GB2379820A - Interpolating values for sub-pixels - Google Patents

Interpolating values for sub-pixels Download PDF

Info

Publication number
GB2379820A
GB2379820A GB0122396A GB0122396A GB2379820A GB 2379820 A GB2379820 A GB 2379820A GB 0122396 A GB0122396 A GB 0122396A GB 0122396 A GB0122396 A GB 0122396A GB 2379820 A GB2379820 A GB 2379820A
Authority
GB
United Kingdom
Prior art keywords
sub
pixels
values
unit
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB0122396A
Other versions
GB0122396D0 (en
Inventor
Antti Hallapuro
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nokia Oyj
Original Assignee
Nokia Oyj
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Oyj filed Critical Nokia Oyj
Priority to GB0122396A priority Critical patent/GB2379820A/en
Publication of GB0122396D0 publication Critical patent/GB0122396D0/en
Publication of GB2379820A publication Critical patent/GB2379820A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/01Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
    • H04N7/0135Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level involving interpolation processes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/523Motion estimation or motion compensation with sub-pixel accuracy

Abstract

A method of interpolation in which an image comprising pixels arranged in rows and columns and represented by values, the pixels in the rows residing at unit horizontal locations and the pixels in the columns residing at unit vertical locations, is interpolated to generate values for sub-pixels at fractional horizontal and vertical locations, the method comprising: <SL> <LI>a) interpolating values for sub-pixels at half unit horizontal and unit vertical locations, unit horizontal and half unit vertical locations, and quarter unit horizontal and unit vertical locations directly using weighted sums of pixels residing at unit horizontal and unit vertical locations; <LI>b) interpolating values for sub-pixels at half unit horizontal and half unit vertical locations using a weighted sum of values for sub-pixels residing at half unit horizontal and unit vertical locations calculated according to step (a); <LI>c) interpolating values for sub-pixels at quarter unit horizontal and half unit vertical locations using a weighted sum of values for sub-pixels residing at quarter unit horizontal and unit vertical locations calculated according to step (a) and <LI>d) interpolating values for sub-pixels at quarter unit vertical locations by taking the average of the values of a first pixel or sub-pixel located at a horizontal location corresponding to that of the sub-pixel being calculated and unit vertical location and a second pixel or sub-pixel located at a horizontal location corresponding to that of the sub-pixel being calculated and half unit vertical location. </SL>

Description

<Desc/Clms Page number 1>
METHOD FOR SUB-PIXEL VALUE INTERPOLATION The present invention relates to a method for sub-pixel value interpolation in the encoding and decoding of data. It relates particularly, but not exclusively, to encoding and decoding of digital video.
Background of the invention Digital video sequences, like ordinary motion pictures recorded on film, comprise a sequence of still images, the illusion of motion being created by displaying the images one after the other at a relatively fast frame rate, typically 15 to 30 frames per second. Because of the relatively fast frame rate, images in consecutive frames tend to be quite similar and thus contain a considerable amount of redundant information. For example, a typical scene may comprise some stationary elements, such as background scenery, and some moving areas, which may take many different forms, for example the face of a newsreader, moving traffic and so on. Alternatively, the camera recording the scene may itself be moving, in which case all elements of the image have the same kind of motion. In many cases, this means that the overall change between one video frame and the next is rather small. Of course, this depends on the nature of the movement. For example, the faster the movement, the greater the change from one frame to the next. Similarly, if a scene contains a number of moving elements, the change from one frame to the next is likely to be greater than in a scene where only one element is moving.
It should be appreciated that each frame of a raw, that is uncompressed, digital video sequence comprises a very large amount of image information. Each frame of an uncompressed digital video sequence is formed from an array of image pixels. For example, in a commonly used digital video format, known as the Quarter Common Interchange Format (QCIF), a frame comprises an array of 176 x 144 pixels, in which case each frame has 25,344 pixels. In turn, each pixel is represented by a certain number of bits,
<Desc/Clms Page number 2>
which carry information about the luminance and/or colour content of the region of the image corresponding to the pixel. Commonly, a so-called YUV colour model is used to represent the luminance and chrominance content of the image. The luminance, or Y, component represents the intensity (brightness) of the image, while the colour content of the image is represented by two chrominance components, labelled U and V.
Colour models based on a luminance/chrominance representation of image content provide certain advantages compared with colour models that are based on a representation involving primary colours (that is Red, Green and Blue, RGB). The human visual system is more sensitive to intensity variations than it is to colour variations; YUV colour models exploit this property by using a lower spatial resolution for the chrominance components (U, V) than for the luminance component (Y). In this way the amount of information needed to code the colour information in an image can be reduced with an acceptable reduction in image quality.
The lower spatial resolution of the chrominance components is usually attained by sub-sampling. Typically, a block of 16x16 image pixels is represented by one block of 16x16 pixels comprising luminance information and the corresponding chrominance components are each represented by one block of 8x8 pixels representing an area of the image equivalent to that of the 16x16 pixels of the luminance component. The chrominance components are thus spatially sub-sampled by a factor of 2 in the x and y directions. The resulting assembly of one 16x16 pixel luminance block and two 8x8 pixel chrominance blocks is commonly referred to as a YUV macroblock, or macroblock, for short.
A QCIF image comprises 11 x9 macroblocks. If the luminance blocks and chrominance blocks are represented with 8 bit resolution (that is by numbers in the range 0 to 255), the total number of bits required per macroblock is (16x16x8) +2x (8x8x8) = 3072 bits. The number of bits needed to represent a video frame in QCIF format is thus 99x3072 = 304,128 bits.
<Desc/Clms Page number 3>
This means that the amount of data required to transmit/record/display a video sequence in QCIF format, represented using a YUV colour model, at a rate of 30 frames per second, is more than 9 Mbps (million bits per second). This is an extremely high data rate and is impractical for use in video recording, transmission and display applications because of the very large storage capacity, transmission channel capacity and hardware performance required.
If video data is to be transmitted in real-time over a fixed line network such as an ISDN (Integrated Services Digital Network) or a conventional PSTN (Public Service Telephone Network), the available data transmission bandwidth is typically of the order of 64kbits/s. In mobile videotelephony, where transmission takes place at least in part over a radio communications link, the available bandwidth can be as low as 20kbits/s. This means that a significant reduction in the amount of information used to represent video data must be achieved in order to enable transmission of digital video sequences over low bandwidth communication networks. For this reason video compression techniques have been developed which reduce the amount of information transmitted while retaining an acceptable image quality.
Video compression methods are based on reducing the redundant and perceptually irrelevant parts of video sequences. The redundancy in video sequences can be categorised into spatial, temporal and spectral redundancy.'Spatial redundancy'is the term used to describe the correlation between neighbouring pixels within a frame. The term'temporal redundancy'expresses the fact that the objects appearing in one frame of a sequence are likely to appear in subsequent frames, while'spectral redundancy'refers to the correlation between different colour components of the same image.
Sufficiently efficient compression cannot usually be achieved by simply reducing the various forms of redundancy in a given sequence of images.
<Desc/Clms Page number 4>
Thus, most current video encoders also reduce the quality of those parts of the video sequence which are subjectively the least important. In addition, the redundancy of the compressed video bit-stream is itself reduced by means of efficient loss-less encoding. Typically, this is achieved using a technique known as'variable length coding' (VLC).
Modern video compression standards, such as ITU-T recommendations H. 261, H. 263 (+) (++), H. 26L and the Motion Picture Experts Group recommendation MPEG-4 make use of'motion compensated temporal prediction'. This is a form of temporal redundancy reduction in which the content of some (often many) frames in a video sequence is'predicted'from other frames in the sequence by tracing the motion of objects or regions of an image between frames.
Compressed images which do not make use of temporal redundancy reduction are usually called INTRA-coded or t-frames, whereas temporally predicted images are called INTER-coded or P-frames. In the case of INTER frames, the predicted (motion-compensated) image is rarely precise enough to represent the image content with sufficient quality, and therefore a spatially compressed prediction error (PE) frame is also associated with each INTER frame. Many video compression schemes can also make use of bi-directionally predicted frames, which are commonly referred to as Bpictures or B-frames. B-pictures are inserted between reference or so-called 'anchor'picture pairs (I or P frames) and are predicted from either one or both of the anchor pictures. B-pictures are not themselves used as anchor pictures, that is no other frames are predicted from them, and therefore, they can be discarded from the video sequence without causing deterioration in the quality of future pictures.
The different types of frame that occur in a typical compressed video sequence are illustrated in Figure 3 of the accompanying drawings. As can be seen from the figure, the sequence starts with an INTRA or I frame 30. In Figure 3, arrows 33 denote the'forward'prediction process by which P-
<Desc/Clms Page number 5>
frames (labelled 34) are formed. The bi-directional prediction process by which B-frames (36) are formed is denoted by arrows 31 a and 31 b, respectively.
A schematic diagram of an example video coding system using motion compensated prediction is shown in Figures 1 and 2. Figure 1 illustrates an encoder 10 employing motion compensation and Figure 2 illustrates a corresponding decoder 20. The encoder 10 shown in Figure 1 comprises a Motion Field Estimation block 11, a Motion Field Coding block 12, a Motion Compensated Prediction block 13, a Prediction Error Coding block 14, a Prediction Error Decoding block 15, a Multiplexing block 16, a Frame Memory 17, and an adder 19. The decoder 20 comprises a Motion Compensated Prediction block 21, a Prediction Error Decoding block 22, a Demultiplexing block 23 and a Frame Memory 24.
The operating principle of video coders using motion compensation is to minimise the amount of information in a prediction error frame En (x, y), which is the difference between a current frame In (x, y) being coded and a prediction frame Pn (x, y). The prediction error frame is thus:
The prediction frame P, (x, y) is built using pixel values of a reference frame Rn (x, y), which is generally one of the previously coded and transmitted frames, for example the frame immediately preceding the current frame and is available from the Frame Memory 17 of the encoder 10. More specifically, the prediction frame (x, y) is constructed by finding so-called'prediction pixels'in the reference frame Rn (x, y) which correspond substantially with pixels in the current frame. Motion information, describing the relationship (e. g. relative location, rotation, scale etc. ) between pixels in the current frame and their corresponding prediction pixels in the reference frame is derived and the prediction frame is constructed by moving the prediction
<Desc/Clms Page number 6>
pixels according to the motion information. In this way, the prediction frame is constructed as an approximate representation of the current frame, using pixel values in the reference frame. The prediction error frame referred to above therefore represents the difference between the approximate representation of the current frame provided by the prediction frame and the current frame itself. The basic advantage provided by video encoders that use motion compensated prediction arises from the fact that a comparatively compact description of the current frame can be obtained by representing it in terms of the motion information required to form its prediction together with the associated prediction error information in the prediction error frame.
However, due to the very large number of pixels in a frame, it is generally not efficient to transmit separate motion information for each pixel to the decoder. Instead, in most video coding schemes, the current frame is divided into larger image segments Sk and motion information relating to the segments is transmitted to the decoder. For example, motion information is typically provided for each macroblock of a frame and the same motion information is then used for all pixels within the macroblock. In some video coding standards, such as H. 26L, a macroblock can be divided into smaller blocks, each smaller block being provided with its own motion information.
The motion information usually takes the form of motion vectors
[Ax (x, y), 1. y (x, y) ]. The pair of numbers Ax (x, y) and 1. y (x, y) represents the horizontal and vertical displacements of a pixel at location (x, y) in the current frame In (x, y) with respect to a pixel in the reference frame Rn (x, y).
The motion vectors [Ax (x, y), Ay (x, y)] are calculated in the Motion Field Estimation block 11 and the set of motion vectors of the current frame [Ax (.), Aye is referred to as the motion vector field.
Typically, the location of a macroblock in a current video frame is specified by the (x, y) co-ordinate of its upper left-hand corner. Thus, in a video
<Desc/Clms Page number 7>
coding scheme in which motion information is associated with each macroblock of a frame, each motion vector describes the horizontal and vertical displacement Ax (x, y) and Ay (x, y) of a pixel representing the upper left-hand corner of a macroblock in the current frame , (x, y) with respect to a pixel in the upper left-hand corner of a substantially corresponding block of prediction pixels in the reference frame R, (x, y) (as shown in Figure 4b).
Motion estimation is a computationally intensive task. Given a reference frame R,, (x, y) and, for example, a square macroblock comprising N x N pixels in a current frame (as shown in Figure 4a), the objective of motion estimation is to find an N x N pixel block in the reference frame that matches the characteristics of the macroblock in the current picture according to some criterion. This criterion can be, for example, a sum of absolute differences (SAD) between the pixels of the macroblock in the current frame and the block of pixels in the reference frame with which it is compared. This process is known generally as'block matching'. It should be noted that, in general, the geometry of the block to be matched and that in the reference frame do not have to be the same, as real-world objects can undergo scale changes, as well as rotation and warping. However, in current international video coding standards, only a translational motion model is used (see below) and thus fixed rectangular geometry is sufficient.
Ideally, in order to achieve the best chance of finding a match, the whole of the reference frame should be searched. However, this is impractical as it imposes too high a computational burden on the video encoder. Instead, the search region is restricted to region [-] around the original location of the macroblock in the current frame, as shown in Figure 4c.
In order to reduce the amount of motion information to be transmitted from the encoder 10 to the decoder 20, the motion vector field is coded in the Motion Field Coding block 12 of the encoder 10, by representing it with a motion model. In this process, the motion vectors of image segments are re-
<Desc/Clms Page number 8>
expressed using certain predetermined functions or, in other words, the motion vector field is represented with a model. Almost all currently used motion vector field models are additive motion models, complying with the following general formula :
where coefficients a, and b, are called motion coefficients. The motion coefficients are transmitted to the decoder 20 (information stream 2 in Figures 1 and 2). Functions 1, and g, are called motion field basis functions, and are known both to the encoder and decoder. An approximate motion vector field (x (x, y), iy (x, y)) can be constructed using the coefficients and the basis functions. As the basis functions are known to (that is stored in) both the encoder 10 and the decoder 20, only the motion coefficients need to be transmitted to the encoder, thus reducing the amount of information required to represent the motion information of the frame.
The simplest motion model is the translational motion model which requires only two coefficients to describe the motion vectors of each segment. The values of motion vectors are given by:
This model is widely used in various international standards (ISO MPEG-1, MPEG-2, MPEG-4, ITU-T Recommendations H. 261 and H. 263) to describe the motion of 16x16 and 8x8 pixel blocks. Systems which use a translational
<Desc/Clms Page number 9>
motion model typically perform motion estimation at full pixel resolution or some integer fraction of full pixel resolution, for example at half or one quarter pixel resolution.
The prediction frame (x, y) is constructed in the Motion Compensated Prediction block 13 in the encoder 10, and is given by:
In the Prediction Error Coding block 14, the prediction error frame En (x, y) is typically compressed by representing it as a finite series (transform) of some 2-dimensional functions. For example, a 2-dimensional Discrete Cosine Transform (DCT) can be used. The transform coefficients are quantised and entropy (for example Huffman) coded before they are transmitted to the decoder (information stream 1 in Figures 1 and 2). Because of the error introduced by quantisation, this operation usually produces some degradation (loss of information) in the prediction error frame En (x, y). To compensate for this degradation, the encoder 10 also comprises a Prediction Error Decoding block 15, where a decoded prediction error frame E, (x, y) is constructed using the transform coefficients. This locally decoded prediction error frame is added to the prediction frame P, (x, y) in the adder
19 and the resulting decoded current frame I, (x, y) is stored in the Frame Memory 17 for further use as the next reference frame li (x, y). The information stream 2 carrying information about the motion vectors is combined with information about the prediction error in multiplexer 16 and an information stream 3 containing typically at least those two types of information is sent to the decoder 20.
The operation of a corresponding video decoder 20 will now be described.
<Desc/Clms Page number 10>
The Frame Memory 24 of the decoder 20 stores a previously reconstructed reference frame Rn (x, y). The prediction frame E : z (x, y) is constructed in the Motion Compensated Prediction block 21 of the decoder 20 according to equation 5, using received motion coefficient information and pixel values of the previously reconstructed reference frame Rn (x, y). The transmitted transform coefficients of the prediction error frame En (x, y) are used in the Prediction Error Decoding block 22 to construct the decoded prediction error frame En (x, y). The pixels of the decoded current frame In (x, y) are then reconstructed by adding the prediction frame E : z (x, y) and the decoded prediction error frame En (x, y) :
This decoded current frame may be stored in the Frame Memory 24 as the next reference frame Rn+l (x, y).
In the description of motion compensated encoding and decoding of digital video presented above, the motion vector [Ax (x, y), Ay (x, y)] describing the motion of a macroblock in the current frame with respect to the reference frame Rn (x, y) can point to any of the pixels in the reference frame. This means that motion between frames of a digital video sequence can only be represented at a resolution which is determined by the image pixels in the frame (so-called full pixel resolution). Real motion, however, has arbitrary precision, and thus the system described above can only provide approximate modelling of the motion between successive frames of a digital video sequence. Typically, modelling of motion between video frames with full pixel resolution is not sufficiently accurate to allow efficient minimisation of the prediction error (PE) information associated with each macroblock/frame. Therefore, to enable more accurate modelling of real motion and to help reduce the amount of PE information that must be transmitted from encoder to decoder, many video coding standards, such as
<Desc/Clms Page number 11>
H. 263 (+) (++) and H. 26L, allow motion vectors to point'in between'image pixels. In other words, the motion vectors can have'sub-pixel'resolution.
Allowing motion vectors to have sub-pixel resolution adds to the complexity of the encoding and decoding operations that must be performed, so it is still advantageous to limit the degree of spatial resolution a motion vector may have. Thus, video coding standards, such as those previously mentioned, typically only allow motion vectors to have full-, half-or quarterpixel resolution.
Motion estimation with sub-pixel resolution is usually performed as a twostage process, as illustrated in Figure 5, for a video coding scheme which allows motion vectors to have full-or half-pixel resolution. In the first step, a motion vector having full-pixel resolution is determined using any appropriate motion estimation scheme, such as the block-matching process described in the foregoing. The resulting motion vector, having full-pixel resolution is shown in Figure 5.
In the second stage, the motion vector determined in the first stage is refined to obtain the desired half-pixel resolution. In the example illustrated in Figure 5, this is done by forming eight new search blocks of 16 x 16 pixels, the location of the top-left corner of each block being marked with an X in Figure 5. These locations are denoted as [Ax + m/2, Ay + n/2], where m and n can take the values-1, 0 and +1, but cannot be zero at the same time. As only the pixel values of original image pixels are known, the values (for example luminance and/or chrominance values) of the sub-pixels residing at half-pixel locations must be estimated for each of the eight new search blocks, using some form of interpolation scheme.
Having interpolated the values of the sub-pixels at half-pixel resolution, each of the eight search blocks is compared with the macroblock whose motion vector is being sought. As in the block matching process performed in order to determine the motion vector with full pixel resolution, the macroblock is compared with each of the eight search blocks according to some criterion,
<Desc/Clms Page number 12>
for example a SAD. As a result of the comparisons, a minimum SAD value will generally be obtained. Depending on the nature of the motion in the video sequence, this minimum value may correspond to the location specified by the original motion vector (having full-pixel resolution), or it may correspond to a location having a half-pixel resolution. Thus, it is possible to determine whether a motion vector should point to a full-pixel or sub-pixel location and if sub-pixel resolution is appropriate, to determine the correct sub-pixel resolution motion vector. It should also be appreciated that the scheme just described can be extended to other sub-pixel resolutions (for example, one-quarter-pixel resolution) in an entirely analogous fashion.
In practice the estimation of a sub-pixel value is performed by interpolating the value of the sub-pixel from the values of pixels at locations corresponding to full pixel resolution, according to a predefined scheme. In general, interpolation can be formulated as a two dimensional filtering operation, represented mathematically as:
where (x, y) are the x and y co-ordinates of the sub-pixel to be interpolated, F is the interpolated sub-pixel value, ao is an interpolation function, fO are the original pixels, x and y are obtained by truncating x and y, respectively, to integer values. Constant M determines how many pixels from the original image are used in interpolation of the sub-pixel value.
If the same interpolation function is used both the horizontal and vertical directions, the interpolation operation can be formulated as a onedimensional filtering operation in which the same 1-D filter is applied separately in both the x and y directions. Mathematically, such a onedimensional filter can be represented as:
<Desc/Clms Page number 13>
when applied in the x direction and
when applied in the y direction.
In order to limit the additional complexity introduced by sub-pixel interpolation, in video coding applications M is typically set to 1 in and a () is chosen so that simple bilinear interpolation is performed. Typically, in image processing, the interpolation function is some form of low pass filter.
The motion vectors are calculated in the encoder. Once the corresponding motion coefficients are transmitted to the decoder, it is a straightforward matter to interpolate the required sub-pixels using an interpolation method identical to that used in the encoder. In this way, a frame following a reference frame in the Frame Memory 24, can be reconstructed from the reference frame and the motion vectors.
The simplest way of applying sub-pixel interpolation in a video codec is to interpolate a sub-pixel every time its value is needed. However, this is not an efficient solution in a video encoder, because it is likely that the same sub-pixel will be required several times and thus calculations to interpolate the same sub-pixel value will be performed multiple times. The complexity is especially high, if interpolation is a complex operation involving, for example, complicated interpolation functions and numerous neighbouring pixels.
An alternative approach, which limits the increase in complexity, is to precalculate and store all sub-pixel values, so that whenever needed, they are immediately available. This solution is called'before-hand interpolation scheme'hereafter. While having the advantage of decreasing the
<Desc/Clms Page number 14>
complexity of the encoder, it has the disadvantage of increasing memory usage by a large margin. For example, if the motion vector resolution is one quarter pixel in both the horizontal and vertical dimensions, storing pre- calculated sub-pixel values for a complete video frame results in a memory usage which is 16 times that required to store the original, non-interpolated image. In addition, it involves the calculation of some sub-pixels which might not actually be required in calculating motion vectors in the encoder. Before- hand interpolation is also particularly inefficient in a video decoder, as the vast majority of the pre-calculated pixel values will never be required by the decoder. Thus, it is advantageous not to use pre-calculation in the decoder.
A compromise solution is to pre-calculate some of the sub-pixel values and then interpolate the rest when they are required. This solution is called 'subsequent interpolation scheme'hereafter. According to this approach, if quarter pixel resolution is used and only every second sub-pixel in both the x an y directions is pre-calculated (that is every half pixel), memory usage is 4 times that required for the original image. This is considerably less than the memory needed when using the before-hand scheme.
Two interpolation schemes have been developed as part of the work ongoing in the ITU-Telecommunications Standardization Sector, Study Group 16, Video Coding Experts Group (VCEG), Questions 6 and 15. These approaches were proposed for incorporation into ITU-T recommendation H. 26L and have been implemented in test models (TML) for the purposes of evaluation and further development. The test model corresponding to Question 15 is referred to as Test Model 5 (TML5), while that resulting from Question 6 is known as Test Model 6 (TML6). The interpolation schemes proposed in both TML5 and TML6 will now be described.
Throughout the description of the sub-pixel value interpolation schemes used in test models TML5 and TML6, reference will be made to Figure 9, which defines the nomenclature used in this text to describe pixel and sub-
<Desc/Clms Page number 15>
pixel locations. For consistency, the same notation will also be used later in the detailed description of the sub-pixel value interpolation method proposed by the present invention.
In Figure 9, the letter A is used to denote original image pixels (full pixel resolution). In other words, the letter A represents the location of pixels in the image data representing a frame of a video sequence, the pixel values of pixels A being either received as current frame In (x, y) from a video source, or reconstructed and stored as a reference frame Rn (x, y) in the Frame Memory 17,24 of the encoder 10 or the decoder 20. All other letters in Figure 9 (b, c, di, dz, d3, e1, e2, e3 and f) represent sub-pixel locations, the values of the sub-pixels situated at the sub-pixel locations being obtained by interpolation.
The term'unit horizontal location'is used to describe the location of any sub-pixel that is constructed in a column of the original image data. As can be seen from Figure 9, sub-pixels d1 and e1 fall into this category. Similarly, the term'unit vertical location'is used to describe any sub-pixel that is constructed in a row of the original image data. Sub-pixels band c fall into this category. By definition, pixels A have unit horizontal and unit vertical locations. The term'half horizontal location'is used to describe the location of any sub-pixel that is constructed in a column that lies at half pixel resolution. Sub-pixels b, d3, and e3 fall into this category. In a similar manner, the term'half vertical location'is used to describe the location of any sub-pixel that is constructed in a row that lies at half-pixel resolution (sub-pixels d1, d2, d3, and d4 in Figure 9). Furthermore, the term'quarter horizontal location'refers to any sub-pixel that is constructed in a column which lies at quarter-pixel resolution, such as sub-pixels c, d2, and e2 and c, d4, and e4 in Figure 9. Analogously, the term'quarter vertical location'refers to sub-pixels that are constructed in a row which lies at quarter-pixel resolution. In Figure 9, sub-pixels e1, e2, e3, e4 and f have quarter vertical locations. The definition of each of the terms described above is shown by an'envelope'in Figure 9. It should further be noted that it is often
<Desc/Clms Page number 16>
convenient to denote a particular pixel with a two-dimensional reference. In this case, the appropriate two-dimensional reference can be obtained by examining the intersection of the envelopes in Figure 9. Applying this principle, pixel d3, for example, has a half horizontal and half vertical location and pixel e1 has a unit horizontal and quarter vertical location.
The sub-pixel value interpolation scheme used in TML5 adopts a two-step process for the calculation of sub-pixel values. In the first step, sub-pixel values at (i) half horizontal and unit vertical locations (sub-pixels b), (ii) unit horizontal and half vertical locations (sub-pixels d1), and (iii) half horizontal and half vertical locations (sub-pixels d3) are determined and then, in the second step, sub-pixel values at quarter-pixel resolution are calculated.
These comprise values for sub-pixels at quarter pixel locations between original image pixels and sub-pixels at half-pixel locations (such as subpixels c and e1) and values for sub-pixels at quarter pixel locations between adjacent sub-pixels at half-pixel locations (sub-pixels d2, d4, e2, e3, e4, f, for example).
Calculation of the sub-pixel values referred to in the preceding paragraph will now be described with reference to Figures 10a and 1 Ob.
Values for sub-pixels at half horizontal and unit vertical locations, such as sub-pixel b in Figure 10a, are calculated using a 6-tap filter. The filter interpolates a value for sub-pixels b based upon the values of the 6 pixels (A1-A6) situated in a row at unit horizontal locations and unit vertical locations symmetrically about b, according to the formula b = (Ai - 5A2 + 20A3 + 20A4-5A5 + A6 + 16) /32. The value of b is truncated to the nearest integer and clipped to the range 0 to 255. The value 16 is added to the sum so that the value of b is rounded to the nearest integer value when integer arithmetic is used.
Values for sub-pixels at unit horizontal and half vertical locations, such as sub-pixel d1 in Figure 10b are calculated using the same 6-tap filter used to
<Desc/Clms Page number 17>
calculate the b values. Referring now to Figure 10b, the filter interpolates a value for sub-pixels d1 based upon the values of the 6 pixels (A1-A6) situated in a column at unit horizontal locations and unit vertical locations symmetrically about d1, according to the formula d1 = (A1 - 5A2 + 20A3 + 20A4-5A5 + A6)/32. Similarly, values for sub-pixels at half horizontal and half vertical locations (sub-pixels d3) are calculated according to the relationship: d3 = (b1 - 5b2 + 20b3 + 20b4-5b5 + b6 + 16)/32. The values of d1 and d3 are truncated to the nearest integer and clipped to the range 0 to 255.
At this point in the interpolation process the values of all sub-pixels at half horizontal locations and half vertical locations have been calculated and the process proceeds to the calculation of pixel values at quarter horizontal locations and quarter vertical locations. Values for sub-pixels located at quarter horizontal and unit vertical locations, such as sub-pixels c in Figure 10a, are calculated using linear interpolation. Specifically, the c sub-pixels are calculated by taking the average of the immediately neighbouring pixel at unit horizontal and unit vertical location (pixel A) and the immediately neighbouring sub-pixel at half horizontal and unit vertical location (pixel b).
The result is truncated to the nearest integer. In some video coding standards there is optional rounding of bilinear interpolation result, but such small rounding does not improve coding results consistently.
Values for sub-pixels located at quarter vertical locations, such as the row of sub-pixels e1, e2, e3, and e4 shown in Figures 10a and 10b, are also calculated using linear interpolation using the nearest pixel or sub-pixel values at unit vertical locations and half vertical locations. More specifically, sub-pixels e1 are calculated by taking the average of the immediately neighbouring pixel at unit horizontal and unit vertical location (pixel A) and the immediately neighbouring sub-pixel at unit horizontal and half vertical location (sub-pixel d1). Sub-pixels e3 are calculated by taking the average of the immediately neighbouring sub-pixel at half horizontal and unit vertical location (sub-pixel b) and the immediately neighbouring sub-pixel at half
<Desc/Clms Page number 18>
horizontal and half vertical location (sub-pixel d3). Furthermore, sub-pixels e2 and e4 are calculated by taking the average of the immediately neighbouring sub-pixel at quarter horizontal and unit vertical location (sub- pixels c) and the corresponding sub-pixel at quarter horizontal and half vertical location (sub-pixels d2 and d4 respectively). The results obtained for all sub-pixels e1, e2, e3, and e4 are truncated? to the nearest integer. Values for sub-pixels located at quarter vertical locations, such as sub-pixels e1, e2, eg, in the row containing sub-pixel f, are calculated in a similar manner.
The value for sub-pixel f is constructed in a manner rather different from that described above. Although pixel f is situated at a quarter horizontal, quarter vertical location, like sub-pixel e4, the value of sub-pixel f is formed by averaging the values of the 4 closest pixels values at unit horizontal and vertical locations, according to f = (A1 + A2 + A3 + A4 + 2) /4, where the locations of pixels A1. A2, A3 and A4 are defined in Figure 9. The value of f is truncated to the nearest integer value. The value 2 is added to the sum in order to control rounding effects in such a way that f is always rounded to the nearest integer value.
A disadvantage of TML5 is that the decoder is computationally complex. This results from the fact that TML5 uses a two stage approach in which interpolation of sub-pixel values at quarter horizontal locations and quarter vertical locations depends upon the interpolation of sub-pixels at half horizontal locations and/or half vertical locations. This means that in order to interpolate the values of sub-pixels at quarter horizontal locations and quarter vertical locations, the values of the sub-pixels at half horizontal locations and/or half vertical locations from which they are determined must be calculated first. Furthermore, since the values of the sub-pixels at quarter horizontal locations and quarter vertical locations depend upon the interpolated values obtained for sub-pixels at half horizontal locations and/or half vertical locations, rounding and clipping of the sub-pixel values at half horizontal locations and half vertical locations has a deleterious effect on the precision of the sub-pixel values at quarter horizontal locations and
<Desc/Clms Page number 19>
quarter vertical locations. Specifically, the sub-pixel values at quarter horizontal locations and quarter vertical locations are less precise than they would be if calculated from intermediate values that had not been rounded and clipped. Another disadvantage of TML5 is that it is necessary to store the values of the sub-pixels at half horizontal locations and/or half vertical locations in order to interpolate the sub-pixel values at quarter horizontal locations and quarter vertical locations. Therefore, excess memory is required to store a result which is not ultimately required.
TML6 solves some of the problems associated with TML5 by using 6-tap filters to calculate all sub-pixel values directly. Firstly, the sub-pixels at (i) half horizontal and unit vertical locations (sub-pixels b), (ii) unit horizontal and half vertical locations (sub-pixels d1), and (iii) half horizontal and half vertical locations (sub-pixels d3) are determined. Secondly, values of subpixels at quarter-pixel resolution between the original image pixels and subpixels at half-pixel resolution and between adjacent sub-pixels at half-pixel resolution (sub-pixels c, d2, d4, e1, e2, e3, e4, f) are calculated. However, unlike TML5, values of all but one of the sub-pixels at quarter-pixel resolution are also calculated using 6-tap filters. The remaining sub-pixel at quarter-pixel resolution, the f sub-pixel (see Figures 9, 10a or 10b) is calculated in a manner similar to that used in TML5. Specifically, in TML6, the 6-tap filters are used to calculate intermediate values from which the half and quarter resolution sub-pixels are derived. Therefore, the sub-pixel values can be considered as being calculated directly because no intermediate rounding and clipping steps are performed. The only rounding and clipping operations which are performed occur when the pixel values determined for the sub-pixels at half and quarter pixel locations are themselves rounded and clipped as a final step in the calculation. Therefore, in TML6, a higher level of precision compared with TML5 is retained throughout the interpolation process, as the intermediate values used in the calculation of sub-pixel values are neither rounded nor clipped.
<Desc/Clms Page number 20>
It should be noted that in TML6, sub-pixel values at quarter-pixel locations are obtained directly using the intermediate values referred to above and are not derived from rounded and clipped values for sub-pixels at half-pixel locations. Therefore, in obtaining sub-pixel values at quarter-pixel locations, it is not necessary to calculate final values for any of the sub-pixels at halfpixel resolution. Specifically, it is not necessary to carry out the rounding and clipping operations associated with the calculation of final values for the sub-pixels at half-pixel locations. Neither is it necessary to have stored final values for sub-pixels at half-pixel locations for use in the calculation of subpixel values at quarter-pixel locations. Therefore TML6 is less computationally complex than TML5. However, a disadvantage of TML6 is that high precision arithmetic is required both in the encoder and in the decoder. High precision interpolation requires more silicon area in ASICs and requires more computations in some CPUs.
In view of the previously presented discussion, it should be appreciated that due to the different requirements of the video encoder and decoder with regard to sub-pixel interpolation, there exists a significant problem in developing a method of sub-pixel value interpolation capable of providing satisfactory performance in both the encoder and decoder. Furthermore, neither of the current test models (TML5, TML6) described in the foregoing can provide a solution that is optimum for application in both encoder and decoder.
Summary of the Invention According to a first aspect of the invention there is provided a method of interpolation in video coding in which an image comprising pixels arranged in rows and columns and represented by values having a specified dynamic range, the pixels in the rows residing at unit horizontal locations and the pixels in the columns residing at unit vertical locations, is interpolated to generate values for sub-pixels located at fractional horizontal and vertical locations, the method comprising:
<Desc/Clms Page number 21>
(a) when a value of an interpolated sub-pixel in a first row located at unit vertical location is required: (i) calculating a first weighted sum for a sub-pixel located at half unit horizontal and unit vertical location by using an n-tap filter which interpolates based upon values of n pixels at unit horizontal and unit vertical locations, the first weighted sum having an extended dynamic range exceeding the specified dynamic range and being dependent upon a first set of weighting factors; and (ii) dividing the first weighted sum calculated in step (a) (i) by a first divisor which is dependent on the first set of weighting factors to produce a first result having a reduced dynamic range compared to the extended dynamic range and, if the first result having the reduced dynamic range exceeds the specified dynamic range, clipping the first result having the reduced dynamic range to produce a value for the sub-pixel located at half unit horizontal and unit vertical location such that it has the specified dynamic range; and (iii) when a value of a sub-pixel located at quarter unit horizontal and unit vertical location is required, calculating a second weighted sum by using the first weighted sum calculated according to step (a) (i) and the value of a pixel located at unit horizontal and unit vertical location and dividing the second weighted sum by a second divisor which is dependent on the first set of weighting factors to produce a second result having a reduced dynamic range compared to the extended dynamic range and, if the second result having the reduced dynamic range exceeds the specified dynamic range, clipping the second result having the reduced dynamic range to produce a value of the sub-pixel located at quarter unit horizontal and unit vertical location such that it has the specified dynamic range; (b) when the value of an interpolated sub-pixel for a second row located at half unit vertical location is required: (i) calculating a third weighted sum for the sub-pixel by using an n-tap filter which interpolates based upon values of n pixels or sub-pixels at a horizontal location corresponding to the horizontal location of the sub-pixel being calculated and at unit vertical locations, the third weighted sum having
<Desc/Clms Page number 22>
an extended dynamic range exceeding the specified dynamic range and being dependent upon a second set of weighting factors; and (ii) dividing the third weighted sum calculated in step (b) (i) by a third divisor which is dependent on the second set of weighting factors to produce a third result having a reduced dynamic range compared to the extended dynamic range and, if the third result having the reduced dynamic range exceeds the specified dynamic range, clipping the third result having the reduced dynamic range to produce the value for the sub-pixel located at half unit vertical location such that it has the specified dynamic range; (c) when the value of an interpolated sub-pixel for a third row located at quarter unit vertical location is required, taking the average of the values of a first pixel or sub-pixel located at a horizontal location corresponding to the horizontal location of the sub-pixel being calculated and at unit vertical location and a second pixel or sub-pixel located at a horizontal location corresponding to the horizontal location of the sub-pixel being calculated and at half unit vertical location calculated according to step (b) (ii).
Preferably the value of a pixel located at unit horizontal and unit vertical location in sub-step a (iii) is the value of the nearest pixel at unit horizontal and vertical location to the sub-pixel located at quarter unit horizontal and unit vertical location.
Sub-pixels at quarter unit horizontal location are to be interpreted as being sub-pixels having as their left-hand nearest neighbour a pixel at unit horizontal location and as their right-hand nearest neighbour a sub-pixel at half unit horizontal location as well as sub-pixels having as their left-hand nearest neighbour a sub-pixel at half unit horizontal location and as their right-hand nearest neighbour a pixel at unit horizontal location. Correspondingly, sub-pixels at quarter unit vertical location are to be interpreted as being sub-pixels having as their upper nearest neighbour a pixel at unit vertical location and as their lower nearest neighbour a subpixel at half unit vertical location as well as sub-pixels having s their upper
<Desc/Clms Page number 23>
nearest neighbour a sub-pixel at half unit vertical location and as their lower nearest neighbour a pixel at unit vertical location.
In the context of the invention, the terms half unit horizontal location, half unit vertical location, quarter unit horizontal location, and quarter unit vertical location are relative terms. Accordingly, if the invention is applied to interpolate between pixels at unit horizontal locations such as pixels A1 and A2 in Figure 9, it interpolates a value for a sub-pixel at a half unit horizontal and unit vertical location. Correspondingly, if it is applied to interpolate between pixels at unit vertical locations such as pixels A1 and A3 in Figure 9, it interpolates a value for a sub-pixel at a half vertical location. If the invention is applied to interpolate between pixels at unit horizontal locations and sub-pixels at half unit horizontal locations it interpolates a value for sub-pixel at a quarter unit horizontal location. Similarly, if it is used to interpolate between pixels at unit vertical locations and sub-pixels at half unit vertical locations, it interpolates a value for a sub-pixel at a quarter vertical location. The invention can further be applied to any degree of interpolation, to produce pixels at 1/2"unit horizontal locations and 1/2"unit vertical locations (where n is an integer greater than zero).
The term dynamic range, refers to the range of values which the sub-pixel values and the weighted sums can take.
Preferably changing the dynamic range, whether by extending it or reducing it, means changing the number of bits which are used to represent the dynamic range.
Preferably the value of a sub-pixel located at unit horizontal and half unit vertical location is interpolated by: (i) calculating the third weighted sum for the sub-pixel by using an n-tap filter which interpolates based upon values of n pixels at unit horizontal location and unit vertical location,
<Desc/Clms Page number 24>
(ii) if the third result having the reduced dynamic range exceeds the specified dynamic range, clipping the third result having the reduced dynamic range to produce a value for the sub-pixel located at unit horizontal location and half unit vertical location such that it has the specified dynamic range.
Preferably the value of a sub-pixel located at half unit horizontal and half unit vertical location is interpolated by: (i) calculating the third weighted sum for the sub-pixel by using an n-tap filter which interpolates based upon values of n sub-pixels located at half unit horizontal and unit vertical location calculated in step (b) (ii); (ii) if the third result having the reduced dynamic range exceeds the specified dynamic range, clipping the third result having the reduced dynamic range to produce a value for the sub-pixel located at half unit horizontal and half unit vertical location such that it has the specified dynamic range.
Preferably the value of a sub-pixel located at quarter unit horizontal and half unit vertical location is interpolated by: (i) calculating the third weighted sum for the sub-pixel by using an n-tap filter which interpolates based upon the values of n sub-pixels at quarter unit horizontal and unit vertical location calculated according to step (a) (iii); (ii) if the third result having the reduced dynamic range exceeds the specified dynamic range, clipping the third result having the reduced dynamic range to produce a value for the sub-pixel located at quarter unit horizontal and half unit vertical location such that it has the specified dynamic range.
Preferably the value of a sub-pixel located at unit horizontal and quarter unit vertical location is interpolated by taking the average of the values of a first pixel located at unit horizontal and unit vertical location and a second pixel located at unit horizontal and half unit vertical location calculated according to step (b) (ii).
<Desc/Clms Page number 25>
Preferably the value of a sub-pixel located at half unit horizontal and quarter unit vertical location is interpolated by taking the average of the values of a first sub-pixel located at half unit horizontal and unit vertical location calculated according to step (a) (ii) and a second sub-pixel located at half unit horizontal and half unit vertical location calculated according to step (b) (ii); Preferably the value of a sub-pixel located at quarter unit horizontal and quarter unit vertical location is interpolated by taking the average of the values of a first sub-pixel located at quarter unit horizontal and unit vertical location calculated according to step (a) (iii) and a second sub-pixel located at quarter unit horizontal and half unit vertical location calculated according to step (b) (ii).
In an embodiment of the invention, the method is applied to an image that is sub-divided into a number of image blocks. Preferably each image block comprises four corners, each corner being defined by a pixel located at a unit horizontal and unit vertical location. Preferably the method is applied to each image block as the block becomes available for sub-pixel value interpolation. Alternatively, sub-pixel value interpolation according to the method of the invention is performed once all image blocks of an image have become available for sub-pixel value interpolation.
Preferably the method is used in video encoding. Preferably the method is used in video decoding.
In one embodiment of the invention, when used in encoding, the method is carried out as before-hand interpolation, in which values for all sub-pixels at half unit locations and values for all sub-pixels at quarter unit locations are calculated and stored before being subsequently used in the determination of a prediction frame during motion predictive coding. In alternative embodiments, the method is carried out as a combination of before-hand and subsequent interpolation. In this case, a certain proportion or category
<Desc/Clms Page number 26>
of sub-pixel values is calculated and stored before being used in the determination of a prediction frame and certain other sub-pixel values are calculated only when required during motion predictive coding.
Preferably, when the method is used in decoding, sub-pixels are only interpolated when their need is indicated by a motion vector.
Preferably the method step c is used to calculate a first row of sub-pixels at quarter unit vertical location between an upper row at unit vertical location and a lower row at half vertical location and a second row of sub-pixels at quarter unit vertical location between an upper row at half vertical location and a lower row at unit vertical location.
Preferably the method also comprises the step of calculating a sub-pixel at a quarter unit horizontal and a quarter unit vertical location between an upper row at half vertical location and a lower row at unit vertical location by taking the average of the values of the nearest four pixels located at unit horizontal and unit vertical locations.
Preferably the method comprises the step of dividing the weighted sums by the sum of the respective sets of weighting factors. Preferably this is followed by the step of rounding the result of the dividing step to a nearest integer value. Preferably the nearest integer value is the nearest smaller integer value. Alternatively, the nearest integer value is the nearest larger integer value. Additionally, the step of rounding can be followed by a step of clipping.
According to a second aspect of the invention there is provided a method of interpolation in video coding in which an image comprising pixels arranged in rows and columns and represented by values having a specified dynamic range, the pixels in the rows residing at unit horizontal locations and the pixels in the columns residing at unit vertical locations, is interpolated to
<Desc/Clms Page number 27>
generate values for sub-pixels located at fractional horizontal and vertical locations, the method comprising: (a) When forming values of interpolated sub-pixels for a first row located at unit vertical location : (i) calculating a first weighted sum by using an n-tap filter which interpolates based upon values of n pixels at unit horizontal and unit vertical locations, the first weighted sum having an extended dynamic range exceeding the specified dynamic range; (ii) dividing the first weighted sum calculated in step (a) (i) to produce a value of a sub-pixel located at half unit horizontal and unit vertical location and adjusting the value, if necessary, such that the value of the sub-pixel has the specified dynamic range; (iii) when a value of a sub-pixel located at quarter unit horizontal and unit vertical location is required, dividing the first weighted sum calculated in step (a) (i) to produce the value of the sub-pixel and adjusting the value, if necessary, such that the value of the sub-pixel has the specified dynamic range; (b) when the value of a sub-pixel located in a second row located at half unit vertical location is required: (i) calculating a further weighted sum by using an n-tap filter which interpolates based upon values of n pixels or sub-pixels at unit vertical locations and at a horizontal location corresponding to the horizontal location of the sub-pixel being calculated, the further weighted sum having an extended dynamic range exceeding the specified dynamic range; (ii) dividing the further weighted sum calculated in step (b) (i) to produce a value of a sub-pixel and adjusting the value, if necessary, such that it has the specified dynamic range; (c) when the value of a sub-pixel located in a third row located at quarter unit vertical location is required, taking the average of the value of a first pixel located at unit vertical location or a sub-pixel located at unit vertical location and the value of a second pixel located at half unit location or a sub-pixel located at half unit vertical location, the first and second pixels or
<Desc/Clms Page number 28>
sub-pixels being at a horizontal location corresponding to the horizontal location of the sub-pixel being calculated.
It should be appreciated that in the method according to the invention, the way in which values of sub-pixels at quarter unit horizontal and unit vertical locations (sub-pixels c) differs from the way in which sub-pixels at quarter unit vertical locations (sub-pixels e) are calculated. Specifically, sub-pixels at quarter unit horizontal and unit vertical locations can be considered as being calculated directly from values of pixels at unit horizontal and unit vertical locations. This is because the sub-pixel values in question are derived from an intermediate value (the first weighted sum determined in step (a) (i) ) which does not undergo rounding or clipping operations. On the other hand, all but one of the sub-pixels at quarter unit vertical locations (the exception being sub-pixel f) are interpolated linearly from pixels at unit horizontal and unit vertical locations or from sub-pixels interpolated by a previous step or steps, the sub-pixels in question having undergone rounding and possibly also clipping operations. An advantage of direct calculation (even if it is done in two steps) is that a smaller number of calculation steps are required.
If reference is made to Figure 9, it can be seen that the majority of subpixels in the image block defined by pixels A,, A2, Ag and Ä4 are sub-pixels at quarter unit horizontal locations or quarter unit vertical locations, that is
pixels c, d2, d4, e, e2. e3, e4 and f. If the invention is applied to the block of Figure 9, in one embodiment, the majority of sub-pixels at quarter unit vertical locations, that is sub-pixels e" e2, e3 and e4 are calculated by linear interpolation. Sub-pixels e, e2, e3 and e4 are calculated, at least in part, from other sub-pixels, for example from sub-pixels b, c, d2 and d4, which have undergone rounding and clipping. However, sub-pixels c are effectively calculated directly from pixels at unit horizontal and unit vertical locations as they are determined from an intermediate sub-pixel value which has not undergone rounding or clipping.
<Desc/Clms Page number 29>
According to a third aspect of the invention there is provided a video coder for coding an image comprising pixels arranged in rows and columns and represented by values having a specified dynamic range, the pixels in the rows residing at unit horizontal locations and the pixels in the columns residing at unit vertical locations, the coder comprising an interpolator adapted to generate values for sub-pixels located at fractional horizontal and vertical locations, the interpolator being adapted to form values of interpolated sub-pixels for a first row located at unit vertical locations, a second row located at half unit vertical locations and a third row located at quarter unit vertical locations : (a) the interpolator being adapted to form values of interpolated sub-pixels for the first row by: (i) calculating a first weighted sum for a sub-pixel located at half unit horizontal and unit vertical location by using an n-tap filter which interpolates based upon values of n pixels at unit horizontal and unit vertical locations, the first weighted sum having an extended dynamic range exceeding the specified dynamic range and being dependent upon a first set of weighting factors; (ii) dividing the first weighted sum by a first divisor which is dependent on the first set of weighting factors to produce a first result having a reduced dynamic range compared to the extended dynamic range and, if the first result having the reduced dynamic range exceeds the specified dynamic range, clipping the first result having the reduced dynamic range to produce a value for the sub-pixel located at half unit horizontal and unit vertical location such that it has the specified dynamic range; and (iii) when the value of a sub-pixel located at quarter unit horizontal and unit vertical location is required, calculating a second weighted sum by using the first weighted sum calculated according to step (a) (i) and the value of a pixel located at unit horizontal and unit vertical location and dividing the second weighted sum by a second divisor which is dependent on the first set of weighting factors to produce a second result having a reduced dynamic range compared to the extended dynamic range and, if the second result
<Desc/Clms Page number 30>
having the reduced dynamic range exceeds the specified dynamic range, clipping the second result having the reduced dynamic range to produce a value for the sub-pixel located at quarter unit horizontal and unit vertical location such that it has the specified dynamic range; (b) the interpolator being adapted to form a value of an interpolated subpixel located in the second row when such a pixel is required by: (i) calculating a third weighted sum for the sub-pixel by using an n-tap filter which interpolates based upon values of n pixels or sub-pixels at a horizontal location corresponding to the horizontal location of the sub-pixel being calculated and unit vertical locations, the third weighted sum having an extended dynamic range exceeding the specified dynamic range and being dependent upon a second set of weighting factors; and (ii) dividing the third weighted sum by a third divisor which is dependent on the second set of weighting factors to produce a third result having a reduced dynamic range compared to the extended dynamic range and, if the third result having the reduced dynamic range exceeds the specified dynamic range, clipping the third result having the reduced dynamic range to produce a value for the sub-pixel located at half unit vertical location such that it has the specified dynamic range; (c) the interpolator being adapted to form a value of an interpolated subpixel located in the third row when such a pixel is required by taking the average of the values of a first pixel or sub-pixel located at a horizontal location corresponding to the horizontal location of the sub-pixel being calculated and at a unit vertical location and a second pixel or sub-pixel located at a horizontal location corresponding to the horizontal location of the sub-pixel being calculated and at a half unit vertical location calculated according to step b (ii).
According to a fourth aspect of the invention there is provided a video coder for coding an image comprising pixels arranged in rows and columns and represented by values having a specified dynamic range, the pixels in the rows residing at unit horizontal locations and the pixels in the columns residing at unit vertical locations, the coder comprising an interpolator
<Desc/Clms Page number 31>
adapted to generate values for sub-pixels located at fractional horizontal and vertical locations, the interpolator being adapted to form values of interpolated sub-pixels for a first row located at unit vertical locations, a second row located at half unit vertical locations, and a third row located at quarter vertical locations : (a) the interpolator being adapted to form values of interpolated sub-pixels for the first row located at unit vertical location by: (i) calculating a first weighted sum by using an n-tap filter which interpolates based upon values of n pixels at unit horizontal and unit vertical locations, the first weighted sum having an extended dynamic range exceeding the specified dynamic range; (ii) dividing and clipping the first weighted sum calculated in step (a) (i) to produce the value of a sub-pixel located at half unit horizontal and unit vertical location such that the value of the sub-pixel has the specified dynamic range; (iii) when the value of a sub-pixel located at quarter unit horizontal and unit vertical location is required, dividing and clipping the first weighted sum calculated in step (a) (i) to produce the value of the sub-pixel such that the value of the sub-pixel is has the specified dynamic range; (b) the interpolator being adapted to form a value of an interpolated subpixel located in the second row when such a pixel is required by; (i) calculating a further weighted sum by using an n-tap filter which interpolates based upon values of n pixels or sub-pixels at unit vertical location and at a horizontal location corresponding to the horizontal location of the sub-pixel being calculated, the further weighted sum having an extended dynamic range exceeding the specified dynamic range; (ii) dividing and clipping the further weighted sum calculated in step (b) (i) to produce a value of a sub-pixel such that it has the specified dynamic range; (c) the interpolator being adapted to form values of an interpolated sub-pixel located in the third row when such a pixel is required by taking the average of the value of a first pixel or sub-pixel located at unit vertical location and the value of a second pixel or sub-pixel located at half unit vertical location,
<Desc/Clms Page number 32>
the first and second pixels being at a horizontal location corresponding to the horizontal location of the sub-pixel being calculated.
Preferably the video coder is a video encoder. Alternatively, the video coder is a decoder.
According to a fifth aspect of the invention, there is provided a telecommunications system comprising a terminal having at least one of an encoder and a decoder according to at least one of the third and fourth aspects of the invention and a network, the terminal and the network being connected by a communications link over which coded video can be transmitted.
Alternatively, the at least one encoder or decoder is located in the network rather than in the terminal.
According to a sixth aspect of the invention there is provided a communications terminal comprising a user interface, a processor and at least one of a transmitting block and a receiving block, and a video coder according to at least one of the third and fourth aspects of the invention.
Preferably the processor controls the operation of the transmitting block and/or the receiving block and the video coder.
Preferably the network enables the terminal to communicate with other terminals connected to the network over communications links between the other terminals and the network.
Preferably the telecommunications system is a mobile telecommunications system comprising mobile terminals and a fixed network, the connection between the mobile terminals and the fixed network being formed by a radio link.
<Desc/Clms Page number 33>
Sub-pixels at quarter unit horizontal locations are to be interpreted as being sub-pixels having as their left-hand nearest neighbour a pixel or sub-pixel at unit horizontal location and as their right-hand nearest neighbour a sub-pixel at half unit horizontal location as well as sub-pixels having as their left-hand nearest neighbour a sub-pixel at half unit horizontal location and as their right-hand nearest neighbour a pixel or sub-pixel at unit horizontal location.
Correspondingly, sub-pixels at quarter unit vertical location are to be interpreted as being sub-pixels having as their upper nearest neighbour a pixel or sub-pixel at unit vertical location and as their lower nearest neighbour a sub-pixel at half unit vertical location as well as sub-pixels having as their upper nearest neighbour a sub-pixel at half unit vertical location and as their lower nearest neighbour a pixel or sub-pixel at unit vertical location.
In the context of the invention, the terms unit horizontal location, unit vertical location, half unit horizontal location, half unit vertical location, quarter unit horizontal location, and quarter unit vertical location are relative terms. Accordingly, if the invention is applied to interpolate between pixels at unit horizontal and unit vertical locations, it interpolates sub-pixels at half unit horizontal and half unit vertical locations and sub-pixels at quarter unit horizontal locations and quarter unit vertical locations. If the invention is applied to interpolate between sub-pixels at half horizontal locations and half vertical locations and pixels at unit horizontal and unit vertical locations (whether these pixels have been interpolated according to the invention or by another interpolation method), it interpolates sub-pixels at quarter unit horizontal and quarter unit vertical locations and sub-pixels at eighth unit horizontal locations and eighth unit vertical locations. The invention can be apply to all degrees of interpolation, to produce pixels at 1/2n unit horizontal locations and 1/2n unit vertical locations (where n is an integer greater than zero).
Brief Description of the Figures
<Desc/Clms Page number 34>
An embodiment of the invention will now be described by way of example only with reference to the accompanying drawings in which: Figure 1 shows a video encoder according to the prior art; Figure 2 shows a video decoder according to the prior art; Figure 3 shows the types of frames used in video encoding; Figures 4a, 4b, and 4c show steps in block-matching ; Figure 5 illustrates the process of motion estimation to sub-pixel resolution ; Figure 6 shows a terminal device comprising video encoding and decoding equipment in which the method of the invention may be implemented ; Figure 7 shows a video encoder according an embodiment of the present invention; Figure 8 shows a video decoder according to an embodiment of the present invention; Figure 9 shows the nomenclature used to describe the arrangement of pixels and sub-pixels ; Figures 10a and 10b show interpolated sub-pixels ; and Figure 11 shows a schematic diagram of a mobile telecommunications network according to an embodiment of the present invention.
Detailed Description Figures 1 to 5 and 9, 1 Oa, and 1 Ob have been described in the foregoing.
Figure 6 presents a terminal device comprising video encoding and decoding equipment which may be adapted to operate in accordance with the present invention. More precisely, the figure illustrates a multimedia terminal 60 implemented according to ITU-T recommendation H. 324. The terminal can be regarded as a multimedia transceiver device. It includes elements that capture, encode and multiplex multimedia data streams for transmission via a communications network, as well as elements that receive, de-multiplex, decode and display received multimedia content. ITUT recommendation H. 324 defines the overall operation of the terminal and refers to other recommendations that govern the operation of its various
<Desc/Clms Page number 35>
constituent parts. This kind of multimedia terminal can be used in real-time applications such as conversational videotelephony, or non real-time applications such as the retrieval/streaming of video clips, for example from a multimedia content server in the Internet.
In the context of the present invention, it should be appreciated that the H. 324 terminal shown in Figure 6 is only one of a number of alternative multimedia terminal implementations suited to application of the inventive method. It should also be noted that a number of alternatives exist relating to the location and implementation of the terminal equipment. As illustrated in Figure 6, the multimedia terminal may be located in communications equipment connected to a fixed line telephone network such as an analogue PSTN (Public Switched Telephone Network). In this case the multimedia terminal is equipped with a modem 71, compliant with ITU-T recommendations V. 8, V. 34 and optionally V. 8bis. Alternatively, the multimedia terminal may be connected to an external modem. The modem enables conversion of the multiplexed digital data and control signals produced by the multimedia terminal into an analogue form suitable for transmission over the PSTN. It further enables the multimedia terminal to receive data and control signals in analogue form from the PSTN and to convert them into a digital data stream that can be demulitplexed and processed in an appropriate manner by the terminal.
An H. 324 multimedia terminal may also be implemented in such a way that it can be connected directly to a digital fixed line network, such as an ISDN (Integrated Services Digital Network). In this case the modem 71 is replaced with an ISDN user-network interface. In Figure 6, this ISDN user-network interface is represented by alternative block 72.
H. 324 multimedia terminals may also be adapted for use in mobile communication applications. If used with a wireless communication link, the modem 71 can be replaced with any appropriate wireless interface, as represented by alternative block 73 in Figure 6. For example, an H. 324/M
<Desc/Clms Page number 36>
multimedia terminal can include a radio transceiver enabling connection to the current 2nd generation GSM mobile telephone network, or the proposed 3rd generation UMTS (Universal Mobile Telephone System).
It should be noted that in multimedia terminals designed for two-way communication, that is for transmission and reception of video data, it is advantageous to provide both a video encoder and video decoder implemented according to the present invention. Such an encoder and decoder pair is often implemented as a single combined functional unit, referred to as a'codec'.
Because a video encoder according to the invention performs motion compensated video encoding to sub-pixel resolution using a specific interpolation scheme and a particular combination of before-hand and subsequent sub-pixel value interpolation, it is generally necessary for a video decoder of a receiving terminal to be implemented in a manner compatible with the encoder of the transmitting terminal which formed the compressed video data stream. Failure to ensure this compatibility may have an adverse effect on the quality of the motion compensation and the accuracy of reconstructed video frames.
A typical H. 324 multimedia terminal will now be described in further detail with reference to Figure 6.
The multimedia terminal 60 includes a variety of elements referred to as 'terminal equipment'. This includes video, audio and telematic devices, denoted generically by reference numbers 61,62 and 63, respectively. The video equipment 61 may include, for example, a video camera for capturing video images, a monitor for displaying received video content and optional video processing equipment. The audio equipment 62 typically includes a microphone, for example for capturing spoken messages, and a loudspeaker for reproducing received audio content. The audio equipment may also include additional audio processing units. The telematic
<Desc/Clms Page number 37>
equipment 63, may include a data terminal, keyboard, electronic whiteboard or a still image transceiver, such as a fax unit.
The video equipment 61 is coupled to a video codec 65. The video codec 65 comprises a video encoder and a corresponding video decoder both implemented according to the invention. Such an encoder and a decoder will be described in the following. The video codec 65 is responsible for encoding captured video data in an appropriate form for further transmission over a communications link and decoding compressed video content received from the communications network. In the example illustrated in Figure 6, the video codec is implemented according to ITU-T recommendation H. 263, with appropriate modifications to implement the sub-pixel value interpolation method according to the invention in both the encoder and the decoder of the video codec.
Similarly, the terminal's audio equipment is coupled to an audio codec, denoted in Figure 6 by reference number 66. Like the video codec, the audio codec comprises an encoder/decoder pair. It converts audio data captured by the terminal's audio equipment into a form suitable for transmission over the communications link and transforms encoded audio data received from the network back into a form suitable for reproduction, for example on the terminal's loudspeaker. The output of the audio codec is passed to a delay block 67. This compensates for the delays introduced by the video coding process and thus ensures synchronisation of audio and video content.
The system control block 64 of the multimedia terminal controls end-tonetwork signalling using an appropriate control protocol (signalling block 68) to establish a common mode of operation between a transmitting and a receiving terminal. The signalling block 68 exchanges information about the encoding and decoding capabilities of the transmitting and receiving terminals and can be used to enable the various coding modes of the video encoder. The system control block 64 also controls the use of data
<Desc/Clms Page number 38>
encryption. Information regarding the type of encryption to be used in data transmission is passed from encryption block 69 to the multiplexer/de- multiplexer (MUX/DMUX unit) 70.
During data transmission from the multimedia terminal, the MUX/DMUX unit 70 combines encoded and synchronised video and audio streams with data input from the telematic equipment 63 and possible control data, to form a single bit-stream. Information concerning the type of data encryption (if any) to be applied to the bit-stream, provided by encryption block 69, is used to select an encryption mode. Correspondingly, when a multiplexed and possibly encrypted multimedia bit-stream is being received, MUX/DMUX unit 70 is responsible for decrypting the bit-stream, dividing it into its constituent multimedia components and passing those components to the appropriate codec (s) and/or terminal equipment for decoding and reproduction.
It should be noted that the functional elements of the multimedia terminal, video encoder, decoder and video codec according to the invention can be implemented as software or dedicated hardware, or a combination of the two. The video encoding and decoding methods according to the invention are particularly suited for implementation in the form of a computer program comprising machine-readable instructions for performing the functional steps of the invention. As such, the encoder and decoder according to the invention may be implemented as software code stored on a storage medium and executed in a computer, such as a personal desktop computer, in order to provide that computer with video encoding and/or decoding functionality.
Figure 7 shows a video encoder 700 according to an embodiment the invention. Figure 8 shows a video decoder 800 according to an embodiment the invention.
The encoder 700 comprises an input 701 for receiving a video signal from a camera or other video source (not shown). It further comprises a DCT
<Desc/Clms Page number 39>
transformer 705, a quantiser 706, an inverse quantiser 709, an inverse DCT transformer 710, combiners 712 and 716, a before-hand sub-pixel interpolation block 730, a frame store 740 and a subsequent sub-pixel interpolation block 750, implemented in combination with motion estimation block 760. The encoder also comprises a motion field coding block 770 and a motion compensated prediction block 780. Switches 702 and 714 are operated co-operatively by a control manager 720 to switch the encoder between an INTRA-mode of video encoding and an INTER-mode of video encoding. The encoder 700 also comprises a multiplexer unit (MUX/DMUX) 790 to form a single bit-stream from the various types of information produced by the encoder 700 for further transmission to a remote receiving terminal, or for example for storage on a mass storage medium such as a computer hard drive (not shown).
It should be noted that the presence of before-hand sub-pixel interpolation block 730 and subsequent sub-pixel value interpolation block 750 in the encoder architecture depends on the way in which the sub-pixel interpolation method according to the invention is applied. In embodiments of the invention in which before-hand sub-pixel value interpolation is not performed, encoder 700 does not comprise before-hand sub-pixel value interpolation block 730. In other embodiments of the invention, only beforehand sub-pixel interpolation is performed and thus the encoder does not include subsequent sub-pixel value interpolation block 750. In embodiments in which both before-hand and subsequent sub-pixel value interpolation is performed, both blocks 730 and 750 are present in the encoder 700.
Operation of the encoder 700 according to the invention will now be described in detail. In the description, it will be assumed that each frame of uncompressed video, received from the video source at the input 701, is received and processed on a macroblock-by-macroblock basis, preferably in raster-scan order. It will further be assumed that when the encoding of a new video sequence starts, the first frame of the sequence is encoded in INTRA-mode. Subsequently, the encoder is programmed to code each
<Desc/Clms Page number 40>
frame in INTER-format, unless one of the following conditions is met: 1) it is judged that the current frame being coded is so dissimilar from the reference frame used in its prediction that excessive prediction error information is produced; 2) a predefined INTRA frame repetition interval has expired; or 3) feedback is received from a receiving terminal indicating a request for a frame to be coded in I NTRA format.
The occurrence of condition 1) is detected by monitoring the output of the combiner 716. The combiner 716 forms a difference between the current macroblock of the frame being coded and its prediction, produced in the motion compensated prediction block 780. If a measure of this difference (for example a sum of absolute differences of pixel values) exceeds a predetermined threshold, the combiner 716 informs the control manager 720 via a control line 717 and the control manager 720 operates the switches 702 and 714 so as to switch the encoder 700 into INTRA coding mode. Occurrence of condition 2) is monitored by means of a timer or frame counter implemented in the control manager 720, in such a way that if the timer expires, or the frame counter reaches a predetermined number of frames, the control manager 720 operates the switches 702 and 714 to switch the encoder into INTRA coding mode. Condition 3) is triggered if the control manager 720 receives a feedback signal from, for example, a receiving terminal, via control line 718 indicating that an INTRA frame refresh is required by the receiving terminal. Such a condition might arise, for example, if a previously transmitted frame were badly corrupted by interference during its transmission, rendering it impossible to decode at the receiver. In this situation, the receiver would issue a request for the next frame to be encoded in INTRA format, thus re-initialising the coding sequence.
It will further be assumed that the encoder and decoder are implemented in such a way as to allow the determination of motion vectors with a spatial resolution of up to quarter-pixel resolution.
<Desc/Clms Page number 41>
Operation of the encoder 700 in INTRA coding mode will now be described. In INTRA-mode, the control manager 720 operates the switch 702 to accept video input from input line 719. The video signal input is received macroblock by macroblock from input 701 via the input line 719 and each macroblock of original image pixels is transformed into DCT coefficients by the DCT transformer 705. The DCT coefficients are then passed to the quantiser 706, where they are quantised using a quantisation parameter QP. Selection of the quantisation parameter QP is controlled by the control manager 720 via control line 722. Each DCT transformed and quantised macroblock that makes up the INTRA coded image information 723 of the frame is passed from the quantiser 706 to the MUX/DMUX 790. The MUX/DMUX 790 combines the INTRA coded image information with possible control information (for example header data, quantisation parameter information, error correction data etc. ) to form a single bit-stream of coded image information 725. Variable length coding (VLC) is used to reduce redundancy of the compressed video bit-stream, as is known to those skilled in the art.
A locally decoded picture is formed in the encoder 700 by passing the data output by the quantiser 706 through inverse quantiser 709 and applying an inverse DCT transform 710 to the inverse-quantised data. The resulting data is then input to the combiner 712. In INTRA mode, switch 714 is set so that the input to the combiner 712 via the switch 714 is set to zero. In this way, the operation performed by the combiner 712 is equivalent to passing the decoded image data formed by the inverse quantiser 709 and the inverse DCT transform 710 unaltered.
In embodiments of the invention in which before-hand sub-pixel value interpolation is performed, the output from combiner 712 is applied to before-hand sub-pixel interpolation block 730. The input to the before-hand sub-pixel value interpolation block 730 takes the form of decoded image blocks. In the before-hand sub-pixel value interpolation block 730, each decoded macroblock is subjected to sub-pixel interpolation in such a way
<Desc/Clms Page number 42>
that a predetermined sub-set of sub-pixel resolution sub-pixel values is calculated according to the interpolation method of the invention and is stored together with the decoded pixel values in frame store 740.
In embodiments in which before-hand sub-pixel interpolation is not performed, before-hand sub-pixel interpolation block is not present in the encoder architecture and the output from combiner 712, comprising decoded image blocks, is applied directly to frame store 740.
As subsequent macroblocks of the current frame are received and undergo the previously described coding and decoding steps in blocks 705,706, 709,710, 712, a decoded version of the INTRA frameis built up in the frame store 740. When the last macroblock of the current frame has been INTRA coded and subsequently decoded, the frame store 740 contains a completely decoded frame, available for use as a prediction reference frame in coding a subsequently received video frame in INTER format. In embodiments of the invention in which before-hand sub-pixel value interpolation is performed, the reference frame held in frame store 740 is at least partially interpolated to sub-pixel resolution.
Operation of the encoder 700 in INTER coding mode will now be described. In INTER coding mode, the control manager 720 operates switch 702 to receive its input from line 721, which comprises the output of the combiner 716. The combiner 716 forms prediction error information representing the difference between the current macroblock of the frame being coded and its prediction, produced in the motion compensated prediction block 780. The prediction error information is DCT transformed in block 705 and quantised in block 706 to form a macroblock of DCT transformed and quantised prediction error information. Each macroblock of DCT transformed and quantised prediction error information is passed from the quantiser 706 to the MUX/DMUX 790. The MUX/DMUX 790 combines the prediction error information 723 with motion coefficients 724 (described in the following) and control information (for example header data, quantisation parameter
<Desc/Clms Page number 43>
information, error correction data etc. ) to form a single bit-stream of coded image information, 725.
Locally decoded prediction error information for the each macroblock of the INTER coded frame is then formed in the encoder 700 by passing the encoded prediction error information 723 output by the quantiser 706 through the inverse quantiser 709 and applying an inverse DCT transform in block 710. The resulting locally decoded macroblock of prediction error information is then input to combiner 712. In INTER-mode, switch 714 is set so that the combiner 712 also receives motion predicted macroblocks for the current INTER frame, produced in the motion compensated prediction block 780. The combiner 712 combines these two pieces of information to produce reconstructed image blocks for the current INTER frame.
As described above when considering INTRA coded frames, in embodiments of the invention in which before-hand sub-pixel value interpolation is performed, the output from combiner 712 is applied to the before-hand sub-pixel interpolation block 730. Thus, the input to the beforehand sub-pixel value interpolation block 730 in INTER coding mode also takes the form of decoded image blocks. In the before-hand sub-pixel value interpolation block 730, each decoded macroblock is subjected to sub-pixel interpolation in such a way that a predetermined sub-set of sub-pixel values is calculated according to the interpolation method of the invention and is stored together with the decoded pixel values in frame store 740. In embodiments in which before-hand sub-pixel interpolation is not performed, before-hand sub-pixel interpolation block is not present in the encoder architecture and the output from combiner 712, comprising decoded image blocks, is applied directly to frame store 740.
As subsequent macroblocks of the video signal are received from the video source and undergo the previously described coding and decoding steps in blocks 705, 706,709, 710,712, a decoded version of the INTER frame is built up in the frame store 740. When the last macroblock of the frame has
<Desc/Clms Page number 44>
been INTER coded and subsequently decoded, the frame store 740 contains a completely decoded frame, available for use as a prediction reference frame in encoding a subsequently received video frame in INTER format. In embodiments of the invention in which before-hand sub-pixel value interpolation is performed, the reference frame held in frame store 740 is at least partially interpolated to sub-pixel resolution.
Formation of a prediction for a macroblock of the current frame will now be described.
Any frame encoded in INTER format requires a reference frame for motion compensated prediction. This means, inter alia, that when encoding a video sequence, the first frame to be encoded, whether it is the first frame in the sequence, or some other frame, must be encoded in INTRA format. This, in turn, means that when the video encoder 700 is switched into INTER coding mode by control manager 720, a complete reference frame, formed by locally decoding a previously encoded frame, will already be available in the frame store 740 of the encoder. In general, the reference frame is formed by locally decoding either an INTRA coded frame or an INTER coded frame.
The first step in forming a prediction for a macroblock of the current frame is performed by motion estimation block 760. The motion estimation block 760 receives the current macroblock of the frame being coded via line 727 and performs a block matching operation in order to identify a region in the reference frame which corresponds substantially with the current macroblock. According to the invention, the block-matching process is performed to quarter-pixel resolution in a manner that depends on the implementation of the encoder 700 and the degree of before-hand sub-pixel interpolation performed. However, the basic principle behind the blockmatching process is similar in all cases. Specifically, motion estimation block 760 performs block-matching by calculating difference values (e. g. sums of absolute differences) representing the difference in pixel values between the macroblock of the current frame under examination and candidate best-matching regions of pixels/sub-pixels in the reference
<Desc/Clms Page number 45>
frame. A difference value is produced for all possible offsets (e. g. quarter- pixel precision x, y displacements) between the macroblock of the current frame and candidate test region within a predefined search region of the reference frame and motion estimation block 760 determines the smallest calculated difference value. The offset between the macroblock in the current frame and the candidate test region of pixel values/sub-pixel values in the reference frame that yields the smallest difference value defines the motion vector for the macro block in question. In certain embodiments of the invention, an initial estimate for the motion vector having unit pixel precision is first determined and then refined to quarterpixel precision, as described in the foregoing.
In embodiments of the encoder in which before-hand sub-pixel value interpolation is not performed, all sub-pixel values required in the block matching process are calculated in subsequent sub-pixel value interpolation block 750. Motion estimation block 760 controls subsequent sub-pixel value interpolation block 750 to calculate each sub-pixel value needed in the block-matching process in an on-demand fashion, as and when it is required. In this case, motion estimation block 760 may be implemented so as to perform block-matching as a one-step process, in which case a quarter-pixel resolution motion vector is sought directly, or it may be implemented so as to perform block-matching as a two step process. If the two-step process is adopted, the first step may comprise a search for a full or half-pixel resolution motion vector and the second step is performed in order to refine the resolution of the motion vector to quarter-pixel resolution. As block matching is an exhaustive process, in which blocks of n x n pixels in the current frame are compared one-by-one with blocks of n x n pixels or sub-pixels in the interpolated reference frame, it should be appreciated that a sub-pixel calculated in an on-demand fashion by the subsequent pixel interpolation block 750 may need to be calculated multiple times as successive difference values are determined. In a video encoder, this approach is not the most efficient possible in terms of computational complexity/burden.
<Desc/Clms Page number 46>
In embodiments of the encoder which use only before-hand sub-pixel value interpolation, block-matching may be performed as a one step process, as all sub-pixel values of the reference frame required to determine a motion vector with quarter-pixel resolution are calculated before-hand in block 730 and stored in frame store 740. Thus, they are directly available for use in the block-matching process and can be retrieved as required from frame store 740 by motion estimation block 760. However, even in the case where all quarter-pixel resolution sub-pixel values are available from frame store 740, it is still more computationally efficient to perform block-matching as a two-step process, as fewer difference calculations are required. It should be appreciated that while full before-hand sub-pixel value interpolation reduces computational complexity in the encoder, it is not the most efficient approach in terms of memory consumption.
In embodiments of the encoder in which both before-hand and subsequent sub-pixel value interpolation are used, motion estimation block 760 is implemented in such a way that it can retrieve sub-pixel values previously calculated in before-hand sub-pixel value interpolation block 730 and stored in frame store 740 and further control subsequent sub-pixel value interpolation block 750 to calculate any additional sub-pixel values that may be required. The block-matching process may be performed as a one-step or a two-step process. If a two-step implementation is used, before-hand calculated sub-pixel values retrieved from frame store 740 may be used in the first step of the process and the second step may be implemented so as to use sub-pixel values calculated by subsequent sub-pixel value interpolation block 750. In this case, certain sub-pixel values used in the second step of the block matching process may need to be calculated multiple times as successive comparisons are made, but the number of such duplicate calculations is significantly less than if before-hand sub-pixel value calculation is not used. Furthermore, memory consumption is reduced with respect to embodiments in which only before-hand sub-pixel value interpolation is used.
<Desc/Clms Page number 47>
Once the motion estimation block 760 has produced a motion vector for the macroblock of the current frame under examination, it outputs the motion vector to the motion field coding block 770. Motion field coding block 770 then approximates the motion vector received from motion estimation block 760 using a motion model. The motion model generally comprises a set of basis functions. More specifically, the motion field coding block 770 represents the motion vector as a set of coefficient values (known as motion coefficients) which, when multiplied by the basis functions, form an approximation of the motion vector. The motion coefficients 724 are passed from motion field coding block 770 to motion compensated prediction block 780. Motion compensated prediction block 780 also receives the pixel ! sub- pixel values of the best-matching candidate test region of the reference frame identified by motion estimation block 760. In Figure 7, these values are shown to be passed via line 729 from subsequent sub-pixel interpolation block 750. In alternative embodiments of the invention, the pixel values in question are provided from the motion estimation block 760 itself.
Using the approximate representation of the motion vector generated by motion field coding block 770 and the pixel ! sub-pixel values of the bestmatching candidate test region, motion compensated prediction block 780 produces a macroblock of predicted pixel values. The macroblock of predicted pixel values represents a prediction for the pixel values of the current macroblock generated from the interpolated reference frame. The macroblock of predicted pixel values is passed to the combiner 716 where it is subtracted from the new current frame in order to produce prediction error information 723 for the macroblock, as described in the foregoing.
The motion coefficients 724 formed by motion field coding block are also passed to the MUX/DMUX unit 790, where they are combined with prediction error information 723 for the macroblock in question and possible control information from control manager 720 to form an encoded video stream 725 for transmission to a receiving terminal.
<Desc/Clms Page number 48>
The sub-pixel interpolation method according to the invention which is used in the encoder 700 will now be described in detail.
Values of sub-pixels at half horizontal and unit vertical locations are determined as follows : Referring to Figure 10a, sub-pixels at half horizontal and unit vertical locations, such as sub-pixel b, are calculated using a 6-tap filter. The filter interpolates an intermediate value b based upon the values of the 6 pixels in a row at unit horizontal and unit vertical locations disposed symmetrically about b, according to the formula b = (A1 - 5A2 + 20A3 + 20A4 - 5A5 + As). The value of sub-pixels b and c are calculated from intermediate value b according to the formulae b = (b + 16)/32 and c = (32A + b + 32)/64, the values of sub-pixels b and c being truncated to the nearest integer and clipped to the range 0 to 255, if necessary.
Values of sub-pixels at half vertical locations are determined as follows : Referring to Figure 10b, sub-pixels at half vertical locations, such as subpixels d1, d2, d3, and d4 are calculated using a 6-tap filter. The filter interpolates a value for sub-pixels d1 based upon the values of the 6 pixels in a column at unit horizontal location and unit vertical locations disposed symmetrically about d, according to the formula d1 = (A,-5A2 + 20A3 + 20A4 - 5A5 + Ae + 16)/32. Similarly, values of sub-pixels at half horizontal and half vertical locations (sub-pixels d3) are calculated according to the relationship: d3 = (b1 - 5b2 + 20b3 + 20b4-5b5 + be + 16)/32 and values of sub-pixies at quarter horizontal and half vertical locations (sub-pixels d2 and d4) are calculated according to the relationship : d2 or d4 = (ci-5c2 + 20C3 + 20c4- 5C5 + Ce + 16) /32, where Cn sub-pixels are located in the same column as the d sub-pixel being calculated and are disposed at unit vertical locations symmetrically on either side of the sub-pixel dn being interpolated. The values of sub-pixels pixels d1, d2, d3, and d4 are truncated to the nearest integer and clipped to the range 0 to 255 if necessary.
<Desc/Clms Page number 49>
Values of sub-pixels at quarter vertical locations are determined as follows : Referring to Figure 1 Db, values of sub-pixels at quarter vertical locations (sub-pixels e) are calculated as an average of the nearest pixel and/or subpixel values in the same column located above and below the sub-pixels being calculated. That is, values of sub-pixels e1, e2, e3, and e4 are calculated as (A3 + di)/2, (c + d2)/2, (b3 + d3)/2, and (c + d4) /2, respectively, and truncated to the nearest integer value. Values of sub-pixels e1, e2, and e3 occupying the same row as sub-pixel f are calculated as sub-pixels (d1 + A4)/2, (d2 + c) /2, and (d3 + b4)/2 respectively and truncated to the nearest integer value. The sub-pixel c is taken from the row occupied by sub-pixel
b4. The value of sub-pixel f is calculated by averaging the values of the four closest pixels values at unit horizontal and vertical locations, according to f = (A1 + A2 + A3 + A4 +2) /4, where the locations of pixels A1, A2, A3 and A4 are defined in Figure 9. The value 2 is added to the sum in order to control rounding effects in such a way that f is always rounded to the nearest integer value.
It should be noted that in the foregoing, sub-pixel value interpolation has only been described in the context of calculating values for sub-pixels that reside at fractional locations within the confines of a group of 4 original image pixels (as shown in Figure 9). Typically, however, sub-pixel value interpolation will be applied to a certain extent to all such groups of 4 pixels within a video frame. However, interpolation of one sub-pixel value often requires the use of pixel values or sub-pixel values within other groups of 4 pixels (see Figures 10a and 10b, for example). Thus, a block-by-block approach to sub-pixel value interpolation does not necessarily treat individual blocks of 4 pixels in isolation. In fact, sub-pixel value interpolation, whether it is performed before-hand or subsequently will usually be performed on a macroblock-by-macroblock basis, as described earlier in the text.
<Desc/Clms Page number 50>
It should further be appreciated that in practical implementations, the extent to which frames are before-hand sub-pixel interpolated, and thus the amount to which they need to be subsequently sub-pixel interpolated, can be chosen according to, or dictated by, the hardware implementation of the video encoder 700, or the environment in which it is intended to be used. For example, if the memory available to the video encoder is limited, or memory must be reserved for other functions, it is appropriate to limit the amount of before-hand sub-pixel value interpolation that is performed. In other cases, where the microprocessor performing the video encoding operation has limited processing capacity, e. g. the number of operations per second that can be executed is comparatively low, it is more appropriate to restrict the amount of subsequent sub-pixel value interpolation that is performed. In a mobile communications environment, for example, when video encoding and decoding functionality is incorporated in a mobile telephone or similar wireless terminal for communication with a mobile telephone network, both memory and processing power may be limited. In this case a combination of before-hand and subsequent sub-pixel value interpolation may be the best choice to obtain an efficient implementation. In video decoder 800, use of before-hand sub-pixel value is generally not preferred, as it typically results in the calculation of many sub-pixel values that are not actually used in the decoding process. However, it should be appreciated that although different amounts of before-hand and subsequent interpolation can be used in the encoder and decoder in order to optimise the operation of each, both encoder and decoder can be implemented so as to use the same division between before-hand and subsequent sub-pixel value interpolation.
According to an embodiment of the invention in which quarter-pixel motion vector resolution is used, four interpolation methods are proposed, each method having different complexity and memory usage characteristics.
1) In the first method, only before-hand interpolation is used. All of the sub- pixels, that is all sub-pixels at half-and quarter pixel locations, are pre- calculated. Thus, the amount of memory required to store one
<Desc/Clms Page number 51>
interpolated image is thus 16 times that required to store the original image.
2) In the second method, both before-hand and subsequent interpolation are used. All of the sub-pixels situated at unit vertical locations (sub- pixels b and c) and half vertical locations (sub-pixels di, d2, d3, d4) are pre-calculated. Sub-pixels at quarter vertical locations (that is, sub-pixels e1, e2, e3, e4 and f) are calculated when they are required during the generation of motion vectors. In this case, the amount of memory required to store one interpolated image is 8 times that required for the original image. This is the preferred method in the encoder.
3) In the third method, both before-hand and subsequent interpolation is also used. A lesser degree of before-hand interpolation compared to method 2) is used. All of the sub-pixels situated at unit horizontal and half vertical locations (sub-pixels d1), half horizontal and unit vertical locations (sub-pixels b) and half horizontal and half vertical locations (sub-pixels d3) are pre-calculated. Sub-pixels at quarter horizontal locations (sub-pixels c, d2 and d4) and sub-pixels at quarter vertical locations (sub-pixels e1, e2, e3, e4 and f) are calculated when they are required during the generation of motion vectors i. e. they are subsequently interpolated. In this case, the amount of memory required to store one interpolated image is 4 times that required for the original image.
4) In the fourth method, no before-hand interpolation is used.
Operation of a video decoder 800 according to the invention will now be described. Referring to Figure 8, the decoder 800 comprises a demultiplexing unit (MUX/DMUX) 810, which receives the encoded video stream 725 from the encoder 700 and demultiplexes it, an inverse quantiser 820, an inverse DCT transformer 830, a motion compensated prediction block 840, a frame store 850, a combiner 860, a control manager 870, an output 880, before-hand sub-pixel value interpolation block 845 and subsequent sub-pixel interpolation block 890 associated with the motion compensated prediction block 840. In practice the control manager 870 of
<Desc/Clms Page number 52>
the decoder 800 and the control manager 720 of the encoder 700 may be the same processor. This may be the case if the encoder 700 and decoder 800 are part of the same video codec.
Figure 8 shows an embodiment in which a combination of before-hand and subsequent sub-pixel value interpolation is used in the decoder. In other embodiments, only before-hand sub-pixel value interpolation is used, in which case decoder 800 does not include subsequent sub-pixel value interpolation block 890. In a preferred embodiment of the invention, no before-hand sub-pixel value interpolation is used in the decoder and therefore before-hand sub-pixel value interpolation block 845 is omitted from the decoder architecture. If both before-hand and subsequent subpixel value interpolation are performed, the decoder comprises both blocks 845 and 890.
The control manager 870 controls the operation of the decoder 800 in response to whether an INTRA or an INTER frame is being decoded. An INTRA/INTER trigger control signal, which causes the decoder to switch between decoding modes is derived, for example, from picture type information provided in the header portion of each compressed video frame received from the encoder. The INTRA/INTER trigger control signal is passed to control manager 870 via control line 815, together with other video codec control signals demultiplexed from the encoded video stream 725 by the MUX/DMUX unit 810.
When an INTRA frame is decoded, the encoded video stream 725 is demultiplexed into INTRA coded macroblocks and control information. No motion vectors are included in the encoded video stream 725 for an INTRA coded frame. The decoding process is performed macroblock-by- macroblock. When the encoded information 723 for a macroblock is extracted from video stream 725 by MUX/DMUX unit 810, it is passed to inverse quantiser 820. The control manager controls inverse quantiser 820 to apply a suitable level of inverse quantisation to the macroblock of
<Desc/Clms Page number 53>
encoded information, according to control information provided in video stream 725. The inverse quantised macro block is then inversely transformed in the inverse DCT transformer 830 to form a decoded block of image information. Control manager 870 controls combiner 860 to prevent any reference information being used in the decoding of the INTRA coded macroblock. The decoded block of image information is passed to the video output 880 of the decoder.
In embodiments of the decoder which employ before-hand sub-pixel value interpolation, the decoded block of image information (i. e. pixel values) produced as a result of the inverse quantisation and inverse transform operations performed in blocks 820 and 830 is passed to before-hand subpixel value interpolation block 845. Here, sub-pixel value interpolation is performed according to the method of the invention, the degree of beforehand sub-pixel value interpolation applied being determined by the details of the decoder implementation. In embodiments of the invention in which subsequent sub-pixel value interpolation is not performed, before-hand subpixel value interpolation block 845 interpolates all sub-pixels at half-and quarter-pixel locations. In embodiments that use a combination of beforehand and subsequent sub-pixel value interpolation, before-hand sub-pixel value interpolation block 845 interpolates a certain sub-set of sub-pixel values. This may comprise all sub-pixels at half-pixel locations, or a combination of sub-pixels at half-pixel and one quarter-pixel locations. In any case, after before-hand sub-pixel value interpolation, the interpolated sub-pixel values are stored in frame store 850, together with the original decoded pixel values. As subsequent macroblocks are decoded, beforehand interpolated and stored, a decoded frame, at least partially interpolated to sub-pixel resolution is progressively assembled in the frame store 850 and becomes available for use as a reference frame for motion compensated prediction.
In embodiments of the decoder which do not employ before-hand sub-pixel value interpolation, the decoded block of image information (i. e. pixel
<Desc/Clms Page number 54>
values) produced as a result of the inverse quantisation and inverse transform operations performed on the macroblock in blocks 820 and 830 is passed directly to frame store 850. As subsequent macroblocks are decoded and stored, a decoded frame, having unit pixel resolution is progressively assembled in the frame store 850 and becomes available for use as a reference frame for motion compensated prediction.
When an INTER frame is decoded, the encoded video stream 725 is demultiplexed into encoded prediction error information 723 for each macroblock of the frame, associated motion coefficients 724 and control information. Again, the decoding process is performed macroblock-bymacroblock. When the encoded prediction error information 723 for a macroblock is extracted from the video stream 725 by MUX/DMUX unit 810, it is passed to inverse quantiser 820. Control manager 870 controls inverse quantiser 820 to apply a suitable level of inverse quantisation to the macroblock of encoded prediction error information, according to control information received in video stream 725. The inverse quantised macroblock of prediction error information is then inversely transformed in the inverse DCT transformer 830 to yield decoded prediction error information for the macroblock.
The motion coefficients 724 associated with the macroblock in question are extracted from the video stream 725 by MUX/DMUX unit 725 and passed to motion compensated prediction block 840, which reconstructs a motion vector for the macroblock using the same motion model as that used to encode the INTER-coded macroblock in encoder 700. The reconstructed motion vector approximates the motion vector originally determined by motion estimation block 760 of the encoder. The motion compensated prediction block 840 of the decoder uses the reconstructed motion vector to identify the location of a block of pixel/sub-pixel values in a prediction reference frame stored in frame store 850. The reference frame may be, for example, a previously decoded INTRA frame, or a previously decoded INTER frame. In either case, the block of pixel/sub-pixel values indicated
<Desc/Clms Page number 55>
by the reconstructed motion vector, represents the prediction of the macroblock in question.
The reconstructed motion vector may point to any pixel or sub-pixel. If the motion vector indicates that the prediction for the current macro block is formed from pixel values (i. e. the values of pixels at unit pixel locations), these can simply be retrieved from frame store 850, as the values in question are obtained directly during the decoding of each frame. If the motion vector indicates that the prediction for the current macroblock is formed from sub-pixel values, these must either be retrieved from frame store 850, or calculated in subsequent sub-pixel interpolation block 890.
Whether sub-pixel values must be calculated, or can simply be retrieved from the frame store, depends on the degree of before-hand sub-pixel value interpolation used in the decoder.
In embodiments of the decoder that do not employ before-hand sub-pixel value interpolation, the required sub-pixel values must all be calculated in subsequent sub-pixel value interpolation block 890. On the other hand, in embodiments in which all sub-pixel values are interpolated before-hand, motion compensated prediction block 840 can retrieve the required subpixel values directly from the frame store 850. In embodiments that use a combination before-hand and subsequent sub-pixel value interpolation, the action required in order to obtain the required sub-pixel values depends on which sub-pixel values are interpolated before-hand. Taking as an example an embodiment in which all sub-pixel values at half-pixel locations are calculated before-hand, it is evident that if a reconstructed motion vector for a macroblock points to a pixel at unit location or a sub-pixel at half-pixel location, all the pixel or sub-pixel values required to form the prediction for the macroblock are present in the frame store 850 and can be retrieved from there by motion compensated prediction block 840. If, however, the motion vector indicates a sub-pixel at a quarter-pixel location, the sub-pixels required to form the prediction for the macroblock are not present in frame store 850 and must therefore be calculated in subsequent sub-pixel value
<Desc/Clms Page number 56>
interpolation block 890. Whenever sub-pixel values must be interpolated subsequently, subsequent sub-pixel value interpolation block 890 retrieves any pixel or sub-pixel required to perform the interpolation from frame store 850 and applies the interpolation method described below. Sub-pixel values calculated in subsequent sub-pixel value interpolation block 890 are passed to motion compensated prediction block 840.
Once a prediction for a macroblock has been obtained, the prediction (that is, a macroblock of predicted pixel values) is passed from motion compensated prediction block 840 to combiner 860 where it combined with the decoded prediction error information for the macro block to form a reconstructed image block which, in turn, is passed to the video output 880 of the decoder.
The sub-pixel interpolation method according to the invention, as used in the decoder 800, will now be described with reference to Figures 9, 10a, and 1 Db. In a preferred embodiment of the invention, in the decoder 800 a particular sub-pixel is interpolated only when it is needed. The value of such a sub-pixel is calculated as follows : Referring to Figure 10a, values of sub-pixels at unit vertical locations are calculated using a 6-tap filter operating on values of pixels at unit horizontal and unit vertical locations. The filter interpolates sub-pixel b based upon the values of the 6 pixels in a row at unit horizontal and unit vertical locations symmetrically disposed about b, according to the formula b = (A1 - 5A2 + 20A3 + 20A4-5A5 + A6 + 16) /32. The filter interpolates sub-pixel c (between A3 and b3) based upon the values of the 6 pixels in a row at unit horizontal and unit vertical locations, three of such pixels on either side of c, according to the formula c = (Al-5A2 + 52A3 + 20A4-5A5 + A6 + 32) /64. The filter interpolates sub-pixel c (between b3 and A4) based upon the values of the 6 pixels in a row at unit horizontal and unit vertical locations, three of such pixels on either side of c, according to the formula c = (A1 - 5A2 + 20A3 + 52A4-5A5 + A6 + 32)/64. The values of sub-pixels b and c are truncated to
<Desc/Clms Page number 57>
the nearest integer and clipped to the range 0 to 255, if necessary. Unlike the formula for calculating sub-pixels b, the formulae for calculating the values of sub-pixels c are asymmetrical about sub-pixels c. This is because sub-pixels b are at half horizontal locations and sub-pixels c are at quarter horizontal locations. It should be noted that the procedure used in the decoder is not exactly the same as that used in the encoder, but does produce identical results.
Values of sub-pixels f are calculated by averaging the values of the four closest pixels values at unit horizontal and vertical locations, according to f = (A1 + A2 + A3 + A4 +2) /4, where the locations of pixels A,, A2, A3 and A4 are defined in Figure 9. The value 2 is added to the sum in order to control rounding effects in such a way that f is always rounded to the nearest integer value.
Values of sub-pixels at half vertical locations are determined as follows : Referring to Figure 10b, sub-pixels at half vertical locations, such as sub- pixels d1, d2, d3, and d4 are calculated using a 6-tap filter. The filter interpolates a value for sub-pixels d1 based upon the values of the 6 pixels in a column at unit horizontal and unit vertical locations disposed symmetrically about b according to the formula d1 = (Ai-5A2 + 20A3 + 20A4 - 5A5 + A6 + 16)/32. Similarly, values of sub-pixels at half horizontal and half vertical locations (sub-pixels d3) are calculated according to the relationship d3 = (b1 - 5b2 + 20b3 + 20b4-5b5 + b6 + 16) /32 and values of sub-pixels at quarter horizontal and half vertical locations (sub-pixels d2 and d4) are calculated according to the relationship : d2 or d4 = (C1 - 5C2 + 20C3 + 20c4- 5c5 + C6 + 16) /32, where Cn sub-pixels are located in the same column as the d sub-pixel being calculated and are disposed at unit vertical locations symmetrically on either side of the sub-pixel d being interpolated. The values of sub-pixels pixels d1, d2, d3, and d4 are truncated to the nearest integer and clipped to the range 0 to 255, if necessary.
<Desc/Clms Page number 58>
Values of sub-pixels at quarter vertical locations are determined as follows : Referring to Figure 1 Db, values of sub-pixels at quarter vertical locations (sub-pixels e) are calculated as an average of the nearest pixel or sub-pixel values in the same column located above and below the sub-pixels being calculated. That is, values of sub-pixels e1, e2, e3, and e4 are calculated as (A3 + d1)/2, (c + d2) /2, (b + d3)/2, and (c + d4)/2, respectively and values of sub-pixels e1, e2, and e3 occupying the same row as sub-pixel f are calculated as (d1 + A4)/2, (d2 + c) /2, and (d3 + b4)/2 respectively, and truncated to the nearest integer value.
The values of the pixels and sub-pixels are generally represented by a certain number of bits. In one embodiment 8 bits are used. In this case, the maximum value that the pixels and sub-pixels can take is 255 (in the range 0 to 255). It is desirable to set a maximum value for the pixels and subpixels. In a preferred embodiment of the invention, whether 8 bits or some other number of bits is used, it is convenient to allow the maximum value of the pixels and sub-pixels to be the greatest number that can be represented by the number of bits assigned to represent the value. However, in other embodiments of the invention, it may be desired to allow the maximum value of the pixels and sub-pixels to be a number less than the greatest number that can be represented by the number of bits assigned to represent the value. In either case, the maximum value of the pixels or subpixels is the dynamic range of the pixel or sub-pixel values.
In connection with the foregoing paragraph, it should be noted that according to the embodiments of the invention described, the multi-tap interpolation filters calculate sub-pixel values according to a general formula involving a weighted sum of pixel or sub-pixel values. An example of such a
formula is (b1 - 5b2 + 20b3 + 20b4 - 5b5 + b6 + 16)/32, where b1, b2, b3, b4, b5, and b6 are values of sub-pixels b shown in Figure 1 Db. It can be seen that if the values b1 = x, b2 = 0, b3 = x, b4 = x, b5 = 0, and b6 = x, are substituted into the formula, the result is 21x/16 + 1/2. If x is the maximum value in the dynamic range allowed for the value of the pixel or sub-pixel values, then
<Desc/Clms Page number 59>
the result obtained would exceed the maximum value. Whenever the result of the formula exceeds the maximum value in the dynamic range allowed for the value of the pixel or sub-pixel values, the result is clipped to the maximum value. Accordingly, the clipping operations described in the foregoing are carried out whenever the need arises.
Although the foregoing description does not describe the construction of bidirectionally predicted frames (B-frames) in the encoder 700 and the decoder 800, it should be understood that in embodiments of the invention, such a capability may be provided. Provision of such capability is considered within the ability of one skilled in the art.
An encoder or a decoder according to the invention can be realised using hardware or software, or using a suitable combination of both. An encoder or decoder implemented in software may be, for example, a separate program or a software building block that can be used by various programs. In the above description and in the drawings, the functional blocks are represented as separate units, but the functionality of these blocks can be implemented, for example, in one software program unit.
The encoder 700 and decoder 800 are combined in order to form the codec 65. In addition to being implemented in a multimedia terminal, such a codec may also be implemented in a network. A codec according to the invention may be a computer program or a computer program element, or it may be implemented at least partly using hardware.
If the multimedia terminal 60 is a mobile terminal, that is if it is equipped with a radio transceiver 73, it will be understood by those skilled in the art that it may also comprise additional elements. In one embodiment it comprises a user interface having a display and a keyboard, which enables operation of the multimedia terminal 60 by a user, together with necessary functional blocks including a central processing unit, such as a microprocessor, which controls the blocks responsible for different functions of the multimedia
<Desc/Clms Page number 60>
terminal, a random access memory RAM, a read only memory ROM, and a digital camera. The microprocessor's operating instructions, that is program code corresponding to the basic functions of the multimedia terminal 60, is stored in the read-only memory ROM and can be executed as required by the microprocessor, for example under control of the user. In accordance with the program code, the microprocessor uses the radio transceiver 73 to form a connection with a mobile communication network, enabling the multimedia terminal 60 to transmit information to and receive information from the mobile communication network over a radio path.
The microprocessor monitors the state of the user interface and controls the digital camera. In response to a user command, the microprocessor instructs the camera to record digital images into the RAM. Once an image is captured, or alternatively during the capturing process, the microprocessor segments the image into image segments (for example macroblocks) and uses the encoder to perform motion compensated encoding for the segments in order to generate a compressed image sequence as explained in the foregoing description. A user may command the multimedia terminal 60 to display the captured images on its display or to send the compressed image sequence using the radio transceiver 73 to another multimedia terminal, a video telephone connected to a fixed line network (PSTN) or some other telecommunications device. In a preferred embodiment, transmission of image data is started as soon as the first segment is encoded so that the recipient can start a corresponding decoding process with a minimum delay.
Figure 11 is a schematic diagram of a mobile telecommunications network according to an embodiment of the invention. Multimedia terminals MS are in communication with base stations BTS by means of a radio link. The base stations BTS are further connected, through a so-called Abis interface, to a base station controller BSC, which controls and manages several base stations.
<Desc/Clms Page number 61>
The entity formed by a number of base stations BTS (typically, by a few tens of base stations) and a single base station controller BSC, controlling the base stations, is called a base station subsystem BSS. Particularly, the base station controller BSC manages radio communication channels and handovers. The base station controller BSC is also connected, through a so-called A interface, to a mobile services switching centre MSC, which coordinates the formation of connections to and from mobile stations. A further connection is made, through the mobile service switching centre MSC, to outside the mobile communications network. Outside the mobile communications network there may further reside other network (s) connected to the mobile communications network by gateway (s) GTW, for example the Internet or a Public Switched Telephone Network (PSTN). In such an external network, or within the telecommunications network, there may be located video decoding or encoding stations, such as computers PC. In an embodiment of the invention, the mobile telecommunications network comprises a video server VSRVR to provide video data to a MS subscribing to such a service. The video data is compressed using the motion compensated video compression method as described in the foregoing. The video server may function as a gateway to an online video source or it may comprise previously recorded video clips. Typical videotelephony applications may involve, for example, two mobile stations or one mobile station MS and a videotelephone connected to the PSTN, a
PC connected to the Internet or a H. 261 compatible terminal connected either to the Internet or to the PSTN.
In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention. While a number of preferred embodiments of the invention have been described in detail, it should be apparent that many modifications and variations thereto are possible, all of which fall within the true spirit and scope of the invention.

Claims (32)

1. A method of interpolation in video coding in which an image comprising pixels arranged in rows and columns and represented by values having a specified dynamic range, the pixels in the rows residing at unit horizontal locations and the pixels in the columns residing at unit vertical locations, is interpolated to generate values for sub-pixels at fractional horizontal and vertical locations, the method comprising: a) interpolating values for sub-pixels at half unit horizontal and unit vertical locations, unit horizontal and half unit vertical locations, and quarter unit horizontal and unit vertical locations directly using weighted sums of pixels residing at unit horizontal and unit vertical locations when such values for sub-pixels are required; b) interpolating values for sub-pixels at half unit horizontal and half unit vertical locations using a weighted sum of values for sub-pixels residing at half unit horizontal and unit vertical locations calculated according to step (a) when such values for sub-pixels are required; c) interpolating values for sub-pixels at quarter unit horizontal and half unit vertical locations using a weighted sum of values for sub-pixels residing at quarter unit horizontal and unit vertical locations calculated according to step (a) when such values for sub-pixels are required; and d) interpolating values for sub-pixels at quarter unit vertical locations when such values for sub-pixels are required by taking the average of the values of a first pixel or sub-pixel located at a horizontal location corresponding to that of the sub-pixel being calculated and unit vertical location and a second pixel or sub-pixel located at a horizontal location corresponding to that of the sub-pixel being calculated and half unit vertical location.
2. A method according to claim 1 in which when a value for a sub-pixel located at half unit horizontal and unit vertical location is required, it is interpolated by: (i) calculating a first weighted sum for the sub-pixel located at half unit horizontal and unit vertical location by using an n-tap filter which interpolates
<Desc/Clms Page number 63>
based upon values of n pixels at unit horizontal and unit vertical locations, the first weighted sum having an extended dynamic range exceeding the specified dynamic range and being dependent upon a first set of weighting factors; and (ii) dividing the first weighted sum calculated in step (i) by a first divisor which is dependent on the first set of weighting factors to produce a first result having a reduced dynamic range compared to the extended dynamic range and, if the first result having the reduced dynamic range exceeds the specified dynamic range, clipping the first result having the reduced dynamic range to produce a value for the sub-pixel located at half unit horizontal and unit vertical location such that it has the specified dynamic range.
3. A method according to claim 2 in which when a value for a sub-pixel located at quarter unit horizontal and unit vertical location is required, calculating a second weighted sum by using the first weighted sum and the value of a pixel located at unit horizontal and unit vertical location and dividing the second weighted sum by a second divisor which is dependent on the first set of weighting factors to produce a second result having a reduced dynamic range compared to the extended dynamic range and, if the second result having the reduced dynamic range exceeds the specified dynamic range, clipping the second result having the reduced dynamic range to produce a value of the sub-pixel located at quarter unit horizontal and unit vertical location such that it has the specified dynamic range.
4. A method according to any preceding claim in which when a value for a sub-pixel located at half unit vertical location is required: (i) calculating a weighted sum for the sub-pixel by using an n-tap filter which interpolates based upon values of n pixels or sub-pixels at a horizontal location corresponding to the horizontal location of the sub-pixel being calculated and at unit vertical locations, the weighted sum having an extended dynamic range exceeding the specified dynamic range and being dependent upon a set of weighting factors; and
<Desc/Clms Page number 64>
(ii) dividing the weighted sum calculated in step (i) by a divisor which is dependent on the set of weighting factors of step (i) to produce a result having a reduced dynamic range compared to the extended dynamic range and, if the result having the reduced dynamic range exceeds the specified dynamic range, clipping the result having the reduced dynamic range to produce the value for the sub-pixel located at half unit vertical location such that it has the specified dynamic range.
5. A method according to any of claims 2 to 4 in which extending or reducing the dynamic range, involves changing the number of bits which are used to represent the dynamic range.
6. A method according to any preceding claim, in which a value for a subpixel at quarter unit horizontal location and quarter unit vertical location is interpolated as an average of the four nearest pixels at unit horizontal and unit vertical location.
7. A method according to any preceding claim, in which values for all subpixels at half unit locations and values for all sub-pixels at quarter unit locations are calculated as they are required in the determination of a prediction frame during motion predictive coding.
8. A method according to any preceding claim which is used during video decoding.
9. A method according to claim 8 in which sub-pixels are only interpolated when their need is indicated by a motion vector.
10. A method according to any of claims 1 to 6 in which values for all subpixels at half unit locations and values for all sub-pixels at quarter unit locations are calculated and stored before being subsequently used in the determination of a prediction frame during motion predictive coding or decoding.
<Desc/Clms Page number 65>
11. A method according to any of claims 1 to 6, in which some values for sub-pixels are calculated and stored before being subsequently used in the determination of a prediction frame during motion predictive coding and some values for sub-pixels are calculated as they are required in the determination of a prediction frame during motion predictive coding or decoding.
12. A method according to claim 11 in which values of sub-pixels at half unit horizontal and unit vertical locations, unit horizontal and half unit vertical locations, and half unit horizontal and half unit vertical locations are calculated and stored before being subsequently used in the determination of a prediction frame during motion predictive coding or decoding.
13. A method according to claim 12 in which values of sub-pixels at quarter unit horizontal and unit vertical locations and quarter unit horizontal and half unit vertical locations are calculated and stored before being subsequently used in the determination of a prediction frame during motion predictive coding or decoding.
14. A method according to claim 12 in which values of sub-pixels at unit horizontal and quarter unit vertical locations and half unit horizontal and quarter unit vertical locations are calculated and stored before being subsequently used in the determination of a prediction frame during motion predictive coding or decoding.
15. A method according to any of claims 10 to 14 which is used during video encoding.
16. A method substantially as described herein with reference to the accompanying drawings.
<Desc/Clms Page number 66>
17. A video coder for coding an image comprising pixels arranged in rows and columns and represented by values having a specified dynamic range, the pixels in the rows residing at unit horizontal locations and the pixels in the columns residing at unit vertical locations, the coder comprising an interpolator adapted to generate values for sub-pixels located at fractional horizontal and vertical locations, the interpolator being adapted to: a) interpolate values for sub-pixels at half unit horizontal and unit vertical locations, unit horizontal and half unit vertical locations, and quarter unit horizontal and unit vertical locations directly using weighted sums of pixels residing at unit horizontal and unit vertical locations when such values for sub-pixels are required; b) interpolate values for sub-pixels at half unit horizontal and half unit vertical locationsusing a weighted sum of values for sub-pixels residing at half unit horizontal and unit vertical locations calculated according to step (a) when such values for sub-pixels are required; c) interpolate values for sub-pixels at quarter unit horizontal and half unit vertical locations using a weighted sum of values for sub-pixels residing at quarter unit horizontal and unit vertical locations calculated according to step (a) when such values for sub-pixels are required; and d) interpolate values for sub-pixels at quarter unit vertical locations when such values for sub-pixels are required by taking the average of the values of a first pixel or sub-pixel located at a horizontal location corresponding to that of the sub-pixel being calculated and unit vertical location and a second pixel or sub-pixel located at a horizontal location corresponding to that of the sub-pixel being calculated and half unit vertical location.
18. A video coder according to claim 17 comprising a video encoder.
19. A video coder according to claim 17 comprising a video decoder.
20. A codec comprising the video encoder of claim 18 and the video decoder of claim 19.
<Desc/Clms Page number 67>
21. A codec substantially as described herein with reference to the accompanying drawings.
22. A communications terminal comprising a video coder for coding an image comprising pixels arranged in rows and columns and represented by values having a specified dynamic range, the pixels in the rows residing at unit horizontal locations and the pixels in the columns residing at unit vertical locations, the coder comprising an interpolator adapted to generate values for sub-pixels located at fractional horizontal and vertical locations, the interpolator being adapted to: a) interpolate values for sub-pixels at half unit horizontal and unit vertical locations, unit horizontal and half unit vertical locations, and quarter unit horizontal and unit vertical locations directly using weighted sums of pixels residing at unit horizontal and unit vertical locations when such values for sub-pixels are required; b) interpolate values for sub-pixels at half unit horizontal and half unit vertical locations using a weighted sum of values for sub-pixels residing at half unit horizontal and unit vertical locations calculated according to step (a) when such values for sub-pixels are required; c) interpolate values for sub-pixels at quarter unit horizontal and half unit vertical locations using a weighted sum of values for sub-pixels residing at quarter unit horizontal and unit vertical locations calculated according to step (a) when such values for sub-pixels are required; and d) interpolate values for sub-pixels at quarter unit vertical locations when such values for sub-pixels are required by taking the average of the values of a first pixel or sub-pixel located at a horizontal location corresponding to that of the sub-pixel being calculated and unit vertical location and a second pixel or sub-pixel located at a horizontal location corresponding to that of the sub-pixel being calculated and half unit vertical location.
23. A communications terminal according to claim 22 comprising a video encoder.
<Desc/Clms Page number 68>
24. A communications terminal according to claim 22 comprising a video decoder.
25. A communications terminal according to any of claims 22 to 24 having a codec comprising a video encoder and a video decoder.
26. A communications terminal according to any of claims 22 to 25 comprising a user interface, a processor and at least one of a transmitting block and a receiving block.
27. A communications terminal according to claim 26 in which the processor controls the operation of the transmitting block and/or the receiving block and the video coder.
28. A communications terminal substantially as described herein with reference to the accompanying drawings.
29. A telecommunications system comprising a communications terminal and a network, the communications terminal and the network being connected by a communications link over which coded video can be transmitted, the communications terminal comprising a video coder for coding an image comprising pixels arranged in rows and columns and represented by values having a specified dynamic range, the pixels in the rows residing at unit horizontal locations and the pixels in the columns residing at unit vertical locations, the coder comprising an interpolator adapted to generate values for sub-pixels located at fractional horizontal and vertical locations, the interpolator being adapted to: a) interpolate values for sub-pixels at half unit horizontal and unit vertical locations, unit horizontal and half unit vertical locations, and quarter unit horizontal and unit vertical locations directly using weighted sums of pixels residing at unit horizontal and unit vertical locations when such values for sub-pixels are required;
<Desc/Clms Page number 69>
b) interpolate values for sub-pixels at half unit horizontal and half unit vertical locations using a weighted sum of values for sub-pixels residing at half unit horizontal and unit vertical locations calculated according to step (a) when such values for sub-pixels are required; c) interpolate values for sub-pixels at quarter unit horizontal and half unit vertical locations using a weighted sum of values for sub-pixels residing at quarter unit horizontal and unit vertical locations calculated according to step (a) when such values for sub-pixels are required; and d) interpolate values for sub-pixels at quarter unit vertical locations when such values for sub-pixels are required by taking the average of the values of a first pixel or sub-pixel located at a horizontal location corresponding to that of the sub-pixel being calculated and unit vertical location and a second pixel or sub-pixel located at a horizontal location corresponding to that of the sub-pixel being calculated and half unit vertical location.
30. A telecommunications system according to claim 29 which is a mobile telecommunications system comprising mobile communications terminals and a wireless network, the connection between the mobile communications terminals and the wireless network being formed by a radio link.
31. A telecommunications system according to claim 29 or claim 30 in which the network enables the communications terminal to communicate with other communications terminals connected to the network over communications links between the other communications terminals and the network.
32. A telecommunications system comprising a communications terminal and a network, the communications terminal and the network being connected by a communications link over which coded video can be transmitted, the network comprising a video coder for coding an image comprising pixels arranged in rows and columns and represented by values having a specified dynamic range, the pixels in the rows residing at unit horizontal locations and the pixels in the columns residing at unit vertical
<Desc/Clms Page number 70>
locations, the coder comprising an interpolator adapted to generate values for sub-pixels located at fractional horizontal and vertical locations, the interpolator being adapted to: a) interpolate values for sub-pixels at half unit horizontal and unit vertical locations, unit horizontal and half unit vertical locations, and quarter unit horizontal and unit vertical locations directly using weighted sums of pixels residing at unit horizontal and unit vertical locations when such values for sub-pixels are required; b) interpolate values for sub-pixels at half unit horizontal and half unit vertical locations using a weighted sum of values for sub-pixels residing at half unit horizontal and unit vertical locations calculated according to step (a) when such values for sub-pixels are required; c) interpolate values for sub-pixels at quarter unit horizontal and half unit vertical locations using a weighted sum of values for sub-pixels residing at quarter unit horizontal and unit vertical locations calculated according to step (a) when such values for sub-pixels are required; and d) interpolate values for sub-pixels at quarter unit vertical locations when such values for sub-pixels are required by taking the average of the values of a first pixel or sub-pixel located at a horizontal location corresponding to that of the sub-pixel being calculated and unit vertical location and a second pixel or sub-pixel located at a horizontal location corresponding to that of the sub-pixel being calculated and half unit vertical location.
GB0122396A 2001-09-17 2001-09-17 Interpolating values for sub-pixels Withdrawn GB2379820A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB0122396A GB2379820A (en) 2001-09-17 2001-09-17 Interpolating values for sub-pixels

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB0122396A GB2379820A (en) 2001-09-17 2001-09-17 Interpolating values for sub-pixels

Publications (2)

Publication Number Publication Date
GB0122396D0 GB0122396D0 (en) 2001-11-07
GB2379820A true GB2379820A (en) 2003-03-19

Family

ID=9922203

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0122396A Withdrawn GB2379820A (en) 2001-09-17 2001-09-17 Interpolating values for sub-pixels

Country Status (1)

Country Link
GB (1) GB2379820A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8971412B2 (en) 2008-04-10 2015-03-03 Qualcomm Incorporated Advanced interpolation techniques for motion compensation in video coding
US10045046B2 (en) 2010-12-10 2018-08-07 Qualcomm Incorporated Adaptive support for interpolating values of sub-pixels for video coding
US10440388B2 (en) 2008-04-10 2019-10-08 Qualcomm Incorporated Rate-distortion defined interpolation for video coding based on fixed filter or adaptive filter
US10462480B2 (en) 2014-12-31 2019-10-29 Microsoft Technology Licensing, Llc Computationally efficient motion estimation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0536717A2 (en) * 1991-10-10 1993-04-14 Salora Oy A method to double the sample density of an orthogonally sampled picture
EP0859242A1 (en) * 1997-02-13 1998-08-19 ATL Ultrasound, Inc. High resolution ultrasonic imaging through interpolation of received scanline data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0536717A2 (en) * 1991-10-10 1993-04-14 Salora Oy A method to double the sample density of an orthogonally sampled picture
EP0859242A1 (en) * 1997-02-13 1998-08-19 ATL Ultrasound, Inc. High resolution ultrasonic imaging through interpolation of received scanline data

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8971412B2 (en) 2008-04-10 2015-03-03 Qualcomm Incorporated Advanced interpolation techniques for motion compensation in video coding
US10440388B2 (en) 2008-04-10 2019-10-08 Qualcomm Incorporated Rate-distortion defined interpolation for video coding based on fixed filter or adaptive filter
US11683519B2 (en) 2008-04-10 2023-06-20 Qualcomm Incorporated Rate-distortion defined interpolation for video coding based on fixed filter or adaptive filter
US10045046B2 (en) 2010-12-10 2018-08-07 Qualcomm Incorporated Adaptive support for interpolating values of sub-pixels for video coding
US10462480B2 (en) 2014-12-31 2019-10-29 Microsoft Technology Licensing, Llc Computationally efficient motion estimation

Also Published As

Publication number Publication date
GB0122396D0 (en) 2001-11-07

Similar Documents

Publication Publication Date Title
CA2452632C (en) Method for sub-pixel value interpolation
EP1466477B1 (en) Coding dynamic filters
AU2002324085A1 (en) Method for sub-pixel value interpolation
KR20150034699A (en) Method and apparatus for image interpolation having quarter pixel accuracy using intra prediction modes
GB2379820A (en) Interpolating values for sub-pixels
AU2007237319B2 (en) Method for sub-pixel value interpolation

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)