WO2004015997A1

WO2004015997A1 - Object-based scalable video transmissions

Info

Publication number: WO2004015997A1
Application number: PCT/EP2003/005080
Authority: WO
Inventors: Jonathan Soon Yew Teh; Angus Reid
Original assignee: Motorola Inc
Priority date: 2002-07-31
Filing date: 2003-05-12
Publication date: 2004-02-19
Also published as: AU2003240646A1; GB2391413A; HK1062759A1; GB2391413B; GB0217761D0

Abstract

A method (800) for improving a quality of MPEG-4, or similar, object-based scalable video coding in a video communication system (700) employing one or more enhancement layers and one or more base layers. The method includes the steps of decoding the one or more enhancement layers and one or more base layers of a received scalable video sequence into a plurality of image blocks having one or more predetermined boundaries and identifying at least one object of the decoded enhancement layer as a forward and/or backward shape. The method further includes the step of padding (810) the forward and/or backward shape of the object to one block boundary of the enhancement layer frame. In this manner, the occurrence of artifacts in the decoded object in the enhancement layer of the received frame is reduced. Thus, the subjective quality of the video image is improved.

Description

Object-Based Scalable Video Transmissions

Field of the Invention

This invention relates to video transmission units and systems employing and supporting video encoding/decoding techniques. The invention is applicable to, but not limited to, an object-based video compression system where the video has been compressed using a scalable compression technique.

Background of the Invention

In the field of video technology, it is known that video is transmitted as a series of still images/pictures. Since the quality of a video signal can be affected during coding or compression of the video signal, it is known to include additional information or "layers' based on the difference between the video signal and the encoded video bit stream. The inclusion of additional layers enables the quality of the received signal, following decoding and/or decompression, to be enhanced. Hence, a hierarchy of base pictures and enhancement pictures, partitioned into one or more layers, is used to produce a layered video bit stream.

A scalable video bit-stream refers to the ability to transmit and receive video signals of more than one resolution and/or quality simultaneously. A scalable video bit-stream is one that may be decoded at different rates, according to the bandwidth available at the decoder. This enables the user with access to a higher bandwidth channel to decode high quality video, whilst a lower bandwidth user is still able to view the same video, albeit at a lower quality. A primary application for scalable video transmissions is for systems where multiple decoders with access to differing bandwidths are receiving images from a single encoder.

In a layered (scalable) video bit stream, enhancements to the video signal may be added to a base layer either by: (i) Increasing the resolution of the picture (spatial scalability) ;

(ii) Including error information to improve the Signal to Noise Ratio of the picture (SNR scalability) ;

(iii) Including extra pictures to increase the frame rate (temporal scalability) ; or

(iv) Providing a continuous enhancement that may be truncated at any chosen bit rate (Fine Granular Scalability) .

Such enhancements may be applied to the whole picture or to an arbitrarily shaped object within the picture, which is termed object-based scalability. In order to preserve the disposable nature of the temporal enhancement layer, the H.263+ ITU H.263 [ITU-T Recommendation, H.263, "Video Coding for Low Bit Rate Communication"] and MPEG-4 standards dictate that pictures included in the temporal scalability mode should be bi-directionally predicted (B) pictures, as shown in the video stream of FIG. 1.

FIG. 1 shows a schematic illustration of a scalable video arrangement 100 illustrating B picture prediction dependencies, as known in the field of video coding techniques. An initial intra-coded frame (Iχ) 110 is followed by a bi-directionally predicted frame (B₂) 120. This, in turn, is followed by a (uni-directional) predicted frame (P₃) 130, and again followed by a second bi-directionally predicted frame (B₄) 140. This again, in turn, is followed by a (uni-directional) predicted frame (P₅) 150, and so on.

As an enhancement to the arrangement of FIG. 1, a layered video bit stream may be used. FIG. 2 is a schematic illustration of a layered video arrangement, known in the field of video coding techniques. A layered video bit stream includes a base layer 205 and one or more enhancement layers 235.

The base layer (layer-1) includes one or more intra-coded pictures (I pictures) 210 sampled, coded and/or compressed from the original video signal pictures. Furthermore, the base layer will include a plurality of subsequent predicted inter-coded pictures (P pictures)

220, 230 predicted from the intra-coded picture (s) 210.

In the enhancement layers (layer-2 or layer-3 or higher layer(s)) 235, three types of picture may be used: (i) Bi-directionally predicted (B) pictures (not shown) ;

(ii) Enhanced intra-coded (El) pictures 240 based on the intra-coded picture (s) 210 of the base layer 205; and (iii) Enhanced predicted (EP) pictures 250, 260, based on the inter-coded predicted pictures 220, 230 of the base layer 205.

The vertical arrows from the lower, base layer illustrate that the picture in the enhancement layer is predicted from a reconstructed approximation of that picture in the reference (lower) layer.

If prediction is only formed from the lower layer, then the enhancement layer picture is referred to as an El picture. It is possible, however, to create a modified bi-directionally predicted picture using both a prior enhancement layer picture and a temporally simultaneous lower layer reference picture. This type of picture is referred to as an EP picture or "Enhancement" P-picture. The prediction flow for El and EP pictures is shown in FIG. 2. (Although not specifically shown in FIG. 2, an El picture in an enhancement layer may have a P picture as its lower layer reference picture, and an EP picture may have an I picture as its lower-layer enhancement picture . )

For both El and EP pictures, the prediction from the reference layer uses no motion vectors. However, as with normal P pictures, EP pictures use motion vectors when predicting from their temporally, prior-reference picture in the same layer.

Current standards incorporating the aforementioned scalability techniques include MPEG-4 and H.263. However, MPEG-4 extends that temporal scalability such that the pictures or Video Object Planes (VOPs) of the enhancement layer can be predicted from each other. These standards create highly compressed bit-streams, which represent the coded video. However, due to this high compression, the bit-streams are very prone to corruption by network errors as they are transmitted.

Object based coding segments a video sequence into two or more objects, for example, a first object may be designated as a person in the foreground of an image, whereas a second object may be designated as the background scenery. In this way, different numbers of bits can be allocated to each object, so that more important objects are encoded at a higher quality and less important objects (such as the background) are encoded with fewer bits resulting in lower subjective quality. As mentioned above, temporal scalability offers scalability of the temporal resolution, that is, the bit stream can be decoded at different frame rates. Object based temporal scalability means that an enhancement layer increases the frame rate of one of the arbitrary shaped objects in the base layer.

There are two types of temporally scalable enhancement structure defined by the MPEG-4 standard. FIG. 3 illustrates an example of Type 1 temporal scalability, which is a focus of the present invention. In FIG. 3, Video Object Layer 0 ' (VOL 0') 330 consists of an entire frame 310, both the foreground object 321 and the background 332, whilst Video Object Layer ^λl' (VOL ^vl')

320 consists of only the foreground object 322 of VOL ^x0' . VOL 0' 330 is coded with a low frame rate and VOL ' 1' 320 is coded with a higher frame rate. The subsequent background for frame-2 and frame-4 are formed using a background composition process. The foreground objects 322 and 324 are overlaid onto the background in frame-2 and frame-4 respectively. In FIG. 3, forward prediction is used to create enhancement layer PVOPs whereas FIG. 4 illustrates a similar Type 1 structure using B-VOPs.

Therefore, referring now to FIG. 4, the foreground object of frame-2 422 of the enhancement layer is formed by combining the foreground objects of two base layer frames frame-0 and frame-6, followed by overlapping the object of the enhancement layer onto the combined frame.

With Type-1 temporal scalability, the enhancement layer does not consist of a whole frame, but only a foreground object. It is therefore necessary to generate the background of the image frame from the previous and subsequent base layer VOPs . The enhancement layer object can then be "pasted" onto the re-generated background object .

Type-2 temporal scalability deals with a sequence of an entire frame, which only contains a background and does not have a scalability layer. VideoObjectO (VO0) only contains the entire background of a frame, whereas VideoObjectl (VOl) is a sequence of a particular object with scalability layers VOL ' 0' and VOL ^v 1' that are known as the base and enhancement layers respectively.

Note that VO0 may not have the same frame rate as other VOs. Hence, in Type-2 temporal scalability there is no need for background composition (as VOO only contains the entire background, which is not the case with Type 1 Temporal Scalability where VOL0 also contains the foreground object) .

The closest known technology is the background composition method as described in the standard [MPEG-4 Video Verification Model version 17.0 ISO/IEC JTC1/SC29/WG11 N3515 July 2000 Beijing Section 6.5.2 p232] .

Referring now to FIG. 5, an example 500 of how MPEG-4 deals with background composition for frames in the enhancement layer, is illustrated. The example details the frame transitions for a ball moving from the lower- left corner of a first base layer frame 510 to the upper- right corner of a subsequent frame 540. The base layer frame 510 is frame-0, which contains the background and represents the shape of the selected foreground object 512, a black ball (termed the "forward shape") .

As the foreground object moves towards the upper-right corner in the second frame 520, the object's shape in the base layer of the next frame 520 is represented by a vertically striped object 522 (termed the "forward shape") . For the background region outside these shapes, the pixel value from the nearest frame ( frame- 0) at the base layer is used for the composed frame. These areas are shown in white in FIG. 5.

The object area 522 is now the forward shape and hence obtains (composes) its background from the subsequent base layer frame, frame-6. The object ball 524 is now obtained from the enhancement layer.

In frame-4 530, the foreground object 536 continues to move towards the upper-right corner of the frame. The pixel values of the background are now generated from frame-6 540, as this frame is closer in time to the subsequent frame in the base layer. The horizontally striped area 538 is the backward shape and hence obtains its background from the previous base layer frame, frame-

0. The object ball 536 is obtained from the enhancement layer.

Finally, in frame-6 540, the foreground object 548 is now in the upper-right corner of the frame 540. This is again just a base layer frame with both background and foreground .

Whilst this approach is simple, it produces some very annoying and obvious artefacts due to the spatial discontinuity caused when most of the background is filled in using the previous or next (as in frame-4) base layer picture, whilst some of the background has to be filled in using the next or previous (as in frame-4) base layer picture. The filling-in operation of the background using the subsequent base layer whilst filling in the backward shape using the previous base layer results in undesirable artefacts . Such artefacts generally occur as a result of insufficient bandwidth in the base layer, thereby causing a blurring of the foreground object. This problem is shown in FIG. 6, which illustrates the problem using the previous example of the moving ball.

Again, the first frame 610 is frame-0, which is a base layer frame that contains the background and the foreground object, a black ball 612. Note however that the edges of the black ball will not be well defined due to the high compression ratios in the base layer. As a result, the object ball 612 will be 'spread' throughout the (background) blocks that contain it . The grey- coloured blocks 602, 606, 608 indicate these. (This 'spreading' effect will affect block-based image compression methods such as the hybrid-DCT method used in MPEG-4. It is due to effect of discarding higher frequency components of the DCT. The term commonly used for this effect is the "blocking artefact" . )

In frame-2 620, the forward object (indicated as vertical stripes) 622 will compose its background from frame-6 640. The rest of the background will be composed from frame-0 610. Whilst most of the forward object 622 would contain the true background (as in frame-6 640, the foreground object is not contained by the lower-left block 606), the three grey background blocks 602, 606, 608 would contain part of the foreground object 612 from frame-0 610.

In frame-4 630, the backward object (indicated in horizontal stripes) 638 will compose its background from frame-0 610, whilst the rest of the background is from frame-6 640. Again, the same problem arises. Most of the backward object 638 would contain the true background (as in frame-0 610 the foreground object 612 is not contained by the upper-right block 604) , the three grey blocks 602, 604, 608 would contain part of the foreground object 648 from frame-6 640. This leads to a shadowing effect.

In frame-6 640, the foreground object 648 is now predominantly located in the upper-right corner (block) 604 of the frame 640. This is again just a base layer frame with both background and foreground.

The inventors of the present invention have appreciated a number of problems in this field, for example, in accurately generating background frames using only the data available at the decoder. That is, generating the background using the previous and subsequent base layer pictures, producing as high a subjective quality as possible, using as simple an algorithm as possible to limit complexity at the decoder.

In summary, there exists a need in the field of video communications, and in particular in temporally scalable video communications, for an apparatus and a method to improve a quality of object-based images generated from the base layer, where the abovementioned disadvantages with prior art arrangements may be alleviated.

Statement of Invention

In accordance with a first aspect of the present invention, there is provided a method for improving a quality of MPEG-4, or similar, object-based scalable video coding process in a video communication system, as claimed in Claim 1.

In accordance with a second aspect of the present invention, there is provided a video communication system, as claimed in Claim 6.

In accordance with a third aspect of the present invention, there is provided a video communication unit, as claimed in Claim 8.

In accordance with a fourth aspect of the present invention, there is provided a video communication unit, as claimed in Claim 9.

In accordance with a fifth aspect of the present invention, there is provided a video encoder, as claimed in Claim 10.

In accordance with a sixth aspect of the present invention, there is provided a video decoder, as claimed in Claim 11.

In accordance with a seventh aspect of the present invention, there is provided a mobile radio device, as

Claimed in Claim 12.

In accordance with an eighth aspect of the present invention, there is provided a storage medium storing processor- implementable instructions, as claimed in Claim

14. Further aspects of the present invention are as claimed in the dependent Claims.

In summary, an apparatus and a method for improving a quality of MPEG-4, or similar, object-based scalable video coding process in a video communication system are described. This invention provides apparatus and a method by which forward and/or backward shapes of an object are padded to one block boundary of the enhancement layer frame, to reduce artefacts in the decoded object. In this manner, the subjective quality of the video image is improved.

Brief Description of the Drawings

FIG. 1 is a schematic illustration of a video coding arrangement showing picture prediction dependencies, as known in the field of video coding techniques;

FIG. 2 is a schematic illustration of a known layered video coding arrangement ;

FIG. 3 illustrates a known example of Type-1 scalability using predictive video objects;

FIG. 4 illustrates a known example of Type-1 scalability using bi-predictive video objects;

FIG. 5 illustrates an example of the known MPEG-4 background composition technique; and FIG. 6 illustrates a problem with the known MPEG-4 background composition technique.

Exemplary embodiments of the present invention will now be described, with reference to the accompanying drawings, in which:

FIG. 7 is a schematic representation of a scalable video communication system adapted to compensate for background composition artefacts in accordance with the preferred embodiment of the present invention; and

FIG. 8 illustrates a flowchart of a background composition scheme in accordance with the preferred embodiment of the present invention.

Description of Preferred Embodiments

This invention applies to any object-based coding scheme, particularly to a Type-1 temporally scalable object-based encoded video as defined in the MPEG-4 standard. However, it is envisaged that the inventive concepts described herein can be applied to any other subsequently developed or standardised types of object-based coding schemes .

Referring first to FIG. 7, a schematic representation of a video communication system 700, including video encoder 715 and video decoder 725, adapted to incorporate the preferred embodiment of the present invention, is shown.

In FIG. 7, a video picture F₀ is compressed 710 in a video encoder 715 to produce the base layer bit stream signal to be transmitted at a rate r_x kilobits per second (kbps) . This signal is decompressed 720 at a video decoder 725 to produce the reconstructed base layer picture F₀' .

The compressed base layer bit stream is also decompressed at 730 in the video encoder 715 and compared with the original picture F₀ at 740 to produce a difference signal 750. This difference signal is compressed at 760 and transmitted as the enhancement layer bit stream at a rate r₂ kbps to a video decoder.

The transmission from the encoder to the decoder preferably occurs over a wireless or wired transmission medium supporting one or more communication links.

The decoder includes a decompression function 720 to decompress the base layer bit stream received at a rate ri kilobits per second (kbps) to reconstruct a base layer picture F₀' .

In accordance with the preferred embodiment of the present invention, the enhancement layer functionality within the decoder has been adapted to include an additional background composition padding function 775. The enhancement layer bit stream is decompressed at 770 in the video decoder 725 to produce an enhancement layer picture, upon which the background composition padding function 775 is applied to the selected foreground object. The resultant output is the enhancement layer picture F₀' ' which is added to the reconstructed base layer picture F₀ ' at 780 to produce the final reconstructed picture F₀ ^'" . The padding is performed to the nearest block boundary. Block boundaries result from tessellating the image into regular non-overlapping square blocks for the purposes of motion estimation and compensation. MPEG uses 16x16 pixel blocks. Hence, a typical quaternary common intermediate format (QCIF) (176x144) image would have 11x9 blocks of 16x16 pixels. Each pixel in the image would then 'belong' to one and only one particular block.

In the context of the present invention, the expression 'padding' substantially incorporates the following operations. A shape mask refers to a binary image that is transmitted with every frame where a '1' is used to indicate that the pixel is in the foreground object and a

'0' is used to indicate that the pixel is in the background. Thus, to determine whether a pixel is in the background, foreground object, forward shape, backward shape, or both forward and backward shapes, we need the shape masks from the two nearest base layer frames and the current enhancement layer frame.

We assume that there exists an image that already classifies each pixel in an image according to the classifications given in the previous paragraph.

The following algorithm is one method of performing padding. Other methods are also possible, e.g. using a recursive function. The expression '4 -neighbour' refers to the neighbours above, below, to the left and to the right (contrast with 8 -neighbours which also includes diagonal neighbours) . Do {

All_pixels_assigned=true For each pixel in image { If current_pixel is in backward shape {

If (nearest 4 -neighbour pixel is in the same block as current_pixel) and (that nearest 4- neighbour pixel is assigned to the background) {

Assign pixel to backward shape All_pixels_assigned=false

}

} } Until All_pixels_assigned==true

This algorithm works by going through every pixel in the image (usually in raster order) and then checking whether it is in the backward shape based on the location of the pixel. If it is, then it checks each of its 4-neighbour pixels that are still within the same block. If that 4- neighbour pixel is currently assigned to the background, it assigns it to the backward shape. It then moves on to the next 4-neighbour pixel, and when that is done, it moves on to the next pixel . The process is repeated until no more pixel reassignments (from background to backward shape) are possible.

Two pixels belong to the same block if they have the same block co-ordinates. To obtain the block co-ordinates from pixel co-ordinates, simply divide each of the pixel x and y co-ordinates by the block size, in this case 16. It is within the contemplation of the invention that alternative encoding and decoding configurations could be adapted to use the background composition padding function 775 within the enhancement layer bit-stream. As a result, the inventive concepts hereinafter described should not be viewed as being limited to the example configuration provided in FIG. 7.

As discussed above, in prior art arrangements, an MPEG-4 enhancement layer bit -stream does not include the increased enhancement layer functionality. The inventors of the present invention have recognised that prior art decoders of object-based temporally scalable video have been found to introduce artefacts.

In an additional, or alternative, embodiment of the present invention, the use of a background composition padding function may be applied to the forward shape, padding the forward shape to the block boundaries . In this manner, when applying the decoder of FIG. 7 to the example sequence of FIG. 6, the visible blocking artefacts in the background surrounding the forward (and/or backward) shape are prevented. Advantageously, this results in just the grey 'background' blocks being visible, without any blurring of the object.

In general, the video encoding and decoding functions are performed in signal processing units within the video communication units. Therefore, it is envisaged that the aforementioned adaptation of a decoding operation in a video communication unit may be implemented in the respective video communication unit in any suitable manner. For example, new apparatus may be added to a conventional video communication unit, or alternatively existing parts of a conventional video communication unit may be adapted, for example by reprogramming one or more processors therein. As such, the required adaptation may be implemented in the form of processor-implementable instructions stored on a storage medium, such as a floppy disk, hard disk, programmable read only memory (PROM) , random access memory (RAM) or any combination of these or other storage multimedia.

It is also within the contemplation of the invention that such adaptation of a video decoding operation may be facilitated by any video communication unit operating in a video communication system, for example user equipment such as mobile or portable radios or telephones, or wireless or wired serving communication units such as a base transceiver station.

Referring now to FIG. 8, a flowchart 800 illustrates the background composition process, in accordance with the preferred embodiment of the present invention. The background composition process starts with the novel step of padding backward and/or forward shapes of an enhancement layer frame to the block boundaries, as shown in step 810. The process moves onto the next pixel in the image, in step 812. The flowchart then follows substantially the known MPEG-4 decoding operation of the enhancement layer video stream with regard to determining whether the current pixel is in a forward or backward shape, as shown in step 815. It is noteworthy that the process of padding the forward and backward shape up to the block boundaries is performed first before the background composition itself. Thus, for example, the aforementioned padding process is performed, which requires repeating for each pixel in a video frame, and then the background composition process is performed. Effectively, step 810 and steps 812 to 830 form two separate processes of padding, followed by background composition.

If the current pixel is in a forward or backward shape in step 815, a determination is made as to whether the enhancement layer VOP is closer in time to the previous base layer VOP, in step 820. If the enhancement layer VOP is closer in time, in step 820, the corresponding pixel is used from the previous base layer VOP in step 825. The processing of the current pixel is then complete, in step 830. If the enhancement layer VOP is not closer in time, in step 820, the corresponding pixel from the subsequent base layer VOP is used in the background composition generation, in step 835. The processing of the current pixel is then complete, in step 830.

If the current pixel is a forward and/or backward shape in step 815, a determination is made as to whether the current pixel is a forward shape, in step 840. If the current pixel is not a forward shape, i.e. it is a backward shape only, the corresponding pixel from the previous base layer VOP is used, in the background composition generation process, in step 825. The process for the current pixel is then complete in step 830. If the current pixel is determined as being a forward shape, in step 840, a determination is made as to whether it is also a backward (overlapping) shape, in step 845. If the current pixel is determined as not being a backward shape, in step 845, i.e. it is a forward shape only, the corresponding pixel from the subsequent base layer VOP is used in the background composition generation process, in step 835. The process for the current pixel is then complete in step 830.

If the current pixel is determined as being both a backward and a forward shape, from the determinations of steps 815, 840, 845, the closest available output pixel value is used as padding in step 850. The process for the current pixel is then complete and a determination is made as to whether all of the pixels have been visited in step 855. If all pixels have been visited in step 855, the process is complete in step 830. If all pixels have not been visited in step 855, the process returns to step

812 and the background composition process repeats with the next pixel .

It is envisaged that the padding operation would be used when the base layer has been encoded with a block-based encoder, such as the hybrid-DCT system used in MPEG-4 at a low bit -rate. In the experiments conducted by the inventors of the present invention, a QCIF frame size (176x144) at 32kbps was used for the base layer. This led to heavy blocking artefacts in the base layer. There would preferably be a predetermined bit-rate threshold for the base layer whereby the padding operation would be activated. This threshold may vary depending on the type of sequence (i.e. high or low texture, fast or slow motion) , for example, for QCIF videos, at or below 64kbps for the base layer.

It is within the contemplation of the present invention that the aforementioned inventive concepts may be applied to any video communication unit and/or video communication system. In particular, the inventive concepts find particular use in wireless (radio) devices such as mobile telephones/mobile radio units and associated wireless communication systems supporting image/video communication. Such wireless communication units may include a portable or mobile PMR radio, a personal digital assistant, a laptop computer or a wirelessly networked PC.

Although the preferred embodiment of the present invention has been described with reference to the MPEG-4 standard, scalable video system technology may be implemented in the 3^rd generation (3G) of digital cellular telephones, commonly referred to as the Universal Mobile Telecommunications Standard (UMTS) . Scalable video system technology may also find applicability in the packet data variants of both the current 2^nd generation of cellular telephones, commonly referred to as the general packet -data radio system (GPRS) and the TErrestrial Trunked RAdio (TETRA) standard for digital private and public mobile radio systems. Furthermore, scalable video system technology may also be utilised in the Internet.

The aforementioned inventive concepts will therefore find applicability in, and thereby benefit, all these emerging technologies .

It is envisaged that the benefits of the aforementioned inventive concepts may be keenly appreciated where Type 1 temporal scalability is of most use, i.e. in applications using fast moving video images where the user will benefit from a higher frame rate for objects of importance in the enhancement layer. It is also noteworthy that scalability offers an advantage in conditions where the end-to-end channel bandwidth is uncertain, such as the Internet, mobile networks etc, as it offers additional flexibility over single layer coding. For example a sports application where the foreground object is the sportsperson or ball, which is updated more frequently than the background. This would allow, for example, the base layer to be distributed for free, whilst the customer may be encouraged to pay for the enhancement layer in order to follow the game properly.

It will be understood that the video transmission arrangement described above provides at least the following advantages: (i) Minimal additional complexity is needed as the only additional processing of decoded enhancement layer bits involves padding the forward and/or backward shape .

(ii) Higher quality video images are obtained when such an adapted video decoder is used to decode any object-based video stream, for example, an MPEG-4 standard compatible object based temporally scalable video bit-stream. The inventive concepts find particular applicability in the video processing of sports and other applications using fast motion video, which are predicted to be a large market for both the wireless and wired markets.

(iii) All additional processing is at the decoder. Hence, this requires no modifications to the encoder or encoded bit-stream. In particular, this means that the shape mask is unchanged; one possible method of improving the background composition quality would be to use a larger shape mask in order to include some of the original background in the enhancement layer as well .

Method of the invention:

In summary, a method for improving a quality of MPEG-4, or similar, object-based scalable video coding in a video communication system employing one or more enhancement layers and one or more base layers is described. The method includes the steps of decoding the one or more enhancement layers and one or more base layers of a received scalable video sequence into a plurality of image blocks having one or more predetermined boundaries ; and identifying at least one object of the decoded enhancement layer as a forward and/or backward shape. The method further includes the step of padding the forward and/or backward shape of the object to one block boundary of the enhancement layer frame.

Apparatus of the invention: Furthermore, a video communication system has been described that includes a video encoder including a processor for encoding an object-based scalable video sequence into an MPEG-4, or similar, having a plurality of enhancement layers and a transmitter for transmitting the object -based scalable video sequence. A video decoder includes a receiver for receiving the object- based scalable video sequence containing the plurality of enhancement layers from the video encoder; and a processor operably coupled to the receiver for decoding the one or more enhancement layers and one or more base layers of a received scalable video sequence into a plurality of image blocks having one or more predetermined boundaries, identifying at least one object of the decoded enhancement layer as a forward and/or backward shape, and padding the forward and/or backward shape of the object to one block boundary of the enhancement layer frame.

A video communication unit has been described that includes a decoder comprising a receiver for receiving an object-based scalable video sequence from an encoding video communication unit wherein the object-based scalable video sequence comprises a plurality of enhancement layers and one or more base layers. A processor is operably coupled to the receiver, and decodes the one or more enhancement layers and one or more base layers into a plurality of image blocks having one or more predetermined boundaries . The processor identifies at least one object of a decoded enhancement layer as a forward and/or backward shape, and pads the forward and/or backward shape of the object to one block boundary of the enhancement layer frame.

In this manner, the occurrence of artefacts in the decoded object in the enhancement layer of the received frame is reduced. Thus, the subjective quality of the video image is improved.

Generally, the inventive concepts contained herein are equally applicable to any suitable video or image transmission system. Whilst specific, and preferred, implementations of the present invention are described above, it is clear that one skilled in the art could readily apply variations and modifications of such inventive concepts.

Thus, an improved apparatus and a method for improving a quality of object-based images generated from a base layer, have been described.

Claims

1. A method (800) for improving a quality of MPEG-4, or similar, object-based scalable video coding in a video communication system employing one or more enhancement layers and one or more base layers, the method comprising the steps of : decoding said one or more enhancement layers and one or more base layers of a received scalable video sequence into a plurality of image blocks having one or more predetermined boundaries ; and identifying at least one object of said decoded enhancement layer as a forward and/or backward shape; the method characterised by the step of: padding (810) said forward and/or backward shape of said object to one block boundary of said enhancement layer frame.

2. The method for improving a quality of MPEG-4, or similar, object-based scalable video coding according to

Claim 1, wherein said scalable video coding is a Type-1 temporally scalable coding mechanism.

3. The method for improving a quality of MPEG-4, or similar, object-based scalable video coding according to

Claim 1 or Claim 2, wherein said step of padding is followed by the step of: performing a background composition process, and repeating said process for substantially each pixel identified as belonging to either the forward or backward shape in an enhancement layer of a video frame.

4. The method for improving a quality of MPEG-4, or similar, object-based scalable video coding according to any preceding Claim, the method further characterised by the step of : determining whether an enhancement layer predicted video object is close, in time (820) , with a corresponding base layer video object; and generating a background composition of said enhancement frame, in response to said determination, based on a corresponding pixel from the subsequent or previous base layer predicted video object.

5. The method for improving a quality of MPEG-4, or similar, object-based scalable video coding according to Claim 4, the method further characterised by the step of determining when said object in two adjacent base layers are close, and performing said step of padding in response to said determination.

6. A video communication system (700) comprising: a) a video encoder (715) , comprising a processor for encoding an object-based scalable video sequence into an MPEG-4, or similar, bitstream having a plurality of enhancement layers; b) a wireless or wired transmission medium supporting one or more communication links between said video encoder and a video decoder; and c) a transmitter for transmitting said object-based scalable video sequence; and d) a video decoder (725) comprising: a receiver for receiving said object-based scalable video sequence containing said plurality of enhancement layers from said video encoder; and a processor (775) operably coupled to said receiver for decoding said one or more enhancement layers and one or more base layers of a received scalable video sequence into a plurality of image blocks having one or more predetermined boundaries, identifying at least one object of said decoded enhancement layer as a forward and/or backward shape, and padding said forward and/or backward shape of said object to one block boundary of said decoded enhancement layer.

7. The video communication system (700) according to Claim 6, wherein said video communication system employs a Type-1 temporally-scalable object-based scalable video coding mechanism.

8. A video communication unit includes a decoder (725) comprising: a receiver for receiving an object-based scalable video sequence from an encoding video communication unit wherein said object-based scalable video sequence comprises a plurality of enhancement layers and one or more base layers; and a processor (775) , operably coupled to said receiver, for decoding said one or more enhancement layers and one or more base layers into a plurality of image blocks having one or more predetermined boundaries, identifying at least one object of a decoded enhancement layer as a forward and/or backward shape, and padding said forward and/or backward shape of said object to one block boundary of said enhancement layer frame.

9. A video decoder (725) adapted for use in the method of any of Claims 1 to 5 or adapted for use in the communication system of Claim 6 or Claim 7 so as to implement background composition padding (775) .

10. A video communication unit (715, 725) adapted for use in the method of any of Claims 1 to 5 or adapted for use in the communication system of Claim 6 or Claim 7.

11. A mobile radio device comprising a video decoder in accordance with Claim 9 or a video communication unit in accordance with Claim 10.

12. The mobile radio device of Claim 11, wherein the mobile radio device is a mobile phone, a portable or mobile PMR radio, a personal digital assistant, a lap-top computer or a wirelessly networked PC.

13. A storage medium storing processor- implementable instructions for controlling a processor to carry out the method of any of Claims 1 to 5.