WO2006112620A1

WO2006112620A1 - Hierarchical video encoding/decoding method for complete spatial scalability and apparatus thereof

Info

Publication number: WO2006112620A1
Application number: PCT/KR2006/001097
Authority: WO
Inventors: Jung-Won Kang; Hae-Chul Choi; Jae-Gon Kim; Jin-Woo Hong; Yong-Man Ro; Tae-Meon Bae; Yong-Ju Jung; Cong-Thang Truong
Original assignee: Electronics And Telecommunications Research Institute; Research And Industrial Cooperation Group
Priority date: 2005-03-25
Filing date: 2006-03-24
Publication date: 2006-10-26
Also published as: EP1862010A4; EP1862010A1; KR100728222B1; KR20060103226A

Abstract

Provided is a hierarchical video encoding/decoding method for complete spatial scalability and apparatus thereof . The apparatus for encoding a video image including: an overlapped region (OR) detector for receiving coding region information about a plurality of regions of interest (ROI) in the video image to encode and detecting overlapped regions (OR) in the ROI regions; a region arranger for arranging the video image, the regions of interest and the detected overlapped regions into a plurality of layers according to a resolution; and a region encoder for encoding the video image, the regions of interest and the detected overlapped regions according to a resolution of a corresponding layer arranged at the region arranger. The coding region information may include information about locations of the regions of interest in the video image and a coding resolution of the regions of interest. A video encoding/decoding apparatus according to the present invention provides a complete scalability of a spatial domain by defining a region of interest (ROI) in a video image. Also, the video encoding/decoding apparatus according to the present invention provides an improved coding rate by encoding video image in consideration of spatial redundancy among a plurality of regions of interest.

Description

HIERARCHICAL VIDEO ENCODING/DECODING METHOD FOR COMPLETE SPATIAL SCALABILITY AND APPARATUS THEREOF

Description Technical Field

The present invention relates to a scalable video encoding/decoding method and an apparatus thereof; and more particularly, to a method of hierarchically encoding and decoding multi-regions having various locations and different resolutions according to a resolution for providing complete scalability of a spatial domain.

Background Art

JPEG 2000 standard [ISO/IEC JTCl 15444-2: JPEG 2000 Image Coding System; Extension, 2004] supports different regions in an image to encode at various bit rates. That is, a predetermined region in an image is allowed to be encoded at a higher bit rate compared to other regions in the image. Such an encoding scheme has been a main issue for last several years. Relatively, the JPEG 2000 standard allows a decoder to independently decode a region of interest (ROI) with a spatial scalability by defining a predetermined region in an image as a region of interest (ROI) and encoding the ROI using a scaling based method before encoding other regions. However, the encoding method introduced in the JPEG 2000 standard may not be applied into a MPEG based video encoding scheme because the encoding method of the JPEG 2000 standard is designed for encoding a still image.

A MPEG based video encoding method is a general method for encoding a video image. Especially, an object based encoding scheme introduced in MPEG-4 [ISO/IEC JTCI 14496-2: Coding of Audio-Visual Objects, Part 2, 1998] provides a coding method that satisfies a condition of independently coding regions in an image having more than one regions. A plurality of regions in the image is a two- dimensional region having a predetermined shape. Such regions are encoded independently or encoded through motion estimation, residual coding or shaped-adaptive DCT (SADCT). However, a MPEG-4 object oriented encoding method may not be used for an image having a plurality of regions having different resolutions.

Recently, there are many researches in actively progress to develop a scalable video coding scheme for decoding a video signal having various resolutions and frame rates without given coded signals by extracting a video signal having a low resolution or a decoding signal having different frame rate from a decoded video signal. H.262/MPEG-2 Visual [ISO/IEC JTCl 13818-2: Generic Coding of Moving Pictures and Associated Audio Information, Part 2; Video, 1994], H.263 [ITU-T: Video Coding for Low Bit-rate Communication, 1995 (version 1), 1998 (version 2)], and MPEG-4 Visual [ISO/IEC JTCl 14496-2: Coding of Audio-Visual Objects, Part 2, 1998] support an same image to be decoded into images with different region resolutions using a layer based coding method in order to achieve a spatial resolution adaptive encoding. Such a layer based coding method encodes an image produced at a lower resolution than a resolution of an original image into a base layer and encodes an image produced at a higher spatial resolution using information about the coded image into a next layer. Recently, demands of various motion picture services increase according to introduction of diverse video reproducing terminals such as a personal data assistant (PDA) , a mobile phone, a digital multimedia broadcasting (DMB) phone, a standard definition (SD) TV and a high definition (HD) TV. Accordingly, there are great demands for a video coding method to decode an image having a plurality of regions having not only various resolutions but also different locations. However, it is difficult to apply the conventional layer based coding methods if an image includes regions having different resolutions or various locations. If the conventional layer based encoding method is used for encoding an image having multiple regions with different resolutions, a coding rate is significantly reduced in a scalable video coding scheme because overlapped regions are encoded repeatedly.

Disclosure Technical Problem

It is, therefore, an object of the present invention to provide an encoding/decoding apparatus for providing a complete scalability on a spatial domain to independently decode regions having various locations and different resolutions in an image, and a method thereof.

It is another object of the present invention to provide an effective hierarchical encoding/decoding apparatus for encoding regions having various locations and different resolutions in consideration of spatial redundancy between layers, and a method thereof.

It is still another object of the present invention to provide a scalable encoding/decoding apparatus for allowing a user to reproduce a video signal by selecting a predetermined region with a desired resolution.

Technical Solution

In accordance with one aspect of the present invention, there is provided an encoder for encoding a video image including: an overlapped region (OR) detector for receiving coding region information about a plurality of regions of interest (ROI) in the video image to encode and detecting overlapped regions (OR) in the ROI regions; a region arranger for arranging the video image, the regions of interest and the detected overlapped regions into a plurality of layers according to a resolution; and a region encoder for encoding the video image, the regions of interest and the detected overlapped regions according to a resolution of a corresponding layer arranged at the region arranger. The coding region information may include information about locations of the regions of interest in the video image and a coding resolution of the regions of interest. The region encoder may perform an inter-layer coding that encodes regions in a unit of a block using a region arranged in a lower layer. An interpolation may be performed after deciding a pixel value in an outside of a region of interest through an extrapolation if an interpolation is required to perform using a pixel in an outside of a region of interest in the inter-layer coding. A flag may be added into a coded stream where the flag denotes that a region of interest is in the video image. The intra-layer coding may be performed without performing the interpolation when the interpolation requires a pixel in an outside of a region of interest.

In accordance with another aspect of the present invention, there is provided a decoder including: decoding region extractor for receiving selection information for selecting a region to decode, and extracting region information required for decoding a region of interest (ROI) corresponding to the selected region from a coded stream including coding information about a plurality of regions of interest; and a region decoder for receiving the extracted region information and recovering an image signal of a region of interest in a video image by performing a decoding based on the received region information. The selection information may include information about location of the decoding region in the video image and a decoding resolution. The decoding region extractor may extract a coded stream of a region of interest corresponding to the selected region from the coded stream, and extract information of related regions in a lower layer from the extracted coded stream for decoding the region of interest. An inter-layer decoding may be performed in a unit of a block using a region of a lower layer having regions of a lower resolution. An intra-layer decoding may be performed on a block located at a region of interest boundary in the inter-layer decoding. Each of the regions of interest may be configured of small rectangular regions for supporting an interactive decoding.

In accordance with still another aspect of the present invention, there is provided an encoding method for providing a spatial scalability including the steps of: a) receiving information about locations and resolutions of a plurality of regions of interest for encoding an input video image; b) arranging the regions of interest into corresponding layers according to a resolution; and c) encoding the arranged region of interest in a block unit, an intra-layer coding is performed on a block that requires an interpolation to perform using a pixel in an outside of a region of interest when an inter-layer coding is performed using region of interest information in a lower layer. The motion information of a block that requires an interpolation to perform using a pixel in an outside of a region of interest may be predicted in an integer pixel unit when the motion prediction/compensation coding uses ROI information in a same layer having a temporal correlation. An interpolation may be performed after deciding an external pixel value for a block requiring an interpolation to perform using a pixel in an outside of a region of interest when a motion prediction/compensation coding is performed using region of interest information of a same layer having a temporal correlation. The region of interest location information may be expressed as scan numbers assigned to macro blocks for a video image.

In accordance with further still another aspect of the present invention, there is provided a method for decoding including the steps of: a) receiving information about a location and a resolution of a decoding region in an input video image; b) extracting region of interest information corresponding to the decoding region from a transmitted coded stream; and c) decoding the region of interest using the extracted information. Intra-layer decoding is performed on a block requiring an interpolation to perform using a pixel in an outside of a region of interest when an inter layer decoding is performed using region of interest information of a lower layer. The motion information of a block that requires an interpolation to perform using a pixel in an outside of a region of interest may be predicted in a unit of an integer pixel when a motion prediction/compensation decoding is performed using region of interest information of a same layer having a temporal correlation. An interpolation may be performed after deciding an external pixel value through an extrapolation for a block that requires an interpolation to perform using a pixel in an outside of a region of interest when an inter-layer decoding is performed using region of interest information of a lower layer. An interpolation may be performed after deciding an external pixel value through an extrapolation for a block that requires an interpolation using a pixel in an outside of a region of interest when a motion prediction/compensation decoding is performed using region of interest information of a same layer having a temporal correlation. The decoding method may further include determining whether a region of interest of a video image in a coded stream through a flag denoting existence of a region of interest included in the coded stream. The location information of the region of interest may be expressed as scan numbers assigned to macro blocks in a video image.

Advantageous Effects

A video encoding/decoding apparatus according to the present invention provides a complete scalability of a spatial domain by defining a region of interest (ROI) in a video image. Also, the video encoding/decoding apparatus according to the present invention provides an improved coding rate by encoding video image in consideration of spatial redundancy among a plurality of regions of interest. The encoding/decoding apparatus and method thereof according to the present invention easily obtains region information for decoding regions of interest through a coded stream which has an effective syntax structure designed to effectively transmit information related to a plurality of region of interest. Furthermore, a method of adding a ROI flag into the syntax structure of the coded steam is disclosed in the present invention. Therefore, a decoder is enabled to recognize existence of an independently decodable ROI and to active a related function thereof.

Moreover, the present invention discloses a method of processing a region at a ROI boundary when a region requires information of other regions to reference. Therefore, the decoder is allowed to independently decode a ROI according to the present invention. Especially, the present invention enables the decoder to easily decode a ROI without requiring additional elements by adding a restriction condition that prevents an interpolation from being performed when the interpolation requires a pixel in an outside of an ROI to use. Description of Drawings

The above and other objects and features of the present invention will become apparent from the following description of the preferred embodiments given in conjunction with the accompanying drawings, in which:

Fig. 1 is a block diagram illustrating an image processing system for providing a spatial scalability in accordance with a preferred embodiment of the present invention;

Fig. 2 is a block diagram illustrating the encoder 110 of Fig. 1 for encoding an input video image and regions in the image; Fig. 3 is a flowchart for describing an operation of an overlapped region detector 210 in accordance with a preferred embodiment of the present invention;

Fig. 4 is a flowchart for describing an operation for detecting overlapped regions in all of regions at the step S310 in Fig. 3;

Fig. 5 is a diagram showing arrangement of regions such as a video image, a region and an overlapped region according to layers when the multi-layer based coding method is employed in accordance with a preferred embodiment of the present invention;

Fig. 6 is a diagram illustrating arrangement of regions such as a video image, a region and an overlapped region according to layers when the one-layer based coding method is used in accordance with a preferred embodiment of the present invention;

Fig. 7 is a flowchart illustrating a method of arranging regions into corresponding layers in a multilayer based encoding method in accordance with a preferred embodiment of the present invention; Fig. 8 is a flowchart of a method of arranging regions into corresponding layers in a one-layer based encoding method in accordance with a preferred embodiment of the present invention;

Fig. 9 is a block diagram of a region encoder 230 employing a multi-layer encoding method in accordance with a preferred embodiment of the present invention;

Fig. 10 is a block diagram of a region encoder 230 employing a one-layer based coding method in accordance with a preferred embodiment of the present invention; Fig. 11 is a flowchart for describing an operation of a region encoder 230 employing a multi-layer based encoding method in accordance with a preferred embodiment of the present invention;

Fig. 12 is a flowchart showing a method of deciding a coding mode for encoding a block of a selected region when a multi-layer based encoding method is employed in accordance with a preferred embodiment of the present invention;

Fig. 13 is a flowchart showing a method of deciding a coding mode for encoding a block of a selected region when a one-layer based encoding method is employed in accordance with a preferred embodiment of the present invention;

Fig. 14 is a flowchart of a method of deciding a coding mode for encoding a block of a selected region when a one-layer based coding method is employed in accordance with a preferred embodiment of the present invention;

Fig. 15 is a view illustrating a method of interpolation used in an inter-layer coding mode and a motion prediction/ compensation mode in accordance with a preferred embodiment of the present invention;

Fig. 16 shows a SVC bit stream having a flag denoting existence of a ROI in an image in accordance with a first embodiment of the present invention;

Fig. 17 shows a SVC bit stream having a flag denoting existence of a ROI in an image in accordance with a second embodiment of the present invention;

. Fig. 18 shows a SVC bit stream having a flag denoting existence of a ROI in an image in accordance with a third embodiment of the present invention; Fig. 19 is a block diagram showing coding dependency between regions in the inter-layer coding when a multilayer based coding method is employed;

Fig. 20 is a block diagram showing coding dependency between regions in the inter-layer coding when a one-layer based coding method is employed;

Fig. 21 shows a syntax structure of a coded stream including information about related regions in a lower layer in accordance with a preferred embodiment of the present invention; Fig. 22 is a block diagram of the decoder 120 of Fig. 1 for decoding a coded stream in accordance with a preferred embodiment of the present invention;

Fig. 23 is a flowchart showing operations of the region decoder employing a multi-layer based decoding method in accordance with a preferred embodiment of the present invention;

Fig. 24 is a flowchart showing a method of decoding a coded stream for blocks of a selected region when a multilayer based decoding method is applied in accordance with a preferred embodiment of the present invention;

Fig. 25 is a flowchart showing operations of the region decoder employing a one-layer based decoding method in accordance with a preferred embodiment of the present invention; and Fig. 26 is a flowchart showing a decoding of a coded stream of a block of a selected region when a one-layer based decoding is applied in accordance with a preferred embodiment of the present invention. Best Mode for the Invention

Other objects and aspects of the invention will become apparent from the following description of the embodiments with reference to the accompanying drawings, which is set forth hereinafter.

Fig. 1 is a block diagram illustrating an image processing system for providing a spatial scalability in accordance with a preferred embodiment of the present invention. As shown in Fig. 1, the image processing system according to the present embodiment includes an encoder 110, a decoder 20, a user interface 130 and a display 140.

The encoder 110 receives video images and region of interest (ROI) information and outputs coded bit-streams by performing an encoding operation according to the present invention. Herein, the region of interest (ROI) denotes a desired region in a video image to encode, and the ROI information denotes information about the ROI. The ROI information includes information about a location of a ROI and a coding resolution. The information may additionally include a region index of a ROI. The coded bit-streams outputted from the encoder 110 are transmitted to the decoder 120 through a channel. The decoder 120 receives decoding region information from the user interface unit 130. The decoding region information denotes information about a decoding region selected by a user through the user interface 130, and the decoding region is a target region to decode. The decoder 120 restores an image signal of a decoding region by performing a decoding operation according to the present invention using the received decoding region information. The decoding region information includes information about a location of a decoding region. The decoding region information may additionally include information about a decoding resolution which is a desired resolution to decode. The display 140 receives the restored image signal from the decoder 120 and displays an image of the region selected by the user.

Fig. 2 is a block diagram illustrating the encoder 110 of Fig. 1 for encoding an input video image and regions in the image. As shown in Fig. 2, the encoder 110 according to the present embodiment includes an overlapped region (OR) detector 210 for receiving the video image and the ROI information and detecting an overlapped region (OR) in the video image and the regions, a region arranging unit 220 for arranging regions of interest in an image including the detected OR into corresponding layers and a region encoder 230 for generating a coded stream by encoding the input video image and regions. Hereinafter, configurations and operations of the OR detector 210, the region arranging unit 220 and the region encoder 230 in the encoder 110 will be described with reference to accompanying drawings.

The overlapped region (OR) detector 210 detects an overlapped region (OR) in regions using information about locations of the regions of interest and defines a new region index for the detected overlapped region. Referring to Fig. 5, the OR detector 210 defines an overlapped region detected between a first region 1 and a second region 2 as an ORl . An overlapped region detected between a second region 2 and a third region 3 is defined as an OR2. As shown in Fig. 5, an overlapped region between a third region 3 and a fourth region 4 also includes a region overlapped with the second region 2. In this case, the overlapped region detector 210 detects a region having a same spatial resolution which is overlapped in the second region 2 and the third region 3 at first and then defines the detected overlapped region as an OR2. Then, the overlapped region detector 210 selects the overlapped region having different spatial resolutions detected from the region 3 and the region 4 with the region defined as the OR2 excluded and defines the selected overlapped region as an 0R3. That is, the overlapped region detected from the region 3 and the region 4 is configured of the OR2 and the 0R3. Fig. 3 is a flowchart for describing an operation of an overlapped region detector 210 in accordance with a preferred embodiment of the present invention.

As shown in Fig. 3, the overlapped region detector 210 detects overlapped regions in all of regions in an image using ROI information at step S310. The overlapped region detector 210 also detects overlapped regions in the detected overlapped regions at step S320. Then, the overlapped region detector 210 defines a new region index for the detected overlapped region at step S330. As described above, the detected overlapped regions detected from regions having different spatial resolutions may include an overlapped region detected from regions having a same spatial resolution such as the 0R2. In this case, the overlapped region having different spatial resolutions detected from the region 3 and the region 4 is selected by excluding the overlapped region from regions having the same spatial resolution such as the 0R2. Then, the selected overlapped region is defined as a new region index such as the OR3. Fig. 4 is a flowchart for describing a method for detecting overlapped regions in all of regions at the step S310 in Fig. 3. The shown method of Fig. 4 is equivalently applied into the step S320 for detecting an overlapped region from the detected overlapped regions. At first, two regions are selected using ROI information which denotes information about a region to encode at step S410. Then, it determines whether the two selected regions include a region overlapped or not at step S420. If the two selected regions include the overlapped region at step S420, it determined whether the two selected regions have a same spatial resolution or not at step S430. If the two selected regions have the same spatial resolution, the detected region is defined as an OR of the two selected regions at step S460. On the contrary, if the two selected regions have a different spatial resolution, it determines whether a region having a higher spatial resolution completely includes a region having a lower spatial resolution or not at step S440. If not, a region having the lower spatial resolution overlapped with the region having the higher spatial resolution is defined as the OR at step S450. Then, a pair of regions is selected from unselected regions in the image, and the steps S420 to S460 are repeatedly performed.

The region arranging unit 220 arranges regions and overlapped regions according to a resolution. The region arranging unit 220 treats the input video image as a one region having the highest resolution.

The present invention is divided into two different methods according to processing of the overlapped region (OR). According to the two different methods, the encoder 110, the decoder 120 and the display 140 perform different operations in the image processing system of Fig. 1. The two methods are a multi-layer based coding method and a one-layer based coding method. The multi-layer based coding method composes a new layer when there is an overlapped region, arranges the overlapped layer at the new layer and uses information about a plurality of related lower layers to encode one layer. The one-layer based coding method dose not compose a new layer for the overlapped region and uses information about a one-level lower layer to encode a layer.

Fig. 5 is a diagram showing arrangement of regions such as a video image, a region and an overlapped region according to layers when the multi-layer based coding method is employed in accordance with a preferred embodiment of the present invention, and Fig. 6 is a diagram illustrating arrangement of regions such as a video image, a region and an overlapped region according to layers when the one-layer based coding method is used in accordance with a preferred embodiment of the present invention.

Referring to Figs. 5 and 6, a second region 2 and a third region 3 have a same resolution, and a fourth region 4 has a higher resolution than the second region 2 and the third region 3. Also, a first region 1 has a higher resolution than the fourth region 4. The input video image is treated as a region having the highest resolution. A first layer 1, a second layer 2, a third layer 3, a fourth layer 4 and a fifth layer 5 denote that each layer has a corresponding layer indexes 1 to 5 , respectively. A layer having an upper layer index denotes a region having a higher resolution.

A method of arranging regions such as a video image, a region and an overlapped region according to a layer will be described with reference to Figs. 5 and 6.

Referring to Figs. 5 and 6, a region having a higher resolution is arranged to an upper layer, and a region having a lower resolution is arranged at a lower layer. Also, regions having a same resolution are arranged at a same layer. The input video image is treated as one region having the highest resolution and is arranged at the highest layer.

A method of arranging the overlapped region is different according to the multi-layer based encoding method and the one-layer based encoding method. In case of the multi-layer based encoding method as shown in Fig. 5, an OR layer is newly defined in order to arrange the overlapped region between a layer where regions having a same resolution are arranged and a one-level lower layer thereof. Then, the overlapped region is arranged at the OR layer. On the contrary, the overlapped region is arranged at a layer where a region having a same resolution compared to the overlapped region in the one-layer based encoding method as shown in Fig. 6. Fig. 7 is a flowchart illustrating a method of arranging regions into corresponding layers in a multilayer based encoding method in accordance with a preferred embodiment of the present invention. Referring to Fig. 7, it selects a region having a lowest spatial resolution among regions of interest which are coding regions including the input video image and the overlapped region at step S710, and an index of the current layer is initialized as 1 at step S720. Then, it determines whether any overlapped region having a same spatial resolution compared to the currently-selected region is existed on not at step S730. If there is the overlapped region having the same spatial resolution existed, the overlapped region is arranged at the current layer and regions which are not overlapped region are arranged at a one-level upper layer at steps S750 and S760. On the contrary, if there is no overlapped region existed, regions having a same resolution are arranged at the current layer at step S760. Then, it determines whether there is a region having a higher spatial resolution than the currently selected region at step SIlO. If there is the region having the higher spatial resolution, it selects a region having a lowest resolution among un-arranged regions at step S780, and a layer index of the selected region increases by one at step S790. Then, the steps of arranging the regions are repeatedly performed.

Fig. 8 is a flowchart of a method of arranging regions into corresponding layers in a one-layer based encoding method in accordance with a preferred embodiment of the present invention. Referring to Fig. 8, it selects a region having a lowest spatial resolution among regions of interest which are coding regions including the input video image and the overlapped region at step S810, and the index of the current layer is initialized as one at step S820. The regions having the same spatial resolution are arranged at the current layer at step S830. Then, the layer index increases by one at step S840 and it determines whether there are any regions having a higher spatial resolution compared to the currently selected region at step S850. If the region having the higher spatial resolution is existed, it selects a region having a lowest resolution among the un-arranged regions at step S860. Then, the region arranging steps are repeatedly performed.

The region encoder 230 generates a coded stream by encoding the input video image and regions . The region encoder 230 has a different configuration and performs different operations according to the method of processing the overlapped region (OR). Therefore, the region decoder 230 will be described according to the multi-layer encoding method and the one-layer encoding method. Fig. 9 is a block diagram of a region encoder 230 employing a multi-layer encoding method in accordance with a preferred embodiment of the present invention. When the video image and region information are received from the region arranging unit 220, the region encoder 230 performs a coding operation on regions in layers according to methods shown in Figs. 11 and 12. Herein, the video image is treated as one region for the coding operation. The region information includes information about regions to encode with the overlapped region and information about arrangement of regions into the layers. As shown in Fig. 9, the region encoder 230 includes down-sampling units 910 and 930, a layerl encoder 920, a Iayer2 encoder 240, a Iayer3 encoder 950 for encoding regions of corresponding layers and a stream MUX 960. The layerl, Iayer2 and Iayer3 encoders 920, 940 and 950 receive the region information and decide a coding mode for encoding a block of the selected region according to the method shown in Fig. 12. After encoding the block of the selected region according to the decided coding mode, the encoders 920, 940 and 950 output coded streams to a stream MUX 960. In Fig. 9, since there is no region overlapped with the block, the layer 1 encoder dose not select an inter-layer coding mode. That is, the layer 1 encoder receives a video image which is down-sampled by the down-sampling unit 910 to have a resolution suitable to the first layer 1, selects one of a motion prediction/compensation mode and an intra mode and outputs the coded stream by encoding the block of the selected region according to the selected mode. When the encoder performs an inter-layer coding, the encoder up- samples motion information, texture information and motion compensated residual information of a region overlapped with the block among regions of a lower layer and uses the up-sampled information. Meanwhile, the stream MUX 960 receives the coded streams from the encoders 920, 940 and 950 and multiplexes the received encoded streams.

Fig. 10 is a block diagram of a region encoder 230 employing a one-layer based coding method in accordance with a preferred embodiment of the present invention. Referring to Fig. 10, when the region encoder 230 receives a video image and region information from the region arranging unit 220, the region encoder 230 performs a coding operation on regions of each layer according to a method shown in Figs. 13 and 14. The inputted region information includes information about the coding region including the overlapped region and information about arrangement of regions into corresponding layers. Layer encoders 1020, 1040 and 1050 receive the region information and decide a coding mode for a block of a selected region according to a method shown in Fig. 14. Then, the layer encoders 1020, 1040 and 1050 encodes the block of the selected region according to the selected mode and outputs the coded stream to a stream MUX 1060. In Fig. 10, the layer 1 encoder 1020 which is an encoder for a first layer 1 is operated identically to that in Fig. 9. When the layer 2 encoder 1040 and the layer 3 encoder 1050 perform an inter-layer coding, the layer 2 encoder 1040 and the layer 3 encoder 1050 up-sample motion information, texture information and motion compensated residual information of a region of a one-level lower layer and use the up-sampled information differently from those in Fig. 9 which use region information in a plurality of lower layers. Herein, if there is no overlapped region in a lower layer, an inter-layer coding is performed using an intermediate region. Meanwhile, the stream MUX 1060 receives coded streams outputted from the layer encoders 1020, 1040 and 1050 and multiplexes the received coded streams.

Fig. 11 is a flowchart for describing an operation of a region encoder 230 employing a multi-layer based encoding method in accordance with a preferred embodiment of the present invention. Referring to Fig. 11, a region of a lowest layer is selected and coded at steps SlIlO and S1120 when information about arrangement of regions including video image and overlapped region into corresponding layers is inputted from the region encoder 230. After encoding, it determines whether other regions are existed in the same layer or not at step S1130. If the other regions are existed, the other regions are selected and encoded at step S1140. After encoding, regions in an upper layer are selected and encoded at steps S1150 and S1160. The coding operation in the step S1120 follows a block-based video coding scheme and uses one of an inter- layer coding using a lower layer's region information and an intra-layer coding using a same layer's region information. The inter-layer coding is a coding method introduced in MPEG-4 standard [ISO/IEC 14496-2 (1998)]. Such a inter-layer coding up-samples motion information, texture information and motion compensated residual information of a lower layer and uses the up-sampled information. Also, the intra-layer coding is a coding method introduced in MPEG-4 AVC [ISO/IEC 14496-10: Advanced Video Coding, 2003] and is classified into a motion prediction/compensation mode and an intra mode. That is, the region encoder 230 encodes a block of a selected region using one of the inter-layer coding mode, the motion prediction/compensation mode and the intra mode and outputs a coded stream.

Fig. 12 is a flowchart showing a method of deciding a coding mode for encoding a block of a selected region when a multi-layer based encoding method is employed in accordance with a preferred embodiment of the present invention. As described above, the coding of the selected region in the step S1120 follows a block-based video coding scheme, and the block of the selected region is encoded according to one of the inter-layer coding mode, the motion prediction/compensation mode and the intra mode. In order to apply the inter-layer coding mode, it must decide which region in a lower layer is used for each block at first. Therefore, it determines whether there is an overlapped region between a block of a selected region and regions of a lower layer using the region arrangement information transferred from the region arranging unit 220 at step S1210. If there is the overlapped region, a region in a layer having the highest layer index among regions having overlapped region is selected at step S1220. Then, it determines whether the selected region is defined as an overlapped region (OR) or not at step S1230. If the selected region is defined at the OR, an inter-layer coding is performed using the defined OR at step S1240. On the contrary, if the selected region is not defined, an inter- layer coding is performed using a region in a layer having the highest layer index among the regions having an overlapped region at step S1250. Meanwhile, an intra-layer coding of a motion prediction/compensation mode and an intra mode is performed on the block of the selected region without regard to whether the block of the selected region has an overlapped region with regions of the lower layer at step S1260. Using results of encoding of three modes, it selects one that minimizes a bit rate among the three coding modes at step S1270. Herein, if there is no overlapped region at the step S1210, the inter-layer coding mode is discarded when one of the three coding modes is selected.

Fig. 13 is a flowchart showing a method of deciding a coding mode for encoding a block of a selected region when a one-layer based encoding method is employed in accordance with a preferred embodiment of the present invention. Regions of a lowest layer are selected and encoded at steps S1310 and S1320 when information about arranging regions including a video image and an overlapped region into corresponding layers is inputted from the region arranging unit 220. After encoding, it determines whether other regions are existed in a same layer or not at step S1330. If other regions are existed, the regions are selected and encoded at step S1340. If other regions are not existed, it determines whether there is a region not overlapped in the regions of the current layer and the regions of a one- level lower layer at step S1350. If there is a region not overlapped, an intermediate region is composed for a region of the current layer that is not spatially matched with a region of the one-level lower layer at step S1360. The intermediate region is a reference region when the inter- layer coding is performed at an upper layer. Meanwhile, if there is no region not overlapped at step S1350, a region in an upper layer is selected and encoded at steps S1370 and S1380. The intermediate region composed in the step S1360 is a region configured of motion information, texture information and motion compensated residual information of regions of a one-level lower layer suitably to the spatial resolution of the current layer. Therefore, the coding is not performed on the intermediate region. The intermediate region is used when the inter-layer coding is performed at the upper layer. Since the coding of the present invention follows the block-based video coding scheme, block based motion information, texture information and motion compensated residua information are required to configure the intermediate region. The motion information, texture information and motion compensated residual information which are up-sampled suitable to a spatial resolution of a current layer through interpolation may be used to compose a new intermediate region at the upper layer through same interpolation if the upper layer do not include a region matched with the intermediate region. Meanwhile, the motion information includes a motion vector and corresponding block' s coding mode such as the inter-layer coding mode, the motion prediction/compensation mode and the intra mode.

The encoding in the step S1320 follows a block-based video coding scheme identically to the encoding employing the multi-layer based coding method and uses one of the inter-layer coding using a lower layer's region information and the inter-layer coding using a same layer's region information. The inter-layer coding is a coding method introduced in MPEG-4 standard [ISO/IEC 14496-2 (1998)], and the inter-layer coding up-samples motion information, texture information and motion compensated residual information of a lower layer and uses the up-sampled information. Also, the intra-layer coding is an encoding method introduced in MPEG-4 AVC [ISO/IEC 14496-10; Advanced Video Coding, 2003] and classified into a motion prediction /compensation mode and an intra mode. Finally, the region encoder 230 encodes a block of a selected region based on one of the inter-layer coding mode, the motion prediction/compensation mode and the intra mode and outputs the coded stream.

Fig. 14 is a flowchart of a method of deciding a coding mode for encoding a block of a selected region when a one-layer based coding method is employed in accordance with a preferred embodiment of the present invention. As described above, the coding performed on the selected region in the step S1320 follows a block-based video coding scheme and outputs a coded stream that is coded based on one of the inter-layer coding mode, the motion prediction/ compensation mode and the intra mode. In order to apply the inter-layer coding mode, it must decide what region in a lower layer is used for each block. Therefore, it determines whether a block of a selected region is overlapped with any regions in a one-level lower layer using information about arranging regions into corresponding layers transferred from the region arranging unit 220 at step S1410. If there is a region overlapped with the block of the selected region, the inter-layer coding is performed using the region overlapped with a region of the one-level lower layer at step S1420. If there is no region overlapped, it determines whether the block of the selected region is overlapped with an intermediate region of a one-level lower layer and the inter-layer coding is performed using the intermediate region at step S1450 if the block is overlapped with the intermediate region. At step S1460, the block is encoded by an intra-layer coding of the motion prediction/ compensation mode and the intra mode without regard to whether the block of the selected region is overlapped with the regions of the lower layer. Finally, one that minimizes a bit rate is selected using the results of coding according to the three modes at step S1270. Herein, if the block of the selected region is not overlapped with any regions in the lower layer, the inter-layer coding mode is discarded when one of the three modes is selected at the step S1430.

Fig. 15 is a view illustrating interpolation used for the inter-layer coding mode performed in the steps of S1240, S1250, S1420 and S1450 and the motion prediction/ compensation mode performed in the steps . of S1260 and S1460 in accordance with a preferred embodiment of the present invention. Referring to Fig. 15, a small gray rectangular region denotes a pixel of an original image. In order to expend a resolution of the original image at twice, a half- pixel value between the pixels is generated through interpolation. In MPEG-4 AVC (Advanced video coding) and SVC (scalable video coding), a FIR filter having six taps is used for half-pixel motion estimation. The same filter may be used for the interpolation in the present invention. Eq. 1 is a filter equation using the interpolation of the motion prediction/compensation mode and the inter-layer coding mode in the present invention. The interpolation is basically a method of expanding a resolution of an image through a half pixel interpolation. Eq. 1 is one of filter equations applicable to the interpolation of the present invention in a view of computation amount and performance.

Eq ,

In Eq. 1, I(x,y) denotes a pixel value of a (x,y) coordinate in an input image, and 0(x,y) denotes a pixel value of a (x,y) coordinate in an output image after interpolation is performed.

In a view of performing such an interpolation, the region of interest (ROI) must not be decoded using information about other regions. Therefore, pixel values in the outside of the ROI must not use when the interpolation is performed for a ROI boundary region in the region encoder 230. In order not to use the pixel values in the outside of the ROI, two methods are disclosed in the present invention.

As a first method, a half pixel interpolation is performed after deciding pixel values in the outside of the boundary region through an extrapolation before performing the interpolation on the ROI boundary. The extrapolation may be one of a zero order extrapolation or a method of substituting of a predetermined constant. Eq. 2 is a filter equation employing the zero order extrapolation with regard to the ROI boundary.

Horizontal interpolation

Eq . 2

Vertical interpolation

Diagonal interpolation

wkew

As shown, a decoder must know that the ROI in the image signal transmitted through a channel when the half pixel interpolation is performed after deciding the pixel value in the outside of the boundary region through the extrapolation in the ROI boundary. Therefore, a coded stream outputted from the region encoder must includes a flag such as roi_flag (roi_enable, boundary_handling, or multiple_roi_flat ) in order to notice that the transmitted bit-stream includes a ROI that can be independently decoded. The decoder determines whether a ROI is defined or not through the value of the flag after the decoder receives a bit stream having such a flag. If the ROI is defined, the encoder enables a function of decoding a bit stream that is encoded with regard to the ROI boundary region. Figs. 16 to 18 are diagrams for describing adding of the flag denoting existence of ROI in an image into a coded stream in accordance with a preferred embodiment of the present invention. At first, Fig. 16 shows a SVC bit stream having a roi_flag (roi_enable, boundary_handling, or multiple_roi_ flat) 1610 when a ROI is generated in a slice group map type 2 ( slice_group_map_type 2). Fig. 17 shows enabling of a roi_flag (roi_enable, boundary_handling, or multiple_roi_flat) 1720 by generating a new slice group map type (slice_group _map_type) for ROI. That is, the slice group map type is generated as an integer greater than 6 by expanding a current slice group map type ( slice_group_map_ type), and the roi_flag (roi_enable, boundary_handling, or multiple_roi_ flat) is disabled (1710) before determining the slice group map type. Then, the flag is enabled (1720) in the new slice group map type for the ROI. The generation of the slice group map is identical to the generation of the conventional slice group map type (slice_ group_map_type ) that is 2. Fig. 18 shows a generation of a ROI as a slice group map type 2 ( slice_group_map_type 2) (1820) as like as Fig. 16. In Fig. 18, if a roi_flag (roi_enable, boundary_handling, or multiple_roi_flat ) is enabled, geometric information such as the number of ROIs and coordinates is read through variables num_rois_minusl , roi_top_left and roi_bottom_right, and an interpolation, that is, an up-sampling, is performed with regard to a ROI boundary for a slice group boundary matched to the boundary of the ROI. The location of the ROI may be expressed using a number assigned to a macro block where the ROI ends and a number assigned to a macro block where the ROI begins after assigning a rester scan number to macro blocks of entire image without using coordinates.

Meanwhile, a second method for decoding a ROI not using other regions' information in the decoder is a method of adding a restriction when performing a decoding in the motion prediction/compensation mode and the inter-layer coding mode. In more detail, the interpolation is not performed if the interpolation is required to perform with reference to pixels in the outside of the ROI boundary in the motion prediction/compensation mode performed in the steps S1260 and S1460. Therefore, motion information referring to the ROI boundary is predicted in a unit of the integer pixel. Also, the inter-layer prediction is not performed on a block located at the ROI boundary in the inter-layer coding mode performed in the steps S1240, S1250, S1420 and S1450. Therefore, a block located at the ROI boundary is encoded using a coding mode that uses information of a current layer, not the lower layer. That is, the motion prediction/compensation mode or the intra mode is used.

Figs. 19 and 20 are block diagrams showing coding dependency between regions in the inter-layer coding when a multi-layer based coding method is employed and when a one- layer based coding method is employed. In Figs. 19 and 20, a region where an arrow begins is required to encode a region where an arrow points. The coding dependency with the multi-layer encoding method applied will be described with reference to Figs. 19 and 12. Blocks in a region 1910 having no regions overlapped with regions in a lower layer are encoded based on a mode minimizing a bit rate between the motion prediction/compensation mode and an intra mode. Blocks in a region 1920 having regions overlapped with regions in the lower layer are encoded based on a mode minimizing a bit rate among the inter-layer coding mode, the motion prediction/compensation mode and the intra mode. Herein, since a first region 1 has a region overlapped with a second region 2 and the overlapped region is defined as the ORl, blocks in the region having regions overlapped with regions in the lower layer are encoded using the ORl when the inter-layer coding mode is applied. The coding dependency with the one-layer encoding method applied will be described with reference to Figs. 20 and 14. In encoding of a region 1, blocks in a region 2010 having no regions overlapped with regions in the lower layer are coded using a mode minimizing a bit rate among the motion prediction/compensation mode and the intra mode as like as the case of using _^the multi-layer encoding method. Meanwhile, blocks in a region 2020 having regions overlapped with region in the lower layer are encoded using a mode minimizing a bit rate among the inter-layer coding mode, the motion prediction/compensation mode and the intra mode. Herein, if a first region 1 has a region overlapped with a second region 2 when the inter-layer coding mode is applied, the overlapped region is defined as the ORl in two-level lower layer that is the first layer 1. However, since the one-layer based coding method performs the inter- layer coding using a region of a one-level lower layer, an intermediate region 2030 is composed by up-sampling motion information, texture information and motion compensated residual information of the ORl in a second layer (layer 2). Then, blocks in a region 2020 having a region overlapped with a region of a lower layer in a third layer 3 are encoded using the information of the intermediate region 2030.

Fig. 21 shows a syntax structure of a coded stream including information about related regions in a lower layer in accordance with a preferred embodiment of the present invention. In the step S1120 of Fig. 11 and the step S1320 of Fig. 13, information about regions in a lower layer used for coding a selected region is added into a coded stream. Referring to Fig. 21, a coded stream of a selected region includes a layer index 2120 of a layer having the selected region, a number of related regions 2130 used for coding the selected region, region information 2140 related to regions used for encoding, and a video signal 2150 which is a coded region video signal. The region information 2140 related the regions used for encoding the selected region includes information of each of related regions 2160, 2170 and 2180 as many as the number of related regions used for encoding. Also, the region information 2140 includes a region index 2171 of a region used for the encoding, a layer index 2172 of a layer having the region used for the encoding, a horizontal axis location (H_org) 2173 of a region used for encoding the selected region, a vertical axis location (V_org) 2174 of a region used for encoding the selected region, a horizontal length (width) 2175 of a region used for the encoding, and a vertical length (height) 2176 of a region used for the encoding. A macro block number may be used to express a location of the region instead of using coordinates. That is, the location of the region may be expressed by a number assigned to a macro block where the region begins and another number assigned a macro block where the region ends after assigning numbers to the macro blocks of the entire image as a rester scan order. Meanwhile, a same region index is assigned to regions having a same location in an input video image although the spatial resolutions of the regions are different. Accordingly, the decoder may decode a predetermined region in various spatial resolutions by extracting coded streams of regions having same region index in each layer.

Fig. 22 is a block diagram of the decoder 120 of Fig. 1 for decoding a coded stream in accordance with a preferred embodiment of the present invention.

Referring to Fig. 22, the decoder 120 receives decoding region information selected by a user from a user interface unit 130 and recovers an image signal by performing a decoding according to the present invention on a coded stream transmitted from the encoder 110. The decoding region information includes information about a location of a region selected by a user to decode. In order to decode the coded stream, the decoder 120 includes a decoding region extractor 2210 and a region decoder 2220.

The decoding region extractor 2210 receives the decoding region information from the user interface unit 130 and extracts from the coded stream transmitted through a channel or composing information for decoding the decoding region. That is, the decoding region extractor

2210 reads the number of regions 2130 in a lower layer for decoding the ROI having the selected region and the related region information 2140 from the coded stream and composes index of regions required for ROI decoding. Also, the decoding region extractor 2210 composes information about each of the related regions by extracting regions required for decoding the ROI from the coded stream. Herein, it is possible to embody an interactive decoding allowing the decoder 120 to select regions and decode the selected region if the encoder 110 encodes the entire image by dividing the entire image to a plurality of small regions not overlapped one another. Such an encoding may be performed on the entire image in a plurality of layers. In more detail, the interactive decoding may be embodied as follows. In a coding side, a region is composed with very small rectangular regions through a slice group map type 2 in a current scalable video coding (SVC). In a decoding side, the small rectangular regions corresponding to a region selected by a user are decoded. In this case, if the user selects a predetermined region to decode through the user interface unit 130, the decoding region extractor 2210 composes a ROI using regions corresponding to the region selected by a user. Then, the decoding region extractor 2210 composes indexes of regions required for decoding by reading the number of related regions 2130 in the lower layer for decoding the ROI region and the related region information 2140 from the coded stream. Then, the decoding region extractor 2210 extracts necessary region information for decoding the ROI from the coded stream.

The region decoder 2220 receives the region information for decoding the ROI from the decoding region extractor 2210 and performs the decoding to recover the image signal. Hereinafter, operations of the region decoding 2220 will be described according to a multi-layer based coding method and a one-layer based coding method.

Fig. 23 is a flowchart showing operations of the region decoder employing a multi-layer based decoding method in accordance with a preferred embodiment of the present invention. When the region information for decoding is inputted from the decoding region extractor 2210, regions of a lowest layer are selected among the related regions for decoding, and the selected regions are decoded at steps S2310 and S2320. After decoding, it determines whether any other regions are in the same layer or not at step S2230. If there is other region in the same layer, the region is selected and decoded at step S2340. After decoding the regions in the lowest layer, regions in an upper layer are selected and decoded at steps S2350, S2360 and S2320.

The decoding at the step S2320 follows a block-based vide coding scheme and uses a decoding method corresponding to one of coding methods of an inter-layer coding using region information of a lower layer and an intra-layer coding using region information of a same layer. The intra-layer coding is divided into a motion prediction/ compensation mode and an intra mode. That is, the region decoder 2220 decodes each block of the selected region based on one of the inter-layer decoding, the motion prediction/compensation decoding and the intra mode decoding. The motion prediction/compensation decoding and the intra mode decoding are introduced in MPEG-4 AVC [ISO/IEC 14496-10: Advanced Video Coding, 2003]. The inter-layer decoding is introduced in MPEG-4 standard [ISO/IEC 14496-2 (1998)], and the inter-layer decoding up- samples motion information, texture information and motion compensated residual information of the lower layer and uses the up-sampled information.

Fig. 24 is a flowchart showing a method of decoding a coded stream for blocks of a selected region when a multilayer based decoding method is applied in accordance with a preferred embodiment of the present invention. As described above, the decoding of the selected region performed in the step S2320 follows a block-based decoding scheme, and a coded stream of each block is decoded based on one of the inter-layer decoding, the motion prediction/compensation decoding and the intra mode decoding. Referring to Fig. 24, it determines a coding mode of a decoding block at step S2410 and S2440. Then, if the coding mode of the decoding block is the inter-layer coding mode, it selects a region in a layer having a highest layer index among the regions of lower layers overlapped with the current decoding block at step S2420 and an inter-layer decoding is performed on the selected region at step S2430. If the coding mode of the decoding block is the motion prediction/compensation mode, a motion prediction/compensation decoding is performed on the block. If the coding mode of the decoding block is not either of the motion prediction/compensation mode and the inter-layer coding mode, the intra mode decoding is performed on the block.

Fig. 25 is a flowchart showing operations of the region decoder employing a one-layer based decoding method in accordance with a preferred embodiment of the present invention .

When the region information for decoding is inputted from the decoding region extractor 2210, regions of a lowest layer are selected among regions related for decoding at step S2510, and the selected regions are decoded at step S2550 if the selected region is in a highest layer. If the selected region is not in the highest layer, it determines whether any regions in a current layer are not overlapped with regions in a one- level lower layer at step S2330. If there is no such a region, the selected region is decoded at step S2550. If there is such a region, an intermediate region is composed for regions in the lower layer not matched with the regions in the current layer and the decoding is performed at steps S2540 and S2550. After decoding, it determines whether other regions are in the same layer or not at step S2560. If there is the region in the same layer, the region is selected and decoded at step S2570. After decoding the regions in the same layer, regions in an upper layer are selected and decoded at steps S2580, S2590 and S2520.

The intermediate region composed in the step S2540 is a region composed by performing interposition on motion information, texture information and motion compensated residual information of the regions in an one-level lower layer. The intermediate region is used to perform the inter-layer decoding at the upper layer. Since the decoding according to the present invention follows a block based decoding, the intermediate region is configured of unit blocks. The motion information includes motion vectors and information about a coding mode such as an inter-layer coding mode, a motion prediction/compensation mode and the intra mode .

Sine the decoding at the step S2550 is performed in a block base identically to the case of employing the multilayer based decoding, each block of the selected region is decoded based on one of the inter-layer decoding, the motion prediction/compensation decoding and the intra mode decoding. The motion prediction/compensation decoding and the intra mode decoding are introduced in MPEG-4 AVC [ISO/IEC 14496-10: Advanced Video Coding, 2003]. The inter-layer decoding is introduced in MPEG-4 standard [ISO/IEC 14496-2 (1998)]. Such inter-layer decoding up- samples the motion information, the texture information and the motion compensated residual information of the lower layer and uses the up-sampled information.

Fig. 26 is a flowchart showing a decoding of a coded stream of a block of a selected region when a one-layer based decoding is applied in accordance with a preferred embodiment of the present invention. The decoding employing the one-layer based decoding is basically identical to the decoding with the multi-based decoding. However, the one-layer based decoding is different from the multi-layer based decoding when the inter-layer decoding is performed. In this case, only one-level lower layer is referenced. If there is no overlapped region in a one- level lower layer in the one-layer based decoding, an intermediate region is composed as shown in the step S2540 in order to reference only one-level lower layer. After composing the intermediate region, the decoding is performed. Referring to Fig. 26, a coding mode of a decoding block is determined at steps S2610 and S2640. If the coding mode of the decoding block is the inter-layer coding mode, the inter-layer decoding is performed at steps S2620 and S2630 using only a one-level lower layer having regions overlapped with the current decoding block or the intermediate region composed at the step S2540. If the coding mode of the decoding block is the motion prediction/ compensation mode, the motion prediction/compensation decoding is performed on the decoding block. If the coding mode of the decoding block is not either of the motion prediction/compensation mode and the inter-layer coding mode, the intra-mode decoding is performed on the block. The method of performing a half pixel interpolation for the motion estimation decoding performed at the steps S2450 and S2650 and the method of up-sampling in the inter- layer decoding performed at steps S2430 and S2630 are identical to those in the steps S1260, S1460, S1240, S1250, S1420 and S1450 described with reference to Fig. 15. As described in the steps S1260, S1460, S1240, S1250, S1420 and S1450, the encoder 110 uses one of two processing methods for the ROI boundary region, and the decoder 120 also performs different operations according to the process method used in the encoder 110.

If the encoder 110 performs the half pixel interpolation or the up-sampling after deciding a value of a pixel in the outside of the boundary region through extrapolation before performing the interpolation for the ROI boundary, the decoder 120 determines whether an independently decodable ROI region is defined on not through the flag such as roi_flag (roi_enable, boundary_ handling or multiple_roi_flag) included in the coded stream transmitted through a channel. If the independently decodable ROI is defined, the region decoder 220 enables a function for decoding the bit stream coded with regard to the ROI boundary region, for example, a filter shown in Eq. 2.

Meanwhile, in order to prevent the pixel in the outside of the ROI boundary from being referenced, the encoder 110 may add a restriction condition for encoding a block at a ROI boundary to use the prediction/compensation mode or the intra mode which uses only information about a current layer without performing an interpolation on the boundary region with reference to a pixel in the outside of the ROI boundary region. When the encoder 110 uses such a restriction condition, the region decoder 2220 estimates the motion information in an integer unit with reference to ROI boundary in the motion prediction/compensation decoding and uses a decoding method that only uses information of the current layer such as the motion prediction/ compensation decoding or the intra mode instead of using the inter-layer decoding.

The above described method according to the present invention can be embodied as a program and stored on a computer readable recording medium. The computer readable recording medium is any data storage device that can store data which can be thereafter read by the computer system. The computer readable recording medium includes a read-only memory (ROM), a random-access memory (RAM), a CD-ROM, a floppy disk, a hard disk and an optical magnetic disk.

While the present invention has been described with respect to certain preferred embodiments, it will be apparent to those skilled in the art that various changes and modifications may be made without departing from the scope of the invention as defined in the following claims.

Claims

What is claimed is:

1. An encoder for encoding a video image comprising: an overlapped region (OR) detector for receiving coding region information about a plurality of regions of interest (ROI) in the video image to encode and detecting overlapped regions (OR) in the ROI regions; a region arranging unit for arranging the video image, the regions of interest and the detected overlapped regions into a plurality of layers according to a resolution; and a region encoder for encoding the video image, the regions of interest and the detected overlapped regions according to a resolution of a corresponding layer arranged at the region arranging unit.

2. The encoder as recited in claim 1, wherein the coding region information includes information about locations of the regions of interest in the video image and a coding resolution of the regions of interest.

3. The encoder as recited in claim 1, wherein the OR detector detects an overlapped region (OR) from a region of interest (ROI) having a lower resolution if the region of interest (ROI) having the lower resolution is not spatially included in a region of interest (ROI) having a higher resolution.

4. The encoder as recited in claim 3, wherein the OR detector does not detect an overlapped region if a region of interest (ROI) having a lower resolution is spatially included in a region of interest (ROI) having a higher resolution.

5. The encoder as recited in claim 3, wherein the OR detector detects regions overlapped in regions of interest having different resolutions by excluding regions overlapped in the regions of interest having a same resolution as an OR if an overlapped region between the regions of interest having different resolutions includes an overlapped region between regions of interest having a same resolution.

6. The encoder as recited in claim 3, wherein the region arranging unit arranges a region having a higher resolution among the video image, the region of interest and the overlapped region into an upper layer and arranges a region having a lower resolution into a lower layer where the video image is treated as a region.

7. The encoder as recited in claim 6, wherein the region arranging unit generates an OR layer below a layer having regions of interest having a resolution identical to the OR and arranges the OR spatially overlapped with the regions of interest into the generated OR layer.

8. The encoder as recited in claim 6, wherein the region arranging unit arranges an OR spatially overlapped with the region of interest into a layer having a region of interest that has a resolution identical to the OR.

9. The encoder as recited in claim 6, wherein the region encoder performs an inter-layer coding that encodes regions in a unit of a block using a region arranged in a lower layer.

10. The encoder as recited in claim 6, wherein the region encoder performs an intra-layer coding that performs a motion prediction/compensation in a unit of a block using a region in a same layer.

11. The encoder as recited in claim 7, wherein the region encoder performs an inter-layer coding on a block of a selected coding region using a region arranged in a highest layer among regions of lower layers that have regions overlapped with the block of the selected coding region.

12. The encoder as recited in claim 11, wherein the region encoder performs an inter-layer coding on the block using a detected OR when an OR is detected from the regions overlapped with the block of the selected coding region.

13. The encoder as recited in claim 11, wherein the region encoder outputs a coded stream for a block of a selected coding region by performing an encoding based on one minimizing a bit rate among an inter layer coding mode, a motion prediction/compensation mode and an intra mode.

14. The encoder as recited in claim 8, wherein the region encoder performs an inter-layer coding on a block of a coding region using a region of a one-level lower layer having a region overlapped with the block.

15. The encoder as recited in claim 14, wherein the region encoder composes an intermediate region by up- sampling motion information and texture information of a region of a lower layer in order to reference the intermediate region when an inter-layer coding is performed.

16. The encoder as recited in claim 15, wherein the region encoder outputs a coded stream for a block of a coding region by performing an encoding based on one minimizing a bit rate among an inter-layer coding mode, a motion prediction/compensation coding mode and an intra- mode .

17. The encoder as recited in claim 9, wherein the region encoder performs an interpolation after deciding a pixel value in an outside of a region of interest through an extrapolation if an interpolation is required to perform using a pixel in an outside of a region of interest in the inter-layer coding.

18. The encoder as recited in claim 10, wherein the region encoder performs an interpolation after deciding a value of a pixel in an outside of a region of interest through extrapolation if an interpolation is required to perform using a pixel in the outside of the region of interest in an intra-layer coding that performs the motion prediction/compensation.

19. The encoder as recited in one of claims 17 and 18, wherein the extrapolation uses a zero order extrapolation or a method of substitution with a predetermined constant.

20. The encoder as recited in one of claims 17 and 18, wherein the region encoder adds a flag into a coded stream where the flag denotes that a region of interest is in the video image.

21. The encoder as recited in claim 20, wherein the flag is one of a roi_flag, a roi_enable, a boundary_handling and a multiple_roi_flag.

22. The encoder as recited in claim 20, wherein the region of interest is generated as a slice group map type 2 ( slice_group_map_type 2 ) .

23. The encoder as recited in claim 20, wherein the region of interest is generated as a slice group map type ( slice_group_map_type ) having a type identification higher than 6.

24. The encoder as recited in claim 20, wherein the region encoder adds geometric information of a region of interest of the video image into a coded stream.

25. The encoder as recited in claim 24, wherein the geometric information is expressed through variables a num_rois_minusl , a roi_top_left and a roi_bottom_right .

26. The encoder as recited in claim 24, wherein the geometric information of the region of interest is expressed as scan numbers of macro blocks that compose the region of interest.

27. The encoder as recited in claim 26, wherein the scan number is a number generated through a rester scan on macro blocks of the video image.

28. The encoder as recited in claim 9, wherein the region encoder does not perform an interpolation when the interpolation requires a pixel in an outside of a region of interest to use in the inter-layer coding.

29. The encoder as recited in claim 9, wherein an inter-layer coding is performed on a block located at a region of interest boundary in the inter-layer coding.

30. The encoder as recited in claim 10, wherein the region encoder does not perform an interpolation when the interpolation requires a pixel in an outside of a region of interest to use in an inter-layer coding that performs the motion prediction/compensation.

31. The encoder as recited in claim 10, wherein motion information is estimated in a unit of an integer pixel when the interpolation is required to perform using a pixel in an outside of a region of interest in an intra- layer coding that performs the motion prediction/ compensation .

32. The encoder as recited in claim 9, wherein the region encoder outputs a coded stream including information about regions in lower layers required for encoding a region.

33. The encoder as recited in claim 32, wherein the coded stream includes a region index of a coding region, a layer index of the coding region, the number of regions related to the coding and information about regions related to the coding.

34. The encoder as recited in claim 33, wherein the information about regions related to the coding includes a region index of the related region, a layer index and location information.

35. The encoder as recited in claim 34, wherein the location information of the related region is expressed as a location of a horizontal axis, a location of a vertical axis, a horizontal length and a vertical length with respect to the coding region.

36. The encoder as recited in claim 34, wherein the location information of the related region is expressed through rester scan numbers assigned to macro blocks of a video image .

37. A decoder comprising: a decoding region extractor for receiving selection information for selecting a region to decode, and extracting region information required for decoding a region of interest (ROI) corresponding to the selected region from a coded stream including coding information about a plurality of regions of interest; and a region decoder for receiving the extracted region information and recovering an image signal of a region of interest in a video image by performing a decoding based on the received region information.

38. The decoder as recited in claim 37, wherein the selection information includes information about location of the decoding region in the video image and a decoding resolution.

39. The decoder as recited in claim 37, wherein the decoding region extractor extracts a coded stream of a region of interest corresponding to the selected region from the coded stream, and extracts information of related regions in a lower layer from the extracted coded stream for decoding the region of interest.

40. The decoder as recited in claim 39, wherein the extracted coded stream includes a region index of the region of interest, a layer index of the region of interest, the number of the related regions and information about the related regions.

41. The decoder as recited in claim 39, wherein the related region information includes a region index, a layer index and location information of the related region.

42. The decoder as recited in claim 41, wherein the location information of the related region is expressed as a horizontal axis, a vertical axis, a horizontal length and a vertical length with respect to the region of interest.

43. The decoder as recited in claim 41, wherein the location information of the related region is expressed through rester scan numbers assigned to macro blocks of a video image .

44. The decoder as recited in claim 39, wherein the region decoder performs an inter-layer decoding in a unit of a block using a region of a lower layer having regions of a lower resolution.

45. The decoder as recited in claim 39, wherein the region decoder performs an intra-layer decoding that perform a motion prediction/compensation in a unit of a block using regions in a same layer that has regions of a same resolution.

46. The decoder as recited in claim 44, wherein the region decoder does not perform an interpolation if the interpolation requires a pixel in an outside of a region of interest to use in the inter-layer decoding.

47. The decoder as recited in claim 44, wherein an intra-layer decoding is performed on a block located at a region of interest boundary in the inter-layer decoding.

48. The decoder as recited in claim 45, wherein the region decoder does not perform an interpolation when the interpolation requires a pixel in an outside of a region of interest to use in the intra-layer decoding that performs the motion prediction/compensation.

49. The decoder as recited in claim 45, wherein the motion information is estimated in a unit of an integer pixel if the interpolation is required to perform using a pixel in a region in an outside of a region of interest in the intra-layer decoding that performs the motion prediction/compensation.

50. The decoder as recited in one of claims 44 and 45, wherein the region decoder analyzes geometric information of the region of interest included in the coded stream when a flag denoting existence of a region of interest in the coded stream is enabled, and performs an interpolation after deciding a pixel value in the outside of the region of interest through an extrapolation.

51. The decoder as recited in claim 39, wherein the region decoder performs an inter-layer decoding on a block of a decoding region using a region arranged in a highest layer among related regions of a lower layer having a region overlapped with the block when a coding mode of a block of a decoding region is an inter-layer coding mode.

52. The decoder as recited in claim 39, wherein the region decoder composes an intermediate region to reference when an inter-layer decoding is performed on an upper layer by up-sampling motion information and texture information of regions in a lower layer.

53. The decoder as recited in claim 52, wherein the region decoder performs an inter-layer decoding on a block of a decoding region using a region of a one-level lower layer having a region overlapped with the block if a coding mode of the block is an inter-layer coding mode.

54. The decoder as recited in claim 37, wherein each of the regions of interest is configured of small rectangular regions for supporting an interactive decoding.

55. A video encoding apparatus comprising: means for receiving coding region information about a plurality of regions of interest for encoding an input video image; and means for encoding the regions of interest, wherein an interpolation is not performed if the interpolation is required to reference to a pixel in an outside of the regions of interest in encoding the regions of interest.

56. The video encoding apparatus as recited in claim

55, wherein the encoding means performs a motion prediction/compensation coding using a temporal correlation,

57. The video encoding apparatus as recited in claim

56, wherein the encoding means predicts motion information in a unit of an integer pixel if an interpolation is required to perform with a reference to a pixel in the outside of the regions of interest for performing the motion prediction/compensation coding.

58. The video encoding apparatus as recited in claim 55, wherein the coding region information includes information about location of the regions of interest in a video image and coding resolution of the regions of interest.

59. The video encoding apparatus as recited in one of claims 55 and 58, further comprising: means for arranging the regions of interest according to a coding resolution.

60. The video encoding apparatus as recited in claim 59, wherein the encoding means performs an inter-layer coding using regions arranged at a lower layer of a lower resolution to encode a region of interest arranged at an upper layer of a higher resolution.

61. The video encoding apparatus as recited in claim

60, wherein the encoding means performs an intra-layer coding to encode a block that requires an interpolation to perform using a pixel in an outside of a region of interest for performing an inter-layer coding on the region of interest in a unit block.

62. The video encoding apparatus as recited in claim 55, wherein each of the regions of interest is configured of small rectangular regions to support an interactive decoding.

63. The video encoding apparatus as recited in claim 55, wherein region of interest (ROI) location information included in a coded stream outputted from the encoding means is expressed as coordinates.

64. The video encoding apparatus as recited in claim 55, wherein the region of interest location information included in a coded stream outputted from the encoding means is expressed as scan numbers assigned to macro blocks in an entire video image.

65. The video encoding apparatus as recited in claim 60, wherein the inter-layer coding uses a region in a lower layer overlapped with a region of interest to encode.

66. The video encoding apparatus as recited in claim 65, wherein the region overlapped with the region of interest is defined as an OR in a layer newly generated below the lower layer.

67. The video encoding apparatus as recited in claim 65, wherein the region overlapped with the region of interest is defined as an OR in the lower layer.

68. The video encoding apparatus as recited in claim

67, wherein the encoding means composes an intermediate region suitable to a resolution of an upper layer by up- sampling texture information of a region of interest or an OR in a lower layer if there is no spatially overlapped region of interest in an upper layer of a region of interest or an OR.

69. The video encoding apparatus as recited in claim

68, wherein the encoding means performs an inter-layer coding using a region in a one-level lower layer.

70. A video decoding apparatus comprising: means for receiving selection information for selecting a region to decode in an input video image; and means for decoding a region of interest corresponding to the decoding region, wherein an interpolation is not performed if the interpolation is required to reference a pixel in the outside of the region of interest for decoding the region of interest.

71. The video decoding apparatus as recited in claim 70, wherein the decoding means performs a motion prediction/compensation decoding using a temporal correlation.

72. The video decoding apparatus as recited in claim 71, wherein the decoding means predicts motion information in a unit of an integer pixel if an interpolation is required to perform with reference to a pixel in the outside of the region of interest for performing the motion prediction/compensation decoding .

73. The video decoding apparatus as recited in claim 70, wherein the selection information includes information about location of the decoding region in the video image and a decoding resolution.

74. The video decoding apparatus as recited in claim 70, wherein the decoding means performs an inter-layer decoding using regions arranged at a lower layer of a lower resolution to decode a region of interest arranged at an upper layer of a higher resolution.

75. The video decoding apparatus as recited in claim 74, wherein the decoding means performs an intra-layer coding on a block that is required to perform an interpolation using a pixel in an outside of a region of interest for performing an inter-layer decoding in a unit block for the region of interest.

76. The video decoding apparatus as recited in claim 70, wherein a plurality of small rectangular regions of interest is decoded for the decoding region.

77. The video decoding apparatus as recited in claim 70, wherein the region of interest location information is expressed as scan numbers assigned to macro blocks in an entire video image.

78. The video decoding apparatus as recited in claim 74, wherein the inter-layer decoding uses an OR in a lower layer that is overlapped with a region interest to decode.

79. The video decoding apparatus as recited in claim

78, wherein the decoding means composes an intermediate region suitable to an upper layer by up-sampling texture information of a region of interest or an OR in a lower layer when there is no region of interest spatially overlapped with a region of interest or an OR in an upper layer.

80. The video decoding apparatus as recited in claim

79, wherein the decoding means performs an inter-layer decoding using a region of a one-level lower layer.

81. An encoding method for providing a spatial scalability comprising the steps of: a) receiving information about locations and resolutions of a plurality of regions of interest for encoding an input video image; b) arranging the regions of interest into corresponding layers according to a resolution; and c) encoding the arranged region of interest in a block unit.

82. The encoding method as recited in claim 81, wherein in the step c), an intra-layer coding is performed on a block that requires an interpolation to perform using a pixel in an outside of a region of interest when an inter-layer coding is performed using region of interest information in a lower layer.

83. The encoding method as recited in claim 81, wherein in the step c), motion information of a block that requires an interpolation using a pixel in an outside of a region of interest is predicted in an integer pixel unit when the motion prediction/compensation coding is performed using ROI information in a same layer having a temporal correlation.

84. The encoding method as recited in claim 81, wherein in the step c), an interpolation is performed on a block requiring an interpolation to perform using a pixel in an outside of a region of interest after deciding an external pixel value through an extrapolation when an inter-layer coding is performed using region of interest information of a lower layer.

85. The encoding method as recited in claim 81, wherein in the step c), an interpolation is performed on a block requiring an interpolation to perform using a pixel in an outside of a region of interest after deciding an external pixel value when a motion prediction/compensation coding is performed using region of interest information of a same layer having a temporal correlation.

86. The encoding method as recited in one of claims 84 and 85, wherein the step c) further includes adding a flag denoting an existence of a region interest of a video image into a coded stream.

87. The encoding method as recited in claim 81, wherein the region of interest location information is expressed as scan numbers assigned to macro blocks for a video image.

88. The encoding method as recited in claim 81, further comprising detecting an overlapped region (OR) between regions of interest, wherein in the step c), an inter-layer coding is performed for a region of interest using the OR.

89. The encoding method as recited in claim 88, further comprising composing an intermediate region by up- sampling for a region not spatially overlapped with the region of interest or the OR in an upper layer, wherein in the step c), an inter-layer coding is performed using information of regions arranged at a one- level lower layer.

90. A decoding method comprising the steps of: a) receiving information about a location and a resolution of a decoding region in an input video image; b) extracting region of interest information corresponding to the decoding region from a transmitted coded stream; and c) decoding the region of interest using the extracted information.

91. The decoding method as recited in claim 90, wherein in the step c), an intra-layer decoding is performed on a block requiring an interpolation to perform using a pixel in an outside of a region of interest when an inter layer decoding is performed using region of interest information of a lower layer.

92. The decoding method as recited in claim 90, wherein in the step c), motion information of a block that requires an interpolation to perform using a pixel in an outside of a region of interest is predicted in a unit of an integer pixel when a motion prediction/compensation decoding is performed using region of interest information of a same layer having a temporal correlation.

93. The decoding method as recited in claim 90, wherein in the step c), an interpolation is performed after deciding an external pixel value through an extrapolation for a block that requires an interpolation to perform using a pixel in an outside of a region of interest when an inter-layer decoding is performed using region of interest information of a lower layer.

94. The decoding method as recited in claim 90, wherein in the step c), an interpolation is performed after deciding an external pixel value through an extrapolation for a block that requires an interpolation using a pixel in an outside of a region of interest when a motion prediction/compensation decoding is performed using region of interest information of a same layer having a temporal correlation.

95. The decoding method as recited in one of claims 93 and 94, wherein the step c) further includes determining whether a region of interest of a video image in a coded stream exists through a flag denoting existence of a region of interest included in the coded stream.

96. The decoding method as recited in claim 90, wherein location information of the region of interest is expressed as scan numbers assigned to macro blocks in a video image.

97. The decoding method as recited in claim 90, wherein in the step c), an inter-layer decoding is performed for a region of interest using an overlapped region OR between the regions of interest.

98. The decoding method as recited in claim 97, further comprising: composing an intermediate region through an up-sampling a region not spatially overlapped to the region of interest or the OR of an upper layer, Wherein in the step (c), an inter-layer decoding is performed using information of regions arranged at a one- level lower layer.