WO2009131287A1

WO2009131287A1 - Method for encoding and decoding image of ftv

Info

Publication number: WO2009131287A1
Application number: PCT/KR2008/006830
Authority: WO
Inventors: Jung Eun Lim; Jin Seok Im; Seung Jong Choi; Jong Chan Kim
Original assignee: Lg Electronics Inc.
Priority date: 2008-04-23
Filing date: 2008-11-20
Publication date: 2009-10-29

Abstract

A method for encoding and decoding a Free-viewpoint Television (FTV) image including a color image and a depth image is provided. The method for encoding the FTV image includes encoding information indicating whether or not an image is an FTV image, encoding at least one of view information, base view information and anchor picture information of a depth image if the image is an FTV image, and encoding the depth image using predictive information of a color image. Accordingly, it is possible to encode or decode the color image and the depth image of the FTV image in a layer structure.

Description

[DESCRIPTION] [Invention Title]

METHOD FOR ENCODING AND DECODING IMAGE OF FTV

[Technical Field]

<i> The present invention relates to a method for encoding and decoding a Free- viewpoint Television (FTV) image, and more particularly, to a method for encoding and decoding an FTV image, which is capable of encoding and decoding a depth image and a color image of the FTV image in a layer structure.

<2>

[Background Art]

<3> In a three-dimensional TV broadcast, there are a stereo image based on binocular disparity, a multiview image which is acquired from several angles, and a Free-viewpoint Television (FTV) image including a multiview image and a depth image.

<4> In the existing standard, a Moving Picture Expert Group-2 (MPEG-2) multiview profile encodes and decodes a three-dimensional (3D) TV image using temporal scalability. This standard is suitable for a stereo moving image by introducing a disparity predicting method. However, a method for encoding and decoding a multiview moving image with a large number of views is not suggested.

<5> Recently, the International Engineering Consortium/International Electrotechnical Commission Joint Video Team ( ISO/IEC JVT) has finished Multiview Video Coding (MVC) standardization for compressing a multiview moving image using an H.264/AVC amendment. However, since the amount of image data to be transmitted is large, as increase in bandwidth is inevitable, and a view location for allowing a viewer to view a 3D moving image is restricted.

<6>

[Disclosure] [Technical Problem] <7> Therefore, the present invention has been made in view of the above problems, and it is an object of the present invention to provide a method for encoding and decoding a Free-viewpoint Television (FTV) image, which is capable of encoding and decoding a depth image and a color image of the FTV image in a layer structure.

<8>

[Technical Solution]

<9> In accordance with an aspect of the present invention, the above and other objects can be accomplished by the provision of a method for encoding a Free-viewpoint Television (FTV) image, the method including: encoding information indicating whether or not an image is an FTV image; and encoding at least one of view information, base view information and anchor picture information of a depth image if the image is an FTV image.

<io> In accordance with another aspect of the present invention, there is provided a method for decoding a Free-viewpoint Television (FTV) image, the method including: decoding information whether or not an image is an FTV image; and decoding at least one of view information, base view information and anchor picture information of a depth image if the image is an FTV image.

<π> In accordance with another aspect of the present invention, there is provided a method for correcting a Free-viewpoint Television (FTV) image, the method including: generating a third view image based on a first view image and a second view image; detecting a reference block including a first block in which holes are generated and a second block which is adjacent to the first block and in which holes are not generated, in the third view image; and comparing the second block in the reference block with a predetermined block of at least one of the first view image and the second view image and replacing the first block with a block adjacent to the predetermined block if a difference therebetween is a predetermined value or less. [Advantageous Effects]

<14> According to the present invention, since a color image and a depth image of a Free-viewpoint Television (FTV) image are encoded in a layer structure, it is possible to remove repeated information due to similarity between the color image and the depth image so as to increase compression efficiency. That is, it is possible to remove repeated similar information such as an intra block prediction mode, motion compensation information or disparity compensation information so as to increase compression efficiency, when the depth image of an auxiliary layer is compressed.

<i5> In addition, holes generated when the color image and the depth image of a predetermined view are restored based on a view of a referred image received from bitstreams can be simply corrected with limited calculation.

<i6> Such a method can minimize the correction of a system, which is configured in advance, by the addition of the syntax, in view of the video standard and can support the use of information about peripheral division images in the video compression standard.

<17>

[Description of Drawings]

<i8> The above and other objects, features and other advantages of the present invention will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which-^"

<19> FIG. 1 is a view illustrating the encoding and decoding of a Free- viewpoint Television (FTV) image according to the present invention;

<2o> FIGS. 2 to A are views illustrating a method for predicting a multiview image;

<2i> FIG. 5 is a flowchart illustrating a method for encoding an FTV image according to an embodiment of the present invention;

<22> FIG. 6 is a view illustrating a bitstream structure according to the encoding method of FIG. 5;

<23> FIG. 7 is a view showing an apparatus for encoding an FTV image according to an embodiment of the present invention; <24> FIG. 8 is a view used in the description of FIG. 7;

<25> FIG. 9 is a view showing a relationship between a color image and a depth image of the FTV image; <26> FIG. 10 is a view illustrating a bitstream data structure associated with FIG. 9; <27> FIG. 11 is a block diagram showing an apparatus for decoding an FTV image according to an embodiment of the present invention; <28> FIG. 12 is a flowchart illustrating a method for decoding an FTV image according to an embodiment of the present invention; <29> FIGS. 13 and 14 are views illustrating a process of generating an FTV image ; <3o> FIGS. 15 and 16 are views showing an example of a method for correcting an FTV image according to the present invention; and <3i> FIGS. 17 and 18 are views illustrating a hole filling process.

<32>

[Best Mode]

<33> Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings.

<34> FIG. 1 is a view illustrating the encoding and decoding of a Free- viewpoint Television (FTV) image according to the present invention.

<35> As shown, a system 100 for encoding and decoding an FTV image includes an apparatus 105 for encoding the FTV image and an apparatus 135 for decoding the FTV image. The FTV image of the present invention includes a color image and a depth image.

<36> First, the apparatus 105 for encoding the FTV image includes a first encoder 110 for encoding the color image, a second encoder 120 for encoding the depth image, and a multiplexer 130 for multiplexing the encoded color and depth images.

<37> The first encoder 110 is a base layer encoder, which encodes a multiview moving image of the color image. The encoding method of the first encoder 110 may be performed according to a Multiview Video Coding (MVC) protocol . <38> The second encoder 120 is an auxiliary layer encoder, which encodes the depth image using supplementary information encoded by the first encoder

110. At this time, the second encoder 120 preferably performs compression by removing information repeated due to similarity between the depth image and the color image. <39> The multiplexer 130 multiplexes the bitstreams of the color image from the first encoder 110 and the bitstreams of the depth image from the second encoder 120. <40> The detailed operation of the first encoder 110 or the second encoder

120 will be described later with reference to FIG. 7. <4i> Next, the apparatus 135 for decoding the FTV image includes a demultiplexer 140 for demultimplexing input bitstreams, a first decoder 160 for decoding the bitstreams of the color image, and a second decoder 150 for decoding the bitstreams of the depth image. <42> The demultiplexer 140 demultiplexes the input bitstreams and extracts the bitstreams of the color image and the bitstreams of the depth image. <43> The first decoder 160 is a base layer decoder, which decodes the bitstreams of the color image so as to restore the color image. The decoding method of the first decoder 150 may be performed according to the MVC protocol . <44> The second decoder 150 is an auxiliary layer decoder, which decodes the depth image using a supplementary image decoded by the first decoder 160. At this time, the second decoder 150 may perform restoration using information repeated due to the similarity between the depth image and the color image. <45> FIGS. 2 to 4 are views illustrating a method for predicting a multiview image. <46> First, FIG. 2 shows a predicting method in the MVC. FIG. 2 shows time-directional prediction 210 using motion information and view-directional prediction 220 using disparity information. <47> An image in a reference view (view 0, 230) uses only the image in the reference view as a referred image. An anchor 240 refers to only an image of the same time.

<48> Each picture is encoded to one of three I-, P- and B-pictures. The I- picture does not use vector or disparity information, a macroblock of the P- picture is an intra macroblock and has one piece of motion information or disparity information in each block, and a macroblock of the B-picture is an intra macroblock or has two pieces of motion information or disparity information as a maximum in each block.

<49> In Scalable Video Coding (SVC), an inter-image layer structure is established and an image of a lower layer is used as a referred image, for a temporal and spatial advantage and the improvement of image quality. That is, a pixel value in a block of the lower layer is used for prediction of a pixel value of a target block.

<50> However, in the method for encoding the FTV image according to the present invention, if encoding is performed using the depth image as the auxiliary layer and the color image as the base layer, since the depth image and the color image are different from each other in terms of the characteristics thereof, the pixel value of the base layer, that is, the color image, is not referred to, but only the auxiliary layer, that is, the depth image is used as the referred image.

<5i> Next, FIG. 3 shows a predicting method of the FTV image. The Group of Picture (GOP) structure of FIG. 3 is equal to that of FIG. 2. In addition, the scheme of encoding the image in the I-, P-, and B-pictures may be equally performed at the same view and the same time zone in the color image and the depth image.

<52> The depth color encoding or decoding sequence may be equal to that of the color image.

<53> It can be seen from FIG. 4 that the picture encoding or decoding sequences of the color image 410 and the depth image 420 are equal.

<54> The predictive structure of the depth moving image may be implemented by a Memory Management Control Operation (MMCO) defined in the existing moving image standard.

<55> FIG. 5 is a flowchart illustrating a method for encoding an FTV image according to an embodiment of the present invention, and FIG. 6 is a view illustrating a bitstream structure according to the encoding method of FIG. 5.

<56> As shown, first, information indicating whether or not an image is an FTV image is encoded (S510).

<57> As described above, the FTV image includes the color image and the depth image. For the encoding of the FTV image and, more particularly, the encoding of the depth image, similar to H.264/AVC and MVC which are the existing moving image compression standards, (Network Abstraction Layer) NAL units of the depth image may be used. FIGS. 6A and 6B show the NAL units of the depth image. Each of the NAL units of the depth image may include an NAL header and a Raw Byte Sequence Payload (RBSP).

<58> The NAL header includes "forbidden_zero_bit" , "nal_ref_idc" which is a flag indicating whether or not a picture is a referred picture, and "nal _unit_tyρe" which is an identifier for identifying the type of each of the NAL units. The NAL header may further include supplementary information according to the NAL type.

<59> In the present invention, information indicating whether or not the image is an FTV image is defined in "nal_unit_type" . The information indicating whether or not the image is an FTV image may indicate whether or not the image is an FTV image. In addition, the information indicating whether or not the image is an FTV image may include information indicating the depth image. That is, it is possible to deduce whether or not the image is an FTV image, from the information indicating the depth image. If "nal_unit_type" is in a range of 0 to 31, an undefined value may be defined as the information indicating the FTV image and, more particularly, the information indicating the depth image.

<60> The information indicating the FTV image and the information indicating the depth image of the FTV image may be separately defined in

"nal_unit_type". <6i> Next, in the case of the FTV image, at least one of view information, base view information and anchor picture information of the depth image is encoded (S520). <62> The NAL header may include at least one of the view information, the base view information and the anchor picture information of the depth image.

Such information may be added as "nal_unit_header_ftv_extension()" as shown.

"nal_unit_header_ftv_extension()" may further include information such as

"dependency_id" . <63> In the NAL units, non-Video Coding layer (VCL) NAL units including a

Sequence Parameter Set (SPS), a Picture Parameter Set (PPS) and a

Supplemental Enhancement Information (SEI) referred to by VCL NAL units of the color image base layer and the depth image auxiliary layer of the FTV image may be shared in the embodiment. <64> Although not shown, a step of encoding at least one of the SPS, the

PPS and the SEI may further be included. In addition, a step of encoding the color image and the depth image of the FTV image may further be included.

The description of the encoding of the color image and the depth image will be described later with reference to FIG. 7. <65> In the present invention, the file structure of the bitstreams of the

FTV image may be designed using the layer structure of the existing SVC standard. That is, if "nal_unit_type" has a value of "14" or "20"

"nal_unit_header_svc_extension()" may be performed. <66> According to the method for encoding the FTV image defined as described above, the existing MVC encoding apparatus skips the NAL units of the depth image, which are not defined in the MVC standard, of the NAL units of the FTV image without decoding, and decodes only the color image of the

FTV image. <67> FIG. 7 is a view showing an apparatus for encoding an FTV image according to an embodiment of the present invention, and FIG. 8 is a view used in the description of FIG. 7.

<68> As shown, the apparatus 700 for encoding the FTV image shown in FIG. 7 includes a transform and quantization unit 710, an entropy encoding unit 715, a motion estimation unit 720, a motion compensation unit 725, an intra prediction unit 730, an inverse quantization and inverse transform unit 745, a filter unit 750, and a memory unit 755. The apparatus for encoding the FTV image shown in FIG. 7 may be similar to an existing H.264 encoding apparatus.

<69> The transform and quantization unit 710 transforms an input image or a residual signal, which is a difference between the input image and a predictive image, to frequency-domain data and quantizes the transformed frequency-domain data.

<70> The entropy encoding unit 715 encodes the output of the transform and quantization unit and supplementary information (a motion vector or the like).

<7i> The motion estimation unit 720 compares a referred image and the input image, estimates motion, and calculates a motion vector. The motion compensation unit 725 calculates a predictive image obtained by compensating for the referred image based on the calculated motion vector. Inter prediction is performed using the motion estimation unit 720 and the motion compensation unit 725. The intra prediction unit 730 performs intra prediction and calculates a predictive image.

<72> The inverse transform and inverse quantization unit 745 inversely transforms and inversely quantizes the output of the transform and quantization unit 710. The predictive images predicted by the motion compensation unit 725 and the intra prediction unit 730 are summed and the summed image is subjected to deblocking filtering by the filter unit 750. The filtered value is stored in the memory unit 755 and is used as the referred image upon inter prediction.

<73> The apparatus 700 for encoding the FTV image shown in FIG. 7 is applicable to both the color image and the depth image.

<74> Although the transform and quantization operations are performed by the transform and quantization unit 710 in the drawing, these operations may be performed by a transform unit and a quantization unit, respectively. The inverse transform operation and the inverse quantization operations may be performed by an inverse transform unit and an inverse quantization unit, respectively, unlike in the drawing.

<75> The apparatus 700 for encoding the FTV image shown in FIG. 7 may perform the deblocking filtering method of the filter unit 750, the intra prediction method of the intra prediction unit 730, the interpolation method of the motion compensation unit 725, and the transform and quantization methods of the transform and quantization unit 710, which are different from the methods of the H.264/AVC or MVC encoding apparatus, for the encoding of the depth image.

<76> The color image of the base layer and depth image of the auxiliary layer acquired at the same view and the same time are different from each other in terms of information such as a distribution of pixel values in the image, complexity, a boundary thereof or the like. However, an intra block prediction mode based on similarity between pixels in the picture, motion information and referred image information of time-directional prediction, and disparity information and referred image information of view-directional prediction have similarities. An embodiment thereof will be described in detail with reference to FIG. 8.

<77> It can be seen from FIG. 8 that color image blocks 830a and 830b at specific locations 820a and 820b in the FTV image 810 and depth image blocks 840a and 840b having the same time, the same view and the same location are similar in view of spatial distributions of pixel values. Accordingly, an optimal prediction type of the color image in the base layer is equal to an optimal prediction type of the depth image when applied to the depth image of the auxiliary image or a criterion such as a Sum of Absolute Difference (SAD) or a Mean Square Error (MSE) is hardly changed.

<78> A base layer block 860 at another specific location 850 has a very complicated shape, but the depth image has similar pixel values. Accordingly, if the criterion such as the SAD or the MSE of the optimal prediction type of the depth information is compared with that of the other prediction types, the difference therebetween is not large. Thus, even when the prediction type of the color image of the base layer is applied to the depth image, the criterion is hardly changed.

<79> The prediction type includes all one selected from three modes including an intra prediction mode, a motion compensation mode and a disparity compensation mode, the motion information, and the disparity information.

<80> In the present invention, when the depth image of the auxiliary layer is compressed based on the characteristics of the FTV image, the repeated similar information such as the intra block prediction mode, the motion compensation information, the disparity compensation information or the like based on similarity between peripheral blocks or referred images is removed, thereby improving compression efficiency.

<8i> For example, since neighboring pixels have similar characteristics in the pixel value of the depth image, dc prediction may be performed using the similar characteristics. The apparatus 700 for encoding the FTV image shown in FIG. 7 may further include a dc prediction unit. The dc prediction unit may perform prediction by referring to the dc values of neighboring blocks or using a plurality of dc values used representatively. At this time, a dc table may be used.

<82> If the dc table stores a plurality of dc values, the dc values of the depth image used frequently and indexes corresponding to the dc values may be stored. The plurality of dc values and indexes of the dc table may be encoded in a first syntax level, e.g., a picture layer level. If the dc table is encoded in the first syntax level, only the indexes may be encoded in a second syntax level lower than the first syntax level, e.g., a macroblock layer level or a block layer level.

<83> The apparatus 700 for encoding the FTV image shown in FIG. 7 may further include a motion vector storage unit for storing the motion vector of the color image. Accordingly, the motion estimation unit 720 may compare the motion vector of the color image with the motion vector of the depth image so as to perform motion estimation with an optimal motion vector.

<84> FIG. 9 is a view showing a relationship between the color image and the depth image of the FTV image.

<85> FIG. 9 shows information about the base layer which may be referred to by the block of the same view, the same time, and the same location between a multiview color image 910 and a depth image 920 corresponding thereto.

<86> In the case of an intra block, an intra prediction method mode 930 of color data is used as a prediction method mode 940 of a current block (925).

<87> In an inter block, information 950 about the index, the motion information or the disparity information of a referred image, a sub-block structure, a prediction mode (forward, backward, direct or the like) is used as information 960 of the current block (945).

<88> FIG. 10 is a view illustrating a bitstream data structure associated with FIG. 9.

<89> In the case of the type which may use the information described with reference to FIG. 9, information indicating whether or not predictive information of the color image is available for the encoding of the depth image is added and encoded. This information may be called "base_mode_flag" . If the predictive information of the color image is used, "base_mode_flag" is encoded to "1". If "base_mode_flag" is "1", the block type of the depth image is adapt ively determined according to the block type of the color image.

<90> If the color image is of the intra block type, it is processed by the intra block having the same intra prediction mode. In contrast, if the base layer color image block is of the inter block type, the same referred image index, motion vector or disparity vector, prediction direction, sub-block partition size or the like may be used.

<9i> In contrast, if "base_mode_flag" is "0" that is, if the information ("base_mode_flag") does not use the base layer color image information, the type information ("mb_type") of the block or the like is subsequently transmitted similar to the existing block.

<92> FIG. 11 is a block diagram showing an apparatus for decoding an FTV image according to an embodiment of the present invention.

<93> As shown, the apparatus 1100 for decoding the FTV image shown in FIG. 11 includes an entropy decoding unit 1110, an inverse quantization and inverse transform unit 1115, a filter unit 1120, a memory unit 1125, a motion compensation unit 1130, and an intra prediction unit 1135.

<94> The entropy decoding unit 1110 entropy-decodes input bitstreams and outputs the decoded bitstreams. The entropy decoding unit 1110 may output a transform and quantization coefficient such as a residual signal and supplementary information (a motion vector or the like).

<95> The inverse quantization and inverse transform unit 1115 inversely transforms and inversely quantizes the output of the entropy decoding unit 1110. The output of the entropy decoding unit 1110 may be an encoded difference signal or an encoded motion vector. The encoded difference signal may be a difference signal due to an intra prediction mode or an inter prediction mode, or a difference signal due to a dc prediction mode according to the present invention.

<96> The motion compensation unit 1130 calculates a predictive image obtained by compensating for a referred image based on the received motion vector, and the intra prediction unit 1135 performs intra prediction and calculates a predictive image.

<97> The predictive images calculated by the motion compensation unit 1130 and the intra prediction unit 1135 are combined with the residual signal inversely quantized and inversely transformed by the inverse quantization and inverse transform unit 1115 and the combined signal is subjected to deblocking filtering by the filter unit 1120. The filtered value is stored in the memory unit 1125 and is used as a referred image upon inter prediction.

<98> For the encoding and the decoding of the FTV image, the apparatus 1100 for decoding the FTV image shown in FIG. 11 may further include a dc prediction unit. The dc prediction unit may perform prediction by referring to the dc values of neighboring blocks or using a plurality of representative dc values. At this time, a dc table may be used. <w> If the motion vector of the color image is commonly used, the motion compensation unit 1130 may perform motion compensation using the motion vector of the color image when decoding the depth image. <ioo> The decoding apparatus 1100 can synthesize the moving image at a view which is not transmitted using view synthesis, rendering, a depth-image-based rendering (DIBR) method or the like so as to support free view movement, which will be described later with reference to FIG. 13. <ioi> FIG. 12 is a flowchart illustrating a method for decoding an FTV image according to an embodiment of the present invention. <iO2> First, information indicating whether or not an image is an FTV image is decoded (S1210). <i03> The FTV image includes the color image and the depth image as described above. FIGS. 6A and 6B show the NAL units of the depth image.

Each of the NAL units of the depth image may include an NAL header and an

RBSP. <i04> The NAL header includes "forbidden_zero_bit", "nal_ref_idc" and

"nal_unit_type" . <i05> Information indicating whether or not the image is an FTV image is defined in "nal_unit_type" as described above. If "nal_unit_type" is in a range of 0 to 31, an undefined value may be defined and used as the information indicating the FTV image and, more particularly, the information indicating the depth image. <iO6> The information indicating the FTV image and the information indicating the depth image of the FTV image may be separately defined in

"nal_unit_type" . <i07> Next, in the case of the FTV image, at least one of view information, base view information and anchor picture information of the depth image is decoded (S1220). Such information may be added as "nal_unit_header_ftv_extension()" as shown in FIG. 6B.

<io8> Although not shown, a step of decoding at least one of the SPS, the PPS and the SEI may be further included. In addition, a step of decoding the color image and the depth image of the FTV image may be further included. The decoding of the color image and the depth image was described with reference to FIG. 11 and a description thereof will be omitted herein.

<iO9> If the FTV image is encoded using the SVC layer structure, "nal_unit_header_svc_extensionθ" of FIG. 6B may be performed so as to perform decoding.

<πo> The existing MVC encoding apparatus skips the NAL units of the depth image, which are not defined in the MVC standard, of the NAL units of the FTV image without decoding and decodes only the color image of the FTV image.

<iii> FIGS. 13 and 14 are views illustrating a process of generating an FTV image.

<ii2> In order to generate a third view image based on a first view image 1301 and a second view image 1302, a 3D warping method is used. Accordingly, a first view modified image 1304 and a second view modified image are generated, and the third view image is finally generated therefrom.

<ii3> However, as shown, regions which are not filled are generated in the first view modified image 1304 and the second view modified image 1303 generated as the reference of the third view image 1305. Accordingly, a region which is not filled is generated in the third view image 1305.

<ii4> However, the region which is not filled in the third view image 1305 is defined and used as a hole.

<ii5> In FIG. 14, a first view image 1401 and a second view image 1402 are aligned using an epipolar line 1415 so as to generate a first view modified image 1403 and a second view modified image 1405, and a third view image 1404 is finally generated therefrom.

<ii6> When an image is photographed, if the first view image 1401 and the second view image 1402 are photographed using the epipolar line 1415, the third view image 1404 may be directly generated without generating the first view modified image 1403 and the second view modified image 1405.

<ii7> FIGS. 15 and 16 are views showing an example of a method for correcting an FTV image according to the present invention.

<U8> As described above, in the third view image generated based on the first view image and the second view image, a reference block including a first block in which holes are generated and a second block which is adjacent to the first block and in which holes are not generated is detected, the detected reference block is compared with a predetermined block of at least one of the first view image and the second view image, and the first block in the reference block is corrected using the block adjacent to the predetermined block.

<ιi9> FIG. 15 shows a reference block 1502 including a first block 1503 and a second block 1504, which are adjacent to each other in a vertical direction, in a third view image 1501. Although the second block 1504 is disposed near the lower side of the first block 1503 in the drawing, the second block 1504 may be disposed near the upper side of the first block 1503. For hole processing efficiency, each of the first block 1503 and the second block 1504 may be a 4x4 block, but the present invention is not limited thereto.

<i2o> FIG. 16 shows a reference block 1602 including a first block 1603 and a second block 1604, which are adjacent to each other in a horizontal direction, in a third view image 1601. Although the second block 1604 is disposed near the right side of the first block 1603 in the drawing, the second block 1604 may be disposed near the left side of the first block 1603. For hole processing efficiency, each of the first block 1603 and the second block 1604 may be a 4x4 block, but the present invention is not limited thereto.

<i2i> Each of the reference blocks 1502 and 1602 is compared with at least one predetermined block of the first view image and the second view image, and the first block is replaced with the reference block so as to fill the holes.

<i22> In such a comparison process, the second block, in which the holes are not formed, and at least one predetermined block of the first view image and the second view image are compared in view of at least one of an average of the depth image, an average of the color image and a variance of the color image, and it is determined whether a difference therebetween is a predetermined value or less.

<i23> If the difference is the predetermined value or less, a block adjacent to the predetermined block is replaced with the first block so as to fill the holes.

<i24> For example, the blocks in the first view image and the second view image are compared with the average of the depth image of the second block so as to detect a block with a difference in average value of a predetermined value or less. The average of the color image and the variance of the color image of the detected block are compared. If a difference therebetween is a predetermined value or less, this block is selected as a matched block. Actually, it is preferable that the block, which is replaced with the first block, be a block located at a location corresponding to the second block.

<i25> Unlike the depth image, the reason why the variance of the color image is used as a comparison reference is because a change in value is not large in the depth image and thus a desired result can be obtained simply by the comparison with the average, but a change in value is large in the color image and thus the average and the variance should be compared in order to find an accurate matched block.

<I26> FIGS. 17 and 18 are views illustrating a hole filling process.

<i27> As shown, hole filling due to the detection of the reference block or the comparison with the predetermined block and the replacement with the predetermined block is preferably performed from the center of the third view image 1701 to the edge thereof.

<128> If the vertical reference block 1502 of FIG. 15 is used, hole filling is preferably performed in order of (D, (2) and (D, as shown. If the horizontal reference block 1602 of FIG. 16 is used, hole filling is preferably performed (1), (2), (3) and (4) based on a horizontal line 1702 and a vertical line 1703 of FIG. 17.

<129> The reason why hole filling is performed from the center of the third view image 1701 to the edge thereof is because a large number of holes is present in the vicinity of the edge and hole filling is more accurately performed by performing hole filling from a portion in which the number of holes is small to a portion in which the number of holes is large.

<i30> Hole filling may be performed earlier in the horizontal direction than in the vertical direction. Due to the change in value of the depth image in the vertical direction, the number of holes is generally large in the vertical direction. Accordingly, it is preferable that hole filling be first performed in the vertical direction. If hole filling is performed in the vertical direction, hole filling is mostly finished. If hole filling is not exceptionally finished, hole filling may be performed in the horizontal direction.

<i3i> As a result, as shown in Fig. 18, hole filling in the reference block 1805 of the third view image 1802 may be completed using the blocks 1804 and 1805 derived from the first view image 1801 and the second view image 1803.

<i32> The method for encoding and decoding the FTV image and the method for correcting the FTV image according to the present invention can also be embodied as processor readable codes on a processor readable recording medium included in the apparatus for encoding and decoding the image. The processor readable recording medium is any data storage device that can store data which can thereafter be read by a processor. Examples of the processor readable recording medium include read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, and carrier waves such as data transmission through the Internet. The processor readable recording medium can also be distributed over network coupled computer systems so that the processor readable code is stored and executed in a distributed fashion. <133>

[Industrial Applicability]

<i34> As described above, a method for encoding and decoding an FTV image according to the present invention may be used for dividing an image into a plurality of images and encoding the images.

<i35> Although the preferred embodiments of the present invention have been disclosed for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims.

<136> <137>

Claims

[CLAIMS] [Claim 1] <139> A method for encoding a Free-viewpoint Television (FTV) image, the method comprising: <i40> encoding information indicating whether or not an image is the FTV image; and

<i4i> encoding at least one of view information, base view information and anchor picture information of a depth image if the image is the FTV image.

<142>

[Claim 2]

<i43> The method according to claim 1, wherein the information is included in a bitstream header.

<I44>

[Claim 3]

<i45> The method according to claim 1, wherein the view information, the base view information and the anchor picture information of the depth image are included when an identifier for identifying the type of a bitstream unit indicates the FTV image.

<146>

[Claim 4]

<i47> The method according to claim 1, wherein the information indicating whether or not the image is the FTV image includes information indicating whether or not the image is the depth image.

<148>

[Claim 5]

<149> The method according to claim 1, further comprising encoding at least one of a Picture Parameter Set (PPS) and Supplemental Enhancement Information (SEI) of the FTV image.

<150>

[Claim 6] <i5i> The method according to claim 1, further comprising encoding a color image and the depth image of the FTV image.

<152>

[Claim 7]

<i53> The method according to claim 6, further comprising encoding information indicating whether or not predictive information of the color image is available for the encoding of the depth image.

<154>

[Claim 8]

<155> The method according to claim 7, wherein: <i56> the predictive information includes an intra prediction mode, in the case of an intra block, and <157> the predictive information includes information about at least one of a referred image management method, an index, motion information or disparity information of a referred image, a sub-block structure, and a prediction mode, in the case of an inter block.

<158>

[Claim 9] <i59> A method for decoding a Free-viewpoint Television (FTV) image, the method comprising:

<i60> decoding information indicating whether or not an image is the FTV image; and <i6i> decoding at least one of view information, base view information and anchor picture information of a depth image if the image is the FTV image.

<162>

[Claim 10]

<i63> The method according to claim 9, wherein the information is included in a bitstream header.

<164>

[Claim 11]

<i65> The method according to claim 9, wherein the view information, the base view information and the anchor picture information of the depth image are included when an identifier for identifying the type of a bitstream unit indicates the FTV image.

<166>

[Claim 12]

<i67> The method according to claim 9, wherein the information indicating whether or not the image is the FTV image includes information indicating whether or not the image is the depth image.

<168>

[Claim 13]

<169> The method according to claim 9, further comprising decoding at least one of a Picture Parameter Set (PPS) and Supplemental Enhancement Information (SEI) of the FTV image.

<170>

[Claim 14]

<i7i> The method according to claim 9, further comprising decoding a color image and the depth image of the FTV image.

<172>

[Claim 15]

<i73> The method according to claim 14, further comprising decoding information indicating whether or not predictive information of the color image is available for the decoding of the depth image.

<174>

[Claim 16]

<i75> The method according to claim 15, wherein: <i76> the predictive information includes an intra prediction mode, in the case of an intra block, and <i77> the predictive information includes information about at least one of a referred image management method, an index, motion information or disparity information of a referred image, a sub-block structure and a prediction mode, in the case of an inter block.

<178>

[Claim 17] <I79> A method for correcting a Free-viewpoint Television (FTV) image, the method comprising:

<i80> generating a third view image based on a first view image and a second view image;

<i8i> detecting a reference block including a first block in which holes are generated and a second block which is adjacent to the first block and in which holes are not generated, in the third view image! and

<i82> comparing the second block in the reference block with a predetermined block of at least one of the first view image and the second view image and replacing the first block with a block adjacent to the predetermined block if a difference therebetween is a predetermined value or less.

<183>

[Claim 18]

<i84> The method according to claim 17, wherein the first block and the second block are adjacent to each other in a horizontal direction or a vertical direction.

<185>

[Claim 19]

<i86> The method according to claim 17, wherein the replacing of the first block is performed from the center of the third view image to the edge thereof.

<I87>

[Claim 20]

<188> The method according to claim 17, wherein the replacing of the first block is performed earlier in the vertical direction of the third view image than in the horizontal direction thereof.

<189>

[Claim 21]

<i9o> The method according to claim 17, wherein the difference uses at least one of a difference in an average of a depth image, a difference in an average of a color image, and a difference in a variance of the color image, between the second block and the predetermined block. <I91>

[Claim 22]

<192> The method according to claim 17, wherein each of the first block and the second block is a 4x4 block.

<193>