WO2009131287A1 - Method for encoding and decoding image of ftv - Google Patents

Method for encoding and decoding image of ftv Download PDF

Info

Publication number
WO2009131287A1
WO2009131287A1 PCT/KR2008/006830 KR2008006830W WO2009131287A1 WO 2009131287 A1 WO2009131287 A1 WO 2009131287A1 KR 2008006830 W KR2008006830 W KR 2008006830W WO 2009131287 A1 WO2009131287 A1 WO 2009131287A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
information
block
ftv
view
Prior art date
Application number
PCT/KR2008/006830
Other languages
French (fr)
Inventor
Jung Eun Lim
Jin Seok Im
Seung Jong Choi
Jong Chan Kim
Original Assignee
Lg Electronics Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lg Electronics Inc. filed Critical Lg Electronics Inc.
Publication of WO2009131287A1 publication Critical patent/WO2009131287A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/86Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding

Definitions

  • the present invention relates to a method for encoding and decoding a Free- viewpoint Television (FTV) image, and more particularly, to a method for encoding and decoding an FTV image, which is capable of encoding and decoding a depth image and a color image of the FTV image in a layer structure.
  • FTV Free- viewpoint Television
  • ⁇ 3> In a three-dimensional TV broadcast, there are a stereo image based on binocular disparity, a multiview image which is acquired from several angles, and a Free-viewpoint Television (FTV) image including a multiview image and a depth image.
  • FTV Free-viewpoint Television
  • a Moving Picture Expert Group-2 (MPEG-2) multiview profile encodes and decodes a three-dimensional (3D) TV image using temporal scalability.
  • MPEG-2 Moving Picture Expert Group-2
  • This standard is suitable for a stereo moving image by introducing a disparity predicting method.
  • a method for encoding and decoding a multiview moving image with a large number of views is not suggested.
  • the present invention has been made in view of the above problems, and it is an object of the present invention to provide a method for encoding and decoding a Free-viewpoint Television (FTV) image, which is capable of encoding and decoding a depth image and a color image of the FTV image in a layer structure.
  • FTV Free-viewpoint Television
  • a method for encoding a Free-viewpoint Television (FTV) image including: encoding information indicating whether or not an image is an FTV image; and encoding at least one of view information, base view information and anchor picture information of a depth image if the image is an FTV image.
  • FTV Free-viewpoint Television
  • a method for decoding a Free-viewpoint Television (FTV) image including: decoding information whether or not an image is an FTV image; and decoding at least one of view information, base view information and anchor picture information of a depth image if the image is an FTV image.
  • FTV Free-viewpoint Television
  • a method for correcting a Free-viewpoint Television (FTV) image including: generating a third view image based on a first view image and a second view image; detecting a reference block including a first block in which holes are generated and a second block which is adjacent to the first block and in which holes are not generated, in the third view image; and comparing the second block in the reference block with a predetermined block of at least one of the first view image and the second view image and replacing the first block with a block adjacent to the predetermined block if a difference therebetween is a predetermined value or less.
  • FTV Free-viewpoint Television
  • ⁇ 14> since a color image and a depth image of a Free-viewpoint Television (FTV) image are encoded in a layer structure, it is possible to remove repeated information due to similarity between the color image and the depth image so as to increase compression efficiency. That is, it is possible to remove repeated similar information such as an intra block prediction mode, motion compensation information or disparity compensation information so as to increase compression efficiency, when the depth image of an auxiliary layer is compressed.
  • FTV Free-viewpoint Television
  • Such a method can minimize the correction of a system, which is configured in advance, by the addition of the syntax, in view of the video standard and can support the use of information about peripheral division images in the video compression standard.
  • FIG. 1 is a view illustrating the encoding and decoding of a Free- viewpoint Television (FTV) image according to the present invention
  • FIGS. 2 to A are views illustrating a method for predicting a multiview image
  • FIG. 5 is a flowchart illustrating a method for encoding an FTV image according to an embodiment of the present invention
  • FIG. 6 is a view illustrating a bitstream structure according to the encoding method of FIG. 5;
  • FIG. 7 is a view showing an apparatus for encoding an FTV image according to an embodiment of the present invention
  • FIG. 8 is a view used in the description of FIG. 7;
  • FIG. 1 is a view illustrating the encoding and decoding of a Free- viewpoint Television (FTV) image according to the present invention.
  • FTV Free- viewpoint Television
  • a system 100 for encoding and decoding an FTV image includes an apparatus 105 for encoding the FTV image and an apparatus 135 for decoding the FTV image.
  • the FTV image of the present invention includes a color image and a depth image.
  • the apparatus 105 for encoding the FTV image includes a first encoder 110 for encoding the color image, a second encoder 120 for encoding the depth image, and a multiplexer 130 for multiplexing the encoded color and depth images.
  • the first encoder 110 is a base layer encoder, which encodes a multiview moving image of the color image.
  • the encoding method of the first encoder 110 may be performed according to a Multiview Video Coding (MVC) protocol .
  • MVC Multiview Video Coding
  • the second encoder 120 is an auxiliary layer encoder, which encodes the depth image using supplementary information encoded by the first encoder
  • the second encoder 120 preferably performs compression by removing information repeated due to similarity between the depth image and the color image.
  • the multiplexer 130 multiplexes the bitstreams of the color image from the first encoder 110 and the bitstreams of the depth image from the second encoder 120.
  • the apparatus 135 for decoding the FTV image includes a demultiplexer 140 for demultimplexing input bitstreams, a first decoder 160 for decoding the bitstreams of the color image, and a second decoder 150 for decoding the bitstreams of the depth image.
  • the demultiplexer 140 demultiplexes the input bitstreams and extracts the bitstreams of the color image and the bitstreams of the depth image.
  • the first decoder 160 is a base layer decoder, which decodes the bitstreams of the color image so as to restore the color image.
  • the decoding method of the first decoder 150 may be performed according to the MVC protocol .
  • the second decoder 150 is an auxiliary layer decoder, which decodes the depth image using a supplementary image decoded by the first decoder 160. At this time, the second decoder 150 may perform restoration using information repeated due to the similarity between the depth image and the color image.
  • FIGS. 2 to 4 are views illustrating a method for predicting a multiview image.
  • FIG. 2 shows a predicting method in the MVC.
  • FIG. 2 shows time-directional prediction 210 using motion information and view-directional prediction 220 using disparity information.
  • An image in a reference view uses only the image in the reference view as a referred image.
  • An anchor 240 refers to only an image of the same time.
  • Each picture is encoded to one of three I-, P- and B-pictures.
  • the I- picture does not use vector or disparity information
  • a macroblock of the P- picture is an intra macroblock and has one piece of motion information or disparity information in each block
  • a macroblock of the B-picture is an intra macroblock or has two pieces of motion information or disparity information as a maximum in each block.
  • SVC Scalable Video Coding
  • the pixel value of the base layer that is, the color image
  • the auxiliary layer that is, the depth image
  • FIG. 3 shows a predicting method of the FTV image.
  • the Group of Picture (GOP) structure of FIG. 3 is equal to that of FIG. 2.
  • the scheme of encoding the image in the I-, P-, and B-pictures may be equally performed at the same view and the same time zone in the color image and the depth image.
  • the depth color encoding or decoding sequence may be equal to that of the color image.
  • the predictive structure of the depth moving image may be implemented by a Memory Management Control Operation (MMCO) defined in the existing moving image standard.
  • MMCO Memory Management Control Operation
  • FIG. 5 is a flowchart illustrating a method for encoding an FTV image according to an embodiment of the present invention
  • FIG. 6 is a view illustrating a bitstream structure according to the encoding method of FIG. 5.
  • the FTV image includes the color image and the depth image.
  • NAL units of the depth image may be used for the encoding of the FTV image and, more particularly, the encoding of the depth image.
  • FIGS. 6A and 6B show the NAL units of the depth image.
  • Each of the NAL units of the depth image may include an NAL header and a Raw Byte Sequence Payload (RBSP).
  • RBSP Raw Byte Sequence Payload
  • the NAL header includes "forbidden_zero_bit” , "nal_ref_idc” which is a flag indicating whether or not a picture is a referred picture, and "nal _unit_ty ⁇ e” which is an identifier for identifying the type of each of the NAL units.
  • the NAL header may further include supplementary information according to the NAL type.
  • information indicating whether or not the image is an FTV image is defined in "nal_unit_type" .
  • the information indicating whether or not the image is an FTV image may indicate whether or not the image is an FTV image.
  • the information indicating whether or not the image is an FTV image may include information indicating the depth image. That is, it is possible to deduce whether or not the image is an FTV image, from the information indicating the depth image. If "nal_unit_type" is in a range of 0 to 31, an undefined value may be defined as the information indicating the FTV image and, more particularly, the information indicating the depth image.
  • the information indicating the FTV image and the information indicating the depth image of the FTV image may be separately defined in
  • NAL header may include at least one of the view information, the base view information and the anchor picture information of the depth image.
  • “nal_unit_header_ftv_extension()" may further include information such as
  • non-Video Coding layer (VCL) NAL units including a
  • SPS Sequence Parameter Set
  • PPS Picture Parameter Set
  • Supplemental Enhancement Information referred to by VCL NAL units of the color image base layer and the depth image auxiliary layer of the FTV image may be shared in the embodiment.
  • SEI Supplemental Enhancement Information
  • PPS and the SEI may further be included.
  • a step of encoding the color image and the depth image of the FTV image may further be included.
  • FTV image may be designed using the layer structure of the existing SVC standard. That is, if "nal_unit_type” has a value of "14" or "20"
  • NAL units of the depth image which are not defined in the MVC standard, of the NAL units of the FTV image without decoding, and decodes only the color image of the
  • FIG. 7 is a view showing an apparatus for encoding an FTV image according to an embodiment of the present invention
  • FIG. 8 is a view used in the description of FIG. 7.
  • the apparatus 700 for encoding the FTV image shown in FIG. 7 includes a transform and quantization unit 710, an entropy encoding unit 715, a motion estimation unit 720, a motion compensation unit 725, an intra prediction unit 730, an inverse quantization and inverse transform unit 745, a filter unit 750, and a memory unit 755.
  • the apparatus for encoding the FTV image shown in FIG. 7 may be similar to an existing H.264 encoding apparatus.
  • the transform and quantization unit 710 transforms an input image or a residual signal, which is a difference between the input image and a predictive image, to frequency-domain data and quantizes the transformed frequency-domain data.
  • the entropy encoding unit 715 encodes the output of the transform and quantization unit and supplementary information (a motion vector or the like).
  • the motion estimation unit 720 compares a referred image and the input image, estimates motion, and calculates a motion vector.
  • the motion compensation unit 725 calculates a predictive image obtained by compensating for the referred image based on the calculated motion vector. Inter prediction is performed using the motion estimation unit 720 and the motion compensation unit 725.
  • the intra prediction unit 730 performs intra prediction and calculates a predictive image.
  • the inverse transform and inverse quantization unit 745 inversely transforms and inversely quantizes the output of the transform and quantization unit 710.
  • the predictive images predicted by the motion compensation unit 725 and the intra prediction unit 730 are summed and the summed image is subjected to deblocking filtering by the filter unit 750.
  • the filtered value is stored in the memory unit 755 and is used as the referred image upon inter prediction.
  • the apparatus 700 for encoding the FTV image shown in FIG. 7 is applicable to both the color image and the depth image.
  • transform and quantization operations are performed by the transform and quantization unit 710 in the drawing, these operations may be performed by a transform unit and a quantization unit, respectively.
  • the inverse transform operation and the inverse quantization operations may be performed by an inverse transform unit and an inverse quantization unit, respectively, unlike in the drawing.
  • the apparatus 700 for encoding the FTV image shown in FIG. 7 may perform the deblocking filtering method of the filter unit 750, the intra prediction method of the intra prediction unit 730, the interpolation method of the motion compensation unit 725, and the transform and quantization methods of the transform and quantization unit 710, which are different from the methods of the H.264/AVC or MVC encoding apparatus, for the encoding of the depth image.
  • the color image of the base layer and depth image of the auxiliary layer acquired at the same view and the same time are different from each other in terms of information such as a distribution of pixel values in the image, complexity, a boundary thereof or the like.
  • an intra block prediction mode based on similarity between pixels in the picture, motion information and referred image information of time-directional prediction, and disparity information and referred image information of view-directional prediction have similarities. An embodiment thereof will be described in detail with reference to FIG. 8.
  • an optimal prediction type of the color image in the base layer is equal to an optimal prediction type of the depth image when applied to the depth image of the auxiliary image or a criterion such as a Sum of Absolute Difference (SAD) or a Mean Square Error (MSE) is hardly changed.
  • SAD Sum of Absolute Difference
  • MSE Mean Square Error
  • a base layer block 860 at another specific location 850 has a very complicated shape, but the depth image has similar pixel values. Accordingly, if the criterion such as the SAD or the MSE of the optimal prediction type of the depth information is compared with that of the other prediction types, the difference therebetween is not large. Thus, even when the prediction type of the color image of the base layer is applied to the depth image, the criterion is hardly changed.
  • the prediction type includes all one selected from three modes including an intra prediction mode, a motion compensation mode and a disparity compensation mode, the motion information, and the disparity information.
  • the repeated similar information such as the intra block prediction mode, the motion compensation information, the disparity compensation information or the like based on similarity between peripheral blocks or referred images is removed, thereby improving compression efficiency.
  • the apparatus 700 for encoding the FTV image shown in FIG. 7 may further include a dc prediction unit.
  • the dc prediction unit may perform prediction by referring to the dc values of neighboring blocks or using a plurality of dc values used representatively. At this time, a dc table may be used.
  • the dc table stores a plurality of dc values
  • the dc values of the depth image used frequently and indexes corresponding to the dc values may be stored.
  • the plurality of dc values and indexes of the dc table may be encoded in a first syntax level, e.g., a picture layer level. If the dc table is encoded in the first syntax level, only the indexes may be encoded in a second syntax level lower than the first syntax level, e.g., a macroblock layer level or a block layer level.
  • the apparatus 700 for encoding the FTV image shown in FIG. 7 may further include a motion vector storage unit for storing the motion vector of the color image. Accordingly, the motion estimation unit 720 may compare the motion vector of the color image with the motion vector of the depth image so as to perform motion estimation with an optimal motion vector.
  • FIG. 9 is a view showing a relationship between the color image and the depth image of the FTV image.
  • FIG. 9 shows information about the base layer which may be referred to by the block of the same view, the same time, and the same location between a multiview color image 910 and a depth image 920 corresponding thereto.
  • an intra prediction method mode 930 of color data is used as a prediction method mode 940 of a current block (925).
  • information 950 about the index, the motion information or the disparity information of a referred image, a sub-block structure, a prediction mode (forward, backward, direct or the like) is used as information 960 of the current block (945).
  • FIG. 10 is a view illustrating a bitstream data structure associated with FIG. 9.
  • the color image is of the intra block type, it is processed by the intra block having the same intra prediction mode.
  • the base layer color image block is of the inter block type, the same referred image index, motion vector or disparity vector, prediction direction, sub-block partition size or the like may be used.
  • base_mode_flag is "0” that is, if the information (“base_mode_flag”) does not use the base layer color image information, the type information ("mb_type”) of the block or the like is subsequently transmitted similar to the existing block.
  • FIG. 11 is a block diagram showing an apparatus for decoding an FTV image according to an embodiment of the present invention.
  • the apparatus 1100 for decoding the FTV image shown in FIG. 11 includes an entropy decoding unit 1110, an inverse quantization and inverse transform unit 1115, a filter unit 1120, a memory unit 1125, a motion compensation unit 1130, and an intra prediction unit 1135.
  • the entropy decoding unit 1110 entropy-decodes input bitstreams and outputs the decoded bitstreams.
  • the entropy decoding unit 1110 may output a transform and quantization coefficient such as a residual signal and supplementary information (a motion vector or the like).
  • the inverse quantization and inverse transform unit 1115 inversely transforms and inversely quantizes the output of the entropy decoding unit 1110.
  • the output of the entropy decoding unit 1110 may be an encoded difference signal or an encoded motion vector.
  • the encoded difference signal may be a difference signal due to an intra prediction mode or an inter prediction mode, or a difference signal due to a dc prediction mode according to the present invention.
  • the motion compensation unit 1130 calculates a predictive image obtained by compensating for a referred image based on the received motion vector, and the intra prediction unit 1135 performs intra prediction and calculates a predictive image.
  • the predictive images calculated by the motion compensation unit 1130 and the intra prediction unit 1135 are combined with the residual signal inversely quantized and inversely transformed by the inverse quantization and inverse transform unit 1115 and the combined signal is subjected to deblocking filtering by the filter unit 1120.
  • the filtered value is stored in the memory unit 1125 and is used as a referred image upon inter prediction.
  • the apparatus 1100 for decoding the FTV image shown in FIG. 11 may further include a dc prediction unit.
  • the dc prediction unit may perform prediction by referring to the dc values of neighboring blocks or using a plurality of representative dc values. At this time, a dc table may be used.
  • a dc table may be used.
  • the motion compensation unit 1130 may perform motion compensation using the motion vector of the color image when decoding the depth image.
  • FIG. 12 is a flowchart illustrating a method for decoding an FTV image according to an embodiment of the present invention.
  • ⁇ iO2> First, information indicating whether or not an image is an FTV image is decoded (S1210).
  • S1210 information indicating whether or not an image is an FTV image is decoded (S1210).
  • S1210 information indicating whether or not an image is an FTV image is decoded (S1210).
  • S1210 information indicating whether or not an image is an FTV image
  • FIGS. 6A and 6B show the NAL units of the depth image.
  • Each of the NAL units of the depth image may include an NAL header and an
  • the NAL header includes "forbidden_zero_bit”, “nal_ref_idc” and
  • “nal_unit_type” Information indicating whether or not the image is an FTV image is defined in "nal_unit_type" as described above. If “nal_unit_type” is in a range of 0 to 31, an undefined value may be defined and used as the information indicating the FTV image and, more particularly, the information indicating the depth image. ⁇ iO6> The information indicating the FTV image and the information indicating the depth image of the FTV image may be separately defined in
  • a step of decoding at least one of the SPS, the PPS and the SEI may be further included.
  • a step of decoding the color image and the depth image of the FTV image may be further included. The decoding of the color image and the depth image was described with reference to FIG. 11 and a description thereof will be omitted herein.
  • the existing MVC encoding apparatus skips the NAL units of the depth image, which are not defined in the MVC standard, of the NAL units of the FTV image without decoding and decodes only the color image of the FTV image.
  • FIGS. 13 and 14 are views illustrating a process of generating an FTV image.
  • ⁇ ii2> In order to generate a third view image based on a first view image 1301 and a second view image 1302, a 3D warping method is used. Accordingly, a first view modified image 1304 and a second view modified image are generated, and the third view image is finally generated therefrom.
  • regions which are not filled are generated in the first view modified image 1304 and the second view modified image 1303 generated as the reference of the third view image 1305. Accordingly, a region which is not filled is generated in the third view image 1305.
  • the region which is not filled in the third view image 1305 is defined and used as a hole.
  • a first view image 1401 and a second view image 1402 are aligned using an epipolar line 1415 so as to generate a first view modified image 1403 and a second view modified image 1405, and a third view image 1404 is finally generated therefrom.
  • ⁇ ii6> When an image is photographed, if the first view image 1401 and the second view image 1402 are photographed using the epipolar line 1415, the third view image 1404 may be directly generated without generating the first view modified image 1403 and the second view modified image 1405.
  • FIGS. 15 and 16 are views showing an example of a method for correcting an FTV image according to the present invention.
  • a reference block including a first block in which holes are generated and a second block which is adjacent to the first block and in which holes are not generated is detected, the detected reference block is compared with a predetermined block of at least one of the first view image and the second view image, and the first block in the reference block is corrected using the block adjacent to the predetermined block.
  • FIG. 15 shows a reference block 1502 including a first block 1503 and a second block 1504, which are adjacent to each other in a vertical direction, in a third view image 1501.
  • the second block 1504 is disposed near the lower side of the first block 1503 in the drawing, the second block 1504 may be disposed near the upper side of the first block 1503.
  • each of the first block 1503 and the second block 1504 may be a 4x4 block, but the present invention is not limited thereto.
  • FIG. 16 shows a reference block 1602 including a first block 1603 and a second block 1604, which are adjacent to each other in a horizontal direction, in a third view image 1601.
  • the second block 1604 is disposed near the right side of the first block 1603 in the drawing, the second block 1604 may be disposed near the left side of the first block 1603.
  • each of the first block 1603 and the second block 1604 may be a 4x4 block, but the present invention is not limited thereto.
  • Each of the reference blocks 1502 and 1602 is compared with at least one predetermined block of the first view image and the second view image, and the first block is replaced with the reference block so as to fill the holes.
  • the second block, in which the holes are not formed, and at least one predetermined block of the first view image and the second view image are compared in view of at least one of an average of the depth image, an average of the color image and a variance of the color image, and it is determined whether a difference therebetween is a predetermined value or less.
  • ⁇ i23> If the difference is the predetermined value or less, a block adjacent to the predetermined block is replaced with the first block so as to fill the holes.
  • the blocks in the first view image and the second view image are compared with the average of the depth image of the second block so as to detect a block with a difference in average value of a predetermined value or less.
  • the average of the color image and the variance of the color image of the detected block are compared. If a difference therebetween is a predetermined value or less, this block is selected as a matched block.
  • the block, which is replaced with the first block be a block located at a location corresponding to the second block.
  • the reason why the variance of the color image is used as a comparison reference is because a change in value is not large in the depth image and thus a desired result can be obtained simply by the comparison with the average, but a change in value is large in the color image and thus the average and the variance should be compared in order to find an accurate matched block.
  • FIGS. 17 and 18 are views illustrating a hole filling process.
  • hole filling due to the detection of the reference block or the comparison with the predetermined block and the replacement with the predetermined block is preferably performed from the center of the third view image 1701 to the edge thereof.
  • hole filling is preferably performed in order of (D, (2) and (D, as shown. If the horizontal reference block 1602 of FIG. 16 is used, hole filling is preferably performed (1), (2), (3) and (4) based on a horizontal line 1702 and a vertical line 1703 of FIG. 17.
  • Hole filling may be performed earlier in the horizontal direction than in the vertical direction. Due to the change in value of the depth image in the vertical direction, the number of holes is generally large in the vertical direction. Accordingly, it is preferable that hole filling be first performed in the vertical direction. If hole filling is performed in the vertical direction, hole filling is mostly finished. If hole filling is not exceptionally finished, hole filling may be performed in the horizontal direction.
  • hole filling in the reference block 1805 of the third view image 1802 may be completed using the blocks 1804 and 1805 derived from the first view image 1801 and the second view image 1803.
  • the method for encoding and decoding the FTV image and the method for correcting the FTV image according to the present invention can also be embodied as processor readable codes on a processor readable recording medium included in the apparatus for encoding and decoding the image.
  • the processor readable recording medium is any data storage device that can store data which can thereafter be read by a processor. Examples of the processor readable recording medium include read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, and carrier waves such as data transmission through the Internet.
  • the processor readable recording medium can also be distributed over network coupled computer systems so that the processor readable code is stored and executed in a distributed fashion.
  • a method for encoding and decoding an FTV image according to the present invention may be used for dividing an image into a plurality of images and encoding the images.

Abstract

A method for encoding and decoding a Free-viewpoint Television (FTV) image including a color image and a depth image is provided. The method for encoding the FTV image includes encoding information indicating whether or not an image is an FTV image, encoding at least one of view information, base view information and anchor picture information of a depth image if the image is an FTV image, and encoding the depth image using predictive information of a color image. Accordingly, it is possible to encode or decode the color image and the depth image of the FTV image in a layer structure.

Description

[DESCRIPTION] [Invention Title]
METHOD FOR ENCODING AND DECODING IMAGE OF FTV
[Technical Field]
<i> The present invention relates to a method for encoding and decoding a Free- viewpoint Television (FTV) image, and more particularly, to a method for encoding and decoding an FTV image, which is capable of encoding and decoding a depth image and a color image of the FTV image in a layer structure.
<2>
[Background Art]
<3> In a three-dimensional TV broadcast, there are a stereo image based on binocular disparity, a multiview image which is acquired from several angles, and a Free-viewpoint Television (FTV) image including a multiview image and a depth image.
<4> In the existing standard, a Moving Picture Expert Group-2 (MPEG-2) multiview profile encodes and decodes a three-dimensional (3D) TV image using temporal scalability. This standard is suitable for a stereo moving image by introducing a disparity predicting method. However, a method for encoding and decoding a multiview moving image with a large number of views is not suggested.
<5> Recently, the International Engineering Consortium/International Electrotechnical Commission Joint Video Team ( ISO/IEC JVT) has finished Multiview Video Coding (MVC) standardization for compressing a multiview moving image using an H.264/AVC amendment. However, since the amount of image data to be transmitted is large, as increase in bandwidth is inevitable, and a view location for allowing a viewer to view a 3D moving image is restricted.
<6>
[Disclosure] [Technical Problem] <7> Therefore, the present invention has been made in view of the above problems, and it is an object of the present invention to provide a method for encoding and decoding a Free-viewpoint Television (FTV) image, which is capable of encoding and decoding a depth image and a color image of the FTV image in a layer structure.
<8>
[Technical Solution]
<9> In accordance with an aspect of the present invention, the above and other objects can be accomplished by the provision of a method for encoding a Free-viewpoint Television (FTV) image, the method including: encoding information indicating whether or not an image is an FTV image; and encoding at least one of view information, base view information and anchor picture information of a depth image if the image is an FTV image.
<io> In accordance with another aspect of the present invention, there is provided a method for decoding a Free-viewpoint Television (FTV) image, the method including: decoding information whether or not an image is an FTV image; and decoding at least one of view information, base view information and anchor picture information of a depth image if the image is an FTV image.
<π> In accordance with another aspect of the present invention, there is provided a method for correcting a Free-viewpoint Television (FTV) image, the method including: generating a third view image based on a first view image and a second view image; detecting a reference block including a first block in which holes are generated and a second block which is adjacent to the first block and in which holes are not generated, in the third view image; and comparing the second block in the reference block with a predetermined block of at least one of the first view image and the second view image and replacing the first block with a block adjacent to the predetermined block if a difference therebetween is a predetermined value or less. [Advantageous Effects]
<14> According to the present invention, since a color image and a depth image of a Free-viewpoint Television (FTV) image are encoded in a layer structure, it is possible to remove repeated information due to similarity between the color image and the depth image so as to increase compression efficiency. That is, it is possible to remove repeated similar information such as an intra block prediction mode, motion compensation information or disparity compensation information so as to increase compression efficiency, when the depth image of an auxiliary layer is compressed.
<i5> In addition, holes generated when the color image and the depth image of a predetermined view are restored based on a view of a referred image received from bitstreams can be simply corrected with limited calculation.
<i6> Such a method can minimize the correction of a system, which is configured in advance, by the addition of the syntax, in view of the video standard and can support the use of information about peripheral division images in the video compression standard.
<17>
[Description of Drawings]
<i8> The above and other objects, features and other advantages of the present invention will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which-"
<19> FIG. 1 is a view illustrating the encoding and decoding of a Free- viewpoint Television (FTV) image according to the present invention;
<2o> FIGS. 2 to A are views illustrating a method for predicting a multiview image;
<2i> FIG. 5 is a flowchart illustrating a method for encoding an FTV image according to an embodiment of the present invention;
<22> FIG. 6 is a view illustrating a bitstream structure according to the encoding method of FIG. 5;
<23> FIG. 7 is a view showing an apparatus for encoding an FTV image according to an embodiment of the present invention; <24> FIG. 8 is a view used in the description of FIG. 7;
<25> FIG. 9 is a view showing a relationship between a color image and a depth image of the FTV image; <26> FIG. 10 is a view illustrating a bitstream data structure associated with FIG. 9; <27> FIG. 11 is a block diagram showing an apparatus for decoding an FTV image according to an embodiment of the present invention; <28> FIG. 12 is a flowchart illustrating a method for decoding an FTV image according to an embodiment of the present invention; <29> FIGS. 13 and 14 are views illustrating a process of generating an FTV image ; <3o> FIGS. 15 and 16 are views showing an example of a method for correcting an FTV image according to the present invention; and <3i> FIGS. 17 and 18 are views illustrating a hole filling process.
<32>
[Best Mode]
<33> Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings.
<34> FIG. 1 is a view illustrating the encoding and decoding of a Free- viewpoint Television (FTV) image according to the present invention.
<35> As shown, a system 100 for encoding and decoding an FTV image includes an apparatus 105 for encoding the FTV image and an apparatus 135 for decoding the FTV image. The FTV image of the present invention includes a color image and a depth image.
<36> First, the apparatus 105 for encoding the FTV image includes a first encoder 110 for encoding the color image, a second encoder 120 for encoding the depth image, and a multiplexer 130 for multiplexing the encoded color and depth images.
<37> The first encoder 110 is a base layer encoder, which encodes a multiview moving image of the color image. The encoding method of the first encoder 110 may be performed according to a Multiview Video Coding (MVC) protocol . <38> The second encoder 120 is an auxiliary layer encoder, which encodes the depth image using supplementary information encoded by the first encoder
110. At this time, the second encoder 120 preferably performs compression by removing information repeated due to similarity between the depth image and the color image. <39> The multiplexer 130 multiplexes the bitstreams of the color image from the first encoder 110 and the bitstreams of the depth image from the second encoder 120. <40> The detailed operation of the first encoder 110 or the second encoder
120 will be described later with reference to FIG. 7. <4i> Next, the apparatus 135 for decoding the FTV image includes a demultiplexer 140 for demultimplexing input bitstreams, a first decoder 160 for decoding the bitstreams of the color image, and a second decoder 150 for decoding the bitstreams of the depth image. <42> The demultiplexer 140 demultiplexes the input bitstreams and extracts the bitstreams of the color image and the bitstreams of the depth image. <43> The first decoder 160 is a base layer decoder, which decodes the bitstreams of the color image so as to restore the color image. The decoding method of the first decoder 150 may be performed according to the MVC protocol . <44> The second decoder 150 is an auxiliary layer decoder, which decodes the depth image using a supplementary image decoded by the first decoder 160. At this time, the second decoder 150 may perform restoration using information repeated due to the similarity between the depth image and the color image. <45> FIGS. 2 to 4 are views illustrating a method for predicting a multiview image. <46> First, FIG. 2 shows a predicting method in the MVC. FIG. 2 shows time-directional prediction 210 using motion information and view-directional prediction 220 using disparity information. <47> An image in a reference view (view 0, 230) uses only the image in the reference view as a referred image. An anchor 240 refers to only an image of the same time.
<48> Each picture is encoded to one of three I-, P- and B-pictures. The I- picture does not use vector or disparity information, a macroblock of the P- picture is an intra macroblock and has one piece of motion information or disparity information in each block, and a macroblock of the B-picture is an intra macroblock or has two pieces of motion information or disparity information as a maximum in each block.
<49> In Scalable Video Coding (SVC), an inter-image layer structure is established and an image of a lower layer is used as a referred image, for a temporal and spatial advantage and the improvement of image quality. That is, a pixel value in a block of the lower layer is used for prediction of a pixel value of a target block.
<50> However, in the method for encoding the FTV image according to the present invention, if encoding is performed using the depth image as the auxiliary layer and the color image as the base layer, since the depth image and the color image are different from each other in terms of the characteristics thereof, the pixel value of the base layer, that is, the color image, is not referred to, but only the auxiliary layer, that is, the depth image is used as the referred image.
<5i> Next, FIG. 3 shows a predicting method of the FTV image. The Group of Picture (GOP) structure of FIG. 3 is equal to that of FIG. 2. In addition, the scheme of encoding the image in the I-, P-, and B-pictures may be equally performed at the same view and the same time zone in the color image and the depth image.
<52> The depth color encoding or decoding sequence may be equal to that of the color image.
<53> It can be seen from FIG. 4 that the picture encoding or decoding sequences of the color image 410 and the depth image 420 are equal.
<54> The predictive structure of the depth moving image may be implemented by a Memory Management Control Operation (MMCO) defined in the existing moving image standard.
<55> FIG. 5 is a flowchart illustrating a method for encoding an FTV image according to an embodiment of the present invention, and FIG. 6 is a view illustrating a bitstream structure according to the encoding method of FIG. 5.
<56> As shown, first, information indicating whether or not an image is an FTV image is encoded (S510).
<57> As described above, the FTV image includes the color image and the depth image. For the encoding of the FTV image and, more particularly, the encoding of the depth image, similar to H.264/AVC and MVC which are the existing moving image compression standards, (Network Abstraction Layer) NAL units of the depth image may be used. FIGS. 6A and 6B show the NAL units of the depth image. Each of the NAL units of the depth image may include an NAL header and a Raw Byte Sequence Payload (RBSP).
<58> The NAL header includes "forbidden_zero_bit" , "nal_ref_idc" which is a flag indicating whether or not a picture is a referred picture, and "nal _unit_tyρe" which is an identifier for identifying the type of each of the NAL units. The NAL header may further include supplementary information according to the NAL type.
<59> In the present invention, information indicating whether or not the image is an FTV image is defined in "nal_unit_type" . The information indicating whether or not the image is an FTV image may indicate whether or not the image is an FTV image. In addition, the information indicating whether or not the image is an FTV image may include information indicating the depth image. That is, it is possible to deduce whether or not the image is an FTV image, from the information indicating the depth image. If "nal_unit_type" is in a range of 0 to 31, an undefined value may be defined as the information indicating the FTV image and, more particularly, the information indicating the depth image.
<60> The information indicating the FTV image and the information indicating the depth image of the FTV image may be separately defined in
"nal_unit_type". <6i> Next, in the case of the FTV image, at least one of view information, base view information and anchor picture information of the depth image is encoded (S520). <62> The NAL header may include at least one of the view information, the base view information and the anchor picture information of the depth image.
Such information may be added as "nal_unit_header_ftv_extension()" as shown.
"nal_unit_header_ftv_extension()" may further include information such as
"dependency_id" . <63> In the NAL units, non-Video Coding layer (VCL) NAL units including a
Sequence Parameter Set (SPS), a Picture Parameter Set (PPS) and a
Supplemental Enhancement Information (SEI) referred to by VCL NAL units of the color image base layer and the depth image auxiliary layer of the FTV image may be shared in the embodiment. <64> Although not shown, a step of encoding at least one of the SPS, the
PPS and the SEI may further be included. In addition, a step of encoding the color image and the depth image of the FTV image may further be included.
The description of the encoding of the color image and the depth image will be described later with reference to FIG. 7. <65> In the present invention, the file structure of the bitstreams of the
FTV image may be designed using the layer structure of the existing SVC standard. That is, if "nal_unit_type" has a value of "14" or "20"
"nal_unit_header_svc_extension()" may be performed. <66> According to the method for encoding the FTV image defined as described above, the existing MVC encoding apparatus skips the NAL units of the depth image, which are not defined in the MVC standard, of the NAL units of the FTV image without decoding, and decodes only the color image of the
FTV image. <67> FIG. 7 is a view showing an apparatus for encoding an FTV image according to an embodiment of the present invention, and FIG. 8 is a view used in the description of FIG. 7.
<68> As shown, the apparatus 700 for encoding the FTV image shown in FIG. 7 includes a transform and quantization unit 710, an entropy encoding unit 715, a motion estimation unit 720, a motion compensation unit 725, an intra prediction unit 730, an inverse quantization and inverse transform unit 745, a filter unit 750, and a memory unit 755. The apparatus for encoding the FTV image shown in FIG. 7 may be similar to an existing H.264 encoding apparatus.
<69> The transform and quantization unit 710 transforms an input image or a residual signal, which is a difference between the input image and a predictive image, to frequency-domain data and quantizes the transformed frequency-domain data.
<70> The entropy encoding unit 715 encodes the output of the transform and quantization unit and supplementary information (a motion vector or the like).
<7i> The motion estimation unit 720 compares a referred image and the input image, estimates motion, and calculates a motion vector. The motion compensation unit 725 calculates a predictive image obtained by compensating for the referred image based on the calculated motion vector. Inter prediction is performed using the motion estimation unit 720 and the motion compensation unit 725. The intra prediction unit 730 performs intra prediction and calculates a predictive image.
<72> The inverse transform and inverse quantization unit 745 inversely transforms and inversely quantizes the output of the transform and quantization unit 710. The predictive images predicted by the motion compensation unit 725 and the intra prediction unit 730 are summed and the summed image is subjected to deblocking filtering by the filter unit 750. The filtered value is stored in the memory unit 755 and is used as the referred image upon inter prediction.
<73> The apparatus 700 for encoding the FTV image shown in FIG. 7 is applicable to both the color image and the depth image.
<74> Although the transform and quantization operations are performed by the transform and quantization unit 710 in the drawing, these operations may be performed by a transform unit and a quantization unit, respectively. The inverse transform operation and the inverse quantization operations may be performed by an inverse transform unit and an inverse quantization unit, respectively, unlike in the drawing.
<75> The apparatus 700 for encoding the FTV image shown in FIG. 7 may perform the deblocking filtering method of the filter unit 750, the intra prediction method of the intra prediction unit 730, the interpolation method of the motion compensation unit 725, and the transform and quantization methods of the transform and quantization unit 710, which are different from the methods of the H.264/AVC or MVC encoding apparatus, for the encoding of the depth image.
<76> The color image of the base layer and depth image of the auxiliary layer acquired at the same view and the same time are different from each other in terms of information such as a distribution of pixel values in the image, complexity, a boundary thereof or the like. However, an intra block prediction mode based on similarity between pixels in the picture, motion information and referred image information of time-directional prediction, and disparity information and referred image information of view-directional prediction have similarities. An embodiment thereof will be described in detail with reference to FIG. 8.
<77> It can be seen from FIG. 8 that color image blocks 830a and 830b at specific locations 820a and 820b in the FTV image 810 and depth image blocks 840a and 840b having the same time, the same view and the same location are similar in view of spatial distributions of pixel values. Accordingly, an optimal prediction type of the color image in the base layer is equal to an optimal prediction type of the depth image when applied to the depth image of the auxiliary image or a criterion such as a Sum of Absolute Difference (SAD) or a Mean Square Error (MSE) is hardly changed.
<78> A base layer block 860 at another specific location 850 has a very complicated shape, but the depth image has similar pixel values. Accordingly, if the criterion such as the SAD or the MSE of the optimal prediction type of the depth information is compared with that of the other prediction types, the difference therebetween is not large. Thus, even when the prediction type of the color image of the base layer is applied to the depth image, the criterion is hardly changed.
<79> The prediction type includes all one selected from three modes including an intra prediction mode, a motion compensation mode and a disparity compensation mode, the motion information, and the disparity information.
<80> In the present invention, when the depth image of the auxiliary layer is compressed based on the characteristics of the FTV image, the repeated similar information such as the intra block prediction mode, the motion compensation information, the disparity compensation information or the like based on similarity between peripheral blocks or referred images is removed, thereby improving compression efficiency.
<8i> For example, since neighboring pixels have similar characteristics in the pixel value of the depth image, dc prediction may be performed using the similar characteristics. The apparatus 700 for encoding the FTV image shown in FIG. 7 may further include a dc prediction unit. The dc prediction unit may perform prediction by referring to the dc values of neighboring blocks or using a plurality of dc values used representatively. At this time, a dc table may be used.
<82> If the dc table stores a plurality of dc values, the dc values of the depth image used frequently and indexes corresponding to the dc values may be stored. The plurality of dc values and indexes of the dc table may be encoded in a first syntax level, e.g., a picture layer level. If the dc table is encoded in the first syntax level, only the indexes may be encoded in a second syntax level lower than the first syntax level, e.g., a macroblock layer level or a block layer level.
<83> The apparatus 700 for encoding the FTV image shown in FIG. 7 may further include a motion vector storage unit for storing the motion vector of the color image. Accordingly, the motion estimation unit 720 may compare the motion vector of the color image with the motion vector of the depth image so as to perform motion estimation with an optimal motion vector.
<84> FIG. 9 is a view showing a relationship between the color image and the depth image of the FTV image.
<85> FIG. 9 shows information about the base layer which may be referred to by the block of the same view, the same time, and the same location between a multiview color image 910 and a depth image 920 corresponding thereto.
<86> In the case of an intra block, an intra prediction method mode 930 of color data is used as a prediction method mode 940 of a current block (925).
<87> In an inter block, information 950 about the index, the motion information or the disparity information of a referred image, a sub-block structure, a prediction mode (forward, backward, direct or the like) is used as information 960 of the current block (945).
<88> FIG. 10 is a view illustrating a bitstream data structure associated with FIG. 9.
<89> In the case of the type which may use the information described with reference to FIG. 9, information indicating whether or not predictive information of the color image is available for the encoding of the depth image is added and encoded. This information may be called "base_mode_flag" . If the predictive information of the color image is used, "base_mode_flag" is encoded to "1". If "base_mode_flag" is "1", the block type of the depth image is adapt ively determined according to the block type of the color image.
<90> If the color image is of the intra block type, it is processed by the intra block having the same intra prediction mode. In contrast, if the base layer color image block is of the inter block type, the same referred image index, motion vector or disparity vector, prediction direction, sub-block partition size or the like may be used.
<9i> In contrast, if "base_mode_flag" is "0" that is, if the information ("base_mode_flag") does not use the base layer color image information, the type information ("mb_type") of the block or the like is subsequently transmitted similar to the existing block.
<92> FIG. 11 is a block diagram showing an apparatus for decoding an FTV image according to an embodiment of the present invention.
<93> As shown, the apparatus 1100 for decoding the FTV image shown in FIG. 11 includes an entropy decoding unit 1110, an inverse quantization and inverse transform unit 1115, a filter unit 1120, a memory unit 1125, a motion compensation unit 1130, and an intra prediction unit 1135.
<94> The entropy decoding unit 1110 entropy-decodes input bitstreams and outputs the decoded bitstreams. The entropy decoding unit 1110 may output a transform and quantization coefficient such as a residual signal and supplementary information (a motion vector or the like).
<95> The inverse quantization and inverse transform unit 1115 inversely transforms and inversely quantizes the output of the entropy decoding unit 1110. The output of the entropy decoding unit 1110 may be an encoded difference signal or an encoded motion vector. The encoded difference signal may be a difference signal due to an intra prediction mode or an inter prediction mode, or a difference signal due to a dc prediction mode according to the present invention.
<96> The motion compensation unit 1130 calculates a predictive image obtained by compensating for a referred image based on the received motion vector, and the intra prediction unit 1135 performs intra prediction and calculates a predictive image.
<97> The predictive images calculated by the motion compensation unit 1130 and the intra prediction unit 1135 are combined with the residual signal inversely quantized and inversely transformed by the inverse quantization and inverse transform unit 1115 and the combined signal is subjected to deblocking filtering by the filter unit 1120. The filtered value is stored in the memory unit 1125 and is used as a referred image upon inter prediction.
<98> For the encoding and the decoding of the FTV image, the apparatus 1100 for decoding the FTV image shown in FIG. 11 may further include a dc prediction unit. The dc prediction unit may perform prediction by referring to the dc values of neighboring blocks or using a plurality of representative dc values. At this time, a dc table may be used. <w> If the motion vector of the color image is commonly used, the motion compensation unit 1130 may perform motion compensation using the motion vector of the color image when decoding the depth image. <ioo> The decoding apparatus 1100 can synthesize the moving image at a view which is not transmitted using view synthesis, rendering, a depth-image-based rendering (DIBR) method or the like so as to support free view movement, which will be described later with reference to FIG. 13. <ioi> FIG. 12 is a flowchart illustrating a method for decoding an FTV image according to an embodiment of the present invention. <iO2> First, information indicating whether or not an image is an FTV image is decoded (S1210). <i03> The FTV image includes the color image and the depth image as described above. FIGS. 6A and 6B show the NAL units of the depth image.
Each of the NAL units of the depth image may include an NAL header and an
RBSP. <i04> The NAL header includes "forbidden_zero_bit", "nal_ref_idc" and
"nal_unit_type" . <i05> Information indicating whether or not the image is an FTV image is defined in "nal_unit_type" as described above. If "nal_unit_type" is in a range of 0 to 31, an undefined value may be defined and used as the information indicating the FTV image and, more particularly, the information indicating the depth image. <iO6> The information indicating the FTV image and the information indicating the depth image of the FTV image may be separately defined in
"nal_unit_type" . <i07> Next, in the case of the FTV image, at least one of view information, base view information and anchor picture information of the depth image is decoded (S1220). Such information may be added as "nal_unit_header_ftv_extension()" as shown in FIG. 6B.
<io8> Although not shown, a step of decoding at least one of the SPS, the PPS and the SEI may be further included. In addition, a step of decoding the color image and the depth image of the FTV image may be further included. The decoding of the color image and the depth image was described with reference to FIG. 11 and a description thereof will be omitted herein.
<iO9> If the FTV image is encoded using the SVC layer structure, "nal_unit_header_svc_extensionθ" of FIG. 6B may be performed so as to perform decoding.
<πo> The existing MVC encoding apparatus skips the NAL units of the depth image, which are not defined in the MVC standard, of the NAL units of the FTV image without decoding and decodes only the color image of the FTV image.
<iii> FIGS. 13 and 14 are views illustrating a process of generating an FTV image.
<ii2> In order to generate a third view image based on a first view image 1301 and a second view image 1302, a 3D warping method is used. Accordingly, a first view modified image 1304 and a second view modified image are generated, and the third view image is finally generated therefrom.
<ii3> However, as shown, regions which are not filled are generated in the first view modified image 1304 and the second view modified image 1303 generated as the reference of the third view image 1305. Accordingly, a region which is not filled is generated in the third view image 1305.
<ii4> However, the region which is not filled in the third view image 1305 is defined and used as a hole.
<ii5> In FIG. 14, a first view image 1401 and a second view image 1402 are aligned using an epipolar line 1415 so as to generate a first view modified image 1403 and a second view modified image 1405, and a third view image 1404 is finally generated therefrom.
<ii6> When an image is photographed, if the first view image 1401 and the second view image 1402 are photographed using the epipolar line 1415, the third view image 1404 may be directly generated without generating the first view modified image 1403 and the second view modified image 1405.
<ii7> FIGS. 15 and 16 are views showing an example of a method for correcting an FTV image according to the present invention.
<U8> As described above, in the third view image generated based on the first view image and the second view image, a reference block including a first block in which holes are generated and a second block which is adjacent to the first block and in which holes are not generated is detected, the detected reference block is compared with a predetermined block of at least one of the first view image and the second view image, and the first block in the reference block is corrected using the block adjacent to the predetermined block.
<ιi9> FIG. 15 shows a reference block 1502 including a first block 1503 and a second block 1504, which are adjacent to each other in a vertical direction, in a third view image 1501. Although the second block 1504 is disposed near the lower side of the first block 1503 in the drawing, the second block 1504 may be disposed near the upper side of the first block 1503. For hole processing efficiency, each of the first block 1503 and the second block 1504 may be a 4x4 block, but the present invention is not limited thereto.
<i2o> FIG. 16 shows a reference block 1602 including a first block 1603 and a second block 1604, which are adjacent to each other in a horizontal direction, in a third view image 1601. Although the second block 1604 is disposed near the right side of the first block 1603 in the drawing, the second block 1604 may be disposed near the left side of the first block 1603. For hole processing efficiency, each of the first block 1603 and the second block 1604 may be a 4x4 block, but the present invention is not limited thereto.
<i2i> Each of the reference blocks 1502 and 1602 is compared with at least one predetermined block of the first view image and the second view image, and the first block is replaced with the reference block so as to fill the holes.
<i22> In such a comparison process, the second block, in which the holes are not formed, and at least one predetermined block of the first view image and the second view image are compared in view of at least one of an average of the depth image, an average of the color image and a variance of the color image, and it is determined whether a difference therebetween is a predetermined value or less.
<i23> If the difference is the predetermined value or less, a block adjacent to the predetermined block is replaced with the first block so as to fill the holes.
<i24> For example, the blocks in the first view image and the second view image are compared with the average of the depth image of the second block so as to detect a block with a difference in average value of a predetermined value or less. The average of the color image and the variance of the color image of the detected block are compared. If a difference therebetween is a predetermined value or less, this block is selected as a matched block. Actually, it is preferable that the block, which is replaced with the first block, be a block located at a location corresponding to the second block.
<i25> Unlike the depth image, the reason why the variance of the color image is used as a comparison reference is because a change in value is not large in the depth image and thus a desired result can be obtained simply by the comparison with the average, but a change in value is large in the color image and thus the average and the variance should be compared in order to find an accurate matched block.
<I26> FIGS. 17 and 18 are views illustrating a hole filling process.
<i27> As shown, hole filling due to the detection of the reference block or the comparison with the predetermined block and the replacement with the predetermined block is preferably performed from the center of the third view image 1701 to the edge thereof.
<128> If the vertical reference block 1502 of FIG. 15 is used, hole filling is preferably performed in order of (D, (2) and (D, as shown. If the horizontal reference block 1602 of FIG. 16 is used, hole filling is preferably performed (1), (2), (3) and (4) based on a horizontal line 1702 and a vertical line 1703 of FIG. 17.
<129> The reason why hole filling is performed from the center of the third view image 1701 to the edge thereof is because a large number of holes is present in the vicinity of the edge and hole filling is more accurately performed by performing hole filling from a portion in which the number of holes is small to a portion in which the number of holes is large.
<i30> Hole filling may be performed earlier in the horizontal direction than in the vertical direction. Due to the change in value of the depth image in the vertical direction, the number of holes is generally large in the vertical direction. Accordingly, it is preferable that hole filling be first performed in the vertical direction. If hole filling is performed in the vertical direction, hole filling is mostly finished. If hole filling is not exceptionally finished, hole filling may be performed in the horizontal direction.
<i3i> As a result, as shown in Fig. 18, hole filling in the reference block 1805 of the third view image 1802 may be completed using the blocks 1804 and 1805 derived from the first view image 1801 and the second view image 1803.
<i32> The method for encoding and decoding the FTV image and the method for correcting the FTV image according to the present invention can also be embodied as processor readable codes on a processor readable recording medium included in the apparatus for encoding and decoding the image. The processor readable recording medium is any data storage device that can store data which can thereafter be read by a processor. Examples of the processor readable recording medium include read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, and carrier waves such as data transmission through the Internet. The processor readable recording medium can also be distributed over network coupled computer systems so that the processor readable code is stored and executed in a distributed fashion. <133>
[Industrial Applicability]
<i34> As described above, a method for encoding and decoding an FTV image according to the present invention may be used for dividing an image into a plurality of images and encoding the images.
<i35> Although the preferred embodiments of the present invention have been disclosed for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims.
<136> <137>

Claims

[CLAIMS] [Claim 1] <139> A method for encoding a Free-viewpoint Television (FTV) image, the method comprising: <i40> encoding information indicating whether or not an image is the FTV image; and
<i4i> encoding at least one of view information, base view information and anchor picture information of a depth image if the image is the FTV image.
<142>
[Claim 2]
<i43> The method according to claim 1, wherein the information is included in a bitstream header.
<I44>
[Claim 3]
<i45> The method according to claim 1, wherein the view information, the base view information and the anchor picture information of the depth image are included when an identifier for identifying the type of a bitstream unit indicates the FTV image.
<146>
[Claim 4]
<i47> The method according to claim 1, wherein the information indicating whether or not the image is the FTV image includes information indicating whether or not the image is the depth image.
<148>
[Claim 5]
<149> The method according to claim 1, further comprising encoding at least one of a Picture Parameter Set (PPS) and Supplemental Enhancement Information (SEI) of the FTV image.
<150>
[Claim 6] <i5i> The method according to claim 1, further comprising encoding a color image and the depth image of the FTV image.
<152>
[Claim 7]
<i53> The method according to claim 6, further comprising encoding information indicating whether or not predictive information of the color image is available for the encoding of the depth image.
<154>
[Claim 8]
<155> The method according to claim 7, wherein: <i56> the predictive information includes an intra prediction mode, in the case of an intra block, and <157> the predictive information includes information about at least one of a referred image management method, an index, motion information or disparity information of a referred image, a sub-block structure, and a prediction mode, in the case of an inter block.
<158>
[Claim 9] <i59> A method for decoding a Free-viewpoint Television (FTV) image, the method comprising:
<i60> decoding information indicating whether or not an image is the FTV image; and <i6i> decoding at least one of view information, base view information and anchor picture information of a depth image if the image is the FTV image.
<162>
[Claim 10]
<i63> The method according to claim 9, wherein the information is included in a bitstream header.
<164>
[Claim 11]
<i65> The method according to claim 9, wherein the view information, the base view information and the anchor picture information of the depth image are included when an identifier for identifying the type of a bitstream unit indicates the FTV image.
<166>
[Claim 12]
<i67> The method according to claim 9, wherein the information indicating whether or not the image is the FTV image includes information indicating whether or not the image is the depth image.
<168>
[Claim 13]
<169> The method according to claim 9, further comprising decoding at least one of a Picture Parameter Set (PPS) and Supplemental Enhancement Information (SEI) of the FTV image.
<170>
[Claim 14]
<i7i> The method according to claim 9, further comprising decoding a color image and the depth image of the FTV image.
<172>
[Claim 15]
<i73> The method according to claim 14, further comprising decoding information indicating whether or not predictive information of the color image is available for the decoding of the depth image.
<174>
[Claim 16]
<i75> The method according to claim 15, wherein: <i76> the predictive information includes an intra prediction mode, in the case of an intra block, and <i77> the predictive information includes information about at least one of a referred image management method, an index, motion information or disparity information of a referred image, a sub-block structure and a prediction mode, in the case of an inter block.
<178>
[Claim 17] <I79> A method for correcting a Free-viewpoint Television (FTV) image, the method comprising:
<i80> generating a third view image based on a first view image and a second view image;
<i8i> detecting a reference block including a first block in which holes are generated and a second block which is adjacent to the first block and in which holes are not generated, in the third view image! and
<i82> comparing the second block in the reference block with a predetermined block of at least one of the first view image and the second view image and replacing the first block with a block adjacent to the predetermined block if a difference therebetween is a predetermined value or less.
<183>
[Claim 18]
<i84> The method according to claim 17, wherein the first block and the second block are adjacent to each other in a horizontal direction or a vertical direction.
<185>
[Claim 19]
<i86> The method according to claim 17, wherein the replacing of the first block is performed from the center of the third view image to the edge thereof.
<I87>
[Claim 20]
<188> The method according to claim 17, wherein the replacing of the first block is performed earlier in the vertical direction of the third view image than in the horizontal direction thereof.
<189>
[Claim 21]
<i9o> The method according to claim 17, wherein the difference uses at least one of a difference in an average of a depth image, a difference in an average of a color image, and a difference in a variance of the color image, between the second block and the predetermined block. <I91>
[Claim 22]
<192> The method according to claim 17, wherein each of the first block and the second block is a 4x4 block.
<193>
PCT/KR2008/006830 2008-04-23 2008-11-20 Method for encoding and decoding image of ftv WO2009131287A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR20080037772 2008-04-23
KR10-2008-0037772 2008-04-23

Publications (1)

Publication Number Publication Date
WO2009131287A1 true WO2009131287A1 (en) 2009-10-29

Family

ID=41216996

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2008/006830 WO2009131287A1 (en) 2008-04-23 2008-11-20 Method for encoding and decoding image of ftv

Country Status (1)

Country Link
WO (1) WO2009131287A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100208994A1 (en) * 2009-02-11 2010-08-19 Ning Yao Filling holes in depth maps
EP2375746A1 (en) 2010-03-31 2011-10-12 Deutsche Telekom AG Method for encoding texture data of free viewpoint television signals, corresponding method for decoding and texture encoder and decoder
US20110298895A1 (en) * 2009-02-19 2011-12-08 Dong Tian 3d video formats
US20140044347A1 (en) * 2011-04-25 2014-02-13 Sharp Kabushiki Kaisha Mage coding apparatus, image coding method, image coding program, image decoding apparatus, image decoding method, and image decoding program
CN103873867A (en) * 2014-03-31 2014-06-18 清华大学深圳研究生院 Free viewpoint video depth map distortion prediction method and free viewpoint video depth map coding method
JP2014132721A (en) * 2013-01-07 2014-07-17 National Institute Of Information & Communication Technology Stereoscopic video encoding device, stereoscopic video decoding device, stereoscopic video encoding method, stereoscopic video decoding method, stereoscopic video encoding program, and stereoscopic video decoding program

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001008232A (en) * 1999-06-25 2001-01-12 Matsushita Electric Ind Co Ltd Omnidirectional video output method and apparatus
KR20040013540A (en) * 2002-08-07 2004-02-14 한국전자통신연구원 The multiplexing method and its device according to user's request for multi-view 3D video
KR20050122717A (en) * 2004-06-25 2005-12-29 학교법인연세대학교 Method for coding/decoding for multiview sequence where view selection is possible

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001008232A (en) * 1999-06-25 2001-01-12 Matsushita Electric Ind Co Ltd Omnidirectional video output method and apparatus
KR20040013540A (en) * 2002-08-07 2004-02-14 한국전자통신연구원 The multiplexing method and its device according to user's request for multi-view 3D video
KR20050122717A (en) * 2004-06-25 2005-12-29 학교법인연세대학교 Method for coding/decoding for multiview sequence where view selection is possible

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100208994A1 (en) * 2009-02-11 2010-08-19 Ning Yao Filling holes in depth maps
US8774512B2 (en) * 2009-02-11 2014-07-08 Thomson Licensing Filling holes in depth maps
US20110298895A1 (en) * 2009-02-19 2011-12-08 Dong Tian 3d video formats
EP2375746A1 (en) 2010-03-31 2011-10-12 Deutsche Telekom AG Method for encoding texture data of free viewpoint television signals, corresponding method for decoding and texture encoder and decoder
US20140044347A1 (en) * 2011-04-25 2014-02-13 Sharp Kabushiki Kaisha Mage coding apparatus, image coding method, image coding program, image decoding apparatus, image decoding method, and image decoding program
JP2014132721A (en) * 2013-01-07 2014-07-17 National Institute Of Information & Communication Technology Stereoscopic video encoding device, stereoscopic video decoding device, stereoscopic video encoding method, stereoscopic video decoding method, stereoscopic video encoding program, and stereoscopic video decoding program
CN103873867A (en) * 2014-03-31 2014-06-18 清华大学深圳研究生院 Free viewpoint video depth map distortion prediction method and free viewpoint video depth map coding method
CN103873867B (en) * 2014-03-31 2017-01-25 清华大学深圳研究生院 Free viewpoint video depth map distortion prediction method and free viewpoint video depth map coding method

Similar Documents

Publication Publication Date Title
JP6884598B2 (en) Valid predictions using partition coding
KR101619450B1 (en) Video signal processing method and apparatus using depth information
US8115804B2 (en) Processing multiview video
JP2021022947A (en) Effective partition encoding with high degree of freedom of partition
US8139150B2 (en) Method and apparatus for encoding and decoding multi-view video signal, and related computer programs
KR101619451B1 (en) Method and apparatus for processing a multiview video signal
CN109068143B (en) Video data decoding method and video data decoding apparatus
US20070147502A1 (en) Method and apparatus for encoding and decoding picture signal, and related computer programs
EP2538674A1 (en) Apparatus for universal coding for multi-view video
EP1793611A2 (en) Method and system for synthesizing multiview videos
EP2348733A2 (en) Virtual view image synthesis method and apparatus
KR101737595B1 (en) Image encoding method, image decoding method, image encoding device, image decoding device, image encoding program, and image decoding program
WO2008084996A1 (en) Method and apparatus for deblocking-filtering video data
WO2007081104A1 (en) Adaptive motion estimation/compensation device for mb-based illumination change and method thereof
US20160065983A1 (en) Method and apparatus for encoding multi layer video and method and apparatus for decoding multilayer video
WO2009131287A1 (en) Method for encoding and decoding image of ftv
Merkle et al. Coding of depth signals for 3D video using wedgelet block segmentation with residual adaptation
KR20210003809A (en) Multi-view video decoding method and apparatus and image processing method and apparatus
AU2014205860A1 (en) Method and apparatus for processing video signal
Tao et al. Joint texture and depth map video coding based on the scalable extension of H. 264/AVC
US20170078698A1 (en) Method and device for deriving inter-view motion merging candidate
Mora et al. Modification of the disparity vector derivation process in 3D-HEVC
Samelak et al. Adaptation of the 3D-HEVC coding tools to arbitrary locations of cameras
KR101841914B1 (en) Method of efficient CODEC for multi-view color and depth videos, and apparatus thereof
Diaz-Honrubia et al. Using bayesian classifiers for low complexity multiview h. 264/avc and hevc hybrid architecture

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08874063

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08874063

Country of ref document: EP

Kind code of ref document: A1