WO2007024072A1 - Method and apparatus for encoding multiview video - Google Patents

Method and apparatus for encoding multiview video Download PDF

Info

Publication number
WO2007024072A1
WO2007024072A1 PCT/KR2006/003268 KR2006003268W WO2007024072A1 WO 2007024072 A1 WO2007024072 A1 WO 2007024072A1 KR 2006003268 W KR2006003268 W KR 2006003268W WO 2007024072 A1 WO2007024072 A1 WO 2007024072A1
Authority
WO
WIPO (PCT)
Prior art keywords
frames
group
frame
encoding
multiview video
Prior art date
Application number
PCT/KR2006/003268
Other languages
French (fr)
Inventor
Tae-Hyeun Ha
Pil-Ho Yu
Original Assignee
Samsung Electronics Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020050105728A external-priority patent/KR100728009B1/en
Application filed by Samsung Electronics Co., Ltd. filed Critical Samsung Electronics Co., Ltd.
Priority to JP2008527838A priority Critical patent/JP2009505604A/en
Priority to EP06783669A priority patent/EP1917814A4/en
Priority to CN200680030315.4A priority patent/CN101243692B/en
Priority to MX2008002391A priority patent/MX2008002391A/en
Publication of WO2007024072A1 publication Critical patent/WO2007024072A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/577Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/187Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scalable video layer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding

Definitions

  • Methods and apparatuses consistent with the present invention relate to encoding a multiview video sequence, and more particularly, to encoding a multiview video photographed by a multiview camera using a minimum amount of information regarding the multiview video.
  • Realism is an important factor in realizing high-quality information and telecommunication services. This realism can be achieved with video communication based on three-dimensional (3D) images. 3D imaging systems have many potential applications in education, entertainment, medical surgery, videoconferencing, and the like. To provide many viewers with more vivid and accurate information of a remote scene, three or more cameras are placed at slightly different viewpoints to produce a multiview sequence.
  • the MVP defines the usage of a temporal scalability mode for multi-camera sequences and acquisition camera parameters in an MPEG-2 syntax.
  • a base-layer stream which represents a multiview video signal can be encoded at a reduced frame rate, and an enhancement-layer stream, which can be used to insert additional frames in between, can be defined to allow reproduction at a full frame rate when both streams are available.
  • a very efficient way to encode the enhancement layer is to determine the optimal method of performing motion-compensated estimation on each macroblock in an enhancement layer frame based on either a base layer frame or a recently reconstructed enhancement layer frame.
  • FIG. 1 is a block diagram of a conventional encoder and decoder of the MPEG-2
  • the scalability provided by the MPEG-2 is used to simultaneously decode images having different resolutions or formats with an image-processing device.
  • temporal scalability is used to improve visual quality by increasing a frame rate.
  • the MVP is applied to stereo sequences in consideration of temporal scalability.
  • the encoder and decoder illustrated in FIG. 1 are a stereo video encoder and decoder with temporal scalability. Left images in a stereo video are input to a base view encoder, and right images are input to a temporal auxiliary view encoder.
  • the temporal auxiliary view encoder provides temporal scalability, and is an interlayer encoder interleaving images between images of the base layer.
  • a two-dimensional (2D) video can be obtained.
  • a stereoscopic video can be obtained.
  • a system multiplexer and a system demultiplexer are needed to combine or separate sequences of the two images.
  • FIG. 2 is a block diagram of a conventional stereo- video encoder and decoder using the MPEG-2 MVP.
  • An image of the base layer is encoded through motion compensation and discrete cosine transform (DCT).
  • the encoded image is decoded in a reverse process.
  • a temporal auxiliary view encoder functions as a temporal interlayer encoder which performs prediction based on the decoded image of the base layer.
  • disparity compensated estimation may be performed twice, or disparity estimation and motion compensated estimation may each be performed once.
  • the temporal auxiliary view encoder includes a disparity and motion compensated DCT encoder and decoder.
  • a disparity compensated encoding process requires a disparity estimator and a compensator as a motion estimation/compensation encoding process requires a motion estimator and compensator.
  • the encoding process includes performing DCT on a difference between an estimated image and an original image, quantization of DCT coefficients, and variable length encoding.
  • a decoding process includes variable length decoding, inverse quantization and inverse DCT.
  • MPEG-2 encoding is a very effective compression method because bi-directional motion estimation is performed for bi-directionally motion-compensated pictures (B pictures). Since the MPEG-2 encoding provides highly effective temporal scalability, B pictures can be used to encode a right image sequence. Consequently, a highly compressed right sequence can be generated.
  • FIG. 3 illustrates disparity-based predictive encoding in which disparity estimation is used twice for bi-directional motion estimation.
  • a left image is encoded using a non-scalable MPEG-2 encoder, and a right image is encoded using a MPEG-2 temporal auxiliary view encoder based on the decoded left image.
  • a right image is predicted using two reference images, e.g., two left images, and encoded into a B picture.
  • one of the two reference images is an isochronal left image to be simultaneously displayed with the right image, and the other is a left image that follows the isochronal left image.
  • the two predictions have three prediction modes: a forward mode, a backward mode and an interpolated mode.
  • the forward mode denotes disparity estimation based on the isochronal left image
  • the backward mode denotes disparity estimation based on the left image that immediately follows the isochronal left image.
  • a right image is predicted using disparity vectors of the two left images.
  • Such an estimation method is called predictive encoding, considering only disparity vectors. Therefore, an encoder estimates two disparity vectors for each frame of a right image, and a decoder decodes the right image from the left image using the two disparity vectors.
  • FIG. 4 illustrates predictive encoding using a disparity vector and a motion vector for the bi-directional estimation.
  • B pictures obtained through the bi-directional estimation of FIG. 3 are used.
  • disparity estimation and motion estimation are each used once in the bi-directional estimation. That is, the disparity estimation using an isochronal left image and the motion estimation using a previous right image are used.
  • the bi-directional estimation also includes three estimation modes, i.e., a forward mode, a backward mode and an interpolated mode, as in the disparity-based predictive encoding of FlG. 3.
  • the forward mode denotes motion estimation based on a decoded right image
  • the backward mode denotes disparity estimation based on a decoded left image.
  • the MPEG-2 MVP does not consider a multiview video encoder, it is not suitable for encoding a multiview video. Therefore, a multiview video encoder for simultaneously providing a multiview video, which is stereoscopic and realistic, to many people is required.
  • the present invention provides a method and apparatus for efficiently encoding a multiview video which is realistic and simultaneously providing the encoded multiview video to many people.
  • the present invention also provides a method and apparatus for encoding a multiview video using a prediction structure that uses a minimum amount of information regarding the multiview video.
  • the present invention provides a method and apparatus for efficiently encoding a multiview video to simultaneously provide the multiview video which is realistic to many people.
  • the present invention also provides a method and apparatus for encoding a multiview video using a B-frame prediction structure that uses a minimum amount of information regarding the multiview video.
  • FIG. 1 is a block diagram of a related art encoder and decoder of a motion picture experts group 2 (MPEG-2) multiview profile (MVP);
  • FIG. 2 is a block diagram of a related art stereo- video encoder and decoder using the MPEG-2 MVP;
  • FIG. 3 illustrates a related art disparity-based predictive encoding in which disparity estimation is used twice for bi-directional motion estimation;
  • FlG. 4 illustrates a related art predictive encoding using a disparity vector and a motion vector for the bi-directional estimation;
  • FIG. 5 is a block diagram of an apparatus for encoding a multiview video according to an exemplary embodiment of the present invention;
  • FlG. 6 illustrates a unit encoding structure of a multiview video according to an exemplary embodiment of the present invention;
  • FIGS. 7 A through 7F illustrate three types of B pictures used in multiview video encoding according to an exemplary embodiment of the present invention
  • FlG. 8 illustrates a horizontally extended unit encoding structure of a multiview video according to an exemplary embodiment of the present invention
  • FlG. 9 illustrates a prediction sequence of the multiview image of FlG. 8;
  • FlG. 10 illustrates a video encoding structure having an odd number of views for motion estimation and disparity estimation according to an exemplary embodiment of the present invention
  • FlG. 11 illustrates a video encoding structure having an even number of views for motion estimation and disparity estimation according to an exemplary embodiment of the present invention.
  • FlG. 12 is a flowchart illustrating a method of encoding a multiview video according to an exemplary embodiment of the present invention.
  • a method of encoding a multiview video including: categorizing a plurality of B frames into at least two groups according to a predetermined standard; and sequentially encoding the categorized B frames.
  • the predetermined standard may be the number of frames to which each B frame refers.
  • the predetermined standard may be the number of reference frames to which each B frame refers and positions of the reference frames.
  • the B frames may be categorized into a first group of B frames which are predicted with reference to two horizontally adjacent frames, two vertically adjacent frames or one horizontally adjacent frame and one vertically adjacent frame, a second group of B frames which are predicted with reference to two horizontally adjacent frames and one vertically adjacent frame or one horizontally adjacent frame and two vertically adjacent frames, and a third group of B frames which are predicted with reference to two horizontally adjacent frames and two vertically adjacent frames, wherein the one or two horizontally adjacent frames are a frame or frames obtained from the multiview video at a same temporal level as a referring B frame, and the one or two vertically adjacent frames are a frame or frames obtained from the multiview video at a same view position as a referring B frame .
  • the sequential encoding of the categorized B frames may include sequentially encoding the first group of B frames, the second group of B frames, and the third group of B frames.
  • the sequential encoding may be performed based on a video encoding structure which includes the B frames, and may further include performing disparity estimation between frames disposed horizontally according to a plurality of views and performing motion estimation between frames disposed vertically according to the passage of time, and the video encoding structure can be horizontally and vertically extended.
  • a video encoding structure having n views can be configured into a video encoding structure having n-1 views by disabling an (n-l)th column of frames, wherein n is an odd natural number.
  • an apparatus for encoding a multiview video including: a prediction unit which predicts a disparity vector and a motion vector of an input multiview video; a disparity and motion compensation unit which compensates an image using the predicted disparity vector and motion vector; a residual image encoding unit which receives an original image and the compensated image generated by the disparity and motion compensation unit, subtracts the compensated image from the original image, and encodes a residual image obtained from the subtraction; and an entropy-encoding unit which generates a bit stream for the multiview video using the disparity vector, the motion vector, and the encoded residual image, wherein the prediction unit categorizes a plurality of B frames into at least two groups according to a predetermined standard and sequentially predicts the categorized B frames.
  • a computer- readable recording medium on which a program for executing a program for implementing the method is recorded.
  • FlG. 5 is a block diagram of an apparatus for encoding a multiview video according to an exemplary embodiment of the present invention.
  • the apparatus includes a multiview image buffer 510, a prediction unit 520, a disparity/motion compensation unit 530, a residual image encoding unit 540, and an entropy-encoding unit 550.
  • the apparatus can receive a multiview video source from a plurality of camera systems or through another method.
  • the received multiview video is stored in the multiview image buffer 510.
  • the multiview image buffer 510 provides the multiview video to the prediction unit 520 and the residual image encoding unit 540.
  • the prediction unit 520 includes a disparity estimation unit 522 and a motion estimation unit 524.
  • the prediction unit 520 performs motion estimation and disparity estimation on the multiview video.
  • the prediction unit 520 estimates a disparity vector and a motion vector in directions indicated by arrows illustrated in FIGS. 6 through 11, and provides the estimated disparity vector and the motion vector to the disparity/ motion compensation unit 530.
  • the prediction unit 520 may set directions for performing motion estimation and disparity estimation by efficiently using a multiview disparity vector and a motion vector which are generated when the multiview video source is extended based on a time axis.
  • an MPEG-2 encoding structure can be extended based on a view axis to use spatial/temporal correlation of the multiview video.
  • the disparity/motion compensation unit 530 performs the disparity estimation and the motion estimation using the motion vector and the disparity vector estimated by the disparity estimation unit 522 and the motion estimation unit 524.
  • the disparity/motion compensation unit 530 reconstructs an image using the estimated motion vector and disparity vector and provides the reconstructed image to the residual image encoding unit 540.
  • the residual image encoding unit [55] To provide better visual quality and stereoscopy, the residual image encoding unit
  • the entropy-encoding unit 540 encodes a residual image obtained by subtracting the image compensated and reconstructed by the disparity/motion compensation unit 530 from the original image provided by the multiview image buffer 510 and provides the encoded residual image to the entropy-encoding unit 550.
  • the entropy-encoding unit 550 receives the estimated disparity vector and motion vector from the prediction unit 520 and the encoded residual image from the residual image encoding unit 540 and generates a bit stream for the multiview video source.
  • FlG. 6 illustrates a unit encoding structure of a multiview video according to an exemplary embodiment of the present invention.
  • a core-prediction structure or a unit- prediction structure illustrated in FlG. 6 is based on the assumption that there are three views.
  • a square block indicates an image frame in a multiview video.
  • a horizontal arrow indicates a sequence of frames according to views or positions of cameras, and a vertical arrow indicates a sequence of the frames according to time.
  • An I picture indicates an 'intra picture', identical to an I frame in MPEG-2/4 or H. 264.
  • P and B pictures respectively indicate a 'predictive picture' and a 'bi-directional prediction picture', similar to P and B frames in MPEG-2.4 or H. 264.
  • the P and B pictures are estimated by the motion estimation and the disparity estimation together in the multiview video coding.
  • arrows between picture- frames indicate prediction directions.
  • Horizontal arrows indicate disparity estimation, and vertical arrows indicate motion estimation.
  • FIGS. 7 A through 7F illustrate three types of B pictures used in multiview video encoding according to an exemplary embodiment of the present invention.
  • B pictures there are three types of B pictures: B, Bl, and B2 pictures.
  • the B, Bl, and B2 pictures denote picture-frames predicted using two or more horizontally or vertically adjacent frames.
  • FlG. 7 A two vertically adjacent frames as illustrated in FlG. 7B, or a horizontally adjacent frame and a vertically adjacent frame as illustrated in FlG. 7C.
  • B 1 pictures are predicted using two horizontally adjacent frames and one vertically adjacent frame as illustrated in FlG. 7D or a horizontally adjacent frame and two vertically adjacent frames as illustrated in FlG. 7E.
  • B 2 pictures are predicted using four horizontally or vertically adjacent frames as illustrated in FlG. 7F.
  • an I frame 601 is intra-predicted.
  • a P frame 603 is predicted by referring to the I frame 601, and a P frame 610 is predicted by referring to the I frame 601.
  • a B frame 602 is predicted using the I frame 601 and the P frame 603 horizontally adjacent to the B frame 602.
  • a B frame 604 and a B frame 607 are predicted using the I frame 601 and the P frame 610 vertically adjacent to the B frame 604 and the B frame 607.
  • a B frame 612 is predicted using the P frame 610 horizontally adjacent to the B frame 612 and the P frame 603 vertically adjacent to the B frame 612.
  • B 1 frames are predicted. Specifically, a B 1 frame 606 is predicted using the
  • a Bl frame 609 is predicted using the B frame 607 horizontally adjacent to the Bl frame 609 and the P frame 603 and the B frame 612 vertically adjacent to the Bl frame 609.
  • a Bl frame 611 is predicted using the P frame 610 and the B frame 612 horizontally adjacent to the B 1 frame 611 and the B frame 602 vertically adjacent to the B 1 frame 611.
  • B2 frames are predicted. Specifically, a B2 frame 605 is predicted using the B frame 604 and the Bl frame 606 horizontally adjacent to the B 2 frame 605 and the B frame 602 and the Bl frame 611 vertically adjacent to the B2 frame 605. In addition, a B2 frame 608 is predicted using the B frame 607 and the Bl frame 609 hor- izontally adjacent to the B2 frame 608 and the B frame 602 and the B 1 frame 611 vertically adjacent to the B2 frame 608.
  • bi-directional prediction is performed with reference not only to B frames, but also to Bl and B2 frames. Since the number of B type frames can be increased, the amount of information required for encoding a multiview image can be minimized. Therefore, according to an exemplary embodiment of the present invention, to efficiently encode a multiview image, B frames are grouped according to the types of frame illustrated in FIGS. 7 A through 7F and encoded in the prediction sequence B frame -> B 1 frame -> B 2 frame as described above.
  • FlG. 8 illustrates a horizontally extended unit encoding structure of a multiview video according to an exemplary embodiment of the present invention.
  • FlG. 8 illustrates a prediction block structure which has a 5-view of an input image source.
  • FlG. 9 illustrates a prediction sequence of the multiview image of FlG. 8. In FlG.
  • frames in the same column are predicted at the same time.
  • an I frame 801 is intra-predicted.
  • a P frame 803 and a P frame 816 in a second column are predicted, and B frames 802, 806, 811 and 818 and a P frame 805 in a third column are predicted.
  • Bl frames 817, 808 and 813, and B frames 804 and 820 are predicted.
  • B2 frames 807 and 821 and Bl frames 810, 819 and 815 in a fifth column are then predicted.
  • B2 frames 809 and 814 are predicted. Therefore, the prediction sequence according to the present exemplary embodiment is I, P, B, Bl, B2, P, B, Bl and B2 pictures in order.
  • FlG. 10 illustrates a video encoding structure having an odd number of views for motion estimation and disparity estimation according to an exemplary embodiment of the present invention.
  • FlG. 11 illustrates a video encoding structure having an even number of views for motion estimation and disparity estimation according to an exemplary embodiment of the present invention.
  • the video encoding structure of FlG. 11 can be obtained by disabling a fourth column of prediction frames in the five- view video encoding structure of FlG. 10.
  • the video encoding structure according to the present exemplary embodiment can be horizontally and vertically extended.
  • an n- view (n is an odd number) video encoding structure can be reconfigured into an (n- 1)- view video encoding structure by disabling an (n-1) column of prediction frames.
  • FlG. 12 is a flowchart illustrating a method of encoding a multiview video according to an exemplary embodiment of the present invention. The method has been described with reference to FIGS. 6 through 11. In particular, B frames are encoded in the method as follows.
  • a plurality of B frames are divided into at least two groups according to a predetermined standard (S 1210).
  • the predetermined standard may be the number of frames that each B frame refers to or may be the number of frames that each B frame refers to and the position of the reference frames.
  • the B frames may be categorized into a first group of B frames which are predicted with reference to two horizontally adjacent frames, two vertically adjacent frames or one horizontally adjacent frame and one vertically adjacent frame, a second group of B frames which are predicted with reference to two horizontally adjacent frames and one vertically adjacent frame or one horizontally adjacent frame and two vertically adjacent frames, and a third group of B frames which are predicted with reference to two horizontally adjacent frames and two vertically adjacent frames.
  • the B frames grouped as described above are sequentially encoded (S 1220).
  • the B frames may be encoded in the order of the first group, the second group, and the third group.
  • the present invention provides a method and apparatus for efficiently encoding a multiview video to simultaneously provide the multiview video which is realistic to many people.
  • the present invention also provides a method and apparatus for encoding a multiview video using a B-frame prediction structure that uses a minimum amount of information regarding the multiview video.
  • the present invention can also be implemented as computer-readable code on a computer-readable recording medium.
  • the computer-readable recording medium is any data storage device that can store data which can be thereafter read by a computer system. Examples of the computer-readable recording medium include read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, and carrier waves (such as data transmission through the Internet).
  • the computer-readable recording medium can also be distributed over network- coupled computer systems so that the computer-readable code is stored and executed in a distributed fashion.

Abstract

A method and an apparatus for encoding a multiview video using a minimum amount of information regarding the multiview video are provided. The method includes: categorizing a plurality of B frames into at least two groups according to a predetermined standard; and sequentially encoding the categorized B frames. Therefore, a multiview video which is realistic can be simultaneously provided to many people using a minimum amount of information regarding the multiview video.

Description

Description METHOD AND APPARATUS FOR ENCODING MULTIVIEW
VIDEO
Technical Field
[1] Methods and apparatuses consistent with the present invention relate to encoding a multiview video sequence, and more particularly, to encoding a multiview video photographed by a multiview camera using a minimum amount of information regarding the multiview video.
Background Art
[2] Realism is an important factor in realizing high-quality information and telecommunication services. This realism can be achieved with video communication based on three-dimensional (3D) images. 3D imaging systems have many potential applications in education, entertainment, medical surgery, videoconferencing, and the like. To provide many viewers with more vivid and accurate information of a remote scene, three or more cameras are placed at slightly different viewpoints to produce a multiview sequence.
[3] Reflecting the current interest in 3D images, a number of research groups have developed 3D-image processing and display systems. In Europe, research on 3DTV has been initiated through several projects such as DISTIMA, the objective of which is to develop a system for capturing, coding, transmitting and displaying digital stereoscopic image sequences. These projects have led to another project, PANORAMA, with the goal of enhancing visual information in 3D telepresence communication. The projects have also led to another project, ATTEST, in which various technologies for 3D-content acquisition, 3D-compression & transmission, and 3D-display systems were researched. In the ATTEST project, motion picture experts group 2 (MPEG-2) and digital video broadcasting (DVB) standards were applied to transmit 3D contents using temporal scalability. To achieve temporal scalability, a base layer is used for the transmission of 2D contents and an advanced layer is used for the transmission of 3D contents.
[4] The MPEG-2 standard was amended in 1996 to define a multiview profile (MVP).
The MVP defines the usage of a temporal scalability mode for multi-camera sequences and acquisition camera parameters in an MPEG-2 syntax.
[5] A base-layer stream which represents a multiview video signal can be encoded at a reduced frame rate, and an enhancement-layer stream, which can be used to insert additional frames in between, can be defined to allow reproduction at a full frame rate when both streams are available. A very efficient way to encode the enhancement layer is to determine the optimal method of performing motion-compensated estimation on each macroblock in an enhancement layer frame based on either a base layer frame or a recently reconstructed enhancement layer frame.
[6] The process of stereo and multiview channel encoding such a multiview video signal using temporal scalability syntax is straightforward. For this purpose, a frame from a particular camera view (usually a left-eye frame) is defined as the base layer, and a frame from the other camera view is defined as the enhancement layer. The base layer represents a simultaneous monoscopic sequence. For the enhancement layer, although disparity-compensated estimation may fail in occluded regions, it is still possible to maintain the quality of a reconstructed image using motion-compensated estimation within the same channel. Since the MPEG-2 MVP was mainly defined for stereo sequences, it does not support multiview sequences and is inherently difficult to extend to multiview sequences.
[7] FIG. 1 is a block diagram of a conventional encoder and decoder of the MPEG-2
MVP. The scalability provided by the MPEG-2 is used to simultaneously decode images having different resolutions or formats with an image-processing device. Among scalabilities supported by MPEG-2, temporal scalability is used to improve visual quality by increasing a frame rate. The MVP is applied to stereo sequences in consideration of temporal scalability.
[8] The encoder and decoder illustrated in FIG. 1 are a stereo video encoder and decoder with temporal scalability. Left images in a stereo video are input to a base view encoder, and right images are input to a temporal auxiliary view encoder.
[9] The temporal auxiliary view encoder provides temporal scalability, and is an interlayer encoder interleaving images between images of the base layer.
[10] When the left image is separately encoded and decoded, a two-dimensional (2D) video can be obtained. When the left image and the right image are simultaneously encoded and decoded, a stereoscopic video can be obtained. To transmit or store a video, a system multiplexer and a system demultiplexer are needed to combine or separate sequences of the two images.
[11] FIG. 2 is a block diagram of a conventional stereo- video encoder and decoder using the MPEG-2 MVP.
[12] An image of the base layer is encoded through motion compensation and discrete cosine transform (DCT). The encoded image is decoded in a reverse process. A temporal auxiliary view encoder functions as a temporal interlayer encoder which performs prediction based on the decoded image of the base layer.
[13] In other words, disparity compensated estimation may be performed twice, or disparity estimation and motion compensated estimation may each be performed once. Like an encoder and decoder of a base layer, the temporal auxiliary view encoder includes a disparity and motion compensated DCT encoder and decoder.
[14] Further, a disparity compensated encoding process requires a disparity estimator and a compensator as a motion estimation/compensation encoding process requires a motion estimator and compensator. In addition to block-based motion/disparity estimation and compensation, the encoding process includes performing DCT on a difference between an estimated image and an original image, quantization of DCT coefficients, and variable length encoding. On the other hand, a decoding process includes variable length decoding, inverse quantization and inverse DCT.
[15] MPEG-2 encoding is a very effective compression method because bi-directional motion estimation is performed for bi-directionally motion-compensated pictures (B pictures). Since the MPEG-2 encoding provides highly effective temporal scalability, B pictures can be used to encode a right image sequence. Consequently, a highly compressed right sequence can be generated.
[16] FIG. 3 illustrates disparity-based predictive encoding in which disparity estimation is used twice for bi-directional motion estimation.
[17] A left image is encoded using a non-scalable MPEG-2 encoder, and a right image is encoded using a MPEG-2 temporal auxiliary view encoder based on the decoded left image.
[18] In other words, a right image is predicted using two reference images, e.g., two left images, and encoded into a B picture. In this case, one of the two reference images is an isochronal left image to be simultaneously displayed with the right image, and the other is a left image that follows the isochronal left image.
[19] Like the motion estimation/compensation, the two predictions have three prediction modes: a forward mode, a backward mode and an interpolated mode. The forward mode denotes disparity estimation based on the isochronal left image, and the backward mode denotes disparity estimation based on the left image that immediately follows the isochronal left image. In this case, a right image is predicted using disparity vectors of the two left images. Such an estimation method is called predictive encoding, considering only disparity vectors. Therefore, an encoder estimates two disparity vectors for each frame of a right image, and a decoder decodes the right image from the left image using the two disparity vectors.
[20] FIG. 4 illustrates predictive encoding using a disparity vector and a motion vector for the bi-directional estimation. In the predictive encoding illustrated in FIG. 4, B pictures obtained through the bi-directional estimation of FIG. 3 are used. However, disparity estimation and motion estimation are each used once in the bi-directional estimation. That is, the disparity estimation using an isochronal left image and the motion estimation using a previous right image are used.
[21] Further, the bi-directional estimation also includes three estimation modes, i.e., a forward mode, a backward mode and an interpolated mode, as in the disparity-based predictive encoding of FlG. 3. The forward mode denotes motion estimation based on a decoded right image, and the backward mode denotes disparity estimation based on a decoded left image. [22] As described above, since the MPEG-2 MVP does not consider a multiview video encoder, it is not suitable for encoding a multiview video. Therefore, a multiview video encoder for simultaneously providing a multiview video, which is stereoscopic and realistic, to many people is required.
Disclosure of Invention
Technical Solution [23] The present invention provides a method and apparatus for efficiently encoding a multiview video which is realistic and simultaneously providing the encoded multiview video to many people. [24] The present invention also provides a method and apparatus for encoding a multiview video using a prediction structure that uses a minimum amount of information regarding the multiview video.
Advantageous Effects [25] The present invention provides a method and apparatus for efficiently encoding a multiview video to simultaneously provide the multiview video which is realistic to many people. [26] The present invention also provides a method and apparatus for encoding a multiview video using a B-frame prediction structure that uses a minimum amount of information regarding the multiview video.
Description of Drawings [27] The above and other features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings, in which: [28] FlG. 1 is a block diagram of a related art encoder and decoder of a motion picture experts group 2 (MPEG-2) multiview profile (MVP); [29] FIG. 2 is a block diagram of a related art stereo- video encoder and decoder using the MPEG-2 MVP; [30] FIG. 3 illustrates a related art disparity-based predictive encoding in which disparity estimation is used twice for bi-directional motion estimation; [31] FlG. 4 illustrates a related art predictive encoding using a disparity vector and a motion vector for the bi-directional estimation; [32] FIG. 5 is a block diagram of an apparatus for encoding a multiview video according to an exemplary embodiment of the present invention; [33] FlG. 6 illustrates a unit encoding structure of a multiview video according to an exemplary embodiment of the present invention;
[34] FIGS. 7 A through 7F illustrate three types of B pictures used in multiview video encoding according to an exemplary embodiment of the present invention;
[35] FlG. 8 illustrates a horizontally extended unit encoding structure of a multiview video according to an exemplary embodiment of the present invention;
[36] FlG. 9 illustrates a prediction sequence of the multiview image of FlG. 8;
[37] FlG. 10 illustrates a video encoding structure having an odd number of views for motion estimation and disparity estimation according to an exemplary embodiment of the present invention;
[38] FlG. 11 illustrates a video encoding structure having an even number of views for motion estimation and disparity estimation according to an exemplary embodiment of the present invention; and
[39] FlG. 12 is a flowchart illustrating a method of encoding a multiview video according to an exemplary embodiment of the present invention.
Best Mode
[40] According to an aspect of the present invention, there is provided a method of encoding a multiview video, the method including: categorizing a plurality of B frames into at least two groups according to a predetermined standard; and sequentially encoding the categorized B frames.
[41] The predetermined standard may be the number of frames to which each B frame refers. Alternatively, the predetermined standard may be the number of reference frames to which each B frame refers and positions of the reference frames.
[42] The B frames may be categorized into a first group of B frames which are predicted with reference to two horizontally adjacent frames, two vertically adjacent frames or one horizontally adjacent frame and one vertically adjacent frame, a second group of B frames which are predicted with reference to two horizontally adjacent frames and one vertically adjacent frame or one horizontally adjacent frame and two vertically adjacent frames, and a third group of B frames which are predicted with reference to two horizontally adjacent frames and two vertically adjacent frames, wherein the one or two horizontally adjacent frames are a frame or frames obtained from the multiview video at a same temporal level as a referring B frame, and the one or two vertically adjacent frames are a frame or frames obtained from the multiview video at a same view position as a referring B frame .
[43] The sequential encoding of the categorized B frames may include sequentially encoding the first group of B frames, the second group of B frames, and the third group of B frames.
[44] The sequential encoding may be performed based on a video encoding structure which includes the B frames, and may further include performing disparity estimation between frames disposed horizontally according to a plurality of views and performing motion estimation between frames disposed vertically according to the passage of time, and the video encoding structure can be horizontally and vertically extended.
[45] In the video-encoding structure which includes the B frames, a video encoding structure having n views can be configured into a video encoding structure having n-1 views by disabling an (n-l)th column of frames, wherein n is an odd natural number.
[46] According to another aspect of the present invention, there is provided an apparatus for encoding a multiview video, the apparatus including: a prediction unit which predicts a disparity vector and a motion vector of an input multiview video; a disparity and motion compensation unit which compensates an image using the predicted disparity vector and motion vector; a residual image encoding unit which receives an original image and the compensated image generated by the disparity and motion compensation unit, subtracts the compensated image from the original image, and encodes a residual image obtained from the subtraction; and an entropy-encoding unit which generates a bit stream for the multiview video using the disparity vector, the motion vector, and the encoded residual image, wherein the prediction unit categorizes a plurality of B frames into at least two groups according to a predetermined standard and sequentially predicts the categorized B frames.
[47] According to another aspect of the present invention, there is provided a computer- readable recording medium on which a program for executing a program for implementing the method is recorded.
Mode for Invention
[48] The present invention will now be described more fully with reference to the accompanying drawings, in which exemplary embodiments of the invention are shown. The invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth therein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art.
[49] FlG. 5 is a block diagram of an apparatus for encoding a multiview video according to an exemplary embodiment of the present invention.
[50] Referring to FlG. 5, the apparatus includes a multiview image buffer 510, a prediction unit 520, a disparity/motion compensation unit 530, a residual image encoding unit 540, and an entropy-encoding unit 550.
[51] The apparatus can receive a multiview video source from a plurality of camera systems or through another method. The received multiview video is stored in the multiview image buffer 510. The multiview image buffer 510 provides the multiview video to the prediction unit 520 and the residual image encoding unit 540. [52] The prediction unit 520 includes a disparity estimation unit 522 and a motion estimation unit 524. The prediction unit 520 performs motion estimation and disparity estimation on the multiview video. The prediction unit 520 estimates a disparity vector and a motion vector in directions indicated by arrows illustrated in FIGS. 6 through 11, and provides the estimated disparity vector and the motion vector to the disparity/ motion compensation unit 530.
[53] As illustrated in multiview video encoding structures illustrated in FIGS. 6 through
11, the prediction unit 520 may set directions for performing motion estimation and disparity estimation by efficiently using a multiview disparity vector and a motion vector which are generated when the multiview video source is extended based on a time axis. In other words, an MPEG-2 encoding structure can be extended based on a view axis to use spatial/temporal correlation of the multiview video.
[54] The disparity/motion compensation unit 530 performs the disparity estimation and the motion estimation using the motion vector and the disparity vector estimated by the disparity estimation unit 522 and the motion estimation unit 524. The disparity/motion compensation unit 530 reconstructs an image using the estimated motion vector and disparity vector and provides the reconstructed image to the residual image encoding unit 540.
[55] To provide better visual quality and stereoscopy, the residual image encoding unit
540 encodes a residual image obtained by subtracting the image compensated and reconstructed by the disparity/motion compensation unit 530 from the original image provided by the multiview image buffer 510 and provides the encoded residual image to the entropy-encoding unit 550.
[56] The entropy-encoding unit 550 receives the estimated disparity vector and motion vector from the prediction unit 520 and the encoded residual image from the residual image encoding unit 540 and generates a bit stream for the multiview video source.
[57] FlG. 6 illustrates a unit encoding structure of a multiview video according to an exemplary embodiment of the present invention. A core-prediction structure or a unit- prediction structure illustrated in FlG. 6 is based on the assumption that there are three views. A square block indicates an image frame in a multiview video. A horizontal arrow indicates a sequence of frames according to views or positions of cameras, and a vertical arrow indicates a sequence of the frames according to time. An I picture indicates an 'intra picture', identical to an I frame in MPEG-2/4 or H. 264. P and B pictures respectively indicate a 'predictive picture' and a 'bi-directional prediction picture', similar to P and B frames in MPEG-2.4 or H. 264.
[58] The P and B pictures are estimated by the motion estimation and the disparity estimation together in the multiview video coding. In FlG. 6, arrows between picture- frames indicate prediction directions. Horizontal arrows indicate disparity estimation, and vertical arrows indicate motion estimation. According to an exemplary embodiment of the present invention, there are three types of B pictures, which will now be described with reference to FlG. 7.
[59] FIGS. 7 A through 7F illustrate three types of B pictures used in multiview video encoding according to an exemplary embodiment of the present invention.
[60] According to exemplary embodiments of the present embodiment, there are three types of B pictures: B, Bl, and B2 pictures. In FlG. 7, the B, Bl, and B2 pictures denote picture-frames predicted using two or more horizontally or vertically adjacent frames.
[61] B pictures are predicted using two horizontally adjacent frames as illustrated in
FlG. 7 A, two vertically adjacent frames as illustrated in FlG. 7B, or a horizontally adjacent frame and a vertically adjacent frame as illustrated in FlG. 7C.
[62] B 1 pictures are predicted using two horizontally adjacent frames and one vertically adjacent frame as illustrated in FlG. 7D or a horizontally adjacent frame and two vertically adjacent frames as illustrated in FlG. 7E. B 2 pictures are predicted using four horizontally or vertically adjacent frames as illustrated in FlG. 7F.
[63] The unit encoding structure indicating a prediction sequence of a multiview video according to an exemplary embodiment of the present invention will now be described with reference to FlG. 6. Referring to FlG. 6, a basic prediction sequence is I, P, B, Bl and B2 pictures in order.
[64] First, an I frame 601 is intra-predicted. A P frame 603 is predicted by referring to the I frame 601, and a P frame 610 is predicted by referring to the I frame 601.
[65] A B frame 602 is predicted using the I frame 601 and the P frame 603 horizontally adjacent to the B frame 602. A B frame 604 and a B frame 607 are predicted using the I frame 601 and the P frame 610 vertically adjacent to the B frame 604 and the B frame 607. A B frame 612 is predicted using the P frame 610 horizontally adjacent to the B frame 612 and the P frame 603 vertically adjacent to the B frame 612.
[66] Then, B 1 frames are predicted. Specifically, a B 1 frame 606 is predicted using the
B frame 604 horizontally adjacent to the Bl frame 606 and the P frame 603 and the B frame 612 vertically adjacent to the Bl frame 606. A Bl frame 609 is predicted using the B frame 607 horizontally adjacent to the Bl frame 609 and the P frame 603 and the B frame 612 vertically adjacent to the Bl frame 609. A Bl frame 611 is predicted using the P frame 610 and the B frame 612 horizontally adjacent to the B 1 frame 611 and the B frame 602 vertically adjacent to the B 1 frame 611.
[67] Finally, B2 frames are predicted. Specifically, a B2 frame 605 is predicted using the B frame 604 and the Bl frame 606 horizontally adjacent to the B 2 frame 605 and the B frame 602 and the Bl frame 611 vertically adjacent to the B2 frame 605. In addition, a B2 frame 608 is predicted using the B frame 607 and the Bl frame 609 hor- izontally adjacent to the B2 frame 608 and the B frame 602 and the B 1 frame 611 vertically adjacent to the B2 frame 608.
[68] As described above with reference to FIGS. 6 and 7 A through 7F, according to exemplary embodiments of the present invention, bi-directional prediction is performed with reference not only to B frames, but also to Bl and B2 frames. Since the number of B type frames can be increased, the amount of information required for encoding a multiview image can be minimized. Therefore, according to an exemplary embodiment of the present invention, to efficiently encode a multiview image, B frames are grouped according to the types of frame illustrated in FIGS. 7 A through 7F and encoded in the prediction sequence B frame -> B 1 frame -> B 2 frame as described above.
[69] FlG. 8 illustrates a horizontally extended unit encoding structure of a multiview video according to an exemplary embodiment of the present invention. FlG. 8 illustrates a prediction block structure which has a 5-view of an input image source.
[70] FlG. 9 illustrates a prediction sequence of the multiview image of FlG. 8. In FlG.
9, frames in the same column are predicted at the same time. Referring to FlG. 9, first, an I frame 801 is intra-predicted. Then, a P frame 803 and a P frame 816 in a second column are predicted, and B frames 802, 806, 811 and 818 and a P frame 805 in a third column are predicted. Next, Bl frames 817, 808 and 813, and B frames 804 and 820 are predicted. B2 frames 807 and 821 and Bl frames 810, 819 and 815 in a fifth column are then predicted. Finally, B2 frames 809 and 814 are predicted. Therefore, the prediction sequence according to the present exemplary embodiment is I, P, B, Bl, B2, P, B, Bl and B2 pictures in order.
[71] FlG. 10 illustrates a video encoding structure having an odd number of views for motion estimation and disparity estimation according to an exemplary embodiment of the present invention.
[72] FlG. 11 illustrates a video encoding structure having an even number of views for motion estimation and disparity estimation according to an exemplary embodiment of the present invention.
[73] The video encoding structure of FlG. 11 can be obtained by disabling a fourth column of prediction frames in the five- view video encoding structure of FlG. 10. The video encoding structure according to the present exemplary embodiment can be horizontally and vertically extended.
[74] Therefore, according to an exemplary embodiment of the present invention, an n- view (n is an odd number) video encoding structure can be reconfigured into an (n- 1)- view video encoding structure by disabling an (n-1) column of prediction frames.
[75] FlG. 12 is a flowchart illustrating a method of encoding a multiview video according to an exemplary embodiment of the present invention. The method has been described with reference to FIGS. 6 through 11. In particular, B frames are encoded in the method as follows.
[76] A plurality of B frames are divided into at least two groups according to a predetermined standard (S 1210). The predetermined standard may be the number of frames that each B frame refers to or may be the number of frames that each B frame refers to and the position of the reference frames.
[77] The B frames may be categorized into a first group of B frames which are predicted with reference to two horizontally adjacent frames, two vertically adjacent frames or one horizontally adjacent frame and one vertically adjacent frame, a second group of B frames which are predicted with reference to two horizontally adjacent frames and one vertically adjacent frame or one horizontally adjacent frame and two vertically adjacent frames, and a third group of B frames which are predicted with reference to two horizontally adjacent frames and two vertically adjacent frames.
[78] The B frames grouped as described above are sequentially encoded (S 1220). In this case, the B frames may be encoded in the order of the first group, the second group, and the third group.
[79] As described above, the present invention provides a method and apparatus for efficiently encoding a multiview video to simultaneously provide the multiview video which is realistic to many people.
[80] The present invention also provides a method and apparatus for encoding a multiview video using a B-frame prediction structure that uses a minimum amount of information regarding the multiview video.
[81] The present invention can also be implemented as computer-readable code on a computer-readable recording medium. The computer-readable recording medium is any data storage device that can store data which can be thereafter read by a computer system. Examples of the computer-readable recording medium include read-only memory (ROM), random-access memory (RAM), CD-ROMs, magnetic tapes, floppy disks, optical data storage devices, and carrier waves (such as data transmission through the Internet).
[82] The computer-readable recording medium can also be distributed over network- coupled computer systems so that the computer-readable code is stored and executed in a distributed fashion.
[83] While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present invention as defined by the following claims.

Claims

Claims
[1] L A method of encoding a multiview video, the method comprising: categorizing a plurality of B frames into at least two groups according to a predetermined standard; and sequentially encoding the categorized plurality of B frames.
[2] 2. The method of claim 1, wherein the predetermined standard comprises a number of frames to which each of the plurality of B frames refers.
[3] 3. The method of claim 1, wherein the predetermined standard comprises a number of frames to which each of the plurality of B frames refers and view positions of the frames.
[4] 4. The method of claim 1, wherein the plurality of B frames are categorized into a first group of B frames which are predicted with reference to two horizontally adjacent frames, two vertically adjacent frames or one horizontally adjacent frame and one vertically adjacent frame, a second group of B frames which are predicted with reference to two horizontally adjacent frames and one vertically adjacent frame or one horizontally adjacent frame and two vertically adjacent frames, and a third group of B frames which are predicted with reference to two horizontally adjacent frames and two vertically adjacent frames, and wherein the one or two horizontally adjacent frames are a frame or frames obtained from the multiview video at a same temporal level as a B frame of the first group, the second group or the third group, and the one or two vertically adjacent frames are a frame or frames obtained from the multiview video at a same view position as the B frame of the first group, the second group or the third group.
[5] 5. The method of claim 4, wherein the sequential encoding of the categorized plurality of B frames comprises sequentially encoding the first group of B frames, the second group of B frames, and the third group of B frames.
[6] 6. The method of claim 1, wherein the sequential encoding is performed based on a video encoding structure comprising the plurality of B frames, wherein the sequential encoding comprises performing disparity estimation between frames disposed horizontally according to a plurality of views and performing motion estimation between frames disposed vertically according to passage of time, wherein the video encoding structure is extendible at least one of horizontal and vertical directions, and wherein the frames disposed horizontally are frames obtained from the multiview video at a same temporal level, and the frames disposed vertically are frames obtained from the multiview video at a same view position.
[7] 7. The method of claim 6, wherein the plurality of views comprises n views, where n is an odd natural number.
[8] 8. The method of claim 7, wherein frames obtained at an (n-l)th view are not used for the disparity estimation and the motion estimation.
[9] 9. The method of claim 7, wherein, in the video encoding structure, frames obtained at views except a first view, among the plurality of views, do not include an I frame, and frames obtained at a k-th view comprises only B frames, where k is an even natural number which is smaller than n.
[10] 10. An apparatus for encoding a multiview video, the apparatus comprising: a prediction unit which predicts a disparity vector and a motion vector of an input multiview video; a disparity and motion compensation unit which compensates an image using the disparity vector and the motion vector; a residual image encoding unit which receives the input multiview video and the compensated image generated by the disparity and motion compensation unit, subtracts the compensated image from the original image, and encodes a residual image obtained from the subtraction; and an entropy-encoding unit which generates a bit stream for the multiview video using the disparity vector, the motion vector, and the encoded residual image, wherein the prediction unit categorizes a plurality of B frames into at least two groups according to a predetermined standard and sequentially predicts the categorized plurality of B frames.
[11] 11. The apparatus of claim 10, wherein the predetermined standard comprises a number of frames to which each of the plurality of B frames refers.
[12] 12. The apparatus of claim 10, wherein the predetermined standard comprises a number of frames to which each of the plurality of B frames refers and view positions of the frames.
[13] 13. The apparatus of claim 10, wherein the plurality of B frames are categorized into a first group of B frames which are predicted with reference to two horizontally adjacent frames, two vertically adjacent frames or one horizontally adjacent frame and one vertically adjacent frame, a second group of B frames which are predicted with reference to two horizontally adjacent frames and one vertically adjacent frame or one horizontally adjacent frame and two vertically adjacent frames, and a third group of B frames which are predicted with reference to two horizontally adjacent frames and two vertically adjacent frames, wherein the one or two horizontally adjacent frames are a frame or frames obtained from the multiview video at a same temporal level as a B frame of the first group, the second group or the third group, and the one or two vertically adjacent frames are a frame or frames obtained from the multiview video at a same view position as the B frame of the first group, the second group or the third group.
[14] 14. The apparatus of claim 13, wherein the prediction unit sequentially predicts the first group of B frames, the second group of B frames, and the third group of B frames.
[15] 15. The apparatus of claim 10, wherein the prediction unit predicts the disparity vector and the motion vector of an input multiview video based on a video encoding structure comprising the plurality of B frames, wherein the prediction unit further performs disparity estimation between frames disposed horizontally according to a plurality of views and performs motion estimation between frames disposed vertically according to passage of time, wherein the video encoding structure is extendible at least one of horizontal and vertical directions, and wherein the frames disposed horizontally are frames obtained from the multiview video at a same temporal level, and the frames disposed vertically are frames obtained from the multiview video at a same view position.
[16] 16. The apparatus of claim 15, wherein the plurality of views comprises n views, where n is an odd natural number.
[17] 17. The apparatus of claim 16, wherein frames obtained at an (n-l)th view are not used for the disparity estimation and the motion estimation.
[18] 18. The apparatus of claim 16, wherein, in the video encoding structure, frames obtained at views except a first view, among the plurality of views, do not include an I frame, and frames obtained at a k-th view comprises only B frames, where k is an even natural number which is smaller than n.
[19] 19. A computer-readable recording medium storing a method of encoding a multiview video, the method comprising: categorizing a plurality of B frames into at least two groups according to a predetermined standard; and sequentially encoding the categorized plurality of B frames.
[20] 20. The computer-readable recording medium of claim 19, wherein the predetermined standard comprises a number of frames to which each of the plurality of B frames refers.
[21] 21. The computer-readable recording medium of claim 19, wherein the pre- determined standard comprises a number of frames to which each of the plurality of B frames refers and view positions of the frames.
[22] 22. The computer-readable recording medium of claim 19, wherein the plurality of B frames are categorized into a first group of B frames which are predicted with reference to two horizontally adjacent frames, two vertically adjacent frames or one horizontally adjacent frame and one vertically adjacent frame, a second group of B frames which are predicted with reference to two horizontally adjacent frames and one vertically adjacent frame or one horizontally adjacent frame and two vertically adjacent frames, and a third group of B frames which are predicted with reference to two horizontally adjacent frames and two vertically adjacent frames, and wherein the one or two horizontally adjacent frames are a frame or frames obtained from the multiview video at a same temporal level as a B frame of the first group, the second group or the third group, and the one or two vertically adjacent frames are a frame or frames obtained from the multiview video at a same view position as the B frame of the first group, the second group or the third group.
[23] 23. The computer-readable recording medium of claim 22, wherein the sequential encoding of the categorized plurality of B frames comprises sequentially encoding the first group of B frames, the second group of B frames, and the third group of B frames.
[24] 24. The computer-readable recording medium of claim 19, wherein the sequential encoding is performed based on a video encoding structure comprising the plurality of B frames, wherein the sequential encoding comprises performing disparity estimation between frames disposed horizontally according to a plurality of views and performing motion estimation between frames disposed vertically according to passage of time, wherein the video encoding structure is extendible at least one of horizontal and vertical directions, and wherein the frames disposed horizontally are frames obtained from the multiview video at a same temporal level, and the frames disposed vertically are frames obtained from the multiview video at a same view position.
[25] 25. The computer-readable recording medium of claim 24, wherein the plurality of views comprises n views, where n is an odd natural number.
[26] 26. The computer-readable recording medium of claim 25, wherein frames obtained at an (n-l)th view are not used for the disparity estimation and the motion estimation. [27] 27. The computer-readable recording medium of claim 25, wherein, in the video encoding structure, frames obtained at views except a first view, among the plurality of views, do not include an I frame, and frames obtained at a k-th view comprises only B frames, where k is an even natural number which is smaller than n.
PCT/KR2006/003268 2005-08-22 2006-08-19 Method and apparatus for encoding multiview video WO2007024072A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2008527838A JP2009505604A (en) 2005-08-22 2006-08-19 Method and apparatus for encoding multi-view video
EP06783669A EP1917814A4 (en) 2005-08-22 2006-08-19 Method and apparatus for encoding multiview video
CN200680030315.4A CN101243692B (en) 2005-08-22 2006-08-19 Method and apparatus for encoding multiview video
MX2008002391A MX2008002391A (en) 2005-08-22 2006-08-19 Method and apparatus for encoding multiview video.

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US70981105P 2005-08-22 2005-08-22
US60/709,811 2005-08-22
KR10-2005-0105728 2005-11-05
KR1020050105728A KR100728009B1 (en) 2005-08-22 2005-11-05 Method and apparatus for encoding multiview video

Publications (1)

Publication Number Publication Date
WO2007024072A1 true WO2007024072A1 (en) 2007-03-01

Family

ID=37771787

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2006/003268 WO2007024072A1 (en) 2005-08-22 2006-08-19 Method and apparatus for encoding multiview video

Country Status (5)

Country Link
EP (1) EP1917814A4 (en)
JP (1) JP2009505604A (en)
CN (1) CN101243692B (en)
MX (1) MX2008002391A (en)
WO (1) WO2007024072A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009045032A1 (en) * 2007-10-05 2009-04-09 Electronics And Telecommunications Research Institute Encoding and decoding method for single-view video or multi-view video and apparatus thereof
CN102006480A (en) * 2010-11-29 2011-04-06 清华大学 Method for coding and decoding binocular stereoscopic video based on inter-view prediction
KR101396948B1 (en) 2007-03-05 2014-05-20 경희대학교 산학협력단 Method and Equipment for hybrid multiview and scalable video coding
KR101433168B1 (en) 2014-04-10 2014-08-27 경희대학교 산학협력단 Method and Equipment for hybrid multiview and scalable video coding
JP2015084559A (en) * 2009-12-21 2015-04-30 アルカテル−ルーセント Method and structure for encoding moving image
US10158885B2 (en) 2013-07-24 2018-12-18 Qualcomm Incorporated Simplified advanced motion prediction for 3D-HEVC
US10567799B2 (en) 2014-03-07 2020-02-18 Qualcomm Incorporated Simplified sub-prediction unit (sub-PU) motion parameter inheritance (MPI)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102404577A (en) * 2011-12-01 2012-04-04 无锡太行电子技术有限公司 Memory method for 3D (three-dimensional) video code
KR101966920B1 (en) * 2012-07-10 2019-04-08 삼성전자주식회사 Method and apparatus for estimating motion of image using disparity information of multi view image
WO2014075236A1 (en) 2012-11-14 2014-05-22 Mediatek Singapore Pte. Ltd. Methods for residual prediction with pseudo residues in 3d video coding
CN104782128B (en) * 2012-11-14 2017-10-24 寰发股份有限公司 Method and its device for three-dimensional or multidimensional view Video coding
CN105359529B (en) * 2013-07-16 2018-12-07 寰发股份有限公司 For three-dimensional or multi-view video coding method and device
WO2015006922A1 (en) * 2013-07-16 2015-01-22 Mediatek Singapore Pte. Ltd. Methods for residual prediction
CN105393539B (en) * 2013-07-24 2019-03-29 高通股份有限公司 The sub- PU motion prediction decoded for texture and depth

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5619256A (en) * 1995-05-26 1997-04-08 Lucent Technologies Inc. Digital 3D/stereoscopic video compression technique utilizing disparity and motion compensated predictions
WO2003056843A1 (en) * 2001-12-28 2003-07-10 Electronics And Telecommunications Research Institute Stereoscopic video encoding/decoding apparatuses supporting multi-display modes and methods thereof
US20030202592A1 (en) 2002-04-20 2003-10-30 Sohn Kwang Hoon Apparatus for encoding a multi-view moving picture
WO2004021711A1 (en) * 2002-08-30 2004-03-11 Electronics And Telecommunications Research Institute Multi-display supporting multi-view video object-based encoding apparatus and method, and object-based transmission/reception system and method using the same

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0639031A3 (en) * 1993-07-09 1995-04-05 Rca Thomson Licensing Corp Method and apparatus for encoding stereo video signals.
JPH09261653A (en) * 1996-03-18 1997-10-03 Sharp Corp Multi-view-point picture encoder

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5619256A (en) * 1995-05-26 1997-04-08 Lucent Technologies Inc. Digital 3D/stereoscopic video compression technique utilizing disparity and motion compensated predictions
WO2003056843A1 (en) * 2001-12-28 2003-07-10 Electronics And Telecommunications Research Institute Stereoscopic video encoding/decoding apparatuses supporting multi-display modes and methods thereof
US20030202592A1 (en) 2002-04-20 2003-10-30 Sohn Kwang Hoon Apparatus for encoding a multi-view moving picture
WO2004021711A1 (en) * 2002-08-30 2004-03-11 Electronics And Telecommunications Research Institute Multi-display supporting multi-view video object-based encoding apparatus and method, and object-based transmission/reception system and method using the same

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Survey of Algorithms used for Multi-view Video Coding", MVC'', ITU STUDY GROUP 16 - VIDEO CODING EXPERTS GROUP - ISO/IEC MPEG & ITU-T VCEG ISO/IEC JTC1/SC29/WG11 AND ITU-T SG16 Q6, 21 January 2005 (2005-01-21)
See also references of EP1917814A4 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101396948B1 (en) 2007-03-05 2014-05-20 경희대학교 산학협력단 Method and Equipment for hybrid multiview and scalable video coding
WO2009045032A1 (en) * 2007-10-05 2009-04-09 Electronics And Telecommunications Research Institute Encoding and decoding method for single-view video or multi-view video and apparatus thereof
JP2015084559A (en) * 2009-12-21 2015-04-30 アルカテル−ルーセント Method and structure for encoding moving image
CN102006480A (en) * 2010-11-29 2011-04-06 清华大学 Method for coding and decoding binocular stereoscopic video based on inter-view prediction
CN102006480B (en) * 2010-11-29 2013-01-30 清华大学 Method for coding and decoding binocular stereoscopic video based on inter-view prediction
US10158885B2 (en) 2013-07-24 2018-12-18 Qualcomm Incorporated Simplified advanced motion prediction for 3D-HEVC
US10567799B2 (en) 2014-03-07 2020-02-18 Qualcomm Incorporated Simplified sub-prediction unit (sub-PU) motion parameter inheritance (MPI)
KR101433168B1 (en) 2014-04-10 2014-08-27 경희대학교 산학협력단 Method and Equipment for hybrid multiview and scalable video coding

Also Published As

Publication number Publication date
CN101243692B (en) 2010-05-26
JP2009505604A (en) 2009-02-05
MX2008002391A (en) 2008-03-18
EP1917814A1 (en) 2008-05-07
EP1917814A4 (en) 2011-04-13
CN101243692A (en) 2008-08-13

Similar Documents

Publication Publication Date Title
US20070041443A1 (en) Method and apparatus for encoding multiview video
US20070104276A1 (en) Method and apparatus for encoding multiview video
US8644386B2 (en) Method of estimating disparity vector, and method and apparatus for encoding and decoding multi-view moving picture using the disparity vector estimation method
WO2007024072A1 (en) Method and apparatus for encoding multiview video
Ho et al. Overview of multi-view video coding
US6055012A (en) Digital multi-view video compression with complexity and compatibility constraints
US5619256A (en) Digital 3D/stereoscopic video compression technique utilizing disparity and motion compensated predictions
US9131247B2 (en) Multi-view video coding using scalable video coding
KR100760258B1 (en) Apparatus for Universal Coding for Multi-View Video
US5612735A (en) Digital 3D/stereoscopic video compression technique utilizing two disparity estimates
KR101227601B1 (en) Method for interpolating disparity vector and method and apparatus for encoding and decoding multi-view video
EP1864498B1 (en) Scalable multi-view image encoding and decoding apparatuses and methods
KR100481732B1 (en) Apparatus for encoding of multi view moving picture
JP5059766B2 (en) Disparity vector prediction method, and method and apparatus for encoding and decoding a multi-view video using the method
Kim et al. Fast disparity and motion estimation for multi-view video coding
EP2859724B1 (en) Method and apparatus of adaptive intra prediction for inter-layer coding
Lim et al. A multiview sequence CODEC with view scalability
EP2156668A1 (en) Method and apparatus for generating block-based stereoscopic image format and method and apparatus for reconstructing stereoscopic images from block-based stereoscopic image format
US9686567B2 (en) Method and apparatus for integrated encoding/decoding of different multilayer video codec
JP2012028960A (en) Image decoding device, image decoding method and image decoding program
Lim et al. Motion/disparity compensated multiview sequence coding
Ekmekcioglu Advanced three-dimensional multi-view video coding and evaluation techniques

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200680030315.4

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2006783669

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: MX/a/2008/002391

Country of ref document: MX

WWE Wipo information: entry into national phase

Ref document number: 2008527838

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE