WO2008051041A1 - Multi-view video scalable coding and decoding - Google Patents

Multi-view video scalable coding and decoding Download PDF

Info

Publication number
WO2008051041A1
WO2008051041A1 PCT/KR2007/005294 KR2007005294W WO2008051041A1 WO 2008051041 A1 WO2008051041 A1 WO 2008051041A1 KR 2007005294 W KR2007005294 W KR 2007005294W WO 2008051041 A1 WO2008051041 A1 WO 2008051041A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
temporal
scalable
spatial
adjacent
Prior art date
Application number
PCT/KR2007/005294
Other languages
French (fr)
Inventor
Sea-Nae Park
Dong-Gyu Sim
Jung-Hak Nam
Suk-Hee Cho
Hyoung-Jin Kwon
Nam-Ho Hur
Jin-Woong Kim
Soo-In Lee
Original Assignee
Electronics And Telecommunications Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electronics And Telecommunications Research Institute filed Critical Electronics And Telecommunications Research Institute
Priority to JP2009534496A priority Critical patent/JP5170786B2/en
Publication of WO2008051041A1 publication Critical patent/WO2008051041A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/647Control signaling between network components and server or clients; Network processes for video distribution between server and clients, e.g. controlling the quality of the video stream, by dropping packets, protecting content from unauthorised alteration within the network, monitoring of network load, bridging between two different networks, e.g. between IP and wireless
    • H04N21/64784Data processing by the network
    • H04N21/64792Controlling the complexity of the content stream, e.g. by dropping packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/187Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scalable video layer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/31Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the temporal domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/33Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the spatial domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
    • H04N19/615Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding using motion compensated temporal filtering [MCTF]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/63Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/87Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving scene cut or scene change detection in combination with video compression
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]

Definitions

  • the present invention relates to a multiview video scalable coding and decoding technology; and, more particularly, to a multiview video scalable coding and decoding apparatus and method for compressing and transmitting multiview video using a multilayer spatial and temporal scalable coding technology and for providing two-dimensional or three-dimensional video services to various types of video terminals.
  • data is compressed by removing temporal and spatial redundancy of data.
  • the spatial redundancy denotes the identical color or the same objects in video.
  • the temporal redundancy means adjacent pictures with almost no changes in moving pictures and repeated sound in audio.
  • the temporal redundancy is removed by temporal filtering based on motion compensation and the spatial redundancy is removed by spatial transformation.
  • the transmitting performance varies according to the types of the transmission mediums.
  • a scalable video coding technology was introduced to support the various speeds of transmission mediums and to transmit multimedia data at a transfer rate proper to a transmission environment.
  • the scalable video coding technology is one of coding technologies for controlling a resolution, a frame rate, and a signal-to-noise ratio (SNR) of video by cutting down a predetermined part of a compressed bit stream according to conditions, such as a transport bit rate, a transport error rate, and a system resource.
  • SNR signal-to-noise ratio
  • Fig. 1 is a diagram describing a scalable cod ⁇ ng technology according to a related art.
  • the scalable video coding technology performs temporal transform for realizing temporal scalable and performs two-dimensional spatial transform for realizing spatial scalable. Also, the scalable video coding technology realizes a quality scalability using texture coding.
  • the motion coding scalably encodes motion information when spatial scalable is realized. As described above, one bit stream is generated through such coding algorithms.
  • MCTF motion compensated temporal filtering
  • hierarchical B-pictures were used.
  • the MCTF performs wavelet transform using motion information in a clockwise direction in a video sequence.
  • the wavelet transform is performed using a lifting scheme.
  • the lifting scheme includes three processes, polyphase decomposition, prediction, and update.
  • the hierarchical B-pictures may be realized in various ways using a memory management control operation that manages a decoded picture buffer (DPB) for storing 16 pictures and the syntaxes of reference picture list reordering (RPLP) .
  • DPB decoded picture buffer
  • RPLP reference picture list reordering
  • the multiview video compression technology is a technology for simultaneously coding videos from a plurality of cameras that provide multiview video and compressing, storing, and transmitting the coded video. If the multiview video is stored and transmitted without being compressed, a large transmission bandwidth is required to transmit the multiview video to a user through a broadcasting network or a wired/wireless Internet in real-time.
  • each of video sequences is independently coded and transmitted and the transmitted coded video sequences are decoded. It is easily realized based on MPEG-1/2/4 or H.261/263/264. However, it is impossible to remove redundancy between videos, which is generated as the same object is photographed by a plurality of cameras .
  • a scalable video coding technology was introduced.
  • a single view point video is divided into video frames with multilayer resolutions in a spatial axis using a spatial filter, and a temporal and spatial scalable is performed on the divided video frames in a temporal axis through hierarchical bi-directional motion estimation.
  • quality scalability may be provided through entropy coding by hierarchical expression in transform coding.
  • An embodiment of the present invention is directed to providing a multiview video scalable coding method and apparatus for effectively compressing videos and providing various video services to terminals in diverse environments through motion estimation with reference to adjacent images at a temporal and spatial axis for compressing multiview video and through motions, differential images, and intra prediction in different resolutions of adjacent videos for providing scalability on a temporal and spatial axis in a multiview video.
  • Another embodiment of the present invention is directed to providing a scalable video decoding method and apparatus for receiving a scalable coded signal and decoding the received signal for multiview video.
  • a scalable video coding apparatus for a multiview video including: a basic scalability video encoder for separating one basic video to video frames with multilayer resolutions and performing scalable video coding through performing temporal and spatial prediction on the separated low resolution image frame and at least one of the separated high resolution video frames; and a plurality of extended scalability video encoders for receiving an own video and at least one of adjacent videos as reference videos which are captured at the same time, separating the received videos in video frames with multilayer resolutions through spatial filtering, performing scalable video coding for the separated low resolution image frame through temporal and spatial prediction with reference to the adjacent video frames at the same temporal axis as well as own adjacent frame, and performing scalable video coding for the separated high resolution video frame through temporal and spatial prediction with reference to lower layers of the adjacent video frames at the same temporal axis as well as an own lower layer.
  • a scalable video coding method for multiview video including the steps of: (a) separating one basic video to video frames with multilayer resolutions and performing scalable video coding through temporal and spatial prediction; and (b) receiving an own video and at least one of adjacent videos, which are captured at the same time, and performing scalable video coding through temporal and spatial prediction by separating the received videos into video frames with multilayer resolutions, wherein the step (b) of receiving an own video and at least one of adjacent videos includes the steps of: (c) performing scalable video coding for low resolution video frames through temporal and spatial prediction with reference to the adjacent video frames at the same temporal axis as well as own adjacent frames; and (d) performing scalable video coding for at least one of high resolution video frames through temporal and spatial prediction with reference to lower layers of the adjacent video frames as well as own lower layer.
  • a scalable video decoding apparatus for multiview video including: a basic scalability video decoder for receiving a bitstream generated by scalably coding one basic video and restoring a basic video through inverse temporal and inverse spatial transform; and a plurality of extended scalability video decoders for receiving a bitstream generated scalable-coded through temporal and spatial prediction for own video and reference videos, which are captured at the same time, restoring at least one of h ⁇ gh resolution image frames through inverse temporal and spatial prediction whether lowei layers of adjacent video frames that are reference video are referred as well as own lower layer or not, restoring a lower resolution image frame through inverse temporal and spatial prediction according to whether the adjacent video frames are referred or not at the same temporal axis as well as own adjacent frame, and restoring an image through inverse spatial filtering for the restored high resolution image frames and the restored low resolution image frame.
  • the extended scalability video decoder may include: a demultiplexing unit for demultiplexing a received bitstream; at least one of enhancement decoding unit for performing scalable decoding for a high resolution image signal outputted from the demultiplexing unit through inverse temporal and spatial motion estimation according to whether lower layers of adjacent videos that are reference videos are referred as well as a lower layer of own video or not; a basic layer decoding unit for performing scalable decoding for a low resolution image signal outputted from the demultiplexing unit through inverse motion estimation for reference video frames at a temporal axis as well as inverse temporal and spatial motion estimation for own video frame; and an inverse spatial video filtering unit for restoring an image through inverse spatial filtering for the restored high resolution images from the enhancement decoding unit and the restored low resolution image from the basic decoding unit .
  • the enhancement decoding unit may perform scalable decoding with reference to a flag indicating whether motion vector information of a lower layer is used or not, a flag indicating whether a reference index of a lower layer for adjacent video is used as prediction information or not, a flag indicating whether a type of an intra block of a lower layer is used as prediction information or not, a flag indicating whether a differential image value of a lower layer is used or not, and an index of a reference view used for prediction.
  • a scalable video decoding method for multiview video including the steps of: (a) performing scalable video decoding for one basic video through inverse temporal and spatial prediction; and (b) receiving a bitstream scalable-coded with reference to an own video and at least one of adjacent videos as reference videos, which are captured at the same time, and performing scalable video decoding through inverse temporal prediction and inverse spatial prediction, wherein the step (b) of receiving a bitstream scalable- coded with reference to an own video and at least one of adjacent videos as reference videos includes the steps of: (c) performing scalable video decoding for demultiplexed high resolution image signal through inverse temporal and spatial prediction according to whether lower layers of the adjacent video frames are referred as well as an own lower layer; and (d) performing scalable video decoding for demultiplexed low resolution image signal through inverse temporal and spatial prediction according to whether the adjacent video frames are referred at the same temporal axis as well as an own adjacent frame.
  • scalable decoding may be performed with reference to a flag indicating whether motion vector information of a lower layer is used or not, a flag indicating whether a reference index of a lower layer for adjacent video is used as prediction information or not, a flag indicating whether a type of an intra block of a lower layer is used as prediction information or not, a flag indicating whether a differential image value of a lower layer is used or not, and an index of a reference view used for prediction .
  • a multiview video can be effectively compressed by expanding the temporal and spatial hierarchical structure of a typical scalable coding technology to multiview videos.
  • a video service can be scalably provided to various 2-D or 3-D terminals by forming a hierarchical structure in a temporal and spatial axis for multiview video.
  • Fig. 1 is a diagram describing a scalable coding technology according to the related art.
  • Fig. 2 illustrates a scalable coding apparatus for multiview video in accordance with an embodiment of the present invention.
  • Fig. 3 is a block diagram illustrating an extended scalability video encoder in accordance with an embodiment of the present invention.
  • Fig. 4 describes a reference structure for predicting and referencing adjacent frames in scalable video coding according to an embodiment of the present invention.
  • Fig. 5 illustrates the reference structure of Fig. 4 with circle symbols and cross symbols in a spatial (view) layer axis with a time fixed.
  • Fig. 6 illustrates a reference structure for a B- frame structure.
  • Fig. 7 illustrates a reference structure of Fig. 6 with circle symbols and cross symbols in a spatial (view) layer axis with a time fixed.
  • Fig. 8 is a block diagram illustrating an apparatus for scalable coding multiview video in accordance with another embodiment of the present invention.
  • Fig. 9 describes a refererce structure for a basic video 0 (91) and adjacent videos 1 (92) and 2(93) in Fig. 8.
  • Fig. 2 illustrates a scalable coding apparatus for multiview video in accordance with an embodiment of the present invention.
  • Fig. 2 five videos 0 to 4 are input from five cameras and each of the input videos 0 to 4 compressed by a scalable video encoder.
  • the scalable coding apparatus includes a basic scalability video encoder 21 and extended scalability video encoders 22 to 25.
  • the video encoder 21 performs 2-D spatial transformation and temporal transformation on the video 0 which is a basic video.
  • the video encoder 21 also performs scalable coding through motion coding and texture coding.
  • Each of the extended scalability video encoders 22 to 25 receives not only own video which is assigned itself but also at least one of adjacent videos as reference video and separates the received videos into frames with multilayer resolutions through spatial filtering and temporal filtering.
  • each of the extended scalability video encoders 22 to 25 performs scalable coding on the separated frames with reference to temporal and spatial hierarchical image information and compression parameters of the adjacent videos as well as the own video.
  • the video 0 is defined as a basic video
  • the basic scalability video encoder 21 performs scalable coding on the video 0.
  • the basic scalability video encoder 21 has the same structure of a single viewpoint video scalable coding apparatus according to the related art. That is, the scalable coding apparatus according to the present embodiment has a structure compatible to typical scalable condition apparatuses for basic video.
  • each of the extended scalability video encoders 22 to 25 receives not only own video which is assigned to itself but also adjacent videos as a reference video, separates the received videos into frames with multilayer resolutions through spatial filtering, and performs scalable coding on the separated frames with reference to lower layers of the other videos assigned to neighbor encoders as well as that of the video assigned to itself. Also, the extended scalability encoder 23 that compresses the video 2 performs compression through bidirectional prediction using multilayer temporal and spatial resolution video information of the basic video 0 and the video 4.
  • the scalable coding apparatus can provide a typical 2-D video service using only the basic scalability video encoder 21. Also, the scalable coding apparatus according to the present embodiment can provide a stereo video service using the basic scalability video encoder 21 for the basic video 0 and the extended scalability video encoder 25 for the video 4. Furthermore, the scalable coding apparatus according to the present embodiment can provide a three-view video service or a five-view video service by selectively combining the basic scalability video encoder 21 with the extended scalability video encoders 22 to 25.
  • Fig. 3 is a block diagram illustrating an extended scalability video encoder in accordance with an embodiment of the present invention.
  • the extended scalability video encoder includes a spatial video filtering unit 31, temporal video filtering units 330 to 340 a basic layer encoder 33, at least one of enhancement layer encoders 34, and a multiplexer 35.
  • the spatial video filtering unit 31 separates an own video and reference videos into frames with multilayer resolutions through spatial filtering.
  • the temporal video filtering units 330 and 340 separate the output videos from the spatial video filtering unit 31 through temporal filtering.
  • the basic Layer encoder 33 performs scalable coding through not only temporal and spatial motion estimation for the own video frames of temporal low frequency images outputted from the temporal video filtering unit 330 but also through the motion estimation for the reference video frames on a temporal axis.
  • Each of the enhancement layer encoders 34 with reference to the lower layers of the reference videos as well as the lower layer of the own video for temporal high frequency images outputted from the temporal video filtering unit 340.
  • the multiplexer 35 outputs one bitstream by multiplexing outputs from the basic layer encoder 31 and the enhancement encoders 34.
  • the spatial video filtering unit 31 receives an own video assigned to itself, which is captured by an own camera, and the other videos captured by the other cameras as reference videos at a predetermined time interval and separates the received videos into frames with multilayer resolutions through spatial filtering based on MCTF or hierarchical B structure.
  • the basic layer encoder 33 and the enhancement layer encoder 34 may includes temporal video filtering units 330 and 340, motion encoders 331 and 341, subtractor 332 and 342, spatial transformers 333 and 343, quantizers 334 and 344, entropy encoders 335 and 345. As described above, the basic layer encoder 33 and the enhancement layer encoder 34 have the structure similar to a typical scalable video encoder.
  • the temporal video filtering unit 330 of the basic layer encoder 33 separates low frequency images, which are separated through spatial filtering, in a temporal axis through filtering based or MCTF or hierarchical B- structure.
  • ALso the temporal video filtering unit 340 of the enhancement layer encoder 34 separates the h ⁇ gh frequency images, which are separated through the spatial filtering, into a temporal axis through filtering based on MCTF or hierarchical B-structure.
  • the motion encoders 331 and 341 include a motion estimation block or a motion compensation block.
  • the motion estimation block performs motion estimation of a current frame using a reference frame as a basis and calculates a motion vector for forward motion estimation or bi-directional estimation.
  • the motion encoders 331 and 341 may use not only own frames but also peripheral frames as reference frames for motion estimation.
  • the motion encoders 331 and 341 use a block matching algorithm that is generally used for motion estimation. That is, the motion encoders 331 and 341 calculates displacement when an error becomes minimum while moving a given motion block in a predetermined search area of a reference frame and estimates the calculated displacement as the motion vector.
  • the motion encoders 331 and 341 provide motion data, such as motion vectors obtained as the result of motion estimation, a size of a motion block, and a reference frame number, to the entropy encoders 335 and 345. Also, the motion compensation block generates a temporal estimated frame for a current frame by performing the motion compensation for a forward reference frame, a backward reference frame, or a bi-directional reference frame using the calculated motion vector.
  • the subtractors 332 and 342 remove the temporal redundancy of a video by subtracting a current frame and a temporal estimated frame.
  • the spatial transformers 333 and 343 remove spatial redundancy from the temporal redundancy removed frame using a predetermined spatial transformation method that supports spatial scalability.
  • Discrete Cosine Transform (DCT) and wavelet transform are widely used.
  • the quantizers 334 and 344 quantize transform coefficients from the spatial transformers 333 and 343.
  • the quantization is a process of transforming the transform coefficient, which is expressed as a predetermined real number, to a discrete value by dividing the transform coefficient by predetermined periods and matching the discrete value to a predetermined index.
  • the entropy encoders 335 and 345 lossless-encode the quantized transform coefficient from the quantizers 334 and 344 and the motion data provided from the motion estimation block and generates an output bitstream.
  • arithmetic coding or variable length coding may be used.
  • intra prediction may be performed for an intra block before spatial transform.
  • the enhancement layer encoder may include a 2-D spatial interpolation block for receiving a restored reference frame from the lower layer encoder and performing two-dimensional (2-D) spatial interpolation and an intra prediction block for performing the intra prediction.
  • inter prediction searches a block most similar to a predetermined block of a current frame, obtains a predicted block that can express the current block best, and quantizes differences between the current block and the predicted block.
  • the inter prediction includes bi-directional prediction using two reference frames, forward prediction using a past reference frame, and backward prediction using a future reference frame.
  • the intra prediction predicts a current block using frames adjacent to the current block.
  • the intra prediction is different from the other because the intra prediction uses information in a current frame only and does not use the other frames in the same layer or frames of the other layer.
  • Intra base prediction may be used when a current frame includes frames of a lower layer having the same temporal location.
  • a macro block of a current frame can be effectively predicted from the macro blocks of a corresponding basic frame. That is, the difference between a macro block of a current block and a macro block of a corresponding basic frame is quantized.
  • the macro block of the basic frame is up-sampled to the resolution of the current layer before calculating the difference.
  • Residual prediction is the extension of the inter prediction from a single layer to multilayer.
  • the residual prediction calculates a difference between the difference obtained from the inter prediction of a current layer and the other different obtained from the inter prediction of a lower layer and quantizes the calculated difference.
  • the enhancement encoder uses a value obtained by multiplying two to a motion vector of a basic layer image that is a low resolution image for an own video and basic layer images that is a low resolution image for the other videos as reference videos to perform motion estimation when encoding high- resolution image frames.
  • the enhancement encoder performs differential image estimation by interpolating remaining images after predicting a basic layer image (low resolution image) of an own video and basic layer images (low resolution images) of the other videos as reference videos when encoding high resolution image frames .
  • the enhancement encoder performs intra prediction using a basic layer image that is a low resolution image of an own video and basic layer images that is a low resolution image of the other videos as reference videos in an intra prediction mode when encoding high resolution image frames.
  • Fig. 4 is a diagram illustrating a reference structure for predicting and referencing adjacent frames in scalable video coding according to an embodiment of the present invention.
  • a P macro block denotes single- directional prediction and a B macro block denotes a bidirectional prediction.
  • the reference structure according to the present embodiment allows single directional prediction and bi-directional prediction to be performed in a plurality of resolution layers as well as in a temporal axis and a spatial axis.
  • the reference structure according to the present embodiment includes a two-layer structure formed of one basic layer and an enhancement layer.
  • the reference structure may further include more enhancement layers.
  • a reference numeral 41 denotes a predicting and referencing operation for predicting and referencing adjacent frames, which is performed in a basic layer encoder and an enhancement layer encoder in a basic scalability video encoder 21 of Fig. 2.
  • a reference numeral 42 denotes a predicting and referencing operation which is performed in a basic layer encoder and an enhancement layer encoder in an extended scalability video encoder 22 for a video 1 of Fig. 2.
  • a reference numeral 43 denotes a predicting and referencing operation, which is performed in a basic layer encoder and an enhancement layer encoder in an extended scalability video encoder 23 for a video 2 of Fig. 2.
  • the basic layer 0 denotes a reference structure performed in each of the basic layer encoder of the basic scalability video encoder 21, the basic layer encoder of the extended scalability video encoder 22 for the video 1, and the basic layer encoder of the extended scalability video encoder 23 for the video 2.
  • the enhancement layer 1 Like the basic layer 0 (L) , the enhancement layer 1
  • (Ll) denotes a reference structure performed in each of the enhancement layer encoder of the basic scalability video encoder 21, the enhancement layer encoder of the extended scalability video encoder 22 for the video 1, and the enhancement layer encoder of the extended scalability video encoder 23 for the video 2.
  • the basic layer encoder of the basic scalability video encoder 21 performs a scalable video coding operation by predicting and referencing adjacent frames for own low resolution image frames in a temporal axis like the video encoder according to the related art.
  • the basic layer encoder of the extended scalability video encoder 22 for the video 1 performs bi- directional prediction for own frame using the frames of a video 0 and the frames o£ a video 2, which are reference video frames located at the same temporal axis.
  • the basic layer encoder of the extended scalability video encoder 23 for the video 2 perform single- directional prediction with reference to the basic video 0 and performs bi-directional prediction using the own frame at the same time.
  • each of macro blocks includes three circles or cross symbols for indicating whether a lower layer is referred or not.
  • a circle or a cross symbol in a middle row among the three symbols indicate whether a lower layer of an own video frame is referred or not
  • circle symbols or cross symbols in a top row or in a bottom row indicate whether lower layers of adjacent video frames are referred or not.
  • the enhancement layer encoder of the basic scalability video encoder 21 performs scalable video coding with reference to own frames of a lower layer like the encoder according to the related art.
  • the enhancement layer encoder of the extended scalability video encoder 22 for the video 1 performs bi-direction prediction with reference to lower layer frames of a video 0 and lower layer frames of a video 2, which are adjacent frames, as well as own lower layer frames.
  • the enhancement layer encoder of the extended scalability video encoder 23 for the video 2 performs prediction with reference to the lower layer frames of the basic video 0 as well as the own lower layer frame.
  • Fig. 5 illustrates the reference structure of Fig. 4 with circle symbols and cross symbols in a spatial (view) layer axis with a time fixed.
  • the reference structure includes three layers.
  • a reference numeral 51 denotes a reference structure for a video 0
  • a reference numeral 52 denotes a reference structure for a video 1
  • a reference numeral 53 denotes a reference structure for a video 2.
  • a coding operation is performed based on motion, differential images, and intra prediction used in the scalable video coding (SVC) according to the related art with reference to a lower layer of an own video only.
  • SVC scalable video coding
  • the macro block of an enhancement layer 2 for a video 1 includes all circle symbols
  • a coding operation is performed with reference to lower layers of adjacent videos as well as a lower layer of an own video.
  • the macro block of an enhancement layer 2 for a video 2 (53) includes circle symbols at the middle and left columns, a scalable video coding operation is performed with reference to the lower layer of an own video and the lower layer of a video 0.
  • a scalable video includes predetermined sentences that describe information about videos in the reference layer.
  • a sentence ref view Idx denotes a view number of reference video in a lower layer.
  • a flag base_mode_flag indicates whether the motion vector information of a lower layer :.s used for estimating a motion in a current block or not. If the flag base_mode_flag is 1, a variable ref_view_Idx must have a view number of a lower layer for indicating which motion vector information is used.
  • a flag base_mode_refinement_flag indicates whether or not the motion vector information of a lower layer is used for predicting a motion vector of a current block. Unlike the flag base_mode_flag, the reference index of a lower layer is also used as prediction information.
  • a variable ref_view_Idx must have a view number of a lower layer for indicating which motion vector and reference index information are used.
  • a flag intra_base_flag indicates a type of an intra block in a lower layer, which is used as prediction information of a current block. If the flag intra_base_ flag is 1, information about an intra prediction mode of a lower layer is used for a current block. Therefore, a variable ref_view_Idx must have a view number of a lower layer for indicating which intra block type information is used.
  • a flag residual_prediction _flag indicates whether a differential image value of a lower layer is used for predicting a differential image of a current block or not. If the flag residula_prediction_flag is 1, the differential image information of a lower layer is up- sampled. Also, the variable ref_view_Idx must have a view number of a lower layer for indicating which differential image information is used. Table 1 shows the above described sentences in the scalable video.
  • Fig. 6 illustrates a reference structure for a B- frame structure.
  • a reference numeral 61 denotes a reference structure for a video 0 that is a basic video
  • a reference numeral 62 denotes a reference structure for a video 1
  • a reference numeral 63 denotes a reference structure for a video 2.
  • Fig. 7 is a diagram il Lustrating a reference structure of E"ig. 6 with circle symbols and cross symbols in a spatial (view) layer axis with a time fixed.
  • the reference structure includes three layers.
  • a reference numeral 71 denotes a reference structure for a video 0 that is a basic video
  • a reference numeral 72 denotes a reference structure for a video 1
  • a reference numeral 73 denotes a reference structure for a video 2.
  • the video 0 which are basic videos 62 and 71 perform scalable video coding with reference to own lower layer frames only.
  • the video 1 and video 2 perform the scalable video coding with reference to the lower layer frames of adjacent videos as well as own lower layer frames.
  • Fig. 8 is a block diagram i Llustrating an apparatus for scalable coding multiview video in accordance with another embodiment of the present invention.
  • Fig. 9 is a diagram illustrating a reference structure for a basic video 0 (91) and adjacent videos 1 (92) and 2(93) in Fig. 8.
  • a basic scalability video encoder 81 performs scalable coding for the basic video 0.
  • the basic scalability video encoder 81 refers own lower layer frames like a single viewpoint video scalable coding apparatus. Therefore, the scalable coding apparatus according to another embodiment can be compatible with existing scalable coding apparatus.
  • Extended scalability video encoders 82 to 85 perform scalable video coding for the videos 1 to 4. Each of the extended scalability video encoders 82 to 85 separates video into frames with multilayer resolution, performs temporal and spatial prediction for the separated video frames, and performs compression with reference to temporal and spatial layer image information of adjacent videos and compression parameters.
  • the enhancement layer 1 (92) of the video 1 performs scalable video coding with reference to a lower layer of the video 0, and the enhancement layer 2 (93) of the video 2 performs scalable video coding with reference to a lower layer of the video 1.
  • the extended scalability video encoders perform scalable video coding with reference to one next video only in the scalable coding apparatus according to another embodiment as shown in Fig. 8.
  • the scalable coding apparatus can provide a 2-D video service using the basic scalability video encoder 81. Also, the scalable coding apparatus according to another embodiment can provide a stereo video service using the basic scalability video encoder 81 for the basic video 0 and the extended scalability video encoder 82 for the video 1. Furthermore, the scalable coding apparatus according to another embodiment can provide t.hree-view service using the basic scalability video encoder 81 for the basic video 0 and the extended scalability video encoders 82 and 84 for the videos 1 and 3. The scalable coding apparatus according to another embodiment can provide a five-view service using the basic scalability video encoder 81 for the basic video 0 and the extended scalability video encoders 82 and 85 for the videos 1 and 4.
  • the scalable coding technology for multiview video according to the present invention will be briefly described again.
  • one of basic videos is separated into frames with multilayer resolutions using a spatial filter m a spatial axis.
  • a spatial and temporal scalable video coding operation as performed on the separated low resolution image frames through motion estimation in a temporal axis.
  • the spat ⁇ al and temporal scalable video coding operation is performed on the separated hagh resolution VLdeo frames through hierarchical motion estimation in a temporal axis with reference to a lower layer.
  • a bitstream is generated by multiplexing the coded low resolution image frame and at least one of the coded high resolution video frames.
  • the scalable video coding for basic video according to bhe present embodiment is identical to that according to the related art.
  • own video is received with at least one of adjacent videos as reference videos.
  • the received own video and adjacent videos are separated into video frames with multilayer resolutions using a spatial filter m a spatial axis.
  • the temporal and spatial scalable video coding is performed on the separated low resolution image frame through hierarchical motion estimation with reference to adjacent frames as reference frames as well as own frame in a temporal axis.
  • the temporal and spatial scalable video coding is performed on t he separated high resolution image frame through hierarchical motion estimation with reference to lower layers of adjacent video frames as well as a lower layer of the own video frame in a temporal axis.
  • a bitstream is generated by multiplexing the coded Low resolution image frame and at least one of the coded high resolution video frames.
  • the extended scalable video coding uses not only the own lower layer frames but also adjacent lower layer frames as reference frames unlike the scalable video coding for single viewpoint video, which uses an adjacent frame and a lower layer frame thereof .
  • a scalable video decoding apparatus for multiview video performs the operations of the scalable video encoding apparatus according to the present embodiment in a reverse order.
  • the scalable video decoding apparatus includes a basic scalability video decoder and a plurality of extended scalability video decoders.
  • the basic scalability video decoder receives a bitstream generated by scalable-coding one basic video and restoring the basic video through inverse temporal transformation and inverse spatial transformation.
  • Each of the extended scalability video decoders receives a bitstream generated by scalable-coding own video and reference videos, which are captured at the same time through the temporal and spatial prediction.
  • one of the extended scalability video decoders restores at least one of high resolution video frame through inverse temporal and spatial prediction according to whether a lower layer of adjacent video frame is referred as well as an own lower layer and restores one low resolution image frame through inverse temporal and spatial prediction according to whether the adjacent video frames at the same temporal axis are referenced as well as the own adjacent frame. Then, one of the extended scalability video decoders restores video through performing inverse spatial filtering on the restored high resolution video frames and the restored low resolution image frame .
  • the basic scalability video decoder has a structure identical to that of a typical scalable video decoder. Therefore, the detail description thereof is omitted.
  • the extended scalability video decoder includes a demultiplexer, at least one of enhancement layer decoders, a basic layer decoder, and an inverse spatial video filtering unit.
  • the demultiplexer demultiplexes the received bitstream.
  • Each of the enhancement layer decoder performs scalable decoding on high resolution video signal outputted from the demultiplexer through inverse temporal and spatial motion estimation according to whether adjacent videos are referred as well as a lower layer of own video.
  • the basic layer decoder performs scalable decoding on low resolution image signal outputted from the demultiplexer through inverse temporal and spatial motion estimate on not only through inverse temporal and spatial motion estimation for own video frame but also inverse motion estimation for reference video frames on a temporal axis.
  • the inverse spatial video filtering unit restores a video through performing inverse spatial filtering on the restored high resolution video frame from the enhancement decoder and on the restored low resolution image frame from the basic layer decoder .
  • the basic layer decoder and the enhancement layer decoder perform operations of the basic layer encoder and the enhancement layer encoder in an inverse order. Therefore, the detail descriptions of the basic layer decoder and the enhancement layer decoder are omitted.
  • the enhancement decoder When the enhancement decoder performs a decoding operation for a high resolution video signal, the enhancement decoder refers a flag indicting whether the motion vector information of a lower layer is used or not, a flag indicating whether a reference index of a lower layer for adjacent video is used as prediction information or not, a flag indicating whether a type of an intra block of a lower layer is used as prediction information or not, a flag indicating whether a differential video value of a lower layer is used or not, and an index of a reference view used for prediction.
  • the technology of the present invention can be realized as a program and stored in a computer-readable recording medium, such as CD-ROM, RAM, ROM, floppy disk, hard disk and magneto-optical disk. Since the process can be easily implemented by those skilled in the art of the present invention, further description will not be provided herein.
  • multiview video can be effectively compressed by expanding a temporal and spatial hierarchical structure of a typical scalable coding technology to the multiview video.
  • a video service can be scalably provided to various types of 2-D or 3-D terminals by forming a hierarchical structure on a temporal and spatial axis for the multiview video according to the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Security & Cryptography (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Provided are a scalable video coding and decoding apparatus and method for multiview video. The apparatus includes a basic scalability video encoder separating one basic video to video frames and performing scalable video coding through temporal and spatial prediction, and multiple extended scalability video encoders for receiving an own video and one or more adjacent videos as reference videos captured simultaneously, separating the received videos in video frames, performing scalable video coding for the separated low resolution image frame through temporal and spatial prediction with reference to the adjacent video frames at same temporal axis as well as own adjacent frame, and performing scalable video coding for the separated high resolution video frame through temporal and spatial prediction referring to lower layers of the adjacent video frames at same temporal axis as well as an own lower layer.

Description

DESCRIPTION
MULTI-VIEW VIDEO SCALABLE CODING AND DECODING
TECHNICAL FIELD The present invention relates to a multiview video scalable coding and decoding technology; and, more particularly, to a multiview video scalable coding and decoding apparatus and method for compressing and transmitting multiview video using a multilayer spatial and temporal scalable coding technology and for providing two-dimensional or three-dimensional video services to various types of video terminals.
This work was supported by the IT R&D program of MIC/IITA [2005-S-403-02, "Development of Super- intelligent Multimedia Anytime-anywhere Realistic TV (SmarTV) Technology"] .
BACKGROUND ART
In general, data is compressed by removing temporal and spatial redundancy of data. The spatial redundancy denotes the identical color or the same objects in video. The temporal redundancy means adjacent pictures with almost no changes in moving pictures and repeated sound in audio. In a typical video coding method, the temporal redundancy is removed by temporal filtering based on motion compensation and the spatial redundancy is removed by spatial transformation.
In order to transmit multimedia data generated after removing data redundancy, various transmission mediums were introduced. The transmitting performance varies according to the types of the transmission mediums. Also, a scalable video coding technology was introduced to support the various speeds of transmission mediums and to transmit multimedia data at a transfer rate proper to a transmission environment. The scalable video coding technology is one of coding technologies for controlling a resolution, a frame rate, and a signal-to-noise ratio (SNR) of video by cutting down a predetermined part of a compressed bit stream according to conditions, such as a transport bit rate, a transport error rate, and a system resource.
Fig. 1 is a diagram describing a scalable codαng technology according to a related art.
Referring to Fig. 1, the scalable video coding technology according to the related art performs temporal transform for realizing temporal scalable and performs two-dimensional spatial transform for realizing spatial scalable. Also, the scalable video coding technology realizes a quality scalability using texture coding. The motion coding scalably encodes motion information when spatial scalable is realized. As described above, one bit stream is generated through such coding algorithms.
In order to provide the temporal scalability and improve a compression rate in the scalable video coding, motion compensated temporal filtering (MCTF) and hierarchical B-pictures were used.
The MCTF performs wavelet transform using motion information in a clockwise direction in a video sequence. The wavelet transform is performed using a lifting scheme. The lifting scheme includes three processes, polyphase decomposition, prediction, and update.
The hierarchical B-pictures may be realized in various ways using a memory management control operation that manages a decoded picture buffer (DPB) for storing 16 pictures and the syntaxes of reference picture list reordering (RPLP) .
Recently, due to advances in technologies and demands of users, researchers are studying to develop a service for providing video information for scenes at diverse viewpoints, and a service allowing viewers to edit video information transmitted from a broadcasting station and watch desired video among the video information. In order to provide the services, a technology for compressing multiview video is required. The multiview video compression technology is a technology for simultaneously coding videos from a plurality of cameras that provide multiview video and compressing, storing, and transmitting the coded video. If the multiview video is stored and transmitted without being compressed, a large transmission bandwidth is required to transmit the multiview video to a user through a broadcasting network or a wired/wireless Internet in real-time.
In the multiview video coding and decoding technology, each of video sequences is independently coded and transmitted and the transmitted coded video sequences are decoded. It is easily realized based on MPEG-1/2/4 or H.261/263/264. However, it is impossible to remove redundancy between videos, which is generated as the same object is photographed by a plurality of cameras .
In order to remove the redundancy between videos, a scalable video coding technology was introduced. In the scalable video coding technology for a single view point video, a single view point video is divided into video frames with multilayer resolutions in a spatial axis using a spatial filter, and a temporal and spatial scalable is performed on the divided video frames in a temporal axis through hierarchical bi-directional motion estimation. Also, quality scalability may be provided through entropy coding by hierarchical expression in transform coding.
However, since the scalable video coding technology was designed for a single viewpoint video, a large overhead may be generated in a video decoder because of a high transport rate when a terminal reproduces three- dimensional videos with selective two-dimensional videos.
DISCLOSURE TECHNICAL PROBLEM
An embodiment of the present invention is directed to providing a multiview video scalable coding method and apparatus for effectively compressing videos and providing various video services to terminals in diverse environments through motion estimation with reference to adjacent images at a temporal and spatial axis for compressing multiview video and through motions, differential images, and intra prediction in different resolutions of adjacent videos for providing scalability on a temporal and spatial axis in a multiview video.
Another embodiment of the present invention is directed to providing a scalable video decoding method and apparatus for receiving a scalable coded signal and decoding the received signal for multiview video. Other objects and advantages of the present invention can be understood by the following description, and become apparent with reference to the embodiments of the present invention. Also, it is obvious to those skilled in the art of the present invention that the objects and advantages of the present invention can be realized by the means as claimed and combinations thereof.
TECHNICAL SOLUTION In accordance with an aspect of the present invention, there is provided a scalable video coding apparatus for a multiview video including: a basic scalability video encoder for separating one basic video to video frames with multilayer resolutions and performing scalable video coding through performing temporal and spatial prediction on the separated low resolution image frame and at least one of the separated high resolution video frames; and a plurality of extended scalability video encoders for receiving an own video and at least one of adjacent videos as reference videos which are captured at the same time, separating the received videos in video frames with multilayer resolutions through spatial filtering, performing scalable video coding for the separated low resolution image frame through temporal and spatial prediction with reference to the adjacent video frames at the same temporal axis as well as own adjacent frame, and performing scalable video coding for the separated high resolution video frame through temporal and spatial prediction with reference to lower layers of the adjacent video frames at the same temporal axis as well as an own lower layer.
In accordance with another aspect of the present invention, there is provided a scalable video coding method for multiview video, including the steps of: (a) separating one basic video to video frames with multilayer resolutions and performing scalable video coding through temporal and spatial prediction; and (b) receiving an own video and at least one of adjacent videos, which are captured at the same time, and performing scalable video coding through temporal and spatial prediction by separating the received videos into video frames with multilayer resolutions, wherein the step (b) of receiving an own video and at least one of adjacent videos includes the steps of: (c) performing scalable video coding for low resolution video frames through temporal and spatial prediction with reference to the adjacent video frames at the same temporal axis as well as own adjacent frames; and (d) performing scalable video coding for at least one of high resolution video frames through temporal and spatial prediction with reference to lower layers of the adjacent video frames as well as own lower layer.
In accordance with another aspect of the present invention, there is provided a scalable video decoding apparatus for multiview video including: a basic scalability video decoder for receiving a bitstream generated by scalably coding one basic video and restoring a basic video through inverse temporal and inverse spatial transform; and a plurality of extended scalability video decoders for receiving a bitstream generated scalable-coded through temporal and spatial prediction for own video and reference videos, which are captured at the same time, restoring at least one of hα gh resolution image frames through inverse temporal and spatial prediction whether lowei layers of adjacent video frames that are reference video are referred as well as own lower layer or not, restoring a lower resolution image frame through inverse temporal and spatial prediction according to whether the adjacent video frames are referred or not at the same temporal axis as well as own adjacent frame, and restoring an image through inverse spatial filtering for the restored high resolution image frames and the restored low resolution image frame. The extended scalability video decoder may include: a demultiplexing unit for demultiplexing a received bitstream; at least one of enhancement decoding unit for performing scalable decoding for a high resolution image signal outputted from the demultiplexing unit through inverse temporal and spatial motion estimation according to whether lower layers of adjacent videos that are reference videos are referred as well as a lower layer of own video or not; a basic layer decoding unit for performing scalable decoding for a low resolution image signal outputted from the demultiplexing unit through inverse motion estimation for reference video frames at a temporal axis as well as inverse temporal and spatial motion estimation for own video frame; and an inverse spatial video filtering unit for restoring an image through inverse spatial filtering for the restored high resolution images from the enhancement decoding unit and the restored low resolution image from the basic decoding unit .
The enhancement decoding unit may perform scalable decoding with reference to a flag indicating whether motion vector information of a lower layer is used or not, a flag indicating whether a reference index of a lower layer for adjacent video is used as prediction information or not, a flag indicating whether a type of an intra block of a lower layer is used as prediction information or not, a flag indicating whether a differential image value of a lower layer is used or not, and an index of a reference view used for prediction.
In accordance with another aspect of the present invention, there is provided a scalable video decoding method for multiview video, including the steps of: (a) performing scalable video decoding for one basic video through inverse temporal and spatial prediction; and (b) receiving a bitstream scalable-coded with reference to an own video and at least one of adjacent videos as reference videos, which are captured at the same time, and performing scalable video decoding through inverse temporal prediction and inverse spatial prediction, wherein the step (b) of receiving a bitstream scalable- coded with reference to an own video and at least one of adjacent videos as reference videos includes the steps of: (c) performing scalable video decoding for demultiplexed high resolution image signal through inverse temporal and spatial prediction according to whether lower layers of the adjacent video frames are referred as well as an own lower layer; and (d) performing scalable video decoding for demultiplexed low resolution image signal through inverse temporal and spatial prediction according to whether the adjacent video frames are referred at the same temporal axis as well as an own adjacent frame.
In the performing scalable video decoding for demultiplexed high resolution image signal, scalable decoding may be performed with reference to a flag indicating whether motion vector information of a lower layer is used or not, a flag indicating whether a reference index of a lower layer for adjacent video is used as prediction information or not, a flag indicating whether a type of an intra block of a lower layer is used as prediction information or not, a flag indicating whether a differential image value of a lower layer is used or not, and an index of a reference view used for prediction .
ADVANTAGEOUS EFFECTS
According to the present invention, a multiview video can be effectively compressed by expanding the temporal and spatial hierarchical structure of a typical scalable coding technology to multiview videos. Also, a video service can be scalably provided to various 2-D or 3-D terminals by forming a hierarchical structure in a temporal and spatial axis for multiview video.
BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a diagram describing a scalable coding technology according to the related art.
Fig. 2 illustrates a scalable coding apparatus for multiview video in accordance with an embodiment of the present invention. Fig. 3 is a block diagram illustrating an extended scalability video encoder in accordance with an embodiment of the present invention.
Fig. 4 describes a reference structure for predicting and referencing adjacent frames in scalable video coding according to an embodiment of the present invention.
Fig. 5 illustrates the reference structure of Fig. 4 with circle symbols and cross symbols in a spatial (view) layer axis with a time fixed. Fig. 6 illustrates a reference structure for a B- frame structure.
Fig. 7 illustrates a reference structure of Fig. 6 with circle symbols and cross symbols in a spatial (view) layer axis with a time fixed. Fig. 8 is a block diagram illustrating an apparatus for scalable coding multiview video in accordance with another embodiment of the present invention.
Fig. 9 describes a refererce structure for a basic video 0 (91) and adjacent videos 1 (92) and 2(93) in Fig. 8.
BEST MODE FOR THE INVENTION
The advantages, features and aspects of the invention will become apparent from the following description of the embodiments with reference to the accompanying drawings, which is set forth hereinafter. Therefore, those skilled in the field of this art of the present invention can embody the technological concept and scope of the invention easily. In addition, if it is considered that detailed description on a related art may obscure the points of the present invention, the detailed description will not be provided herein. The specific embodiments of the present invention will be described in detail hereinafter with reference to the attached drawings . Fig. 2 illustrates a scalable coding apparatus for multiview video in accordance with an embodiment of the present invention.
In Fig. 2, five videos 0 to 4 are input from five cameras and each of the input videos 0 to 4 compressed by a scalable video encoder.
Referring to Fig. 2, the scalable coding apparatus according to the present embodiment includes a basic scalability video encoder 21 and extended scalability video encoders 22 to 25. The video encoder 21 performs 2-D spatial transformation and temporal transformation on the video 0 which is a basic video. The video encoder 21 also performs scalable coding through motion coding and texture coding. Each of the extended scalability video encoders 22 to 25 receives not only own video which is assigned itself but also at least one of adjacent videos as reference video and separates the received videos into frames with multilayer resolutions through spatial filtering and temporal filtering. Also, each of the extended scalability video encoders 22 to 25 performs scalable coding on the separated frames with reference to temporal and spatial hierarchical image information and compression parameters of the adjacent videos as well as the own video. In Fig. 2, the video 0 is defined as a basic video, and the basic scalability video encoder 21 performs scalable coding on the video 0. The basic scalability video encoder 21 has the same structure of a single viewpoint video scalable coding apparatus according to the related art. That is, the scalable coding apparatus according to the present embodiment has a structure compatible to typical scalable condition apparatuses for basic video.
In order to scalably compress input video with reference to the adjacent videos as well as the own video, the videos 1 to 4 are compressed through the extended scalability video encoders 22 to 25. Like a single viewpoint video scalable coding apparatus, each of the extended scalability video encoders 22 to 25 receives not only own video which is assigned to itself but also adjacent videos as a reference video, separates the received videos into frames with multilayer resolutions through spatial filtering, and performs scalable coding on the separated frames with reference to lower layers of the other videos assigned to neighbor encoders as well as that of the video assigned to itself. Also, the extended scalability encoder 23 that compresses the video 2 performs compression through bidirectional prediction using multilayer temporal and spatial resolution video information of the basic video 0 and the video 4.
Accordingly, the scalable coding apparatus according to the present embodiment can provide a typical 2-D video service using only the basic scalability video encoder 21. Also, the scalable coding apparatus according to the present embodiment can provide a stereo video service using the basic scalability video encoder 21 for the basic video 0 and the extended scalability video encoder 25 for the video 4. Furthermore, the scalable coding apparatus according to the present embodiment can provide a three-view video service or a five-view video service by selectively combining the basic scalability video encoder 21 with the extended scalability video encoders 22 to 25.
Hereinafter, the structure and the function of the extended scalability video encoder will be described with reference to Fig. 3.
Fig. 3 is a block diagram illustrating an extended scalability video encoder in accordance with an embodiment of the present invention. Referring to Fig. 3, the extended scalability video encoder according to the present embodiment includes a spatial video filtering unit 31, temporal video filtering units 330 to 340 a basic layer encoder 33, at least one of enhancement layer encoders 34, and a multiplexer 35. The spatial video filtering unit 31 separates an own video and reference videos into frames with multilayer resolutions through spatial filtering. The temporal video filtering units 330 and 340 separate the output videos from the spatial video filtering unit 31 through temporal filtering. The basic Layer encoder 33 performs scalable coding through not only temporal and spatial motion estimation for the own video frames of temporal low frequency images outputted from the temporal video filtering unit 330 but also through the motion estimation for the reference video frames on a temporal axis. Each of the enhancement layer encoders 34 with reference to the lower layers of the reference videos as well as the lower layer of the own video for temporal high frequency images outputted from the temporal video filtering unit 340. The multiplexer 35 outputs one bitstream by multiplexing outputs from the basic layer encoder 31 and the enhancement encoders 34.
The spatial video filtering unit 31 receives an own video assigned to itself, which is captured by an own camera, and the other videos captured by the other cameras as reference videos at a predetermined time interval and separates the received videos into frames with multilayer resolutions through spatial filtering based on MCTF or hierarchical B structure. The basic layer encoder 33 and the enhancement layer encoder 34 may includes temporal video filtering units 330 and 340, motion encoders 331 and 341, subtractor 332 and 342, spatial transformers 333 and 343, quantizers 334 and 344, entropy encoders 335 and 345. As described above, the basic layer encoder 33 and the enhancement layer encoder 34 have the structure similar to a typical scalable video encoder. Hereinafter, the functions of the constituent elements in the encoders 33 and 34 will be described. The temporal video filtering unit 330 of the basic layer encoder 33 separates low frequency images, which are separated through spatial filtering, in a temporal axis through filtering based or MCTF or hierarchical B- structure. ALso, the temporal video filtering unit 340 of the enhancement layer encoder 34 separates the hαgh frequency images, which are separated through the spatial filtering, into a temporal axis through filtering based on MCTF or hierarchical B-structure.
The motion encoders 331 and 341 include a motion estimation block or a motion compensation block. The motion estimation block performs motion estimation of a current frame using a reference frame as a basis and calculates a motion vector for forward motion estimation or bi-directional estimation. Here, the motion encoders 331 and 341 may use not only own frames but also peripheral frames as reference frames for motion estimation. The motion encoders 331 and 341 use a block matching algorithm that is generally used for motion estimation. That is, the motion encoders 331 and 341 calculates displacement when an error becomes minimum while moving a given motion block in a predetermined search area of a reference frame and estimates the calculated displacement as the motion vector. The motion encoders 331 and 341 provide motion data, such as motion vectors obtained as the result of motion estimation, a size of a motion block, and a reference frame number, to the entropy encoders 335 and 345. Also, the motion compensation block generates a temporal estimated frame for a current frame by performing the motion compensation for a forward reference frame, a backward reference frame, or a bi-directional reference frame using the calculated motion vector.
The subtractors 332 and 342 remove the temporal redundancy of a video by subtracting a current frame and a temporal estimated frame. The spatial transformers 333 and 343 remove spatial redundancy from the temporal redundancy removed frame using a predetermined spatial transformation method that supports spatial scalability. As the spatial transformation method, Discrete Cosine Transform (DCT) and wavelet transform are widely used.
The quantizers 334 and 344 quantize transform coefficients from the spatial transformers 333 and 343. The quantization is a process of transforming the transform coefficient, which is expressed as a predetermined real number, to a discrete value by dividing the transform coefficient by predetermined periods and matching the discrete value to a predetermined index.
The entropy encoders 335 and 345 lossless-encode the quantized transform coefficient from the quantizers 334 and 344 and the motion data provided from the motion estimation block and generates an output bitstream. As the lossless encoding method, arithmetic coding or variable length coding may be used. Meanwhile, intra prediction may be performed for an intra block before spatial transform. In order to perform the intra prediction, the enhancement layer encoder may include a 2-D spatial interpolation block for receiving a restored reference frame from the lower layer encoder and performing two-dimensional (2-D) spatial interpolation and an intra prediction block for performing the intra prediction.
In general, inter prediction searches a block most similar to a predetermined block of a current frame, obtains a predicted block that can express the current block best, and quantizes differences between the current block and the predicted block. The inter prediction includes bi-directional prediction using two reference frames, forward prediction using a past reference frame, and backward prediction using a future reference frame.
Meanwhile, the intra prediction predicts a current block using frames adjacent to the current block. The intra prediction is different from the other because the intra prediction uses information in a current frame only and does not use the other frames in the same layer or frames of the other layer.
Intra base prediction may be used when a current frame includes frames of a lower layer having the same temporal location. A macro block of a current frame can be effectively predicted from the macro blocks of a corresponding basic frame. That is, the difference between a macro block of a current block and a macro block of a corresponding basic frame is quantized. When the resolution of the lower layer is different from that of the current layer, the macro block of the basic frame is up-sampled to the resolution of the current layer before calculating the difference.
Residual prediction is the extension of the inter prediction from a single layer to multilayer. The residual prediction calculates a difference between the difference obtained from the inter prediction of a current layer and the other different obtained from the inter prediction of a lower layer and quantizes the calculated difference. In the present embodiment, the enhancement encoder uses a value obtained by multiplying two to a motion vector of a basic layer image that is a low resolution image for an own video and basic layer images that is a low resolution image for the other videos as reference videos to perform motion estimation when encoding high- resolution image frames.
In the present embodiment, the enhancement encoder performs differential image estimation by interpolating remaining images after predicting a basic layer image (low resolution image) of an own video and basic layer images (low resolution images) of the other videos as reference videos when encoding high resolution image frames .
Also, the enhancement encoder performs intra prediction using a basic layer image that is a low resolution image of an own video and basic layer images that is a low resolution image of the other videos as reference videos in an intra prediction mode when encoding high resolution image frames. Fig. 4 is a diagram illustrating a reference structure for predicting and referencing adjacent frames in scalable video coding according to an embodiment of the present invention.
In Fig. 4, a P macro block denotes single- directional prediction and a B macro block denotes a bidirectional prediction. The reference structure according to the present embodiment allows single directional prediction and bi-directional prediction to be performed in a plurality of resolution layers as well as in a temporal axis and a spatial axis.
As shown in Fig. 4, the reference structure according to the present embodiment includes a two-layer structure formed of one basic layer and an enhancement layer. However, the reference structure may further include more enhancement layers.
In Fig. 4, a reference numeral 41 denotes a predicting and referencing operation for predicting and referencing adjacent frames, which is performed in a basic layer encoder and an enhancement layer encoder in a basic scalability video encoder 21 of Fig. 2. A reference numeral 42 denotes a predicting and referencing operation which is performed in a basic layer encoder and an enhancement layer encoder in an extended scalability video encoder 22 for a video 1 of Fig. 2. A reference numeral 43 denotes a predicting and referencing operation, which is performed in a basic layer encoder and an enhancement layer encoder in an extended scalability video encoder 23 for a video 2 of Fig. 2.
In other words, the basic layer 0 (LO) denotes a reference structure performed in each of the basic layer encoder of the basic scalability video encoder 21, the basic layer encoder of the extended scalability video encoder 22 for the video 1, and the basic layer encoder of the extended scalability video encoder 23 for the video 2.
Like the basic layer 0 (L) , the enhancement layer 1
(Ll) denotes a reference structure performed in each of the enhancement layer encoder of the basic scalability video encoder 21, the enhancement layer encoder of the extended scalability video encoder 22 for the video 1, and the enhancement layer encoder of the extended scalability video encoder 23 for the video 2.
Referring to Fig. 4, the basic layer encoder of the basic scalability video encoder 21 performs a scalable video coding operation by predicting and referencing adjacent frames for own low resolution image frames in a temporal axis like the video encoder according to the related art. The basic layer encoder of the extended scalability video encoder 22 for the video 1 performs bi- directional prediction for own frame using the frames of a video 0 and the frames o£ a video 2, which are reference video frames located at the same temporal axis. Also, the basic layer encoder of the extended scalability video encoder 23 for the video 2 perform single- directional prediction with reference to the basic video 0 and performs bi-directional prediction using the own frame at the same time.
Meanwhile, the enhancement layer 1 (Ll), the upper layer of the basic layer, performs spatial and temporal prediction for own video frame and performs prediction in reference with own frames of a basic layer and adjacent frames of the basic layer. In Fig. 4, each of macro blocks includes three circles or cross symbols for indicating whether a lower layer is referred or not. Here, a circle or a cross symbol in a middle row among the three symbols indicate whether a lower layer of an own video frame is referred or not, and circle symbols or cross symbols in a top row or in a bottom row indicate whether lower layers of adjacent video frames are referred or not.
Referring to Fig. 4, the enhancement layer encoder of the basic scalability video encoder 21 performs scalable video coding with reference to own frames of a lower layer like the encoder according to the related art. The enhancement layer encoder of the extended scalability video encoder 22 for the video 1 performs bi-direction prediction with reference to lower layer frames of a video 0 and lower layer frames of a video 2, which are adjacent frames, as well as own lower layer frames. The enhancement layer encoder of the extended scalability video encoder 23 for the video 2 performs prediction with reference to the lower layer frames of the basic video 0 as well as the own lower layer frame.
Fig. 5 illustrates the reference structure of Fig. 4 with circle symbols and cross symbols in a spatial (view) layer axis with a time fixed.
In Fig. 5, the reference structure includes three layers. A reference numeral 51 denotes a reference structure for a video 0, a reference numeral 52 denotes a reference structure for a video 1, and a reference numeral 53 denotes a reference structure for a video 2.
In Fig. 5, since the macro block of an enhancement layer 2 (50) for the video 0 includes two cross symbols at the right and left columns and a circle symbol at the middle column, a coding operation is performed based on motion, differential images, and intra prediction used in the scalable video coding (SVC) according to the related art with reference to a lower layer of an own video only. On the contrary, since the macro block of an enhancement layer 2 for a video 1 (52) includes all circle symbols, a coding operation is performed with reference to lower layers of adjacent videos as well as a lower layer of an own video. Also, since the macro block of an enhancement layer 2 for a video 2 (53) includes circle symbols at the middle and left columns, a scalable video coding operation is performed with reference to the lower layer of an own video and the lower layer of a video 0.
In order to indicate whether a lower layer is referenced or not as described above, a scalable video includes predetermined sentences that describe information about videos in the reference layer.
A sentence ref view Idx denotes a view number of reference video in a lower layer. Here, a flag base_mode_flag indicates whether the motion vector information of a lower layer :.s used for estimating a motion in a current block or not. If the flag base_mode_flag is 1, a variable ref_view_Idx must have a view number of a lower layer for indicating which motion vector information is used. A flag base_mode_refinement_flag indicates whether or not the motion vector information of a lower layer is used for predicting a motion vector of a current block. Unlike the flag base_mode_flag, the reference index of a lower layer is also used as prediction information. Therefore, if the flag basejnode _refinement_flag is 1, a variable ref_view_Idx must have a view number of a lower layer for indicating which motion vector and reference index information are used. A flag intra_base_flag indicates a type of an intra block in a lower layer, which is used as prediction information of a current block. If the flag intra_base_ flag is 1, information about an intra prediction mode of a lower layer is used for a current block. Therefore, a variable ref_view_Idx must have a view number of a lower layer for indicating which intra block type information is used.
A flag residual_prediction _flag indicates whether a differential image value of a lower layer is used for predicting a differential image of a current block or not. If the flag residula_prediction_flag is 1, the differential image information of a lower layer is up- sampled. Also, the variable ref_view_Idx must have a view number of a lower layer for indicating which differential image information is used. Table 1 shows the above described sentences in the scalable video.
Table 1
Figure imgf000023_0001
Fig. 6 illustrates a reference structure for a B- frame structure.
In Fig. 6, a reference numeral 61 denotes a reference structure for a video 0 that is a basic video, a reference numeral 62 denotes a reference structure for a video 1, and a reference numeral 63 denotes a reference structure for a video 2.
Fig. 7 is a diagram il Lustrating a reference structure of E"ig. 6 with circle symbols and cross symbols in a spatial (view) layer axis with a time fixed.
In Fig. 7, the reference structure includes three layers. A reference numeral 71 denotes a reference structure for a video 0 that is a basic video, a reference numeral 72 denotes a reference structure for a video 1, and a reference numeral 73 denotes a reference structure for a video 2.
Referring to Figs. 6 and 7, the video 0 which are basic videos 62 and 71 perform scalable video coding with reference to own lower layer frames only. However, the video 1 and video 2 perform the scalable video coding with reference to the lower layer frames of adjacent videos as well as own lower layer frames.
The above described reference structures can be identically applied to a P-frame structure.
Fig. 8 is a block diagram i Llustrating an apparatus for scalable coding multiview video in accordance with another embodiment of the present invention. Fig. 9 is a diagram illustrating a reference structure for a basic video 0 (91) and adjacent videos 1 (92) and 2(93) in Fig. 8.
Referring to Figs. 8 and 9, a basic scalability video encoder 81 performs scalable coding for the basic video 0. The basic scalability video encoder 81 refers own lower layer frames like a single viewpoint video scalable coding apparatus. Therefore, the scalable coding apparatus according to another embodiment can be compatible with existing scalable coding apparatus.
Extended scalability video encoders 82 to 85 perform scalable video coding for the videos 1 to 4. Each of the extended scalability video encoders 82 to 85 separates video into frames with multilayer resolution, performs temporal and spatial prediction for the separated video frames, and performs compression with reference to temporal and spatial layer image information of adjacent videos and compression parameters.
As shown in Fig. 9, the enhancement layer 1 (92) of the video 1 performs scalable video coding with reference to a lower layer of the video 0, and the enhancement layer 2 (93) of the video 2 performs scalable video coding with reference to a lower layer of the video 1. In other word, the extended scalability video encoders perform scalable video coding with reference to one next video only in the scalable coding apparatus according to another embodiment as shown in Fig. 8.
Accordingly, the scalable coding apparatus according to another embodiment can provide a 2-D video service using the basic scalability video encoder 81. Also, the scalable coding apparatus according to another embodiment can provide a stereo video service using the basic scalability video encoder 81 for the basic video 0 and the extended scalability video encoder 82 for the video 1. Furthermore, the scalable coding apparatus according to another embodiment can provide t.hree-view service using the basic scalability video encoder 81 for the basic video 0 and the extended scalability video encoders 82 and 84 for the videos 1 and 3. The scalable coding apparatus according to another embodiment can provide a five-view service using the basic scalability video encoder 81 for the basic video 0 and the extended scalability video encoders 82 and 85 for the videos 1 and 4.
The scalable coding technology for multiview video according to the present invention will be briefly described again. At first, one of basic videos is separated into frames with multilayer resolutions using a spatial filter m a spatial axis. Then, a spatial and temporal scalable video coding operation as performed on the separated low resolution image frames through motion estimation in a temporal axis. Also, the spatαal and temporal scalable video coding operation is performed on the separated hagh resolution VLdeo frames through hierarchical motion estimation in a temporal axis with reference to a lower layer. Then, a bitstream is generated by multiplexing the coded low resolution image frame and at least one of the coded high resolution video frames. As described above, the scalable video coding for basic video according to bhe present embodiment is identical to that according to the related art.
Hereinafter, the scalable video coding for multivj ew video accord mg to the present embodiment will be described.
At first, own video is received with at least one of adjacent videos as reference videos. The received own video and adjacent videos are separated into video frames with multilayer resolutions using a spatial filter m a spatial axis. Then, the temporal and spatial scalable video coding is performed on the separated low resolution image frame through hierarchical motion estimation with reference to adjacent frames as reference frames as well as own frame in a temporal axis. Also, the temporal and spatial scalable video coding is performed on t he separated high resolution image frame through hierarchical motion estimation with reference to lower layers of adjacent video frames as well as a lower layer of the own video frame in a temporal axis. Then, a bitstream is generated by multiplexing the coded Low resolution image frame and at least one of the coded high resolution video frames. As described above, the extended scalable video coding uses not only the own lower layer frames but also adjacent lower layer frames as reference frames unlike the scalable video coding for single viewpoint video, which uses an adjacent frame and a lower layer frame thereof .
Meanwhile, a scalable video decoding apparatus for multiview video according to an embodiment of the present invention performs the operations of the scalable video encoding apparatus according to the present embodiment in a reverse order.
The scalable video decoding apparatus according to the present embodiment includes a basic scalability video decoder and a plurality of extended scalability video decoders. The basic scalability video decoder receives a bitstream generated by scalable-coding one basic video and restoring the basic video through inverse temporal transformation and inverse spatial transformation. Each of the extended scalability video decoders receives a bitstream generated by scalable-coding own video and reference videos, which are captured at the same time through the temporal and spatial prediction. Then, one of the extended scalability video decoders restores at least one of high resolution video frame through inverse temporal and spatial prediction according to whether a lower layer of adjacent video frame is referred as well as an own lower layer and restores one low resolution image frame through inverse temporal and spatial prediction according to whether the adjacent video frames at the same temporal axis are referenced as well as the own adjacent frame. Then, one of the extended scalability video decoders restores video through performing inverse spatial filtering on the restored high resolution video frames and the restored low resolution image frame . In the scalable decoding apparatus according to the present embodiment, the basic scalability video decoder has a structure identical to that of a typical scalable video decoder. Therefore, the detail description thereof is omitted.
In the present embodiment, the extended scalability video decoder includes a demultiplexer, at least one of enhancement layer decoders, a basic layer decoder, and an inverse spatial video filtering unit. The demultiplexer demultiplexes the received bitstream. Each of the enhancement layer decoder performs scalable decoding on high resolution video signal outputted from the demultiplexer through inverse temporal and spatial motion estimation according to whether adjacent videos are referred as well as a lower layer of own video. The basic layer decoder performs scalable decoding on low resolution image signal outputted from the demultiplexer through inverse temporal and spatial motion estimate on not only through inverse temporal and spatial motion estimation for own video frame but also inverse motion estimation for reference video frames on a temporal axis. The inverse spatial video filtering unit restores a video through performing inverse spatial filtering on the restored high resolution video frame from the enhancement decoder and on the restored low resolution image frame from the basic layer decoder .
Here, the basic layer decoder and the enhancement layer decoder perform operations of the basic layer encoder and the enhancement layer encoder in an inverse order. Therefore, the detail descriptions of the basic layer decoder and the enhancement layer decoder are omitted.
When the enhancement decoder performs a decoding operation for a high resolution video signal, the enhancement decoder refers a flag indicting whether the motion vector information of a lower layer is used or not, a flag indicating whether a reference index of a lower layer for adjacent video is used as prediction information or not, a flag indicating whether a type of an intra block of a lower layer is used as prediction information or not, a flag indicating whether a differential video value of a lower layer is used or not, and an index of a reference view used for prediction.
As described above, the technology of the present invention can be realized as a program and stored in a computer-readable recording medium, such as CD-ROM, RAM, ROM, floppy disk, hard disk and magneto-optical disk. Since the process can be easily implemented by those skilled in the art of the present invention, further description will not be provided herein.
While the present invention has been described with respect to certain preferred embodiments, it will be apparent to those skilled in the art that various changes and modifications may be made without departing from the scope of the invention as defined in the following claims.
INDUSTRIAL APPLICABILITY
According to the present invention, multiview video can be effectively compressed by expanding a temporal and spatial hierarchical structure of a typical scalable coding technology to the multiview video. Also, a video service can be scalably provided to various types of 2-D or 3-D terminals by forming a hierarchical structure on a temporal and spatial axis for the multiview video according to the present invention.

Claims

WHAT IS CLAIMED IS:
1. A scalable video coding apparatus for a multiview video comprising: a basic scalability video encoder for separating one basic video into video frames with multilayer resolutions and performing scalable video coding through performing temporal and spatial prediction on the separated low resolution image frame and at least one of the separated high resolution video frames; and a plurality of extended scalability video encoders for receiving an own video and at least one of adjacent videos as reference videos which are captured at the same time, separating the received videos in video frames with multilayer resolutions through spatial filtering, performing scalable video coding for the separated low resolution image frame through temporal and spatial prediction with reference to the adjacent video frames at the same temporal axis as well as own adjacent frame, and performing scalable video coding for the separated high resolution video frame through temporal and spatial prediction with reference to lower layers of the adjacent video frames at the same temporal axis as well as an own lower layer.
2. The scalable video coding apparatus of claim 1, wherein the extended scalability video encoder includes: a spatial video filtering means for separating own video and adjacent videos as reference videos into video frames with multilayer resolutions through spatial filtering; a basic layer encoding means for separating the own video and adjacent videos into low resolution image frames through temporal filtering and performing scalable coding through motion estimation for reference video frame in a temporal axis as well as temporal and spatial motion estimation for own video frame; at least one of enhancement layer encoding means for separating the own video and adjacent videos to high resolution video frames through temporal filtering and performing scalable coding through spatial and temporal motion estimation with reference to lower layers for the adjacent videos as well as a lower layer of own video; and a multiplexing means for outputting one bit stream by multiplexing the output of the basic layer encoding means and the output of the enhancement encoding means .
3. The scalable video coding apparatus of claim 2, wherein the enhancement encoding means sets a flag indicating whether motion vector information of a lower layer is used or not, a flag indicating whether a reference index of a lower layer for an adjacent video is used as prediction information or not, a flag indicating whether a type of an intra block of a lower layer is used as prediction information or not, and a flag indicting whether a differential video value of a lower layer is used or not, and marks an index of a used reference view as a coding result.
4. The scalable video coding apparatus of claim 2, wherein the enhancement layer encoding means further includes a two-dimensional (2-D) spatial interpolation means for performing 2-D spatial interpolation on a video frame restored for intra prediction for an intra block.
5. The scalable video coding apparatus of claim 2, wherein the enhancement layer encoding means performs coding through motions between frames, differential video and intra prediction on a temporal and spatial axis.
6. The scalable video coding apparatus of claim 5, wherein the enhancement layer encoding means performs motion estimation using a value obtained by multiplying two to a motion vector of a basic layer image that is a low resolution image for own video and a basic layer image that is a low resolution image for an adjacent video .
7. The scalable video coding apparatus of claim 5, wherein the enhancement layer encoding means performs differential video prediction by interpolating remaining images after predicting a basic layer image that is a low resolution image for own video and a basic layer image that is a low resolution image for an adjacent video.
8. A scalable video coding method for multiview video, comprising the steps of:
(a) separating one basic video to video frames with multilayer resolutions and performing scalable video coding through temporal and spatial prediction; and
(b) receiving an own video and at least one of adjacent videos, which are captured at the same time, and performing scalable video coding through temporal and spatial prediction by separating the received videos into video frames with multilayer resolutions, wherein the step of (b) receiving an own video and at least one of adjacent videos includes the steps of:
(c) performing scalable video coding for low resolution video frames through temporal and spatial prediction with reference to the adjacent video frames at the same temporal axis as well as own adjacent frames; and
(d) performing scalable video coding for at least one of high resolution video frames through temporal and spatial prediction with reference to lower layers of the adjacent video frames as well as own lower layer.
9. The scalable video coding method of claim 8, wherein as a result of performing the step of (d) performing scalable video coding for at least one of high resolution video frames, a flag indicating whether motion vector information of a lower layer is used or not, a flag indicating whether a reference index of a lower layer for an adjacent video is used as prediction information or not, a flag indicating whether a type of an intra block of a lower layei is used as prediction information or not, and a fLag indicting whether a differential video value of a lower layer is used or not are set, and an index of a used reference view is marked.
10. The scalable video coding method of claim 8, wherein in the step of (d) performing scalable video coding for at least one of high resolution video frames, a video is coded through motions between frames, differential images, and intra prediction on a temporal and spatial axis.
11. The scalable video coding method of claim 8, wherein in the step of (d) performing scalable video coding for at least one of high resolution video frames, two-dimensional spatial interpolation is performed for a video frame restored for intra prediction for an intra block.
12. The scalable video coding method of claim 8, wherein in the step of (d) performing scalable video coding for at least one of high resolution video frames, motion estimation is performed αsing a value obtained by multiplying two to a motion vector of a basic layer image that is a low resolution image for own video and a basic layer image that is a low resolution image for an adjacent video.
13. The scalable video coding method of claim 8, wherein in the step of (d) performing scalable video coding for at least one of high resolution video frames, differential video prediction is performed by interpolating remaining images after prediction of a basic layer image that is a low resolution image for own video and a basic layer image that is a low resolution image for an adjacent video.
14. A scalable video decoding apparatus for multiview video comprising: a basic scalability video decoder for receiving a bitstream generated by scalably coding one basic video and restoring a basic video through inverse temporal and inverse spatial transform; and a plurality of extended scalability video decoders for receiving a bitstream generated scalable-coded through temporal and spatial prediction for own video and reference videos, which are captured at the same time, restoring at least one of high resolution image frames through inverse temporal and spatial prediction whether lower layers of adjacent video frames that are reference video are referred as well as own lower layer or not, restoring a lower resolution image frame through inverse temporal and spatial prediction according to whether the adjacent video frames are referred or not at the same temporal axis as well as own adjacent frame, and restoring an image through inverse spatial filtering for the restored high resolution image frames and the restored low resolution image frame.
15. The scalable video decoding apparatus of claim
14, wherein the extended scalability video decoder includes : a demultiplexing means for demultiplexing a received bitstream; at least one of enhancement decoding means for performing scalable decoding for a high resolution image signal outputted from the demultiplexing means through inverse temporal and spatial motion estimation according to whether lower layers of adjacent videos that are reference videos are referred as well as a lower layer of own video or not; a basic layer decoding means for performing scalable decoding for a low resolution image signal outputted from the demultiplexing means through inverse motion estimation for reference video frames at a temporal axis as well as inverse temporal and spatial motion estimation for own video frame; and an inverse spatial video filtering means restoring an image through inverse spatial filtering for the restored high resolution images from the enhancement decoding means and the restored low resolution image from the basic decoding means.
16. The scalable video decoding apparatus of claim
15, wherein the enhancement decoding means performs scalable decoding with reference to a flag indicating whether motion vector information of a lower layer is used or not, a flag indicating whether a reference index of a lower layer for adjacent video is used as prediction information or not, a flag indicating whether a type of an intra block of a lower layer is used as prediction information or not, a flag indicating whether a differential image value of a lower layer is used or not, and an index of a reference view used for prediction.
17. A scalable video decoding method for multiview video, comprising the steps of:
(a) performing scalable video decoding for one basic video through inverse temporal and spatial prediction; and
(b) receiving a bitstream scalable-coded with reference to an own video and at least one of adjacent videos as reference videos, which are captured at the same time, and performing scalable video decoding through inverse temporal prediction and inverse spatial prediction, wherein the step of (b) receiving a bitstream scalable-coded with reference to an own video and at least one of adjacent videos as reference videos includes the steps of:
(c) performing scalable video decoding for demultiplexed high resolution image signal through inverse temporal and spatial prediction according to whether lower layers of the adjacent video frames are referred as well as an own lower layer; and
(d) performing scalable video decoding for demultiplexed low resolution image signal through inverse temporal and spatial prediction according to whether the adjacent video frames are referred at the same temporal axis as well as an own adjacent frame .
18. The scalable video decoding method of claim 17, wherein in the step of (c) performing scalable video decoding for demultiplexed high resolution image signal, scalable decoding is performed with reference to a flag indicating whether motion vector information of a lower layer is used or not, a flag indicating whether a reference index of a lower layer for adjacent video is used as prediction information or not, a flag indicating whether a type of an intra block of a lower layer is used as prediction information or not, a flag indicating whether a differential image value of a lower layer is used or not, and an index of a reference view used for prediction .
PCT/KR2007/005294 2006-10-25 2007-10-25 Multi-view video scalable coding and decoding WO2008051041A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2009534496A JP5170786B2 (en) 2006-10-25 2007-10-25 Multi-view video scalable coding and decoding method, and coding and decoding apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2006-0103923 2006-10-25
KR20060103923 2006-10-25

Publications (1)

Publication Number Publication Date
WO2008051041A1 true WO2008051041A1 (en) 2008-05-02

Family

ID=39324782

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2007/005294 WO2008051041A1 (en) 2006-10-25 2007-10-25 Multi-view video scalable coding and decoding

Country Status (3)

Country Link
JP (1) JP5170786B2 (en)
KR (1) KR100919885B1 (en)
WO (1) WO2008051041A1 (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009140913A1 (en) * 2008-05-23 2009-11-26 华为技术有限公司 Controlling method and device of multi-point meeting
WO2010120804A1 (en) 2009-04-13 2010-10-21 Reald Inc. Encoding, decoding, and distributing enhanced resolution stereoscopic video
WO2010147289A1 (en) * 2009-06-16 2010-12-23 Lg Electronics Inc. Broadcast transmitter, broadcast receiver and 3d video processing method thereof
US20110012994A1 (en) * 2009-07-17 2011-01-20 Samsung Electronics Co., Ltd. Method and apparatus for multi-view video coding and decoding
WO2011042440A1 (en) * 2009-10-08 2011-04-14 Thomson Licensing Method for multi-view coding and corresponding decoding method
CN102036065A (en) * 2009-10-05 2011-04-27 美国博通公司 Method and system for video coding
WO2012006299A1 (en) * 2010-07-08 2012-01-12 Dolby Laboratories Licensing Corporation Systems and methods for multi-layered image and video delivery using reference processing signals
CN102957910A (en) * 2011-08-09 2013-03-06 索尼公司 Image encoding apparatus, image encoding method and program
CN103026706A (en) * 2010-07-21 2013-04-03 杜比实验室特许公司 Systems and methods for multi-layered frame-compatible video delivery
EP2587804A1 (en) * 2011-10-28 2013-05-01 Samsung Electronics Co., Ltd Method and apparatus for hierarchically encoding and decoding of a two-dimensional image, of a stereo image, and of a three-dimensional image
EP2700233A2 (en) * 2011-04-19 2014-02-26 Samsung Electronics Co., Ltd. Method and apparatus for unified scalable video encoding for multi-view video and method and apparatus for unified scalable video decoding for multi-view video
CN103733620A (en) * 2011-08-11 2014-04-16 高通股份有限公司 Three-dimensional video with asymmetric spatial resolution
US20140133567A1 (en) * 2012-04-16 2014-05-15 Nokia Corporation Apparatus, a method and a computer program for video coding and decoding
CN103828371A (en) * 2011-09-22 2014-05-28 松下电器产业株式会社 Moving-image encoding method, moving-image encoding device, moving image decoding method, and moving image decoding device
US8855199B2 (en) * 2008-04-21 2014-10-07 Nokia Corporation Method and device for video coding and decoding
CN105025312A (en) * 2008-12-30 2015-11-04 Lg电子株式会社 Digital broadcast receiving method providing two-dimensional image and 3d image integration service, and digital broadcast receiving device using the same
TWI552575B (en) * 2011-08-09 2016-10-01 三星電子股份有限公司 Multi-view video prediction method and apparatus therefore and multi-view video prediction restoring method and apparatus therefore
US9485503B2 (en) 2011-11-18 2016-11-01 Qualcomm Incorporated Inside view motion prediction among texture and depth view components
US9521418B2 (en) 2011-07-22 2016-12-13 Qualcomm Incorporated Slice header three-dimensional video extension for slice header prediction
US9648346B2 (en) 2009-06-25 2017-05-09 Microsoft Technology Licensing, Llc Multi-view video compression and streaming based on viewpoints of remote viewer
US10027943B2 (en) 2012-04-03 2018-07-17 Sun Patent Trust Image encoding method, image decoding method, image encoding device, and image decoding device
US11496760B2 (en) 2011-07-22 2022-11-08 Qualcomm Incorporated Slice header prediction for depth maps in three-dimensional video codecs

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101012760B1 (en) * 2008-09-05 2011-02-08 에스케이 텔레콤주식회사 System and Method for transmitting and receiving of Multi-view video
KR101146138B1 (en) * 2008-12-10 2012-05-16 한국전자통신연구원 Temporal scalabel video encoder
KR101144752B1 (en) * 2009-08-05 2012-05-09 경희대학교 산학협력단 video encoding/decoding method and apparatus thereof
WO2011016701A2 (en) * 2009-08-07 2011-02-10 한국전자통신연구원 Motion picture encoding apparatus and method thereof
KR20110015356A (en) 2009-08-07 2011-02-15 한국전자통신연구원 Video encoding and decoding apparatus and method using adaptive transform and quantization domain that based on a differential image signal characteristic
EP2591602A1 (en) * 2010-07-06 2013-05-15 Koninklijke Philips Electronics N.V. Generation of high dynamic range images from low dynamic range images
JP5663093B2 (en) * 2010-10-01 2015-02-04 ドルビー ラボラトリーズ ライセンシング コーポレイション Optimized filter selection for reference picture processing
WO2013051896A1 (en) * 2011-10-05 2013-04-11 한국전자통신연구원 Video encoding/decoding method and apparatus for same
WO2013076991A1 (en) * 2011-11-25 2013-05-30 パナソニック株式会社 Image coding method, image coding device, image decoding method and image decoding device
KR101346349B1 (en) * 2012-01-30 2013-12-31 광운대학교 산학협력단 Apparatus and Method for scalable multi-view video decoding
WO2013115609A1 (en) * 2012-02-02 2013-08-08 한국전자통신연구원 Interlayer prediction method and device for image signal
JP6050488B2 (en) * 2012-07-06 2016-12-21 サムスン エレクトロニクス カンパニー リミテッド Multi-layer video encoding method and apparatus for random access, and multi-layer video decoding method and apparatus for random access
WO2014088316A2 (en) * 2012-12-04 2014-06-12 인텔렉추얼 디스커버리 주식회사 Video encoding and decoding method, and apparatus using same
EP2961166B1 (en) * 2013-02-25 2020-04-01 LG Electronics Inc. Method for encoding video of multi-layer structure supporting scalability and method for decoding same and apparatus therefor
US10616607B2 (en) 2013-02-25 2020-04-07 Lg Electronics Inc. Method for encoding video of multi-layer structure supporting scalability and method for decoding same and apparatus therefor
KR101595397B1 (en) * 2013-07-26 2016-02-29 경희대학교 산학협력단 Method and apparatus for integrated encoding/decoding of different multilayer video codec
WO2015016535A1 (en) * 2013-07-30 2015-02-05 주식회사 케이티 Image encoding and decoding method supporting plurality of layers and apparatus using same
US9894369B2 (en) 2013-07-30 2018-02-13 Kt Corporation Image encoding and decoding method supporting plurality of layers and apparatus using same
US9762909B2 (en) 2013-07-30 2017-09-12 Kt Corporation Image encoding and decoding method supporting plurality of layers and apparatus using same

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030202592A1 (en) * 2002-04-20 2003-10-30 Sohn Kwang Hoon Apparatus for encoding a multi-view moving picture
WO2006062377A1 (en) * 2004-12-10 2006-06-15 Electronics And Telecommunications Research Institute Apparatus for universal coding for multi-view video
WO2006104326A1 (en) * 2005-04-01 2006-10-05 Industry Academic Cooperation Foundation Kyunghee University Scalable multi-view image encoding and decoding apparatuses and methods

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7468745B2 (en) * 2004-12-17 2008-12-23 Mitsubishi Electric Research Laboratories, Inc. Multiview video decomposition and encoding
KR20060101847A (en) * 2005-03-21 2006-09-26 엘지전자 주식회사 Method for scalably encoding and decoding video signal

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030202592A1 (en) * 2002-04-20 2003-10-30 Sohn Kwang Hoon Apparatus for encoding a multi-view moving picture
WO2006062377A1 (en) * 2004-12-10 2006-06-15 Electronics And Telecommunications Research Institute Apparatus for universal coding for multi-view video
WO2006104326A1 (en) * 2005-04-01 2006-10-05 Industry Academic Cooperation Foundation Kyunghee University Scalable multi-view image encoding and decoding apparatuses and methods

Cited By (57)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8855199B2 (en) * 2008-04-21 2014-10-07 Nokia Corporation Method and device for video coding and decoding
KR101224097B1 (en) 2008-05-23 2013-01-21 후아웨이 테크놀러지 컴퍼니 리미티드 Controlling method and device of multi-point meeting
WO2009140913A1 (en) * 2008-05-23 2009-11-26 华为技术有限公司 Controlling method and device of multi-point meeting
US8339440B2 (en) 2008-05-23 2012-12-25 Huawei Technologies Co., Ltd. Method and apparatus for controlling multipoint conference
CN105025312A (en) * 2008-12-30 2015-11-04 Lg电子株式会社 Digital broadcast receiving method providing two-dimensional image and 3d image integration service, and digital broadcast receiving device using the same
EP2420068A1 (en) * 2009-04-13 2012-02-22 RealD Inc. Encoding, decoding, and distributing enhanced resolution stereoscopic video
WO2010120804A1 (en) 2009-04-13 2010-10-21 Reald Inc. Encoding, decoding, and distributing enhanced resolution stereoscopic video
CN102804785A (en) * 2009-04-13 2012-11-28 瑞尔D股份有限公司 Encoding, decoding, and distributing enhanced resolution stereoscopic video
EP2420068A4 (en) * 2009-04-13 2012-08-08 Reald Inc Encoding, decoding, and distributing enhanced resolution stereoscopic video
US20120092453A1 (en) * 2009-06-16 2012-04-19 Jong Yeul Suh Broadcast transmitter, broadcast receiver and 3d video processing method thereof
CN105025309A (en) * 2009-06-16 2015-11-04 Lg电子株式会社 Broadcast transmitter and 3D video data processing method thereof
CN102461183A (en) * 2009-06-16 2012-05-16 Lg电子株式会社 Broadcast transmitter, broadcast receiver and 3d video processing method thereof
US9578302B2 (en) 2009-06-16 2017-02-21 Lg Electronics Inc. Broadcast transmitter, broadcast receiver and 3D video data processing method thereof
US20150350625A1 (en) * 2009-06-16 2015-12-03 Lg Electronics Inc. Broadcast transmitter, broadcast receiver and 3d video data processing method thereof
WO2010147289A1 (en) * 2009-06-16 2010-12-23 Lg Electronics Inc. Broadcast transmitter, broadcast receiver and 3d video processing method thereof
US9088817B2 (en) 2009-06-16 2015-07-21 Lg Electronics Inc. Broadcast transmitter, broadcast receiver and 3D video processing method thereof
US9648346B2 (en) 2009-06-25 2017-05-09 Microsoft Technology Licensing, Llc Multi-view video compression and streaming based on viewpoints of remote viewer
CN102577376A (en) * 2009-07-17 2012-07-11 三星电子株式会社 Method and apparatus for multi-view video coding and decoding
CN102577376B (en) * 2009-07-17 2015-05-27 三星电子株式会社 Method, apparatus and system for multi-view video coding and decoding
JP2012533925A (en) * 2009-07-17 2012-12-27 サムスン エレクトロニクス カンパニー リミテッド Method and apparatus for multi-view video encoding and decoding
US20110012994A1 (en) * 2009-07-17 2011-01-20 Samsung Electronics Co., Ltd. Method and apparatus for multi-view video coding and decoding
EP2306730A3 (en) * 2009-10-05 2011-07-06 Broadcom Corporation Method and system for 3D video decoding using a tier system framework
CN102036065A (en) * 2009-10-05 2011-04-27 美国博通公司 Method and system for video coding
FR2951346A1 (en) * 2009-10-08 2011-04-15 Thomson Licensing MULTIVATED CODING METHOD AND CORRESPONDING DECODING METHOD
WO2011042440A1 (en) * 2009-10-08 2011-04-14 Thomson Licensing Method for multi-view coding and corresponding decoding method
CN103155568A (en) * 2010-07-08 2013-06-12 杜比实验室特许公司 Systems and methods for multi-layered image and video delivery using reference processing signals
US10531120B2 (en) 2010-07-08 2020-01-07 Dolby Laboratories Licensing Corporation Systems and methods for multi-layered image and video delivery using reference processing signals
WO2012006299A1 (en) * 2010-07-08 2012-01-12 Dolby Laboratories Licensing Corporation Systems and methods for multi-layered image and video delivery using reference processing signals
US9467689B2 (en) 2010-07-08 2016-10-11 Dolby Laboratories Licensing Corporation Systems and methods for multi-layered image and video delivery using reference processing signals
US11044454B2 (en) 2010-07-21 2021-06-22 Dolby Laboratories Licensing Corporation Systems and methods for multi-layered frame compatible video delivery
CN105847780A (en) * 2010-07-21 2016-08-10 杜比实验室特许公司 Decoding method for multi-layered frame-compatible video delivery
US10142611B2 (en) 2010-07-21 2018-11-27 Dolby Laboratories Licensing Corporation Systems and methods for multi-layered frame-compatible video delivery
JP2013538487A (en) * 2010-07-21 2013-10-10 ドルビー ラボラトリーズ ライセンシング コーポレイション System and method for multi-layer frame compliant video delivery
CN103026706A (en) * 2010-07-21 2013-04-03 杜比实验室特许公司 Systems and methods for multi-layered frame-compatible video delivery
CN105847781A (en) * 2010-07-21 2016-08-10 杜比实验室特许公司 Decoding method for multi-layered frame-compatible video delivery
CN105812828A (en) * 2010-07-21 2016-07-27 杜比实验室特许公司 Decoding method for multilayer frame compatible video transmission
EP2700233A4 (en) * 2011-04-19 2014-09-17 Samsung Electronics Co Ltd Method and apparatus for unified scalable video encoding for multi-view video and method and apparatus for unified scalable video decoding for multi-view video
EP2700233A2 (en) * 2011-04-19 2014-02-26 Samsung Electronics Co., Ltd. Method and apparatus for unified scalable video encoding for multi-view video and method and apparatus for unified scalable video decoding for multi-view video
US11496760B2 (en) 2011-07-22 2022-11-08 Qualcomm Incorporated Slice header prediction for depth maps in three-dimensional video codecs
US9521418B2 (en) 2011-07-22 2016-12-13 Qualcomm Incorporated Slice header three-dimensional video extension for slice header prediction
CN102957910A (en) * 2011-08-09 2013-03-06 索尼公司 Image encoding apparatus, image encoding method and program
TWI552575B (en) * 2011-08-09 2016-10-01 三星電子股份有限公司 Multi-view video prediction method and apparatus therefore and multi-view video prediction restoring method and apparatus therefore
US9973778B2 (en) 2011-08-09 2018-05-15 Samsung Electronics Co., Ltd. Method for multiview video prediction encoding and device for same, and method for multiview video prediction decoding and device for same
CN103733620A (en) * 2011-08-11 2014-04-16 高通股份有限公司 Three-dimensional video with asymmetric spatial resolution
US9288505B2 (en) 2011-08-11 2016-03-15 Qualcomm Incorporated Three-dimensional video with asymmetric spatial resolution
US10764604B2 (en) 2011-09-22 2020-09-01 Sun Patent Trust Moving picture encoding method, moving picture encoding apparatus, moving picture decoding method, and moving picture decoding apparatus
CN103828371B (en) * 2011-09-22 2017-08-22 太阳专利托管公司 Dynamic image encoding method, dynamic image encoding device and dynamic image decoding method and moving image decoding apparatus
CN103828371A (en) * 2011-09-22 2014-05-28 松下电器产业株式会社 Moving-image encoding method, moving-image encoding device, moving image decoding method, and moving image decoding device
US20140219338A1 (en) * 2011-09-22 2014-08-07 Panasonic Corporation Moving picture encoding method, moving picture encoding apparatus, moving picture decoding method, and moving picture decoding apparatus
EP2587804A1 (en) * 2011-10-28 2013-05-01 Samsung Electronics Co., Ltd Method and apparatus for hierarchically encoding and decoding of a two-dimensional image, of a stereo image, and of a three-dimensional image
US9191677B2 (en) 2011-10-28 2015-11-17 Samsung Electronics Co., Ltd. Method and apparatus for encoding image and method and appartus for decoding image
US9485503B2 (en) 2011-11-18 2016-11-01 Qualcomm Incorporated Inside view motion prediction among texture and depth view components
US10027943B2 (en) 2012-04-03 2018-07-17 Sun Patent Trust Image encoding method, image decoding method, image encoding device, and image decoding device
US10582183B2 (en) 2012-04-03 2020-03-03 Sun Patent Trust Image encoding method, image decoding method, image encoding device, and image decoding device
US20140133567A1 (en) * 2012-04-16 2014-05-15 Nokia Corporation Apparatus, a method and a computer program for video coding and decoding
EP2839660B1 (en) * 2012-04-16 2020-10-07 Nokia Technologies Oy An apparatus, a method and a computer program for video coding and decoding
US10863170B2 (en) 2012-04-16 2020-12-08 Nokia Technologies Oy Apparatus, a method and a computer program for video coding and decoding on the basis of a motion vector

Also Published As

Publication number Publication date
KR20080037593A (en) 2008-04-30
JP5170786B2 (en) 2013-03-27
KR100919885B1 (en) 2009-09-30
JP2010507961A (en) 2010-03-11

Similar Documents

Publication Publication Date Title
WO2008051041A1 (en) Multi-view video scalable coding and decoding
KR100760258B1 (en) Apparatus for Universal Coding for Multi-View Video
US7817181B2 (en) Method, medium, and apparatus for 3-dimensional encoding and/or decoding of video
KR100763179B1 (en) Method for compressing/Reconstructing motion vector of unsynchronized picture and apparatus thereof
US8644386B2 (en) Method of estimating disparity vector, and method and apparatus for encoding and decoding multi-view moving picture using the disparity vector estimation method
KR100789753B1 (en) Apparatus of predictive coding/decoding using view-temporal reference picture buffers and method using the same
EP1927250A1 (en) Method of estimating disparity vector, and method and apparatus for encoding and decoding multi-view moving picture using the disparity vector estimation method
JP2007180981A (en) Device, method, and program for encoding image
WO2007052969A1 (en) Method and apparatus for encoding multiview video
WO2004059980A1 (en) Method and apparatus for encoding and decoding stereoscopic video
KR100703746B1 (en) Video coding method and apparatus for predicting effectively unsynchronized frame
MX2008002391A (en) Method and apparatus for encoding multiview video.
EP1642463A1 (en) Video coding in an overcomplete wavelet domain
JP2007180982A (en) Device, method, and program for decoding image
KR20040065014A (en) Apparatus and method for compressing/decompressing multi-viewpoint image
WO2006118384A1 (en) Method and apparatus for encoding/decoding multi-layer video using weighted prediction
WO2006110007A1 (en) Method for coding in multiview video coding/decoding system
WO2013039348A1 (en) Method for signaling image information and video decoding method using same
KR100791453B1 (en) Multi-view Video Encoding and Decoding Method and apparatus Using Motion Compensated Temporal Filtering
KR20110118744A (en) 3d tv video encoding method, decoding method
JP2011091498A (en) Moving image coder, moving image decoder, moving image coding method, and moving image decoding method
WO2006104357A1 (en) Method for compressing/decompressing motion vectors of unsynchronized picture and apparatus using the same
Liu et al. Fully scalable multiview wavelet video coding
Lim et al. Motion/disparity compensated multiview sequence coding

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07833603

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2009534496

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07833603

Country of ref document: EP

Kind code of ref document: A1