WO2008051041A1 - Multi-view video scalable coding and decoding - Google Patents
Multi-view video scalable coding and decoding Download PDFInfo
- Publication number
- WO2008051041A1 WO2008051041A1 PCT/KR2007/005294 KR2007005294W WO2008051041A1 WO 2008051041 A1 WO2008051041 A1 WO 2008051041A1 KR 2007005294 W KR2007005294 W KR 2007005294W WO 2008051041 A1 WO2008051041 A1 WO 2008051041A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- temporal
- scalable
- spatial
- adjacent
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/60—Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client
- H04N21/63—Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
- H04N21/647—Control signaling between network components and server or clients; Network processes for video distribution between server and clients, e.g. controlling the quality of the video stream, by dropping packets, protecting content from unauthorised alteration within the network, monitoring of network load, bridging between two different networks, e.g. between IP and wireless
- H04N21/64784—Data processing by the network
- H04N21/64792—Controlling the complexity of the content stream, e.g. by dropping packets
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/105—Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/187—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scalable video layer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
- H04N19/31—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the temporal domain
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
- H04N19/33—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the spatial domain
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/46—Embedding additional information in the video signal during the compression process
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
- H04N19/615—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding using motion compensated temporal filtering [MCTF]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/63—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using sub-band based transform, e.g. wavelets
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
- H04N19/87—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving scene cut or scene change detection in combination with video compression
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/13—Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
Definitions
- the present invention relates to a multiview video scalable coding and decoding technology; and, more particularly, to a multiview video scalable coding and decoding apparatus and method for compressing and transmitting multiview video using a multilayer spatial and temporal scalable coding technology and for providing two-dimensional or three-dimensional video services to various types of video terminals.
- data is compressed by removing temporal and spatial redundancy of data.
- the spatial redundancy denotes the identical color or the same objects in video.
- the temporal redundancy means adjacent pictures with almost no changes in moving pictures and repeated sound in audio.
- the temporal redundancy is removed by temporal filtering based on motion compensation and the spatial redundancy is removed by spatial transformation.
- the transmitting performance varies according to the types of the transmission mediums.
- a scalable video coding technology was introduced to support the various speeds of transmission mediums and to transmit multimedia data at a transfer rate proper to a transmission environment.
- the scalable video coding technology is one of coding technologies for controlling a resolution, a frame rate, and a signal-to-noise ratio (SNR) of video by cutting down a predetermined part of a compressed bit stream according to conditions, such as a transport bit rate, a transport error rate, and a system resource.
- SNR signal-to-noise ratio
- Fig. 1 is a diagram describing a scalable cod ⁇ ng technology according to a related art.
- the scalable video coding technology performs temporal transform for realizing temporal scalable and performs two-dimensional spatial transform for realizing spatial scalable. Also, the scalable video coding technology realizes a quality scalability using texture coding.
- the motion coding scalably encodes motion information when spatial scalable is realized. As described above, one bit stream is generated through such coding algorithms.
- MCTF motion compensated temporal filtering
- hierarchical B-pictures were used.
- the MCTF performs wavelet transform using motion information in a clockwise direction in a video sequence.
- the wavelet transform is performed using a lifting scheme.
- the lifting scheme includes three processes, polyphase decomposition, prediction, and update.
- the hierarchical B-pictures may be realized in various ways using a memory management control operation that manages a decoded picture buffer (DPB) for storing 16 pictures and the syntaxes of reference picture list reordering (RPLP) .
- DPB decoded picture buffer
- RPLP reference picture list reordering
- the multiview video compression technology is a technology for simultaneously coding videos from a plurality of cameras that provide multiview video and compressing, storing, and transmitting the coded video. If the multiview video is stored and transmitted without being compressed, a large transmission bandwidth is required to transmit the multiview video to a user through a broadcasting network or a wired/wireless Internet in real-time.
- each of video sequences is independently coded and transmitted and the transmitted coded video sequences are decoded. It is easily realized based on MPEG-1/2/4 or H.261/263/264. However, it is impossible to remove redundancy between videos, which is generated as the same object is photographed by a plurality of cameras .
- a scalable video coding technology was introduced.
- a single view point video is divided into video frames with multilayer resolutions in a spatial axis using a spatial filter, and a temporal and spatial scalable is performed on the divided video frames in a temporal axis through hierarchical bi-directional motion estimation.
- quality scalability may be provided through entropy coding by hierarchical expression in transform coding.
- An embodiment of the present invention is directed to providing a multiview video scalable coding method and apparatus for effectively compressing videos and providing various video services to terminals in diverse environments through motion estimation with reference to adjacent images at a temporal and spatial axis for compressing multiview video and through motions, differential images, and intra prediction in different resolutions of adjacent videos for providing scalability on a temporal and spatial axis in a multiview video.
- Another embodiment of the present invention is directed to providing a scalable video decoding method and apparatus for receiving a scalable coded signal and decoding the received signal for multiview video.
- a scalable video coding apparatus for a multiview video including: a basic scalability video encoder for separating one basic video to video frames with multilayer resolutions and performing scalable video coding through performing temporal and spatial prediction on the separated low resolution image frame and at least one of the separated high resolution video frames; and a plurality of extended scalability video encoders for receiving an own video and at least one of adjacent videos as reference videos which are captured at the same time, separating the received videos in video frames with multilayer resolutions through spatial filtering, performing scalable video coding for the separated low resolution image frame through temporal and spatial prediction with reference to the adjacent video frames at the same temporal axis as well as own adjacent frame, and performing scalable video coding for the separated high resolution video frame through temporal and spatial prediction with reference to lower layers of the adjacent video frames at the same temporal axis as well as an own lower layer.
- a scalable video coding method for multiview video including the steps of: (a) separating one basic video to video frames with multilayer resolutions and performing scalable video coding through temporal and spatial prediction; and (b) receiving an own video and at least one of adjacent videos, which are captured at the same time, and performing scalable video coding through temporal and spatial prediction by separating the received videos into video frames with multilayer resolutions, wherein the step (b) of receiving an own video and at least one of adjacent videos includes the steps of: (c) performing scalable video coding for low resolution video frames through temporal and spatial prediction with reference to the adjacent video frames at the same temporal axis as well as own adjacent frames; and (d) performing scalable video coding for at least one of high resolution video frames through temporal and spatial prediction with reference to lower layers of the adjacent video frames as well as own lower layer.
- a scalable video decoding apparatus for multiview video including: a basic scalability video decoder for receiving a bitstream generated by scalably coding one basic video and restoring a basic video through inverse temporal and inverse spatial transform; and a plurality of extended scalability video decoders for receiving a bitstream generated scalable-coded through temporal and spatial prediction for own video and reference videos, which are captured at the same time, restoring at least one of h ⁇ gh resolution image frames through inverse temporal and spatial prediction whether lowei layers of adjacent video frames that are reference video are referred as well as own lower layer or not, restoring a lower resolution image frame through inverse temporal and spatial prediction according to whether the adjacent video frames are referred or not at the same temporal axis as well as own adjacent frame, and restoring an image through inverse spatial filtering for the restored high resolution image frames and the restored low resolution image frame.
- the extended scalability video decoder may include: a demultiplexing unit for demultiplexing a received bitstream; at least one of enhancement decoding unit for performing scalable decoding for a high resolution image signal outputted from the demultiplexing unit through inverse temporal and spatial motion estimation according to whether lower layers of adjacent videos that are reference videos are referred as well as a lower layer of own video or not; a basic layer decoding unit for performing scalable decoding for a low resolution image signal outputted from the demultiplexing unit through inverse motion estimation for reference video frames at a temporal axis as well as inverse temporal and spatial motion estimation for own video frame; and an inverse spatial video filtering unit for restoring an image through inverse spatial filtering for the restored high resolution images from the enhancement decoding unit and the restored low resolution image from the basic decoding unit .
- the enhancement decoding unit may perform scalable decoding with reference to a flag indicating whether motion vector information of a lower layer is used or not, a flag indicating whether a reference index of a lower layer for adjacent video is used as prediction information or not, a flag indicating whether a type of an intra block of a lower layer is used as prediction information or not, a flag indicating whether a differential image value of a lower layer is used or not, and an index of a reference view used for prediction.
- a scalable video decoding method for multiview video including the steps of: (a) performing scalable video decoding for one basic video through inverse temporal and spatial prediction; and (b) receiving a bitstream scalable-coded with reference to an own video and at least one of adjacent videos as reference videos, which are captured at the same time, and performing scalable video decoding through inverse temporal prediction and inverse spatial prediction, wherein the step (b) of receiving a bitstream scalable- coded with reference to an own video and at least one of adjacent videos as reference videos includes the steps of: (c) performing scalable video decoding for demultiplexed high resolution image signal through inverse temporal and spatial prediction according to whether lower layers of the adjacent video frames are referred as well as an own lower layer; and (d) performing scalable video decoding for demultiplexed low resolution image signal through inverse temporal and spatial prediction according to whether the adjacent video frames are referred at the same temporal axis as well as an own adjacent frame.
- scalable decoding may be performed with reference to a flag indicating whether motion vector information of a lower layer is used or not, a flag indicating whether a reference index of a lower layer for adjacent video is used as prediction information or not, a flag indicating whether a type of an intra block of a lower layer is used as prediction information or not, a flag indicating whether a differential image value of a lower layer is used or not, and an index of a reference view used for prediction .
- a multiview video can be effectively compressed by expanding the temporal and spatial hierarchical structure of a typical scalable coding technology to multiview videos.
- a video service can be scalably provided to various 2-D or 3-D terminals by forming a hierarchical structure in a temporal and spatial axis for multiview video.
- Fig. 1 is a diagram describing a scalable coding technology according to the related art.
- Fig. 2 illustrates a scalable coding apparatus for multiview video in accordance with an embodiment of the present invention.
- Fig. 3 is a block diagram illustrating an extended scalability video encoder in accordance with an embodiment of the present invention.
- Fig. 4 describes a reference structure for predicting and referencing adjacent frames in scalable video coding according to an embodiment of the present invention.
- Fig. 5 illustrates the reference structure of Fig. 4 with circle symbols and cross symbols in a spatial (view) layer axis with a time fixed.
- Fig. 6 illustrates a reference structure for a B- frame structure.
- Fig. 7 illustrates a reference structure of Fig. 6 with circle symbols and cross symbols in a spatial (view) layer axis with a time fixed.
- Fig. 8 is a block diagram illustrating an apparatus for scalable coding multiview video in accordance with another embodiment of the present invention.
- Fig. 9 describes a refererce structure for a basic video 0 (91) and adjacent videos 1 (92) and 2(93) in Fig. 8.
- Fig. 2 illustrates a scalable coding apparatus for multiview video in accordance with an embodiment of the present invention.
- Fig. 2 five videos 0 to 4 are input from five cameras and each of the input videos 0 to 4 compressed by a scalable video encoder.
- the scalable coding apparatus includes a basic scalability video encoder 21 and extended scalability video encoders 22 to 25.
- the video encoder 21 performs 2-D spatial transformation and temporal transformation on the video 0 which is a basic video.
- the video encoder 21 also performs scalable coding through motion coding and texture coding.
- Each of the extended scalability video encoders 22 to 25 receives not only own video which is assigned itself but also at least one of adjacent videos as reference video and separates the received videos into frames with multilayer resolutions through spatial filtering and temporal filtering.
- each of the extended scalability video encoders 22 to 25 performs scalable coding on the separated frames with reference to temporal and spatial hierarchical image information and compression parameters of the adjacent videos as well as the own video.
- the video 0 is defined as a basic video
- the basic scalability video encoder 21 performs scalable coding on the video 0.
- the basic scalability video encoder 21 has the same structure of a single viewpoint video scalable coding apparatus according to the related art. That is, the scalable coding apparatus according to the present embodiment has a structure compatible to typical scalable condition apparatuses for basic video.
- each of the extended scalability video encoders 22 to 25 receives not only own video which is assigned to itself but also adjacent videos as a reference video, separates the received videos into frames with multilayer resolutions through spatial filtering, and performs scalable coding on the separated frames with reference to lower layers of the other videos assigned to neighbor encoders as well as that of the video assigned to itself. Also, the extended scalability encoder 23 that compresses the video 2 performs compression through bidirectional prediction using multilayer temporal and spatial resolution video information of the basic video 0 and the video 4.
- the scalable coding apparatus can provide a typical 2-D video service using only the basic scalability video encoder 21. Also, the scalable coding apparatus according to the present embodiment can provide a stereo video service using the basic scalability video encoder 21 for the basic video 0 and the extended scalability video encoder 25 for the video 4. Furthermore, the scalable coding apparatus according to the present embodiment can provide a three-view video service or a five-view video service by selectively combining the basic scalability video encoder 21 with the extended scalability video encoders 22 to 25.
- Fig. 3 is a block diagram illustrating an extended scalability video encoder in accordance with an embodiment of the present invention.
- the extended scalability video encoder includes a spatial video filtering unit 31, temporal video filtering units 330 to 340 a basic layer encoder 33, at least one of enhancement layer encoders 34, and a multiplexer 35.
- the spatial video filtering unit 31 separates an own video and reference videos into frames with multilayer resolutions through spatial filtering.
- the temporal video filtering units 330 and 340 separate the output videos from the spatial video filtering unit 31 through temporal filtering.
- the basic Layer encoder 33 performs scalable coding through not only temporal and spatial motion estimation for the own video frames of temporal low frequency images outputted from the temporal video filtering unit 330 but also through the motion estimation for the reference video frames on a temporal axis.
- Each of the enhancement layer encoders 34 with reference to the lower layers of the reference videos as well as the lower layer of the own video for temporal high frequency images outputted from the temporal video filtering unit 340.
- the multiplexer 35 outputs one bitstream by multiplexing outputs from the basic layer encoder 31 and the enhancement encoders 34.
- the spatial video filtering unit 31 receives an own video assigned to itself, which is captured by an own camera, and the other videos captured by the other cameras as reference videos at a predetermined time interval and separates the received videos into frames with multilayer resolutions through spatial filtering based on MCTF or hierarchical B structure.
- the basic layer encoder 33 and the enhancement layer encoder 34 may includes temporal video filtering units 330 and 340, motion encoders 331 and 341, subtractor 332 and 342, spatial transformers 333 and 343, quantizers 334 and 344, entropy encoders 335 and 345. As described above, the basic layer encoder 33 and the enhancement layer encoder 34 have the structure similar to a typical scalable video encoder.
- the temporal video filtering unit 330 of the basic layer encoder 33 separates low frequency images, which are separated through spatial filtering, in a temporal axis through filtering based or MCTF or hierarchical B- structure.
- ALso the temporal video filtering unit 340 of the enhancement layer encoder 34 separates the h ⁇ gh frequency images, which are separated through the spatial filtering, into a temporal axis through filtering based on MCTF or hierarchical B-structure.
- the motion encoders 331 and 341 include a motion estimation block or a motion compensation block.
- the motion estimation block performs motion estimation of a current frame using a reference frame as a basis and calculates a motion vector for forward motion estimation or bi-directional estimation.
- the motion encoders 331 and 341 may use not only own frames but also peripheral frames as reference frames for motion estimation.
- the motion encoders 331 and 341 use a block matching algorithm that is generally used for motion estimation. That is, the motion encoders 331 and 341 calculates displacement when an error becomes minimum while moving a given motion block in a predetermined search area of a reference frame and estimates the calculated displacement as the motion vector.
- the motion encoders 331 and 341 provide motion data, such as motion vectors obtained as the result of motion estimation, a size of a motion block, and a reference frame number, to the entropy encoders 335 and 345. Also, the motion compensation block generates a temporal estimated frame for a current frame by performing the motion compensation for a forward reference frame, a backward reference frame, or a bi-directional reference frame using the calculated motion vector.
- the subtractors 332 and 342 remove the temporal redundancy of a video by subtracting a current frame and a temporal estimated frame.
- the spatial transformers 333 and 343 remove spatial redundancy from the temporal redundancy removed frame using a predetermined spatial transformation method that supports spatial scalability.
- Discrete Cosine Transform (DCT) and wavelet transform are widely used.
- the quantizers 334 and 344 quantize transform coefficients from the spatial transformers 333 and 343.
- the quantization is a process of transforming the transform coefficient, which is expressed as a predetermined real number, to a discrete value by dividing the transform coefficient by predetermined periods and matching the discrete value to a predetermined index.
- the entropy encoders 335 and 345 lossless-encode the quantized transform coefficient from the quantizers 334 and 344 and the motion data provided from the motion estimation block and generates an output bitstream.
- arithmetic coding or variable length coding may be used.
- intra prediction may be performed for an intra block before spatial transform.
- the enhancement layer encoder may include a 2-D spatial interpolation block for receiving a restored reference frame from the lower layer encoder and performing two-dimensional (2-D) spatial interpolation and an intra prediction block for performing the intra prediction.
- inter prediction searches a block most similar to a predetermined block of a current frame, obtains a predicted block that can express the current block best, and quantizes differences between the current block and the predicted block.
- the inter prediction includes bi-directional prediction using two reference frames, forward prediction using a past reference frame, and backward prediction using a future reference frame.
- the intra prediction predicts a current block using frames adjacent to the current block.
- the intra prediction is different from the other because the intra prediction uses information in a current frame only and does not use the other frames in the same layer or frames of the other layer.
- Intra base prediction may be used when a current frame includes frames of a lower layer having the same temporal location.
- a macro block of a current frame can be effectively predicted from the macro blocks of a corresponding basic frame. That is, the difference between a macro block of a current block and a macro block of a corresponding basic frame is quantized.
- the macro block of the basic frame is up-sampled to the resolution of the current layer before calculating the difference.
- Residual prediction is the extension of the inter prediction from a single layer to multilayer.
- the residual prediction calculates a difference between the difference obtained from the inter prediction of a current layer and the other different obtained from the inter prediction of a lower layer and quantizes the calculated difference.
- the enhancement encoder uses a value obtained by multiplying two to a motion vector of a basic layer image that is a low resolution image for an own video and basic layer images that is a low resolution image for the other videos as reference videos to perform motion estimation when encoding high- resolution image frames.
- the enhancement encoder performs differential image estimation by interpolating remaining images after predicting a basic layer image (low resolution image) of an own video and basic layer images (low resolution images) of the other videos as reference videos when encoding high resolution image frames .
- the enhancement encoder performs intra prediction using a basic layer image that is a low resolution image of an own video and basic layer images that is a low resolution image of the other videos as reference videos in an intra prediction mode when encoding high resolution image frames.
- Fig. 4 is a diagram illustrating a reference structure for predicting and referencing adjacent frames in scalable video coding according to an embodiment of the present invention.
- a P macro block denotes single- directional prediction and a B macro block denotes a bidirectional prediction.
- the reference structure according to the present embodiment allows single directional prediction and bi-directional prediction to be performed in a plurality of resolution layers as well as in a temporal axis and a spatial axis.
- the reference structure according to the present embodiment includes a two-layer structure formed of one basic layer and an enhancement layer.
- the reference structure may further include more enhancement layers.
- a reference numeral 41 denotes a predicting and referencing operation for predicting and referencing adjacent frames, which is performed in a basic layer encoder and an enhancement layer encoder in a basic scalability video encoder 21 of Fig. 2.
- a reference numeral 42 denotes a predicting and referencing operation which is performed in a basic layer encoder and an enhancement layer encoder in an extended scalability video encoder 22 for a video 1 of Fig. 2.
- a reference numeral 43 denotes a predicting and referencing operation, which is performed in a basic layer encoder and an enhancement layer encoder in an extended scalability video encoder 23 for a video 2 of Fig. 2.
- the basic layer 0 denotes a reference structure performed in each of the basic layer encoder of the basic scalability video encoder 21, the basic layer encoder of the extended scalability video encoder 22 for the video 1, and the basic layer encoder of the extended scalability video encoder 23 for the video 2.
- the enhancement layer 1 Like the basic layer 0 (L) , the enhancement layer 1
- (Ll) denotes a reference structure performed in each of the enhancement layer encoder of the basic scalability video encoder 21, the enhancement layer encoder of the extended scalability video encoder 22 for the video 1, and the enhancement layer encoder of the extended scalability video encoder 23 for the video 2.
- the basic layer encoder of the basic scalability video encoder 21 performs a scalable video coding operation by predicting and referencing adjacent frames for own low resolution image frames in a temporal axis like the video encoder according to the related art.
- the basic layer encoder of the extended scalability video encoder 22 for the video 1 performs bi- directional prediction for own frame using the frames of a video 0 and the frames o£ a video 2, which are reference video frames located at the same temporal axis.
- the basic layer encoder of the extended scalability video encoder 23 for the video 2 perform single- directional prediction with reference to the basic video 0 and performs bi-directional prediction using the own frame at the same time.
- each of macro blocks includes three circles or cross symbols for indicating whether a lower layer is referred or not.
- a circle or a cross symbol in a middle row among the three symbols indicate whether a lower layer of an own video frame is referred or not
- circle symbols or cross symbols in a top row or in a bottom row indicate whether lower layers of adjacent video frames are referred or not.
- the enhancement layer encoder of the basic scalability video encoder 21 performs scalable video coding with reference to own frames of a lower layer like the encoder according to the related art.
- the enhancement layer encoder of the extended scalability video encoder 22 for the video 1 performs bi-direction prediction with reference to lower layer frames of a video 0 and lower layer frames of a video 2, which are adjacent frames, as well as own lower layer frames.
- the enhancement layer encoder of the extended scalability video encoder 23 for the video 2 performs prediction with reference to the lower layer frames of the basic video 0 as well as the own lower layer frame.
- Fig. 5 illustrates the reference structure of Fig. 4 with circle symbols and cross symbols in a spatial (view) layer axis with a time fixed.
- the reference structure includes three layers.
- a reference numeral 51 denotes a reference structure for a video 0
- a reference numeral 52 denotes a reference structure for a video 1
- a reference numeral 53 denotes a reference structure for a video 2.
- a coding operation is performed based on motion, differential images, and intra prediction used in the scalable video coding (SVC) according to the related art with reference to a lower layer of an own video only.
- SVC scalable video coding
- the macro block of an enhancement layer 2 for a video 1 includes all circle symbols
- a coding operation is performed with reference to lower layers of adjacent videos as well as a lower layer of an own video.
- the macro block of an enhancement layer 2 for a video 2 (53) includes circle symbols at the middle and left columns, a scalable video coding operation is performed with reference to the lower layer of an own video and the lower layer of a video 0.
- a scalable video includes predetermined sentences that describe information about videos in the reference layer.
- a sentence ref view Idx denotes a view number of reference video in a lower layer.
- a flag base_mode_flag indicates whether the motion vector information of a lower layer :.s used for estimating a motion in a current block or not. If the flag base_mode_flag is 1, a variable ref_view_Idx must have a view number of a lower layer for indicating which motion vector information is used.
- a flag base_mode_refinement_flag indicates whether or not the motion vector information of a lower layer is used for predicting a motion vector of a current block. Unlike the flag base_mode_flag, the reference index of a lower layer is also used as prediction information.
- a variable ref_view_Idx must have a view number of a lower layer for indicating which motion vector and reference index information are used.
- a flag intra_base_flag indicates a type of an intra block in a lower layer, which is used as prediction information of a current block. If the flag intra_base_ flag is 1, information about an intra prediction mode of a lower layer is used for a current block. Therefore, a variable ref_view_Idx must have a view number of a lower layer for indicating which intra block type information is used.
- a flag residual_prediction _flag indicates whether a differential image value of a lower layer is used for predicting a differential image of a current block or not. If the flag residula_prediction_flag is 1, the differential image information of a lower layer is up- sampled. Also, the variable ref_view_Idx must have a view number of a lower layer for indicating which differential image information is used. Table 1 shows the above described sentences in the scalable video.
- Fig. 6 illustrates a reference structure for a B- frame structure.
- a reference numeral 61 denotes a reference structure for a video 0 that is a basic video
- a reference numeral 62 denotes a reference structure for a video 1
- a reference numeral 63 denotes a reference structure for a video 2.
- Fig. 7 is a diagram il Lustrating a reference structure of E"ig. 6 with circle symbols and cross symbols in a spatial (view) layer axis with a time fixed.
- the reference structure includes three layers.
- a reference numeral 71 denotes a reference structure for a video 0 that is a basic video
- a reference numeral 72 denotes a reference structure for a video 1
- a reference numeral 73 denotes a reference structure for a video 2.
- the video 0 which are basic videos 62 and 71 perform scalable video coding with reference to own lower layer frames only.
- the video 1 and video 2 perform the scalable video coding with reference to the lower layer frames of adjacent videos as well as own lower layer frames.
- Fig. 8 is a block diagram i Llustrating an apparatus for scalable coding multiview video in accordance with another embodiment of the present invention.
- Fig. 9 is a diagram illustrating a reference structure for a basic video 0 (91) and adjacent videos 1 (92) and 2(93) in Fig. 8.
- a basic scalability video encoder 81 performs scalable coding for the basic video 0.
- the basic scalability video encoder 81 refers own lower layer frames like a single viewpoint video scalable coding apparatus. Therefore, the scalable coding apparatus according to another embodiment can be compatible with existing scalable coding apparatus.
- Extended scalability video encoders 82 to 85 perform scalable video coding for the videos 1 to 4. Each of the extended scalability video encoders 82 to 85 separates video into frames with multilayer resolution, performs temporal and spatial prediction for the separated video frames, and performs compression with reference to temporal and spatial layer image information of adjacent videos and compression parameters.
- the enhancement layer 1 (92) of the video 1 performs scalable video coding with reference to a lower layer of the video 0, and the enhancement layer 2 (93) of the video 2 performs scalable video coding with reference to a lower layer of the video 1.
- the extended scalability video encoders perform scalable video coding with reference to one next video only in the scalable coding apparatus according to another embodiment as shown in Fig. 8.
- the scalable coding apparatus can provide a 2-D video service using the basic scalability video encoder 81. Also, the scalable coding apparatus according to another embodiment can provide a stereo video service using the basic scalability video encoder 81 for the basic video 0 and the extended scalability video encoder 82 for the video 1. Furthermore, the scalable coding apparatus according to another embodiment can provide t.hree-view service using the basic scalability video encoder 81 for the basic video 0 and the extended scalability video encoders 82 and 84 for the videos 1 and 3. The scalable coding apparatus according to another embodiment can provide a five-view service using the basic scalability video encoder 81 for the basic video 0 and the extended scalability video encoders 82 and 85 for the videos 1 and 4.
- the scalable coding technology for multiview video according to the present invention will be briefly described again.
- one of basic videos is separated into frames with multilayer resolutions using a spatial filter m a spatial axis.
- a spatial and temporal scalable video coding operation as performed on the separated low resolution image frames through motion estimation in a temporal axis.
- the spat ⁇ al and temporal scalable video coding operation is performed on the separated hagh resolution VLdeo frames through hierarchical motion estimation in a temporal axis with reference to a lower layer.
- a bitstream is generated by multiplexing the coded low resolution image frame and at least one of the coded high resolution video frames.
- the scalable video coding for basic video according to bhe present embodiment is identical to that according to the related art.
- own video is received with at least one of adjacent videos as reference videos.
- the received own video and adjacent videos are separated into video frames with multilayer resolutions using a spatial filter m a spatial axis.
- the temporal and spatial scalable video coding is performed on the separated low resolution image frame through hierarchical motion estimation with reference to adjacent frames as reference frames as well as own frame in a temporal axis.
- the temporal and spatial scalable video coding is performed on t he separated high resolution image frame through hierarchical motion estimation with reference to lower layers of adjacent video frames as well as a lower layer of the own video frame in a temporal axis.
- a bitstream is generated by multiplexing the coded Low resolution image frame and at least one of the coded high resolution video frames.
- the extended scalable video coding uses not only the own lower layer frames but also adjacent lower layer frames as reference frames unlike the scalable video coding for single viewpoint video, which uses an adjacent frame and a lower layer frame thereof .
- a scalable video decoding apparatus for multiview video performs the operations of the scalable video encoding apparatus according to the present embodiment in a reverse order.
- the scalable video decoding apparatus includes a basic scalability video decoder and a plurality of extended scalability video decoders.
- the basic scalability video decoder receives a bitstream generated by scalable-coding one basic video and restoring the basic video through inverse temporal transformation and inverse spatial transformation.
- Each of the extended scalability video decoders receives a bitstream generated by scalable-coding own video and reference videos, which are captured at the same time through the temporal and spatial prediction.
- one of the extended scalability video decoders restores at least one of high resolution video frame through inverse temporal and spatial prediction according to whether a lower layer of adjacent video frame is referred as well as an own lower layer and restores one low resolution image frame through inverse temporal and spatial prediction according to whether the adjacent video frames at the same temporal axis are referenced as well as the own adjacent frame. Then, one of the extended scalability video decoders restores video through performing inverse spatial filtering on the restored high resolution video frames and the restored low resolution image frame .
- the basic scalability video decoder has a structure identical to that of a typical scalable video decoder. Therefore, the detail description thereof is omitted.
- the extended scalability video decoder includes a demultiplexer, at least one of enhancement layer decoders, a basic layer decoder, and an inverse spatial video filtering unit.
- the demultiplexer demultiplexes the received bitstream.
- Each of the enhancement layer decoder performs scalable decoding on high resolution video signal outputted from the demultiplexer through inverse temporal and spatial motion estimation according to whether adjacent videos are referred as well as a lower layer of own video.
- the basic layer decoder performs scalable decoding on low resolution image signal outputted from the demultiplexer through inverse temporal and spatial motion estimate on not only through inverse temporal and spatial motion estimation for own video frame but also inverse motion estimation for reference video frames on a temporal axis.
- the inverse spatial video filtering unit restores a video through performing inverse spatial filtering on the restored high resolution video frame from the enhancement decoder and on the restored low resolution image frame from the basic layer decoder .
- the basic layer decoder and the enhancement layer decoder perform operations of the basic layer encoder and the enhancement layer encoder in an inverse order. Therefore, the detail descriptions of the basic layer decoder and the enhancement layer decoder are omitted.
- the enhancement decoder When the enhancement decoder performs a decoding operation for a high resolution video signal, the enhancement decoder refers a flag indicting whether the motion vector information of a lower layer is used or not, a flag indicating whether a reference index of a lower layer for adjacent video is used as prediction information or not, a flag indicating whether a type of an intra block of a lower layer is used as prediction information or not, a flag indicating whether a differential video value of a lower layer is used or not, and an index of a reference view used for prediction.
- the technology of the present invention can be realized as a program and stored in a computer-readable recording medium, such as CD-ROM, RAM, ROM, floppy disk, hard disk and magneto-optical disk. Since the process can be easily implemented by those skilled in the art of the present invention, further description will not be provided herein.
- multiview video can be effectively compressed by expanding a temporal and spatial hierarchical structure of a typical scalable coding technology to the multiview video.
- a video service can be scalably provided to various types of 2-D or 3-D terminals by forming a hierarchical structure on a temporal and spatial axis for the multiview video according to the present invention.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computer Security & Cryptography (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Provided are a scalable video coding and decoding apparatus and method for multiview video. The apparatus includes a basic scalability video encoder separating one basic video to video frames and performing scalable video coding through temporal and spatial prediction, and multiple extended scalability video encoders for receiving an own video and one or more adjacent videos as reference videos captured simultaneously, separating the received videos in video frames, performing scalable video coding for the separated low resolution image frame through temporal and spatial prediction with reference to the adjacent video frames at same temporal axis as well as own adjacent frame, and performing scalable video coding for the separated high resolution video frame through temporal and spatial prediction referring to lower layers of the adjacent video frames at same temporal axis as well as an own lower layer.
Description
DESCRIPTION
MULTI-VIEW VIDEO SCALABLE CODING AND DECODING
TECHNICAL FIELD The present invention relates to a multiview video scalable coding and decoding technology; and, more particularly, to a multiview video scalable coding and decoding apparatus and method for compressing and transmitting multiview video using a multilayer spatial and temporal scalable coding technology and for providing two-dimensional or three-dimensional video services to various types of video terminals.
This work was supported by the IT R&D program of MIC/IITA [2005-S-403-02, "Development of Super- intelligent Multimedia Anytime-anywhere Realistic TV (SmarTV) Technology"] .
BACKGROUND ART
In general, data is compressed by removing temporal and spatial redundancy of data. The spatial redundancy denotes the identical color or the same objects in video. The temporal redundancy means adjacent pictures with almost no changes in moving pictures and repeated sound in audio. In a typical video coding method, the temporal redundancy is removed by temporal filtering based on motion compensation and the spatial redundancy is removed by spatial transformation.
In order to transmit multimedia data generated after removing data redundancy, various transmission mediums were introduced. The transmitting performance varies according to the types of the transmission mediums. Also, a scalable video coding technology was introduced to support the various speeds of transmission mediums and to transmit multimedia data at a transfer rate proper to a transmission environment.
The scalable video coding technology is one of coding technologies for controlling a resolution, a frame rate, and a signal-to-noise ratio (SNR) of video by cutting down a predetermined part of a compressed bit stream according to conditions, such as a transport bit rate, a transport error rate, and a system resource.
Fig. 1 is a diagram describing a scalable codαng technology according to a related art.
Referring to Fig. 1, the scalable video coding technology according to the related art performs temporal transform for realizing temporal scalable and performs two-dimensional spatial transform for realizing spatial scalable. Also, the scalable video coding technology realizes a quality scalability using texture coding. The motion coding scalably encodes motion information when spatial scalable is realized. As described above, one bit stream is generated through such coding algorithms.
In order to provide the temporal scalability and improve a compression rate in the scalable video coding, motion compensated temporal filtering (MCTF) and hierarchical B-pictures were used.
The MCTF performs wavelet transform using motion information in a clockwise direction in a video sequence. The wavelet transform is performed using a lifting scheme. The lifting scheme includes three processes, polyphase decomposition, prediction, and update.
The hierarchical B-pictures may be realized in various ways using a memory management control operation that manages a decoded picture buffer (DPB) for storing 16 pictures and the syntaxes of reference picture list reordering (RPLP) .
Recently, due to advances in technologies and demands of users, researchers are studying to develop a service for providing video information for scenes at diverse viewpoints, and a service allowing viewers to
edit video information transmitted from a broadcasting station and watch desired video among the video information. In order to provide the services, a technology for compressing multiview video is required. The multiview video compression technology is a technology for simultaneously coding videos from a plurality of cameras that provide multiview video and compressing, storing, and transmitting the coded video. If the multiview video is stored and transmitted without being compressed, a large transmission bandwidth is required to transmit the multiview video to a user through a broadcasting network or a wired/wireless Internet in real-time.
In the multiview video coding and decoding technology, each of video sequences is independently coded and transmitted and the transmitted coded video sequences are decoded. It is easily realized based on MPEG-1/2/4 or H.261/263/264. However, it is impossible to remove redundancy between videos, which is generated as the same object is photographed by a plurality of cameras .
In order to remove the redundancy between videos, a scalable video coding technology was introduced. In the scalable video coding technology for a single view point video, a single view point video is divided into video frames with multilayer resolutions in a spatial axis using a spatial filter, and a temporal and spatial scalable is performed on the divided video frames in a temporal axis through hierarchical bi-directional motion estimation. Also, quality scalability may be provided through entropy coding by hierarchical expression in transform coding.
However, since the scalable video coding technology was designed for a single viewpoint video, a large overhead may be generated in a video decoder because of a
high transport rate when a terminal reproduces three- dimensional videos with selective two-dimensional videos.
DISCLOSURE TECHNICAL PROBLEM
An embodiment of the present invention is directed to providing a multiview video scalable coding method and apparatus for effectively compressing videos and providing various video services to terminals in diverse environments through motion estimation with reference to adjacent images at a temporal and spatial axis for compressing multiview video and through motions, differential images, and intra prediction in different resolutions of adjacent videos for providing scalability on a temporal and spatial axis in a multiview video.
Another embodiment of the present invention is directed to providing a scalable video decoding method and apparatus for receiving a scalable coded signal and decoding the received signal for multiview video. Other objects and advantages of the present invention can be understood by the following description, and become apparent with reference to the embodiments of the present invention. Also, it is obvious to those skilled in the art of the present invention that the objects and advantages of the present invention can be realized by the means as claimed and combinations thereof.
TECHNICAL SOLUTION In accordance with an aspect of the present invention, there is provided a scalable video coding apparatus for a multiview video including: a basic scalability video encoder for separating one basic video to video frames with multilayer resolutions and performing scalable video coding through performing
temporal and spatial prediction on the separated low resolution image frame and at least one of the separated high resolution video frames; and a plurality of extended scalability video encoders for receiving an own video and at least one of adjacent videos as reference videos which are captured at the same time, separating the received videos in video frames with multilayer resolutions through spatial filtering, performing scalable video coding for the separated low resolution image frame through temporal and spatial prediction with reference to the adjacent video frames at the same temporal axis as well as own adjacent frame, and performing scalable video coding for the separated high resolution video frame through temporal and spatial prediction with reference to lower layers of the adjacent video frames at the same temporal axis as well as an own lower layer.
In accordance with another aspect of the present invention, there is provided a scalable video coding method for multiview video, including the steps of: (a) separating one basic video to video frames with multilayer resolutions and performing scalable video coding through temporal and spatial prediction; and (b) receiving an own video and at least one of adjacent videos, which are captured at the same time, and performing scalable video coding through temporal and spatial prediction by separating the received videos into video frames with multilayer resolutions, wherein the step (b) of receiving an own video and at least one of adjacent videos includes the steps of: (c) performing scalable video coding for low resolution video frames through temporal and spatial prediction with reference to the adjacent video frames at the same temporal axis as well as own adjacent frames; and (d) performing scalable video coding for at least one of high resolution video frames through temporal and spatial prediction with
reference to lower layers of the adjacent video frames as well as own lower layer.
In accordance with another aspect of the present invention, there is provided a scalable video decoding apparatus for multiview video including: a basic scalability video decoder for receiving a bitstream generated by scalably coding one basic video and restoring a basic video through inverse temporal and inverse spatial transform; and a plurality of extended scalability video decoders for receiving a bitstream generated scalable-coded through temporal and spatial prediction for own video and reference videos, which are captured at the same time, restoring at least one of hα gh resolution image frames through inverse temporal and spatial prediction whether lowei layers of adjacent video frames that are reference video are referred as well as own lower layer or not, restoring a lower resolution image frame through inverse temporal and spatial prediction according to whether the adjacent video frames are referred or not at the same temporal axis as well as own adjacent frame, and restoring an image through inverse spatial filtering for the restored high resolution image frames and the restored low resolution image frame. The extended scalability video decoder may include: a demultiplexing unit for demultiplexing a received bitstream; at least one of enhancement decoding unit for performing scalable decoding for a high resolution image signal outputted from the demultiplexing unit through inverse temporal and spatial motion estimation according to whether lower layers of adjacent videos that are reference videos are referred as well as a lower layer of own video or not; a basic layer decoding unit for performing scalable decoding for a low resolution image signal outputted from the demultiplexing unit through
inverse motion estimation for reference video frames at a temporal axis as well as inverse temporal and spatial motion estimation for own video frame; and an inverse spatial video filtering unit for restoring an image through inverse spatial filtering for the restored high resolution images from the enhancement decoding unit and the restored low resolution image from the basic decoding unit .
The enhancement decoding unit may perform scalable decoding with reference to a flag indicating whether motion vector information of a lower layer is used or not, a flag indicating whether a reference index of a lower layer for adjacent video is used as prediction information or not, a flag indicating whether a type of an intra block of a lower layer is used as prediction information or not, a flag indicating whether a differential image value of a lower layer is used or not, and an index of a reference view used for prediction.
In accordance with another aspect of the present invention, there is provided a scalable video decoding method for multiview video, including the steps of: (a) performing scalable video decoding for one basic video through inverse temporal and spatial prediction; and (b) receiving a bitstream scalable-coded with reference to an own video and at least one of adjacent videos as reference videos, which are captured at the same time, and performing scalable video decoding through inverse temporal prediction and inverse spatial prediction, wherein the step (b) of receiving a bitstream scalable- coded with reference to an own video and at least one of adjacent videos as reference videos includes the steps of: (c) performing scalable video decoding for demultiplexed high resolution image signal through inverse temporal and spatial prediction according to whether lower layers of the adjacent video frames are
referred as well as an own lower layer; and (d) performing scalable video decoding for demultiplexed low resolution image signal through inverse temporal and spatial prediction according to whether the adjacent video frames are referred at the same temporal axis as well as an own adjacent frame.
In the performing scalable video decoding for demultiplexed high resolution image signal, scalable decoding may be performed with reference to a flag indicating whether motion vector information of a lower layer is used or not, a flag indicating whether a reference index of a lower layer for adjacent video is used as prediction information or not, a flag indicating whether a type of an intra block of a lower layer is used as prediction information or not, a flag indicating whether a differential image value of a lower layer is used or not, and an index of a reference view used for prediction .
ADVANTAGEOUS EFFECTS
According to the present invention, a multiview video can be effectively compressed by expanding the temporal and spatial hierarchical structure of a typical scalable coding technology to multiview videos. Also, a video service can be scalably provided to various 2-D or 3-D terminals by forming a hierarchical structure in a temporal and spatial axis for multiview video.
BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a diagram describing a scalable coding technology according to the related art.
Fig. 2 illustrates a scalable coding apparatus for multiview video in accordance with an embodiment of the present invention. Fig. 3 is a block diagram illustrating an extended
scalability video encoder in accordance with an embodiment of the present invention.
Fig. 4 describes a reference structure for predicting and referencing adjacent frames in scalable video coding according to an embodiment of the present invention.
Fig. 5 illustrates the reference structure of Fig. 4 with circle symbols and cross symbols in a spatial (view) layer axis with a time fixed. Fig. 6 illustrates a reference structure for a B- frame structure.
Fig. 7 illustrates a reference structure of Fig. 6 with circle symbols and cross symbols in a spatial (view) layer axis with a time fixed. Fig. 8 is a block diagram illustrating an apparatus for scalable coding multiview video in accordance with another embodiment of the present invention.
Fig. 9 describes a refererce structure for a basic video 0 (91) and adjacent videos 1 (92) and 2(93) in Fig. 8.
BEST MODE FOR THE INVENTION
The advantages, features and aspects of the invention will become apparent from the following description of the embodiments with reference to the accompanying drawings, which is set forth hereinafter. Therefore, those skilled in the field of this art of the present invention can embody the technological concept and scope of the invention easily. In addition, if it is considered that detailed description on a related art may obscure the points of the present invention, the detailed description will not be provided herein. The specific embodiments of the present invention will be described in detail hereinafter with reference to the attached drawings .
Fig. 2 illustrates a scalable coding apparatus for multiview video in accordance with an embodiment of the present invention.
In Fig. 2, five videos 0 to 4 are input from five cameras and each of the input videos 0 to 4 compressed by a scalable video encoder.
Referring to Fig. 2, the scalable coding apparatus according to the present embodiment includes a basic scalability video encoder 21 and extended scalability video encoders 22 to 25. The video encoder 21 performs 2-D spatial transformation and temporal transformation on the video 0 which is a basic video. The video encoder 21 also performs scalable coding through motion coding and texture coding. Each of the extended scalability video encoders 22 to 25 receives not only own video which is assigned itself but also at least one of adjacent videos as reference video and separates the received videos into frames with multilayer resolutions through spatial filtering and temporal filtering. Also, each of the extended scalability video encoders 22 to 25 performs scalable coding on the separated frames with reference to temporal and spatial hierarchical image information and compression parameters of the adjacent videos as well as the own video. In Fig. 2, the video 0 is defined as a basic video, and the basic scalability video encoder 21 performs scalable coding on the video 0. The basic scalability video encoder 21 has the same structure of a single viewpoint video scalable coding apparatus according to the related art. That is, the scalable coding apparatus according to the present embodiment has a structure compatible to typical scalable condition apparatuses for basic video.
In order to scalably compress input video with reference to the adjacent videos as well as the own video,
the videos 1 to 4 are compressed through the extended scalability video encoders 22 to 25. Like a single viewpoint video scalable coding apparatus, each of the extended scalability video encoders 22 to 25 receives not only own video which is assigned to itself but also adjacent videos as a reference video, separates the received videos into frames with multilayer resolutions through spatial filtering, and performs scalable coding on the separated frames with reference to lower layers of the other videos assigned to neighbor encoders as well as that of the video assigned to itself. Also, the extended scalability encoder 23 that compresses the video 2 performs compression through bidirectional prediction using multilayer temporal and spatial resolution video information of the basic video 0 and the video 4.
Accordingly, the scalable coding apparatus according to the present embodiment can provide a typical 2-D video service using only the basic scalability video encoder 21. Also, the scalable coding apparatus according to the present embodiment can provide a stereo video service using the basic scalability video encoder 21 for the basic video 0 and the extended scalability video encoder 25 for the video 4. Furthermore, the scalable coding apparatus according to the present embodiment can provide a three-view video service or a five-view video service by selectively combining the basic scalability video encoder 21 with the extended scalability video encoders 22 to 25.
Hereinafter, the structure and the function of the extended scalability video encoder will be described with reference to Fig. 3.
Fig. 3 is a block diagram illustrating an extended scalability video encoder in accordance with an embodiment of the present invention. Referring to Fig. 3, the extended scalability video
encoder according to the present embodiment includes a spatial video filtering unit 31, temporal video filtering units 330 to 340 a basic layer encoder 33, at least one of enhancement layer encoders 34, and a multiplexer 35. The spatial video filtering unit 31 separates an own video and reference videos into frames with multilayer resolutions through spatial filtering. The temporal video filtering units 330 and 340 separate the output videos from the spatial video filtering unit 31 through temporal filtering. The basic Layer encoder 33 performs scalable coding through not only temporal and spatial motion estimation for the own video frames of temporal low frequency images outputted from the temporal video filtering unit 330 but also through the motion estimation for the reference video frames on a temporal axis. Each of the enhancement layer encoders 34 with reference to the lower layers of the reference videos as well as the lower layer of the own video for temporal high frequency images outputted from the temporal video filtering unit 340. The multiplexer 35 outputs one bitstream by multiplexing outputs from the basic layer encoder 31 and the enhancement encoders 34.
The spatial video filtering unit 31 receives an own video assigned to itself, which is captured by an own camera, and the other videos captured by the other cameras as reference videos at a predetermined time interval and separates the received videos into frames with multilayer resolutions through spatial filtering based on MCTF or hierarchical B structure. The basic layer encoder 33 and the enhancement layer encoder 34 may includes temporal video filtering units 330 and 340, motion encoders 331 and 341, subtractor 332 and 342, spatial transformers 333 and 343, quantizers 334 and 344, entropy encoders 335 and 345. As described above, the basic layer encoder 33 and the enhancement
layer encoder 34 have the structure similar to a typical scalable video encoder. Hereinafter, the functions of the constituent elements in the encoders 33 and 34 will be described. The temporal video filtering unit 330 of the basic layer encoder 33 separates low frequency images, which are separated through spatial filtering, in a temporal axis through filtering based or MCTF or hierarchical B- structure. ALso, the temporal video filtering unit 340 of the enhancement layer encoder 34 separates the hαgh frequency images, which are separated through the spatial filtering, into a temporal axis through filtering based on MCTF or hierarchical B-structure.
The motion encoders 331 and 341 include a motion estimation block or a motion compensation block. The motion estimation block performs motion estimation of a current frame using a reference frame as a basis and calculates a motion vector for forward motion estimation or bi-directional estimation. Here, the motion encoders 331 and 341 may use not only own frames but also peripheral frames as reference frames for motion estimation. The motion encoders 331 and 341 use a block matching algorithm that is generally used for motion estimation. That is, the motion encoders 331 and 341 calculates displacement when an error becomes minimum while moving a given motion block in a predetermined search area of a reference frame and estimates the calculated displacement as the motion vector. The motion encoders 331 and 341 provide motion data, such as motion vectors obtained as the result of motion estimation, a size of a motion block, and a reference frame number, to the entropy encoders 335 and 345. Also, the motion compensation block generates a temporal estimated frame for a current frame by performing the motion compensation for a forward reference frame, a backward reference frame,
or a bi-directional reference frame using the calculated motion vector.
The subtractors 332 and 342 remove the temporal redundancy of a video by subtracting a current frame and a temporal estimated frame. The spatial transformers 333 and 343 remove spatial redundancy from the temporal redundancy removed frame using a predetermined spatial transformation method that supports spatial scalability. As the spatial transformation method, Discrete Cosine Transform (DCT) and wavelet transform are widely used.
The quantizers 334 and 344 quantize transform coefficients from the spatial transformers 333 and 343. The quantization is a process of transforming the transform coefficient, which is expressed as a predetermined real number, to a discrete value by dividing the transform coefficient by predetermined periods and matching the discrete value to a predetermined index.
The entropy encoders 335 and 345 lossless-encode the quantized transform coefficient from the quantizers 334 and 344 and the motion data provided from the motion estimation block and generates an output bitstream. As the lossless encoding method, arithmetic coding or variable length coding may be used. Meanwhile, intra prediction may be performed for an intra block before spatial transform. In order to perform the intra prediction, the enhancement layer encoder may include a 2-D spatial interpolation block for receiving a restored reference frame from the lower layer encoder and performing two-dimensional (2-D) spatial interpolation and an intra prediction block for performing the intra prediction.
In general, inter prediction searches a block most similar to a predetermined block of a current frame, obtains a predicted block that can express the current
block best, and quantizes differences between the current block and the predicted block. The inter prediction includes bi-directional prediction using two reference frames, forward prediction using a past reference frame, and backward prediction using a future reference frame.
Meanwhile, the intra prediction predicts a current block using frames adjacent to the current block. The intra prediction is different from the other because the intra prediction uses information in a current frame only and does not use the other frames in the same layer or frames of the other layer.
Intra base prediction may be used when a current frame includes frames of a lower layer having the same temporal location. A macro block of a current frame can be effectively predicted from the macro blocks of a corresponding basic frame. That is, the difference between a macro block of a current block and a macro block of a corresponding basic frame is quantized. When the resolution of the lower layer is different from that of the current layer, the macro block of the basic frame is up-sampled to the resolution of the current layer before calculating the difference.
Residual prediction is the extension of the inter prediction from a single layer to multilayer. The residual prediction calculates a difference between the difference obtained from the inter prediction of a current layer and the other different obtained from the inter prediction of a lower layer and quantizes the calculated difference. In the present embodiment, the enhancement encoder uses a value obtained by multiplying two to a motion vector of a basic layer image that is a low resolution image for an own video and basic layer images that is a low resolution image for the other videos as reference videos to perform motion estimation when encoding high-
resolution image frames.
In the present embodiment, the enhancement encoder performs differential image estimation by interpolating remaining images after predicting a basic layer image (low resolution image) of an own video and basic layer images (low resolution images) of the other videos as reference videos when encoding high resolution image frames .
Also, the enhancement encoder performs intra prediction using a basic layer image that is a low resolution image of an own video and basic layer images that is a low resolution image of the other videos as reference videos in an intra prediction mode when encoding high resolution image frames. Fig. 4 is a diagram illustrating a reference structure for predicting and referencing adjacent frames in scalable video coding according to an embodiment of the present invention.
In Fig. 4, a P macro block denotes single- directional prediction and a B macro block denotes a bidirectional prediction. The reference structure according to the present embodiment allows single directional prediction and bi-directional prediction to be performed in a plurality of resolution layers as well as in a temporal axis and a spatial axis.
As shown in Fig. 4, the reference structure according to the present embodiment includes a two-layer structure formed of one basic layer and an enhancement layer. However, the reference structure may further include more enhancement layers.
In Fig. 4, a reference numeral 41 denotes a predicting and referencing operation for predicting and referencing adjacent frames, which is performed in a basic layer encoder and an enhancement layer encoder in a basic scalability video encoder 21 of Fig. 2. A
reference numeral 42 denotes a predicting and referencing operation which is performed in a basic layer encoder and an enhancement layer encoder in an extended scalability video encoder 22 for a video 1 of Fig. 2. A reference numeral 43 denotes a predicting and referencing operation, which is performed in a basic layer encoder and an enhancement layer encoder in an extended scalability video encoder 23 for a video 2 of Fig. 2.
In other words, the basic layer 0 (LO) denotes a reference structure performed in each of the basic layer encoder of the basic scalability video encoder 21, the basic layer encoder of the extended scalability video encoder 22 for the video 1, and the basic layer encoder of the extended scalability video encoder 23 for the video 2.
Like the basic layer 0 (L) , the enhancement layer 1
(Ll) denotes a reference structure performed in each of the enhancement layer encoder of the basic scalability video encoder 21, the enhancement layer encoder of the extended scalability video encoder 22 for the video 1, and the enhancement layer encoder of the extended scalability video encoder 23 for the video 2.
Referring to Fig. 4, the basic layer encoder of the basic scalability video encoder 21 performs a scalable video coding operation by predicting and referencing adjacent frames for own low resolution image frames in a temporal axis like the video encoder according to the related art. The basic layer encoder of the extended scalability video encoder 22 for the video 1 performs bi- directional prediction for own frame using the frames of a video 0 and the frames o£ a video 2, which are reference video frames located at the same temporal axis. Also, the basic layer encoder of the extended scalability video encoder 23 for the video 2 perform single- directional prediction with reference to the basic video
0 and performs bi-directional prediction using the own frame at the same time.
Meanwhile, the enhancement layer 1 (Ll), the upper layer of the basic layer, performs spatial and temporal prediction for own video frame and performs prediction in reference with own frames of a basic layer and adjacent frames of the basic layer. In Fig. 4, each of macro blocks includes three circles or cross symbols for indicating whether a lower layer is referred or not. Here, a circle or a cross symbol in a middle row among the three symbols indicate whether a lower layer of an own video frame is referred or not, and circle symbols or cross symbols in a top row or in a bottom row indicate whether lower layers of adjacent video frames are referred or not.
Referring to Fig. 4, the enhancement layer encoder of the basic scalability video encoder 21 performs scalable video coding with reference to own frames of a lower layer like the encoder according to the related art. The enhancement layer encoder of the extended scalability video encoder 22 for the video 1 performs bi-direction prediction with reference to lower layer frames of a video 0 and lower layer frames of a video 2, which are adjacent frames, as well as own lower layer frames. The enhancement layer encoder of the extended scalability video encoder 23 for the video 2 performs prediction with reference to the lower layer frames of the basic video 0 as well as the own lower layer frame.
Fig. 5 illustrates the reference structure of Fig. 4 with circle symbols and cross symbols in a spatial (view) layer axis with a time fixed.
In Fig. 5, the reference structure includes three layers. A reference numeral 51 denotes a reference structure for a video 0, a reference numeral 52 denotes a reference structure for a video 1, and a reference
numeral 53 denotes a reference structure for a video 2.
In Fig. 5, since the macro block of an enhancement layer 2 (50) for the video 0 includes two cross symbols at the right and left columns and a circle symbol at the middle column, a coding operation is performed based on motion, differential images, and intra prediction used in the scalable video coding (SVC) according to the related art with reference to a lower layer of an own video only. On the contrary, since the macro block of an enhancement layer 2 for a video 1 (52) includes all circle symbols, a coding operation is performed with reference to lower layers of adjacent videos as well as a lower layer of an own video. Also, since the macro block of an enhancement layer 2 for a video 2 (53) includes circle symbols at the middle and left columns, a scalable video coding operation is performed with reference to the lower layer of an own video and the lower layer of a video 0.
In order to indicate whether a lower layer is referenced or not as described above, a scalable video includes predetermined sentences that describe information about videos in the reference layer.
A sentence ref view Idx denotes a view number of reference video in a lower layer. Here, a flag base_mode_flag indicates whether the motion vector information of a lower layer :.s used for estimating a motion in a current block or not. If the flag base_mode_flag is 1, a variable ref_view_Idx must have a view number of a lower layer for indicating which motion vector information is used. A flag base_mode_refinement_flag indicates whether or not the motion vector information of a lower layer is used for predicting a motion vector of a current block. Unlike the flag base_mode_flag, the reference index of a lower layer is also used as prediction information. Therefore, if the flag basejnode _refinement_flag is 1, a
variable ref_view_Idx must have a view number of a lower layer for indicating which motion vector and reference index information are used. A flag intra_base_flag indicates a type of an intra block in a lower layer, which is used as prediction information of a current block. If the flag intra_base_ flag is 1, information about an intra prediction mode of a lower layer is used for a current block. Therefore, a variable ref_view_Idx must have a view number of a lower layer for indicating which intra block type information is used.
A flag residual_prediction _flag indicates whether a differential image value of a lower layer is used for predicting a differential image of a current block or not. If the flag residula_prediction_flag is 1, the differential image information of a lower layer is up- sampled. Also, the variable ref_view_Idx must have a view number of a lower layer for indicating which differential image information is used. Table 1 shows the above described sentences in the scalable video.
Table 1
Fig. 6 illustrates a reference structure for a B- frame structure.
In Fig. 6, a reference numeral 61 denotes a reference structure for a video 0 that is a basic video,
a reference numeral 62 denotes a reference structure for a video 1, and a reference numeral 63 denotes a reference structure for a video 2.
Fig. 7 is a diagram il Lustrating a reference structure of E"ig. 6 with circle symbols and cross symbols in a spatial (view) layer axis with a time fixed.
In Fig. 7, the reference structure includes three layers. A reference numeral 71 denotes a reference structure for a video 0 that is a basic video, a reference numeral 72 denotes a reference structure for a video 1, and a reference numeral 73 denotes a reference structure for a video 2.
Referring to Figs. 6 and 7, the video 0 which are basic videos 62 and 71 perform scalable video coding with reference to own lower layer frames only. However, the video 1 and video 2 perform the scalable video coding with reference to the lower layer frames of adjacent videos as well as own lower layer frames.
The above described reference structures can be identically applied to a P-frame structure.
Fig. 8 is a block diagram i Llustrating an apparatus for scalable coding multiview video in accordance with another embodiment of the present invention. Fig. 9 is a diagram illustrating a reference structure for a basic video 0 (91) and adjacent videos 1 (92) and 2(93) in Fig. 8.
Referring to Figs. 8 and 9, a basic scalability video encoder 81 performs scalable coding for the basic video 0. The basic scalability video encoder 81 refers own lower layer frames like a single viewpoint video scalable coding apparatus. Therefore, the scalable coding apparatus according to another embodiment can be compatible with existing scalable coding apparatus.
Extended scalability video encoders 82 to 85 perform scalable video coding for the videos 1 to 4. Each of the
extended scalability video encoders 82 to 85 separates video into frames with multilayer resolution, performs temporal and spatial prediction for the separated video frames, and performs compression with reference to temporal and spatial layer image information of adjacent videos and compression parameters.
As shown in Fig. 9, the enhancement layer 1 (92) of the video 1 performs scalable video coding with reference to a lower layer of the video 0, and the enhancement layer 2 (93) of the video 2 performs scalable video coding with reference to a lower layer of the video 1. In other word, the extended scalability video encoders perform scalable video coding with reference to one next video only in the scalable coding apparatus according to another embodiment as shown in Fig. 8.
Accordingly, the scalable coding apparatus according to another embodiment can provide a 2-D video service using the basic scalability video encoder 81. Also, the scalable coding apparatus according to another embodiment can provide a stereo video service using the basic scalability video encoder 81 for the basic video 0 and the extended scalability video encoder 82 for the video 1. Furthermore, the scalable coding apparatus according to another embodiment can provide t.hree-view service using the basic scalability video encoder 81 for the basic video 0 and the extended scalability video encoders 82 and 84 for the videos 1 and 3. The scalable coding apparatus according to another embodiment can provide a five-view service using the basic scalability video encoder 81 for the basic video 0 and the extended scalability video encoders 82 and 85 for the videos 1 and 4.
The scalable coding technology for multiview video according to the present invention will be briefly described again.
At first, one of basic videos is separated into frames with multilayer resolutions using a spatial filter m a spatial axis. Then, a spatial and temporal scalable video coding operation as performed on the separated low resolution image frames through motion estimation in a temporal axis. Also, the spatαal and temporal scalable video coding operation is performed on the separated hagh resolution VLdeo frames through hierarchical motion estimation in a temporal axis with reference to a lower layer. Then, a bitstream is generated by multiplexing the coded low resolution image frame and at least one of the coded high resolution video frames. As described above, the scalable video coding for basic video according to bhe present embodiment is identical to that according to the related art.
Hereinafter, the scalable video coding for multivj ew video accord mg to the present embodiment will be described.
At first, own video is received with at least one of adjacent videos as reference videos. The received own video and adjacent videos are separated into video frames with multilayer resolutions using a spatial filter m a spatial axis. Then, the temporal and spatial scalable video coding is performed on the separated low resolution image frame through hierarchical motion estimation with reference to adjacent frames as reference frames as well as own frame in a temporal axis. Also, the temporal and spatial scalable video coding is performed on t he separated high resolution image frame through hierarchical motion estimation with reference to lower layers of adjacent video frames as well as a lower layer of the own video frame in a temporal axis. Then, a bitstream is generated by multiplexing the coded Low resolution image frame and at least one of the coded high resolution video frames.
As described above, the extended scalable video coding uses not only the own lower layer frames but also adjacent lower layer frames as reference frames unlike the scalable video coding for single viewpoint video, which uses an adjacent frame and a lower layer frame thereof .
Meanwhile, a scalable video decoding apparatus for multiview video according to an embodiment of the present invention performs the operations of the scalable video encoding apparatus according to the present embodiment in a reverse order.
The scalable video decoding apparatus according to the present embodiment includes a basic scalability video decoder and a plurality of extended scalability video decoders. The basic scalability video decoder receives a bitstream generated by scalable-coding one basic video and restoring the basic video through inverse temporal transformation and inverse spatial transformation. Each of the extended scalability video decoders receives a bitstream generated by scalable-coding own video and reference videos, which are captured at the same time through the temporal and spatial prediction. Then, one of the extended scalability video decoders restores at least one of high resolution video frame through inverse temporal and spatial prediction according to whether a lower layer of adjacent video frame is referred as well as an own lower layer and restores one low resolution image frame through inverse temporal and spatial prediction according to whether the adjacent video frames at the same temporal axis are referenced as well as the own adjacent frame. Then, one of the extended scalability video decoders restores video through performing inverse spatial filtering on the restored high resolution video frames and the restored low resolution image frame .
In the scalable decoding apparatus according to the present embodiment, the basic scalability video decoder has a structure identical to that of a typical scalable video decoder. Therefore, the detail description thereof is omitted.
In the present embodiment, the extended scalability video decoder includes a demultiplexer, at least one of enhancement layer decoders, a basic layer decoder, and an inverse spatial video filtering unit. The demultiplexer demultiplexes the received bitstream. Each of the enhancement layer decoder performs scalable decoding on high resolution video signal outputted from the demultiplexer through inverse temporal and spatial motion estimation according to whether adjacent videos are referred as well as a lower layer of own video. The basic layer decoder performs scalable decoding on low resolution image signal outputted from the demultiplexer through inverse temporal and spatial motion estimate on not only through inverse temporal and spatial motion estimation for own video frame but also inverse motion estimation for reference video frames on a temporal axis. The inverse spatial video filtering unit restores a video through performing inverse spatial filtering on the restored high resolution video frame from the enhancement decoder and on the restored low resolution image frame from the basic layer decoder .
Here, the basic layer decoder and the enhancement layer decoder perform operations of the basic layer encoder and the enhancement layer encoder in an inverse order. Therefore, the detail descriptions of the basic layer decoder and the enhancement layer decoder are omitted.
When the enhancement decoder performs a decoding operation for a high resolution video signal, the enhancement decoder refers a flag indicting whether the
motion vector information of a lower layer is used or not, a flag indicating whether a reference index of a lower layer for adjacent video is used as prediction information or not, a flag indicating whether a type of an intra block of a lower layer is used as prediction information or not, a flag indicating whether a differential video value of a lower layer is used or not, and an index of a reference view used for prediction.
As described above, the technology of the present invention can be realized as a program and stored in a computer-readable recording medium, such as CD-ROM, RAM, ROM, floppy disk, hard disk and magneto-optical disk. Since the process can be easily implemented by those skilled in the art of the present invention, further description will not be provided herein.
While the present invention has been described with respect to certain preferred embodiments, it will be apparent to those skilled in the art that various changes and modifications may be made without departing from the scope of the invention as defined in the following claims.
INDUSTRIAL APPLICABILITY
According to the present invention, multiview video can be effectively compressed by expanding a temporal and spatial hierarchical structure of a typical scalable coding technology to the multiview video. Also, a video service can be scalably provided to various types of 2-D or 3-D terminals by forming a hierarchical structure on a temporal and spatial axis for the multiview video according to the present invention.
Claims
1. A scalable video coding apparatus for a multiview video comprising: a basic scalability video encoder for separating one basic video into video frames with multilayer resolutions and performing scalable video coding through performing temporal and spatial prediction on the separated low resolution image frame and at least one of the separated high resolution video frames; and a plurality of extended scalability video encoders for receiving an own video and at least one of adjacent videos as reference videos which are captured at the same time, separating the received videos in video frames with multilayer resolutions through spatial filtering, performing scalable video coding for the separated low resolution image frame through temporal and spatial prediction with reference to the adjacent video frames at the same temporal axis as well as own adjacent frame, and performing scalable video coding for the separated high resolution video frame through temporal and spatial prediction with reference to lower layers of the adjacent video frames at the same temporal axis as well as an own lower layer.
2. The scalable video coding apparatus of claim 1, wherein the extended scalability video encoder includes: a spatial video filtering means for separating own video and adjacent videos as reference videos into video frames with multilayer resolutions through spatial filtering; a basic layer encoding means for separating the own video and adjacent videos into low resolution image frames through temporal filtering and performing scalable coding through motion estimation for reference video frame in a temporal axis as well as temporal and spatial motion estimation for own video frame; at least one of enhancement layer encoding means for separating the own video and adjacent videos to high resolution video frames through temporal filtering and performing scalable coding through spatial and temporal motion estimation with reference to lower layers for the adjacent videos as well as a lower layer of own video; and a multiplexing means for outputting one bit stream by multiplexing the output of the basic layer encoding means and the output of the enhancement encoding means .
3. The scalable video coding apparatus of claim 2, wherein the enhancement encoding means sets a flag indicating whether motion vector information of a lower layer is used or not, a flag indicating whether a reference index of a lower layer for an adjacent video is used as prediction information or not, a flag indicating whether a type of an intra block of a lower layer is used as prediction information or not, and a flag indicting whether a differential video value of a lower layer is used or not, and marks an index of a used reference view as a coding result.
4. The scalable video coding apparatus of claim 2, wherein the enhancement layer encoding means further includes a two-dimensional (2-D) spatial interpolation means for performing 2-D spatial interpolation on a video frame restored for intra prediction for an intra block.
5. The scalable video coding apparatus of claim 2, wherein the enhancement layer encoding means performs coding through motions between frames, differential video and intra prediction on a temporal and spatial axis.
6. The scalable video coding apparatus of claim 5, wherein the enhancement layer encoding means performs motion estimation using a value obtained by multiplying two to a motion vector of a basic layer image that is a low resolution image for own video and a basic layer image that is a low resolution image for an adjacent video .
7. The scalable video coding apparatus of claim 5, wherein the enhancement layer encoding means performs differential video prediction by interpolating remaining images after predicting a basic layer image that is a low resolution image for own video and a basic layer image that is a low resolution image for an adjacent video.
8. A scalable video coding method for multiview video, comprising the steps of:
(a) separating one basic video to video frames with multilayer resolutions and performing scalable video coding through temporal and spatial prediction; and
(b) receiving an own video and at least one of adjacent videos, which are captured at the same time, and performing scalable video coding through temporal and spatial prediction by separating the received videos into video frames with multilayer resolutions, wherein the step of (b) receiving an own video and at least one of adjacent videos includes the steps of:
(c) performing scalable video coding for low resolution video frames through temporal and spatial prediction with reference to the adjacent video frames at the same temporal axis as well as own adjacent frames; and
(d) performing scalable video coding for at least one of high resolution video frames through temporal and spatial prediction with reference to lower layers of the adjacent video frames as well as own lower layer.
9. The scalable video coding method of claim 8, wherein as a result of performing the step of (d) performing scalable video coding for at least one of high resolution video frames, a flag indicating whether motion vector information of a lower layer is used or not, a flag indicating whether a reference index of a lower layer for an adjacent video is used as prediction information or not, a flag indicating whether a type of an intra block of a lower layei is used as prediction information or not, and a fLag indicting whether a differential video value of a lower layer is used or not are set, and an index of a used reference view is marked.
10. The scalable video coding method of claim 8, wherein in the step of (d) performing scalable video coding for at least one of high resolution video frames, a video is coded through motions between frames, differential images, and intra prediction on a temporal and spatial axis.
11. The scalable video coding method of claim 8, wherein in the step of (d) performing scalable video coding for at least one of high resolution video frames, two-dimensional spatial interpolation is performed for a video frame restored for intra prediction for an intra block.
12. The scalable video coding method of claim 8, wherein in the step of (d) performing scalable video coding for at least one of high resolution video frames, motion estimation is performed αsing a value obtained by multiplying two to a motion vector of a basic layer image that is a low resolution image for own video and a basic layer image that is a low resolution image for an adjacent video.
13. The scalable video coding method of claim 8, wherein in the step of (d) performing scalable video coding for at least one of high resolution video frames, differential video prediction is performed by interpolating remaining images after prediction of a basic layer image that is a low resolution image for own video and a basic layer image that is a low resolution image for an adjacent video.
14. A scalable video decoding apparatus for multiview video comprising: a basic scalability video decoder for receiving a bitstream generated by scalably coding one basic video and restoring a basic video through inverse temporal and inverse spatial transform; and a plurality of extended scalability video decoders for receiving a bitstream generated scalable-coded through temporal and spatial prediction for own video and reference videos, which are captured at the same time, restoring at least one of high resolution image frames through inverse temporal and spatial prediction whether lower layers of adjacent video frames that are reference video are referred as well as own lower layer or not, restoring a lower resolution image frame through inverse temporal and spatial prediction according to whether the adjacent video frames are referred or not at the same temporal axis as well as own adjacent frame, and restoring an image through inverse spatial filtering for the restored high resolution image frames and the restored low resolution image frame.
15. The scalable video decoding apparatus of claim
14, wherein the extended scalability video decoder includes : a demultiplexing means for demultiplexing a received bitstream; at least one of enhancement decoding means for performing scalable decoding for a high resolution image signal outputted from the demultiplexing means through inverse temporal and spatial motion estimation according to whether lower layers of adjacent videos that are reference videos are referred as well as a lower layer of own video or not; a basic layer decoding means for performing scalable decoding for a low resolution image signal outputted from the demultiplexing means through inverse motion estimation for reference video frames at a temporal axis as well as inverse temporal and spatial motion estimation for own video frame; and an inverse spatial video filtering means restoring an image through inverse spatial filtering for the restored high resolution images from the enhancement decoding means and the restored low resolution image from the basic decoding means.
16. The scalable video decoding apparatus of claim
15, wherein the enhancement decoding means performs scalable decoding with reference to a flag indicating whether motion vector information of a lower layer is used or not, a flag indicating whether a reference index of a lower layer for adjacent video is used as prediction information or not, a flag indicating whether a type of an intra block of a lower layer is used as prediction information or not, a flag indicating whether a differential image value of a lower layer is used or not, and an index of a reference view used for prediction.
17. A scalable video decoding method for multiview video, comprising the steps of:
(a) performing scalable video decoding for one basic video through inverse temporal and spatial prediction; and
(b) receiving a bitstream scalable-coded with reference to an own video and at least one of adjacent videos as reference videos, which are captured at the same time, and performing scalable video decoding through inverse temporal prediction and inverse spatial prediction, wherein the step of (b) receiving a bitstream scalable-coded with reference to an own video and at least one of adjacent videos as reference videos includes the steps of:
(c) performing scalable video decoding for demultiplexed high resolution image signal through inverse temporal and spatial prediction according to whether lower layers of the adjacent video frames are referred as well as an own lower layer; and
(d) performing scalable video decoding for demultiplexed low resolution image signal through inverse temporal and spatial prediction according to whether the adjacent video frames are referred at the same temporal axis as well as an own adjacent frame .
18. The scalable video decoding method of claim 17, wherein in the step of (c) performing scalable video decoding for demultiplexed high resolution image signal, scalable decoding is performed with reference to a flag indicating whether motion vector information of a lower layer is used or not, a flag indicating whether a reference index of a lower layer for adjacent video is used as prediction information or not, a flag indicating whether a type of an intra block of a lower layer is used as prediction information or not, a flag indicating whether a differential image value of a lower layer is used or not, and an index of a reference view used for prediction .
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2009534496A JP5170786B2 (en) | 2006-10-25 | 2007-10-25 | Multi-view video scalable coding and decoding method, and coding and decoding apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2006-0103923 | 2006-10-25 | ||
KR20060103923 | 2006-10-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2008051041A1 true WO2008051041A1 (en) | 2008-05-02 |
Family
ID=39324782
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2007/005294 WO2008051041A1 (en) | 2006-10-25 | 2007-10-25 | Multi-view video scalable coding and decoding |
Country Status (3)
Country | Link |
---|---|
JP (1) | JP5170786B2 (en) |
KR (1) | KR100919885B1 (en) |
WO (1) | WO2008051041A1 (en) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009140913A1 (en) * | 2008-05-23 | 2009-11-26 | 华为技术有限公司 | Controlling method and device of multi-point meeting |
WO2010120804A1 (en) | 2009-04-13 | 2010-10-21 | Reald Inc. | Encoding, decoding, and distributing enhanced resolution stereoscopic video |
WO2010147289A1 (en) * | 2009-06-16 | 2010-12-23 | Lg Electronics Inc. | Broadcast transmitter, broadcast receiver and 3d video processing method thereof |
US20110012994A1 (en) * | 2009-07-17 | 2011-01-20 | Samsung Electronics Co., Ltd. | Method and apparatus for multi-view video coding and decoding |
WO2011042440A1 (en) * | 2009-10-08 | 2011-04-14 | Thomson Licensing | Method for multi-view coding and corresponding decoding method |
CN102036065A (en) * | 2009-10-05 | 2011-04-27 | 美国博通公司 | Method and system for video coding |
WO2012006299A1 (en) * | 2010-07-08 | 2012-01-12 | Dolby Laboratories Licensing Corporation | Systems and methods for multi-layered image and video delivery using reference processing signals |
CN102957910A (en) * | 2011-08-09 | 2013-03-06 | 索尼公司 | Image encoding apparatus, image encoding method and program |
CN103026706A (en) * | 2010-07-21 | 2013-04-03 | 杜比实验室特许公司 | Systems and methods for multi-layered frame-compatible video delivery |
EP2587804A1 (en) * | 2011-10-28 | 2013-05-01 | Samsung Electronics Co., Ltd | Method and apparatus for hierarchically encoding and decoding of a two-dimensional image, of a stereo image, and of a three-dimensional image |
EP2700233A2 (en) * | 2011-04-19 | 2014-02-26 | Samsung Electronics Co., Ltd. | Method and apparatus for unified scalable video encoding for multi-view video and method and apparatus for unified scalable video decoding for multi-view video |
CN103733620A (en) * | 2011-08-11 | 2014-04-16 | 高通股份有限公司 | Three-dimensional video with asymmetric spatial resolution |
US20140133567A1 (en) * | 2012-04-16 | 2014-05-15 | Nokia Corporation | Apparatus, a method and a computer program for video coding and decoding |
CN103828371A (en) * | 2011-09-22 | 2014-05-28 | 松下电器产业株式会社 | Moving-image encoding method, moving-image encoding device, moving image decoding method, and moving image decoding device |
US8855199B2 (en) * | 2008-04-21 | 2014-10-07 | Nokia Corporation | Method and device for video coding and decoding |
CN105025312A (en) * | 2008-12-30 | 2015-11-04 | Lg电子株式会社 | Digital broadcast receiving method providing two-dimensional image and 3d image integration service, and digital broadcast receiving device using the same |
TWI552575B (en) * | 2011-08-09 | 2016-10-01 | 三星電子股份有限公司 | Multi-view video prediction method and apparatus therefore and multi-view video prediction restoring method and apparatus therefore |
US9485503B2 (en) | 2011-11-18 | 2016-11-01 | Qualcomm Incorporated | Inside view motion prediction among texture and depth view components |
US9521418B2 (en) | 2011-07-22 | 2016-12-13 | Qualcomm Incorporated | Slice header three-dimensional video extension for slice header prediction |
US9648346B2 (en) | 2009-06-25 | 2017-05-09 | Microsoft Technology Licensing, Llc | Multi-view video compression and streaming based on viewpoints of remote viewer |
US10027943B2 (en) | 2012-04-03 | 2018-07-17 | Sun Patent Trust | Image encoding method, image decoding method, image encoding device, and image decoding device |
US11496760B2 (en) | 2011-07-22 | 2022-11-08 | Qualcomm Incorporated | Slice header prediction for depth maps in three-dimensional video codecs |
Families Citing this family (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101012760B1 (en) * | 2008-09-05 | 2011-02-08 | 에스케이 텔레콤주식회사 | System and Method for transmitting and receiving of Multi-view video |
KR101146138B1 (en) * | 2008-12-10 | 2012-05-16 | 한국전자통신연구원 | Temporal scalabel video encoder |
KR101144752B1 (en) * | 2009-08-05 | 2012-05-09 | 경희대학교 산학협력단 | video encoding/decoding method and apparatus thereof |
WO2011016701A2 (en) * | 2009-08-07 | 2011-02-10 | 한국전자통신연구원 | Motion picture encoding apparatus and method thereof |
KR20110015356A (en) | 2009-08-07 | 2011-02-15 | 한국전자통신연구원 | Video encoding and decoding apparatus and method using adaptive transform and quantization domain that based on a differential image signal characteristic |
EP2591602A1 (en) * | 2010-07-06 | 2013-05-15 | Koninklijke Philips Electronics N.V. | Generation of high dynamic range images from low dynamic range images |
JP5663093B2 (en) * | 2010-10-01 | 2015-02-04 | ドルビー ラボラトリーズ ライセンシング コーポレイション | Optimized filter selection for reference picture processing |
WO2013051896A1 (en) * | 2011-10-05 | 2013-04-11 | 한국전자통신연구원 | Video encoding/decoding method and apparatus for same |
WO2013076991A1 (en) * | 2011-11-25 | 2013-05-30 | パナソニック株式会社 | Image coding method, image coding device, image decoding method and image decoding device |
KR101346349B1 (en) * | 2012-01-30 | 2013-12-31 | 광운대학교 산학협력단 | Apparatus and Method for scalable multi-view video decoding |
WO2013115609A1 (en) * | 2012-02-02 | 2013-08-08 | 한국전자통신연구원 | Interlayer prediction method and device for image signal |
JP6050488B2 (en) * | 2012-07-06 | 2016-12-21 | サムスン エレクトロニクス カンパニー リミテッド | Multi-layer video encoding method and apparatus for random access, and multi-layer video decoding method and apparatus for random access |
WO2014088316A2 (en) * | 2012-12-04 | 2014-06-12 | 인텔렉추얼 디스커버리 주식회사 | Video encoding and decoding method, and apparatus using same |
EP2961166B1 (en) * | 2013-02-25 | 2020-04-01 | LG Electronics Inc. | Method for encoding video of multi-layer structure supporting scalability and method for decoding same and apparatus therefor |
US10616607B2 (en) | 2013-02-25 | 2020-04-07 | Lg Electronics Inc. | Method for encoding video of multi-layer structure supporting scalability and method for decoding same and apparatus therefor |
KR101595397B1 (en) * | 2013-07-26 | 2016-02-29 | 경희대학교 산학협력단 | Method and apparatus for integrated encoding/decoding of different multilayer video codec |
WO2015016535A1 (en) * | 2013-07-30 | 2015-02-05 | 주식회사 케이티 | Image encoding and decoding method supporting plurality of layers and apparatus using same |
US9894369B2 (en) | 2013-07-30 | 2018-02-13 | Kt Corporation | Image encoding and decoding method supporting plurality of layers and apparatus using same |
US9762909B2 (en) | 2013-07-30 | 2017-09-12 | Kt Corporation | Image encoding and decoding method supporting plurality of layers and apparatus using same |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030202592A1 (en) * | 2002-04-20 | 2003-10-30 | Sohn Kwang Hoon | Apparatus for encoding a multi-view moving picture |
WO2006062377A1 (en) * | 2004-12-10 | 2006-06-15 | Electronics And Telecommunications Research Institute | Apparatus for universal coding for multi-view video |
WO2006104326A1 (en) * | 2005-04-01 | 2006-10-05 | Industry Academic Cooperation Foundation Kyunghee University | Scalable multi-view image encoding and decoding apparatuses and methods |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7468745B2 (en) * | 2004-12-17 | 2008-12-23 | Mitsubishi Electric Research Laboratories, Inc. | Multiview video decomposition and encoding |
KR20060101847A (en) * | 2005-03-21 | 2006-09-26 | 엘지전자 주식회사 | Method for scalably encoding and decoding video signal |
-
2007
- 2007-10-25 JP JP2009534496A patent/JP5170786B2/en not_active Expired - Fee Related
- 2007-10-25 WO PCT/KR2007/005294 patent/WO2008051041A1/en active Application Filing
- 2007-10-25 KR KR1020070108021A patent/KR100919885B1/en not_active IP Right Cessation
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030202592A1 (en) * | 2002-04-20 | 2003-10-30 | Sohn Kwang Hoon | Apparatus for encoding a multi-view moving picture |
WO2006062377A1 (en) * | 2004-12-10 | 2006-06-15 | Electronics And Telecommunications Research Institute | Apparatus for universal coding for multi-view video |
WO2006104326A1 (en) * | 2005-04-01 | 2006-10-05 | Industry Academic Cooperation Foundation Kyunghee University | Scalable multi-view image encoding and decoding apparatuses and methods |
Cited By (57)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8855199B2 (en) * | 2008-04-21 | 2014-10-07 | Nokia Corporation | Method and device for video coding and decoding |
KR101224097B1 (en) | 2008-05-23 | 2013-01-21 | 후아웨이 테크놀러지 컴퍼니 리미티드 | Controlling method and device of multi-point meeting |
WO2009140913A1 (en) * | 2008-05-23 | 2009-11-26 | 华为技术有限公司 | Controlling method and device of multi-point meeting |
US8339440B2 (en) | 2008-05-23 | 2012-12-25 | Huawei Technologies Co., Ltd. | Method and apparatus for controlling multipoint conference |
CN105025312A (en) * | 2008-12-30 | 2015-11-04 | Lg电子株式会社 | Digital broadcast receiving method providing two-dimensional image and 3d image integration service, and digital broadcast receiving device using the same |
EP2420068A1 (en) * | 2009-04-13 | 2012-02-22 | RealD Inc. | Encoding, decoding, and distributing enhanced resolution stereoscopic video |
WO2010120804A1 (en) | 2009-04-13 | 2010-10-21 | Reald Inc. | Encoding, decoding, and distributing enhanced resolution stereoscopic video |
CN102804785A (en) * | 2009-04-13 | 2012-11-28 | 瑞尔D股份有限公司 | Encoding, decoding, and distributing enhanced resolution stereoscopic video |
EP2420068A4 (en) * | 2009-04-13 | 2012-08-08 | Reald Inc | Encoding, decoding, and distributing enhanced resolution stereoscopic video |
US20120092453A1 (en) * | 2009-06-16 | 2012-04-19 | Jong Yeul Suh | Broadcast transmitter, broadcast receiver and 3d video processing method thereof |
CN105025309A (en) * | 2009-06-16 | 2015-11-04 | Lg电子株式会社 | Broadcast transmitter and 3D video data processing method thereof |
CN102461183A (en) * | 2009-06-16 | 2012-05-16 | Lg电子株式会社 | Broadcast transmitter, broadcast receiver and 3d video processing method thereof |
US9578302B2 (en) | 2009-06-16 | 2017-02-21 | Lg Electronics Inc. | Broadcast transmitter, broadcast receiver and 3D video data processing method thereof |
US20150350625A1 (en) * | 2009-06-16 | 2015-12-03 | Lg Electronics Inc. | Broadcast transmitter, broadcast receiver and 3d video data processing method thereof |
WO2010147289A1 (en) * | 2009-06-16 | 2010-12-23 | Lg Electronics Inc. | Broadcast transmitter, broadcast receiver and 3d video processing method thereof |
US9088817B2 (en) | 2009-06-16 | 2015-07-21 | Lg Electronics Inc. | Broadcast transmitter, broadcast receiver and 3D video processing method thereof |
US9648346B2 (en) | 2009-06-25 | 2017-05-09 | Microsoft Technology Licensing, Llc | Multi-view video compression and streaming based on viewpoints of remote viewer |
CN102577376A (en) * | 2009-07-17 | 2012-07-11 | 三星电子株式会社 | Method and apparatus for multi-view video coding and decoding |
CN102577376B (en) * | 2009-07-17 | 2015-05-27 | 三星电子株式会社 | Method, apparatus and system for multi-view video coding and decoding |
JP2012533925A (en) * | 2009-07-17 | 2012-12-27 | サムスン エレクトロニクス カンパニー リミテッド | Method and apparatus for multi-view video encoding and decoding |
US20110012994A1 (en) * | 2009-07-17 | 2011-01-20 | Samsung Electronics Co., Ltd. | Method and apparatus for multi-view video coding and decoding |
EP2306730A3 (en) * | 2009-10-05 | 2011-07-06 | Broadcom Corporation | Method and system for 3D video decoding using a tier system framework |
CN102036065A (en) * | 2009-10-05 | 2011-04-27 | 美国博通公司 | Method and system for video coding |
FR2951346A1 (en) * | 2009-10-08 | 2011-04-15 | Thomson Licensing | MULTIVATED CODING METHOD AND CORRESPONDING DECODING METHOD |
WO2011042440A1 (en) * | 2009-10-08 | 2011-04-14 | Thomson Licensing | Method for multi-view coding and corresponding decoding method |
CN103155568A (en) * | 2010-07-08 | 2013-06-12 | 杜比实验室特许公司 | Systems and methods for multi-layered image and video delivery using reference processing signals |
US10531120B2 (en) | 2010-07-08 | 2020-01-07 | Dolby Laboratories Licensing Corporation | Systems and methods for multi-layered image and video delivery using reference processing signals |
WO2012006299A1 (en) * | 2010-07-08 | 2012-01-12 | Dolby Laboratories Licensing Corporation | Systems and methods for multi-layered image and video delivery using reference processing signals |
US9467689B2 (en) | 2010-07-08 | 2016-10-11 | Dolby Laboratories Licensing Corporation | Systems and methods for multi-layered image and video delivery using reference processing signals |
US11044454B2 (en) | 2010-07-21 | 2021-06-22 | Dolby Laboratories Licensing Corporation | Systems and methods for multi-layered frame compatible video delivery |
CN105847780A (en) * | 2010-07-21 | 2016-08-10 | 杜比实验室特许公司 | Decoding method for multi-layered frame-compatible video delivery |
US10142611B2 (en) | 2010-07-21 | 2018-11-27 | Dolby Laboratories Licensing Corporation | Systems and methods for multi-layered frame-compatible video delivery |
JP2013538487A (en) * | 2010-07-21 | 2013-10-10 | ドルビー ラボラトリーズ ライセンシング コーポレイション | System and method for multi-layer frame compliant video delivery |
CN103026706A (en) * | 2010-07-21 | 2013-04-03 | 杜比实验室特许公司 | Systems and methods for multi-layered frame-compatible video delivery |
CN105847781A (en) * | 2010-07-21 | 2016-08-10 | 杜比实验室特许公司 | Decoding method for multi-layered frame-compatible video delivery |
CN105812828A (en) * | 2010-07-21 | 2016-07-27 | 杜比实验室特许公司 | Decoding method for multilayer frame compatible video transmission |
EP2700233A4 (en) * | 2011-04-19 | 2014-09-17 | Samsung Electronics Co Ltd | Method and apparatus for unified scalable video encoding for multi-view video and method and apparatus for unified scalable video decoding for multi-view video |
EP2700233A2 (en) * | 2011-04-19 | 2014-02-26 | Samsung Electronics Co., Ltd. | Method and apparatus for unified scalable video encoding for multi-view video and method and apparatus for unified scalable video decoding for multi-view video |
US11496760B2 (en) | 2011-07-22 | 2022-11-08 | Qualcomm Incorporated | Slice header prediction for depth maps in three-dimensional video codecs |
US9521418B2 (en) | 2011-07-22 | 2016-12-13 | Qualcomm Incorporated | Slice header three-dimensional video extension for slice header prediction |
CN102957910A (en) * | 2011-08-09 | 2013-03-06 | 索尼公司 | Image encoding apparatus, image encoding method and program |
TWI552575B (en) * | 2011-08-09 | 2016-10-01 | 三星電子股份有限公司 | Multi-view video prediction method and apparatus therefore and multi-view video prediction restoring method and apparatus therefore |
US9973778B2 (en) | 2011-08-09 | 2018-05-15 | Samsung Electronics Co., Ltd. | Method for multiview video prediction encoding and device for same, and method for multiview video prediction decoding and device for same |
CN103733620A (en) * | 2011-08-11 | 2014-04-16 | 高通股份有限公司 | Three-dimensional video with asymmetric spatial resolution |
US9288505B2 (en) | 2011-08-11 | 2016-03-15 | Qualcomm Incorporated | Three-dimensional video with asymmetric spatial resolution |
US10764604B2 (en) | 2011-09-22 | 2020-09-01 | Sun Patent Trust | Moving picture encoding method, moving picture encoding apparatus, moving picture decoding method, and moving picture decoding apparatus |
CN103828371B (en) * | 2011-09-22 | 2017-08-22 | 太阳专利托管公司 | Dynamic image encoding method, dynamic image encoding device and dynamic image decoding method and moving image decoding apparatus |
CN103828371A (en) * | 2011-09-22 | 2014-05-28 | 松下电器产业株式会社 | Moving-image encoding method, moving-image encoding device, moving image decoding method, and moving image decoding device |
US20140219338A1 (en) * | 2011-09-22 | 2014-08-07 | Panasonic Corporation | Moving picture encoding method, moving picture encoding apparatus, moving picture decoding method, and moving picture decoding apparatus |
EP2587804A1 (en) * | 2011-10-28 | 2013-05-01 | Samsung Electronics Co., Ltd | Method and apparatus for hierarchically encoding and decoding of a two-dimensional image, of a stereo image, and of a three-dimensional image |
US9191677B2 (en) | 2011-10-28 | 2015-11-17 | Samsung Electronics Co., Ltd. | Method and apparatus for encoding image and method and appartus for decoding image |
US9485503B2 (en) | 2011-11-18 | 2016-11-01 | Qualcomm Incorporated | Inside view motion prediction among texture and depth view components |
US10027943B2 (en) | 2012-04-03 | 2018-07-17 | Sun Patent Trust | Image encoding method, image decoding method, image encoding device, and image decoding device |
US10582183B2 (en) | 2012-04-03 | 2020-03-03 | Sun Patent Trust | Image encoding method, image decoding method, image encoding device, and image decoding device |
US20140133567A1 (en) * | 2012-04-16 | 2014-05-15 | Nokia Corporation | Apparatus, a method and a computer program for video coding and decoding |
EP2839660B1 (en) * | 2012-04-16 | 2020-10-07 | Nokia Technologies Oy | An apparatus, a method and a computer program for video coding and decoding |
US10863170B2 (en) | 2012-04-16 | 2020-12-08 | Nokia Technologies Oy | Apparatus, a method and a computer program for video coding and decoding on the basis of a motion vector |
Also Published As
Publication number | Publication date |
---|---|
KR20080037593A (en) | 2008-04-30 |
JP5170786B2 (en) | 2013-03-27 |
KR100919885B1 (en) | 2009-09-30 |
JP2010507961A (en) | 2010-03-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2008051041A1 (en) | Multi-view video scalable coding and decoding | |
KR100760258B1 (en) | Apparatus for Universal Coding for Multi-View Video | |
US7817181B2 (en) | Method, medium, and apparatus for 3-dimensional encoding and/or decoding of video | |
KR100763179B1 (en) | Method for compressing/Reconstructing motion vector of unsynchronized picture and apparatus thereof | |
US8644386B2 (en) | Method of estimating disparity vector, and method and apparatus for encoding and decoding multi-view moving picture using the disparity vector estimation method | |
KR100789753B1 (en) | Apparatus of predictive coding/decoding using view-temporal reference picture buffers and method using the same | |
EP1927250A1 (en) | Method of estimating disparity vector, and method and apparatus for encoding and decoding multi-view moving picture using the disparity vector estimation method | |
JP2007180981A (en) | Device, method, and program for encoding image | |
WO2007052969A1 (en) | Method and apparatus for encoding multiview video | |
WO2004059980A1 (en) | Method and apparatus for encoding and decoding stereoscopic video | |
KR100703746B1 (en) | Video coding method and apparatus for predicting effectively unsynchronized frame | |
MX2008002391A (en) | Method and apparatus for encoding multiview video. | |
EP1642463A1 (en) | Video coding in an overcomplete wavelet domain | |
JP2007180982A (en) | Device, method, and program for decoding image | |
KR20040065014A (en) | Apparatus and method for compressing/decompressing multi-viewpoint image | |
WO2006118384A1 (en) | Method and apparatus for encoding/decoding multi-layer video using weighted prediction | |
WO2006110007A1 (en) | Method for coding in multiview video coding/decoding system | |
WO2013039348A1 (en) | Method for signaling image information and video decoding method using same | |
KR100791453B1 (en) | Multi-view Video Encoding and Decoding Method and apparatus Using Motion Compensated Temporal Filtering | |
KR20110118744A (en) | 3d tv video encoding method, decoding method | |
JP2011091498A (en) | Moving image coder, moving image decoder, moving image coding method, and moving image decoding method | |
WO2006104357A1 (en) | Method for compressing/decompressing motion vectors of unsynchronized picture and apparatus using the same | |
Liu et al. | Fully scalable multiview wavelet video coding | |
Lim et al. | Motion/disparity compensated multiview sequence coding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 07833603 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2009534496 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 07833603 Country of ref document: EP Kind code of ref document: A1 |