WO2011105297A1 - 動きベクトル推定方法、多視点映像符号化方法、多視点映像復号方法、動きベクトル推定装置、多視点映像符号化装置、多視点映像復号装置、動きベクトル推定プログラム、多視点映像符号化プログラム、及び多視点映像復号プログラム - Google Patents
動きベクトル推定方法、多視点映像符号化方法、多視点映像復号方法、動きベクトル推定装置、多視点映像符号化装置、多視点映像復号装置、動きベクトル推定プログラム、多視点映像符号化プログラム、及び多視点映像復号プログラム Download PDFInfo
- Publication number
- WO2011105297A1 WO2011105297A1 PCT/JP2011/053516 JP2011053516W WO2011105297A1 WO 2011105297 A1 WO2011105297 A1 WO 2011105297A1 JP 2011053516 W JP2011053516 W JP 2011053516W WO 2011105297 A1 WO2011105297 A1 WO 2011105297A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- viewpoint
- image
- motion vector
- frame
- encoding
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/513—Processing of motion vectors
- H04N19/521—Processing of motion vectors for estimating the reliability of the determined motion vectors or motion vector field, e.g. for smoothing the motion vector field or for correcting motion vectors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/105—Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/109—Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
- H04N19/137—Motion inside a coding unit, e.g. average field, frame or block difference
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/44—Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/513—Processing of motion vectors
- H04N19/517—Processing of motion vectors by encoding
- H04N19/52—Processing of motion vectors by encoding by predictive encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/567—Motion estimation based on rate distortion criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/573—Motion compensation with multiple frame prediction using two or more reference frames in a given prediction direction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/80—Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
Definitions
- the present invention relates to a motion vector estimation method, a multiview video encoding method, a multiview video decoding method, a motion vector estimation device, a multiview video encoding device, a multiview video decoding device, a motion vector estimation program, and a multiview video encoding
- the present invention relates to a program and a multi-view video decoding program.
- a multi-view video is a group of video obtained by shooting the same subject and background with a plurality of cameras.
- efficient encoding is realized using motion compensation prediction using a high correlation existing between frames at different times of a video.
- Motion compensated prediction is described in H.264. This is a technique adopted in recent international standards for moving picture coding systems represented by H.264. That is, motion compensated prediction generates an image by compensating for the motion of a subject between an encoding target frame and an already encoded reference frame, and between the generated image and the encoding target frame. This is a method of taking a difference and encoding only the difference signal.
- parallax compensation prediction In multi-view video coding, there is a high correlation not only between frames of different time but also between frames of different viewpoints. For this reason, a technique called parallax compensation prediction is used in which an inter-frame difference is taken between an image (frame) generated by compensating for disparity between viewpoints instead of motion and an encoding target frame, and only a difference signal is encoded. .
- the parallax compensation prediction is performed in H.264. H.264 Annex. It is adopted as an international standard as H (For details of H.264, see Non-Patent Document 1, for example).
- the parallax used here is the difference in the position at which the subject is projected on the image plane of the camera placed at a different position. In the parallax compensation prediction, this is expressed by a two-dimensional vector and encoded. As shown in FIG. 20, since the parallax is information generated depending on the position of the camera and the subject from the camera (depth), there is a method called viewpoint synthesis prediction (viewpoint interpolation prediction) using this principle. .
- View synthesis prediction uses a part of a multi-view video that has already been processed and obtained a decoding result in accordance with the three-dimensional positional relationship between the camera and the subject.
- an image obtained by combining (interpolating) frames for different viewpoints to be performed is used as a predicted image (see, for example, Non-Patent Document 2).
- Depth maps (sometimes called distance images, parallax images, and disparity maps) that represent the distance (depth) from the camera to the subject for each pixel are used to represent the three-dimensional position of the subject. There are many. In addition to the depth map, polygon information of the subject and voxel information of the subject space can also be used.
- the method for obtaining the depth map can be broadly divided into a method for generating a depth map by measuring using infrared pulses, etc., and the principle of triangulation from the point of view of the same subject on a multi-viewpoint image. There is a method of generating a depth map after estimating the depth using. Which method to use the depth map obtained by is not a big problem in viewpoint synthesis prediction. If a depth map can be obtained, it is not a big problem where to estimate.
- the redundancy of video signals between cameras can be removed by using parallax compensation prediction or viewpoint synthesis prediction. For this reason, it is possible to compress and encode a multi-viewpoint video with higher efficiency than in the case where video shot by each camera is encoded independently.
- Non-Patent Document 1 attempts to use both inter-camera correlation and temporal correlation by introducing adaptive selection between motion compensation prediction and parallax compensation prediction for each block. By using this method, it is possible to realize efficient encoding as compared with the case where only one of the correlations is used.
- selecting either one for each block means that more redundancy is reduced by using the one that shows a stronger correlation for each block, and between cameras and at different times. It is not possible to reduce the redundancy that exists at the same time between frames taken at the same time.
- a predicted image generated by a method using temporal correlation such as motion compensation prediction and a predicted image generated by a method using inter-camera correlation such as parallax compensation prediction or viewpoint synthesis prediction are used.
- a method using a weighted average can be easily inferred. By using this method, a certain degree of improvement in coding efficiency can be obtained.
- generating the predicted image using the weighted average only distributes the ratio using the correlation between the time correlation and the correlation between the cameras. In other words, since the two correlations are not used at the same time but only which correlation is used is more flexibly performed, the redundancy existing at the same time is not reduced.
- the present invention has been made in consideration of such circumstances, and its purpose is to accurately estimate a motion vector even in a situation where a processed image cannot be obtained, and to perform time correlation in video signal prediction.
- Motion vector estimation method, multi-view video encoding method, multi-view video decoding method, motion vector estimation device capable of realizing efficient multi-view video encoding using two correlations simultaneously
- Another object is to provide a multi-view video encoding device, a multi-view video decoding device, a motion vector estimation program, a multi-view video encoding program, and a multi-view video decoding program.
- the first aspect of the present invention is the same as the processing camera from a reference camera video captured by a camera different from the processing camera that captured the processed image included in the multi-view video.
- the method further includes a reliability setting step of setting a reliability indicating the certainty of the viewpoint composite image for each pixel of the viewpoint composite image, and the corresponding region estimation step includes the reliability
- the matching cost for searching for the corresponding area may be weighted based on the above.
- a second aspect of the present invention is a multi-view video encoding method that performs predictive encoding of a multi-view video, the encoding target viewpoint having the multi-view video, and A viewpoint composite image generation step of generating a viewpoint composite image at the encoding target viewpoint from a reference viewpoint frame that has already been encoded at a different reference viewpoint and taken at the same time as the encoding target frame; and the viewpoint composite image
- a motion vector estimation step for estimating a motion vector by searching for a corresponding region on a reference frame that has already been encoded at the encoding target viewpoint, and the estimated motion vector
- a motion compensated prediction image generation step for generating a motion compensated prediction image for the encoding target frame using the reference frame; and the encoding target frame.
- Is a multi-view video encoding method comprising the residual encoding step of encoding a differential signal between over arm and the motion compensated prediction picture.
- the method further includes a reliability setting step of setting a reliability indicating the certainty of the viewpoint composite image for each pixel of the viewpoint composite image, and the motion vector estimation step includes the reliability Based on the above, a matching cost of each pixel when searching for the corresponding region may be weighted.
- a motion search step of generating an optimal motion vector by searching a corresponding region with the reference frame A difference vector encoding step of encoding a difference vector between the motion vector and the optimal motion vector, wherein the motion compensated prediction image generation step uses the optimal motion vector and the reference frame, A motion compensated prediction image may be generated.
- the method further includes a prediction vector generation step of generating a prediction vector using the motion vector and an optimal motion vector group used in a region adjacent to the encoding target region,
- a difference vector between the prediction vector and the optimum motion vector may be encoded.
- a third aspect of the present invention is a multi-view video decoding method for decoding encoded video data for a certain viewpoint of a multi-view video, which is different from the decoding target viewpoint.
- a viewpoint synthesized image generation step of generating a viewpoint synthesized image at the decoding target viewpoint, and for each decoding unit block of the viewpoint synthesized image, Using the motion vector estimation step of estimating a motion vector by searching for a corresponding region on a reference frame that has already been decoded at the decoding target viewpoint, and using the estimated motion vector and the reference frame, the decoding target frame
- a multi-view video decoding method and an image decoding step of decoding the decoding target frame being predictive encoding from the encoded data.
- the method further includes a reliability setting step of setting a reliability indicating the certainty of the viewpoint composite image for each pixel of the viewpoint composite image, and the motion vector estimation step includes the reliability Based on the above, a matching cost of each pixel when searching for the corresponding region may be weighted.
- the method further includes a vector decoding step of decoding an optimal motion vector that has been predictively encoded from the encoded data using the motion vector as a prediction vector, and generating the motion compensated prediction image
- the step may generate the motion compensated prediction image using the optimal motion vector and the reference frame.
- the method further includes a prediction vector generation step of generating an estimated prediction vector using the motion vector and an optimal motion vector group used in a region adjacent to the decoding target region,
- the vector decoding step may decode the optimum motion vector using the estimated prediction vector as the prediction vector.
- a fourth aspect of the present invention is that the processing camera is obtained from a reference camera video captured by a camera different from the processing camera that captured the processed image included in the multi-view video.
- viewpoint synthesized image generation means for generating a viewpoint synthesized image at the time when the processed image was taken, and on the viewpoint synthesized image corresponding to the processing area on the processed image without using the processed image.
- This is a motion vector estimation device comprising corresponding region estimation means for estimating a motion vector by searching for a corresponding region in a reference image captured by the processing camera using the image signal.
- the image processing apparatus further includes reliability setting means for setting a reliability indicating the certainty of the viewpoint composite image for each pixel of the viewpoint composite image, and the corresponding region estimation means includes the reliability
- the matching cost for searching for the corresponding area may be weighted based on the above.
- a fifth aspect of the present invention is a multi-view video encoding apparatus that performs predictive encoding of a multi-view video, wherein the multi-view video has a target encoding point and Is a viewpoint synthesized image generating means for generating a viewpoint synthesized image at the encoding target viewpoint from a reference viewpoint frame that has already been encoded at a different reference viewpoint and taken at the same time as the encoding target frame, and the viewpoint synthesized image
- Motion vector estimation means for estimating a motion vector by searching a corresponding region on a reference frame that has already been encoded at the encoding target viewpoint for each of the encoding unit blocks, and the estimated motion vector
- Motion-compensated prediction image generation means for generating a motion-compensated prediction image for the encoding target frame using the reference frame, the encoding target frame and the encoding frame
- the differential signal between come compensated prediction picture is a multi-view video encoding apparatus and a residual encoding means for
- the reliability setting means which sets the reliability which shows the reliability of the said viewpoint synthetic
- the said motion vector estimation means is the said reliability. Based on the above, a matching cost of each pixel when searching for the corresponding region may be weighted.
- a sixth aspect of the present invention is a multi-view video decoding apparatus that decodes encoded video data for a certain viewpoint of the multi-view video, and is different from the decoding target viewpoint.
- viewpoint synthesized image generation means for generating a viewpoint synthesized image at the decoding target viewpoint, and for each decoding unit block of the viewpoint synthesized image, Using the motion vector estimation means for estimating a motion vector by searching for a corresponding region on a reference frame that has already been decoded at the decoding target viewpoint, and using the estimated motion vector and the reference frame, the decoding target frame
- a multi-view image decoding apparatus and an image decoding means for decoding the decoding target frame being from the encoded data.
- the reliability setting means which sets the reliability which shows the reliability of the said viewpoint synthetic
- the said motion vector estimation means is the said reliability. Based on the above, a matching cost of each pixel when searching for the corresponding region may be weighted.
- a seventh aspect of the present invention is that the computer of the motion vector estimation device was photographed by a camera different from the processing camera that photographed the processed image included in the multi-view video. From a reference camera image, in accordance with the same settings as the processing camera, a viewpoint composite image generation function for generating the same viewpoint composite image in which the processing image is captured, and without using the processing image, a processing area on the processing image.
- a motion vector estimation program for executing a corresponding region estimation function for estimating a motion vector by searching for a corresponding region in a reference image captured by the processing camera using an image signal on the corresponding viewpoint synthesized image. .
- an eighth aspect of the present invention provides a computer of a multi-view video encoding apparatus that performs predictive encoding of a multi-view video, and an encoding target viewpoint having the multi-view video.
- a motion vector estimation function for estimating a motion vector by searching a corresponding region on a reference frame that has already been coded at the coding target viewpoint, the estimated motion vector and the reference
- a motion compensated prediction image generation function for generating a motion compensated prediction image for the encoding target frame using the frame, and the encoding target frame and the previous frame
- the ninth aspect of the present invention is different from a decoding target viewpoint in a computer of a multi-view video decoding apparatus that decodes encoded video data for a certain viewpoint of a multi-view video.
- a viewpoint synthesized image generation function for generating a viewpoint synthesized image at the decoding target viewpoint, and for each decoding unit block of the viewpoint synthesized image,
- a motion vector estimation function for estimating a motion vector by searching for a corresponding region on a reference frame that has already been decoded at a decoding target viewpoint, and a motion for the decoding target frame using the estimated motion vector and the reference frame
- a motion-compensated prediction image generation function for generating a compensated prediction image, and using the motion-compensated prediction image as a prediction signal,
- the decoding target frame being of a multi-view video decoding program for executing the image decoding function of decoding from said encoded data.
- a motion vector can be accurately estimated, and two correlations (that is, an inter-camera correlation and a temporal correlation) can be obtained by using temporal correlation in video signal prediction.
- Efficient multi-view video coding can be realized by using (correlation) at the same time.
- motion compensation prediction is realized by obtaining a corresponding region on a reference image using an image signal of an input image to be encoded.
- a composite image corresponding to an encoding target image is generated using a video photographed by another camera (step Sa2 described later), and an image signal of the composite image is used to generate a composite image on the reference image. Is determined (step Sa5 described later). Since the same composite image can be generated on the decoding side, a motion vector can be obtained by performing a similar search on the decoding side on the decoding side.
- information (coordinate value, index that can be associated with a coordinate value, area, index that can be associated with a region) that can specify a position sandwiched between symbols [] is displayed on a video (frame).
- the video signal for the pixel at that position and the region is indicated.
- FIG. 1 is a block diagram showing the configuration of the multi-view video encoding apparatus according to the first embodiment.
- the multi-view video encoding apparatus 100 includes an encoding target frame input unit 101, an encoding target image memory 102, a reference viewpoint frame input unit 103, a reference viewpoint image memory 104, a viewpoint synthesis unit 105, and a viewpoint.
- the encoding target frame input unit 101 inputs a video frame (encoding target frame) to be encoded.
- the encoding target image memory 102 stores the input encoding target frame.
- the reference viewpoint frame input unit 103 inputs a video frame (reference viewpoint frame) for a viewpoint (reference viewpoint) different from the encoding target frame.
- the reference viewpoint image memory 104 stores the input reference viewpoint frame.
- the view synthesis unit 105 generates a view synthesized image for the encoding target frame using the reference view frame.
- the viewpoint composite image memory 106 stores the generated viewpoint composite image.
- the reliability setting unit 107 sets the reliability for each pixel of the generated viewpoint composite image.
- the corresponding region search unit 108 becomes a motion compensation prediction reference frame for each coding unit block of the viewpoint composite image, is shot from the same viewpoint as the encoding target frame, and indicates a motion vector indicating a corresponding block in an already encoded frame.
- Are searched using reliability In other words, by assigning weights to matching costs when searching for corresponding regions based on reliability, it is important to focus on pixels that can be synthesized accurately without being dragged by errors during view synthesis. To realize.
- the motion compensation prediction unit 109 generates a motion compensated prediction image using the reference frame according to the determined corresponding block.
- the prediction residual calculation unit 113 calculates a difference (prediction residual signal) between the encoding target frame and the motion compensated prediction image.
- the prediction residual encoding unit 110 encodes the prediction residual signal.
- the prediction residual decoding unit 111 decodes encoded data of the prediction residual signal.
- the decoded image calculation unit 114 calculates the decoded image of the encoding target frame by adding the decoded prediction residual signal and the motion compensated prediction image.
- the decoded image memory 112 stores the decoded image.
- FIG. 2 is a flowchart for explaining the operation of the multi-view video encoding apparatus 100 according to the first embodiment. The process executed by the multi-view video encoding apparatus 100 according to the first embodiment will be described in detail according to this flowchart.
- the viewpoint synthesis unit 105 synthesizes an image shot at the same viewpoint at the same time as the encoding target frame from the information of the reference viewpoint frame, and accumulates the generated viewpoint synthesized image Syn in the viewpoint synthesized image memory 106.
- Step Sa2 Any method may be used for generating the viewpoint composite image.
- Non-Patent Document 2 and Non-Patent Document 3 Y. Mori, N. Fukushima, T. Fujii, and M. Tanimoto , “View Generation Generation with 3D Warping Using Depth Information Information for FTV,” Proceedings of 3DTV-CON2008, pp. 229-232, May 2008, etc.
- Non-Patent Document 4 S. Yea and A. Vetro, “View Synthesis Prediction for Rate-Overhead Reduction in FTV,” Proceedings of 3DTV-CON2008, pp 145-148, ⁇ May 2008.) etc. can also be used.
- Non-Patent Document 5 J. Sun, ⁇ N. Zheng, and H. Shum, “Stereo Matching Using Belief Propagation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 25, No. .7, pp. 787-800, July 2003.
- the depth information for the reference view frame or the encoding target frame is created using a method called the stereo method or depth estimation method.
- the method can be applied to generate a viewpoint composite image
- Non-Patent Document 6 S. Shimizu, Y. Tonomura, H. Kimata, and Y. Ohtani, “Improved View Interpolation Prediction for Side Information in Multiview Distributed Video Coding , ”Proceedings ofDSICDSC2009, August 2009.).
- Non-patent Document 7 K. Yamamoto, M. Kitahara, H. Kimata, T. Yendo, T. Fujii, M. Tanimoto, S. Shimizu, K. Kamikura, and Y. Yashima, “Multiview Video Coding Using View Interpolation and Color Correction,” IEEE Transactions on Circuits and System for Video Technology, Vol. 17, pp. -1449, November, 2007.).
- the reliability setting unit 107 generates, for each pixel of the viewpoint composite image, a reliability ⁇ indicating how accurately the composition for the pixel can be realized (step Sa3).
- the reliability ⁇ is a real number from 0 to 1, but if the definition is such that a large value of 0 or more is more reliable, how can the reliability be You can express it.
- the reliability may be expressed by an 8-bit integer of 1 or more.
- the reliability ⁇ may be any value as long as it can indicate how accurately the composition is performed.
- the reliability is represented using the following formula (1) and formula (2). It is possible.
- max is a function that returns the maximum value for a given set.
- the other functions are expressed by the following mathematical formula (3).
- the reference viewpoint frame is clustered by the pixel value of the corresponding pixel, and the variance value, the maximum value and the minimum value are compared with the pixel value of the corresponding pixel of the reference viewpoint frame belonging to the largest cluster. You may calculate and use the difference.
- the error at the corresponding points between the viewpoints follows a normal distribution or a Laplace distribution, and each of the values obtained by diff in the above formula (4) using the average value of the distribution and the variance value as parameters.
- the reliability may be defined using a probability value corresponding to the error amount of the pixel.
- a distribution model, an average value, and a variance value may be determined in advance, or information on the used model may be encoded and transmitted. In general, if the subject is completely diffusely reflected, the average value of the distribution can be considered as 0 theoretically, so the model may be simplified.
- the error from the change in the error amount when the depth is changed slightly.
- the probability that the error is within a certain range when the error occurrence probability follows the error distribution is used as the reliability.
- the definition using the error distribution model and the pixel value of the corresponding pixel on the reference viewpoint frame at the time of generating the viewpoint composite image is the reference at the time of generating the viewpoint composite image when the error distribution probability follows the estimated error distribution.
- the probability of occurrence of a situation represented by the pixel value of the corresponding pixel on the viewpoint frame is used as the reliability.
- parallax (depth) obtained when using a method called Belief Propagation (the above-mentioned Non-Patent Document 5) when estimating the parallax (depth) required for viewpoint synthesis.
- the probability value for can be used as the reliability.
- Belief Propagation it is possible to use the information as reliability as long as it is a depth estimation algorithm that internally calculates the probability of the solution for each pixel of the viewpoint composite image.
- a part of the processing for obtaining corresponding point information and depth information may be the same as part of the reliability calculation. In such a case, it is possible to reduce the amount of calculation by simultaneously performing the viewpoint composite image generation and the reliability calculation.
- the encoding target frame is divided into blocks, and the video signal of the encoding target frame is encoded while searching for corresponding points and generating a predicted image for each region (steps Sa4 to Sa12). ).
- the encoding target block index is represented by blk and the total number of encoding target blocks is represented by numBlks
- blk is initialized with 0 (step Sa4), and 1 is added to blk (step Sa11).
- the following processing (steps Sa5 to Sa10) is repeated until numBlks is reached (step Sa12).
- these processes can also be performed as part of the process repeated for each encoding target block. It is. For example, this corresponds to the case where depth information for the encoding target block is given.
- the corresponding region search unit 108 finds a corresponding block on the reference frame corresponding to the block blk using the viewpoint synthesized image (step Sa5).
- the reference frame is a local decoded image obtained by decoding data that has already been encoded.
- the local decoded image data is data stored in the decoded image memory 112.
- the reason for using the local decoded image is to prevent the occurrence of coding distortion called drift by using the same data that can be acquired at the same timing on the decoding side. If the generation of such encoding distortion is allowed, an input frame encoded before the encoding target frame may be used instead of the local decoded image. In the first embodiment, an image taken with the same camera as the encoding target frame and taken at a different time from the encoding target frame is used. However, any frame may be used even if it is a frame shot by a camera different from the encoding target frame as long as the frame is processed before the encoding target frame.
- the process for obtaining the corresponding block is a process for obtaining on the local decoded image stored in the decoded image memory 112 a corresponding block that maximizes the degree of matching or minimizes the degree of divergence using the viewpoint composite image Syn [blk] as a template. It is.
- a matching cost indicating the degree of divergence is used.
- Specific examples of the matching cost indicating the degree of deviation include the following formula (5) and formula (6).
- vec is a vector between corresponding blocks
- t is an index value indicating one of the local decoded images Dec stored in the decoded image memory 112.
- DCT Discrete Cosine Transform
- X represents the norm of X.
- the process for obtaining a block that minimizes the matching cost is to obtain a set of (best_vec, best_t) represented by the following formula (9).
- argmin indicates a process for obtaining a parameter that minimizes a given function.
- the set of parameters to be derived is a set given at the lower part of argmin.
- Any method may be used as the method for determining the number of frames to be searched, the search range, the search order, and censoring.
- the search range and the truncation method greatly affect the calculation cost.
- there is a method of appropriately setting a search center As one example, there is a method in which the corresponding point represented by the motion vector used in the corresponding region on the reference viewpoint frame is set as the search center.
- a frame determination method to be searched may be determined in advance. For example, this is a method in which the most recently encoded frame is the search target.
- information indicating which frame is targeted is encoded and notified to the decoding side. In this case, the decoding side needs to have a mechanism for decoding information such as an index value indicating the search target frame and determining the search target frame based on the decoded information.
- the motion compensated prediction unit 109 When the corresponding block is determined, the motion compensated prediction unit 109 generates a predicted image Pred for the block blk (step Sa6).
- the simplest method is a method in which a pixel value of a corresponding block is used as a predicted image, and is represented by Expression (10).
- a prediction image is generated in consideration of continuity with adjacent blocks using a technique called overlap MC (MC: motion compensation) or a deblocking filter.
- MC motion compensation
- a deblocking filter a technique called motion compensation
- the residual signal Res represented by the difference between the encoding target frame Org and the prediction image Pred is generated by the prediction residual calculation unit 113, and the residual signal is generated as the prediction residual code.
- the encoding unit 110 performs encoding (step Sa7).
- the encoded data output as a result of encoding is output from the multi-view video encoding apparatus 100 and is sent to the prediction residual decoding unit 111.
- Any method may be used for encoding the prediction residual.
- the H.P. In H.264 encoding is performed by sequentially performing frequency conversion such as DCT, quantization, binarization, and entropy encoding.
- the prediction residual decoding unit 111 decodes the input encoded data and obtains a decoded prediction residual DecRes (step Sa8).
- a method for decoding encoded data obtained by a technique used in encoding is used.
- a decoded prediction residual is obtained by performing processing in the order of inverse frequency transformation such as entropy decoding, inverse binarization, inverse quantization, and IDCT (Inverse Discrete Cosine Transform).
- the decoded image calculation unit 114 adds the prediction signal Pred to the obtained decoded prediction residual DecRes as indicated by the mathematical expression (11), and generates a local decoded image Dec cur [blk] (step Sa9).
- the generated local decoded image is stored in the decoded image memory 112 for use in future prediction (step Sa10).
- one corresponding block is determined in the corresponding block search in step Sa5.
- a method for predetermining the number of blocks a method for directly specifying the number, a method for determining a condition relating to a matching cost, selecting all blocks satisfying the condition, and a method for combining the two are conceivable.
- the method combining the two includes, for example, a method in which the matching cost is less than a threshold value, and the method is selected in order from the smallest value up to a predetermined number.
- a method of encoding information indicating the number and transmitting the information to the decoding side is also conceivable.
- a method for generating a prediction signal from a plurality of candidates may be determined in advance, or information indicating which method is used may be encoded and transmitted.
- the search target frame does not include a frame at the same time as the encoding target frame, but an already decoded area may be the search target.
- a viewpoint composite image corresponding to the processed image is generated by a method similar to viewpoint synthesis prediction and viewpoint interpolation prediction, and a corresponding point with the reference image is searched using the viewpoint composite image.
- the motion vector is estimated.
- Non-Patent Document 8 J. Ascenso, C. Brites, and F. Pereira, “Improving frame interpolation with spatial motion smoothing for pixel domain distributed video coding,” in the 5th EURASIP Conference on Speech and Image Processing, Multimedia Communications and Services, July 2005.). This concept is the same as that described in Non-Patent Document 1. H.264 is used as a time direct mode.
- the motion vector estimation can be performed with a certain degree of accuracy even by a method that assumes such subject movement.
- the motion of a subject is generally nonlinear and difficult to model, it is difficult to estimate a motion vector with high accuracy by such a method.
- Non-Patent Document 9 S. Kamp, M. Evertz, and M. Wien, Decoder side motion vector derivation for inter frame video coding, ICIP 2008, pp. 1120-1123, October 2008.
- a method is described in which when a processed image is obtained in an adjacent area, a motion vector of the processed area is estimated by obtaining a corresponding area of the adjacent area.
- the motion vector can be estimated with a certain degree of accuracy by using this method.
- an image of an adjacent area is required, but a correct motion vector cannot be estimated if the same subject is not captured in the adjacent area.
- highly accurate motion vector estimation cannot be realized except in limited situations.
- video signals in a region for which motion is to be obtained are synthesized using inter-viewpoint correlation, and a corresponding region search is performed using the synthesis result. For this reason, it is not necessary to assume temporal regularity or spatial similarity for motion, and it is possible to perform highly accurate motion vector estimation for any video.
- the reliability indicating the certainty of the viewpoint synthesized image is set for each pixel of the viewpoint synthesized image, and the matching cost is weighted for each pixel based on the reliability.
- An error may occur in the viewpoint synthesized image synthesized using the correlation between cameras.
- the reliability indicating the certainty of the composite image is set for each pixel of the viewpoint composite image, and the matching cost is weighted for each pixel based on the reliability.
- Information indicating the certainty of synthesis required for setting the reliability includes the dispersion of pixel values for the corresponding pixel group on the reference camera video (video shot by the reference camera) used when synthesizing a certain pixel, Difference values can be used. Also, when a method called Belief Propagation (Non-Patent Document 5) is used to estimate the parallax and depth required when performing viewpoint synthesis, a probability distribution of parallax and depth is obtained for each pixel. Therefore, that information may be used.
- Belief Propagation Non-Patent Document 5
- FIG. 3 is a block diagram showing the configuration of the multi-view video encoding apparatus according to the second embodiment.
- the multi-view video encoding apparatus 200 includes an encoding target frame input unit 201, an encoding target image memory 202, a reference viewpoint frame input unit 203, a viewpoint synthesis unit 204, a viewpoint synthesized image memory 205, a motion Estimation unit 206, motion compensation prediction unit 207, image encoding unit 208, image decoding unit 209, decoded image memory 210, corresponding region search unit 211, prediction vector generation unit 212, vector information encoding unit 213, and motion vector memory 214 It has.
- the encoding target frame input unit 201 inputs a video frame to be encoded.
- the encoding target image memory 202 stores the input encoding target frame.
- the reference viewpoint frame input unit 203 inputs a video frame for a viewpoint different from the encoding target frame.
- the view synthesis unit 204 generates a view synthesized image for the encoding target frame using the input reference view frame.
- the viewpoint composite image memory 205 stores the generated viewpoint composite image.
- the motion estimation unit 206 estimates the motion between the encoding target frame and the reference frame for each encoding unit block of the encoding target frame.
- the motion compensation prediction unit 207 generates a motion compensated prediction image based on the motion estimation result.
- the image encoding unit 208 receives the motion compensated prediction image, predictively encodes the encoding target frame, and outputs encoded data.
- the image decoding unit 209 receives the motion compensated prediction image and the encoded data, decodes the encoding target frame, and outputs a decoded image.
- the decoded image memory 210 stores the decoded image of the encoding target frame.
- the corresponding area search unit 211 searches for an estimated vector indicating the corresponding block in the reference frame for motion compensated prediction for each coding unit block of the viewpoint synthesized image.
- the prediction vector generation unit 212 generates a prediction vector for the motion vector of the encoding target block from the motion vector and the estimation vector used for motion compensation in the adjacent block of the encoding target block.
- the vector information encoding unit 213 predictively encodes the motion vector using the generated prediction vector.
- the motion vector memory 214 stores motion vectors.
- FIG. 4 is a flowchart for explaining the operation of the multi-view video encoding apparatus 200 according to the second embodiment. The processing executed by the multi-view video encoding apparatus 200 according to the second embodiment will be described in detail according to this flowchart.
- the viewpoint synthesis unit 204 uses the reference viewpoint frame to synthesize an image shot at the same viewpoint as the encoding target frame, and accumulates the generated viewpoint synthesized image Syn in the viewpoint synthesized image memory 205. (Step Sb2).
- the process performed here is the same as step Sa2 in the first embodiment.
- the encoding target frame is divided into blocks, and the video signal of the encoding target frame is encoded while searching for corresponding points and generating predicted images for each region ( Steps Sb3 to Sb14).
- the encoding target block index is represented by blk and the total number of encoding target blocks is represented by numBlks
- blk is initialized with 0 (step Sb3), and 1 is added to blk (step Sb13).
- the following processing (steps Sb4 to Sb12) is repeated until numBlks is reached (step Sb14).
- these processes can also be performed as part of a process that is repeated for each encoding target block. For example, this corresponds to the case where depth information for the encoding target block is given.
- the motion estimation unit 206 finds a block on the reference frame corresponding to the encoding target block Org [blk] (step Sb4). This process is called motion prediction, and an arbitrary method can be used.
- a two-dimensional vector indicating a displacement from the block blk used to represent the corresponding block is referred to as a motion vector, and is represented as mv in the second embodiment.
- the motion vector mv is stored in the motion vector memory 214 for use in later block processing.
- the motion compensated prediction unit 207 When the motion estimation is completed, the motion compensated prediction unit 207 generates a motion compensated prediction signal Pred [blk] for the encoding target block Org [blk] as represented by the following equation (12) (step Sb5).
- ref is an index indicating a reference frame.
- an example of a prediction method using only one reference frame is shown. It is also possible to extend to a method using a plurality of reference frames, such as bi-prediction used in H.264. When two reference frames are used, motion estimation is performed for each reference frame, and a prediction signal is generated using the average value.
- the image encoding unit 208 predictively encodes the encoding target block Org [blk] using the motion compensated prediction signal Pred [blk]. Specifically, the residual signal Res represented by the difference between the encoding target block Org and the motion compensated prediction signal Pred is obtained and encoded (step Sb6). Any method may be used for encoding the residual signal. For example, the H.P. In H.264, encoding is performed by sequentially performing frequency conversion such as DCT, quantization, binarization, and entropy encoding. This encoded result data becomes part of the output of the multi-view video encoding apparatus 200 according to the second embodiment.
- the encoded data is decoded by the image decoding unit 209 for use in prediction when encoding subsequent frames.
- the encoded prediction residual signal is decoded (step Sb7), and the motion compensated prediction signal Pred is added to the obtained decoded prediction residual signal DecRes, whereby the local decoded image Dec cur [blk] Is generated (step Sb8).
- the obtained local decoded image is stored in the decoded image memory 210 (step Sb9).
- a method for decoding encoded data obtained by a technique used in encoding is used. H.
- a decoded prediction residual signal is obtained by performing processing in the order of entropy decoding, inverse binarization, inverse quantization, and inverse frequency transform such as IDCT.
- the motion vector mv obtained by the motion estimation in step Sb4 and used in the motion compensation prediction in step Sb5 is encoded.
- the corresponding area search unit 211 finds a corresponding block on the reference frame corresponding to the viewpoint synthesized image Syn [blk] (step Sb10).
- a two-dimensional vector indicating the displacement from the block blk for representing the corresponding block is referred to as an estimated vector vec.
- the process here is the same as step Sa5 of the first embodiment.
- the second embodiment shows an example in which the reliability ⁇ is not used, ⁇ is all 1 and multiplication of ⁇ can be omitted.
- the reliability may be set and used as in the first embodiment.
- the prediction vector generation unit 212 uses the motion vector used in the block adjacent to the encoding target block stored in the motion vector memory 214 and the estimation vector to generate the encoding target block.
- the prediction vector pmv for the motion vector mv is generated (step Sb11).
- the optimal motion vector actually used in the adjacent region is a vector with higher accuracy in the adjacent region than the motion vector estimated using the viewpoint composite image (that is, the estimated vector). Therefore, when there is a spatial similarity, the amount of the difference vector that needs to be encoded can be reduced by generating a prediction vector using these vectors. However, if there is no spatial similarity with the adjacent region, the result may be that the difference vector is increased. Therefore, in this embodiment, it is determined whether there is a spatial similarity using the motion vector estimated using the viewpoint composite image. If it is determined that there is a spatial similarity, The prediction vector using the optimal vector group is generated, and if not, the motion vector estimated using the viewpoint synthesized image is used. By doing so, the amount of difference vectors that are always encoded is reduced, and efficient multi-view video encoding is achieved.
- a method of generating a prediction vector from a motion vector estimated using a viewpoint composite image and an optimal motion vector group used in an adjacent region a method of taking an average value or a median value for each vector component is used. be able to. Also, there is a method in which a vector having the smallest difference from a motion vector estimated using a viewpoint synthesized image among the optimal motion vector group used in the adjacent region is used as a prediction vector.
- a prediction vector generation method only the optimal motion vector group used in the adjacent region is targeted, and a vector is generated by taking an average value and a median value for each vector component, and that vector and a viewpoint composite image are used. If the difference is equal to or greater than a separately defined threshold, the motion vector estimated using the viewpoint composite image is used as the prediction vector, and if the difference is less than the threshold, it is generated. There is also a method of using a vector as a prediction vector. Conversely, there is also a method in which a vector generated when the difference is greater than or equal to a threshold is used as a prediction vector, and when the difference is less than the threshold, a motion vector estimated using a view synthesized image is used as a prediction vector. These two methods depend on the degree of accuracy with which the viewpoint composite image can be generated. Therefore, a method may be used in which a prediction vector is determined by the former algorithm when the viewpoint composite image is made with high accuracy, and a prediction vector is determined by the latter algorithm otherwise.
- the estimated vector vec may be used as the predicted vector pmv without using the motion vector of the adjacent block, or the motion vector of the adjacent block closest to the estimated vector vec may be used as the predicted vector pmv.
- the predicted vector pmv may be generated by taking the median value or average value of the estimated vector and the motion vector of the adjacent block for each component.
- a vector pmv ′ is generated from a median value or an average value of motion vectors of adjacent blocks, and a prediction vector pmv is determined according to a difference between the vector pmv ′ and the estimated vector vec.
- the motion vector mv is predictively encoded by the vector information encoding unit 213 (step Sb12). That is, the prediction residual vector represented by the difference between the motion vector mv and the prediction vector pmv is encoded.
- the encoding result is one of outputs from the multi-view video encoding apparatus 200.
- the reference frame is predetermined or is H.264.
- the information indicating the used reference frame is encoded in the same manner as H.264, thereby matching the selection of the decoding side and the reference frame.
- step Sb10 may be performed prior to step Sb4 to determine a decoded frame that minimizes the matching cost from among a plurality of candidates, and the determined frame may be used as a reference frame.
- the code amount can be reduced by switching the code table so that the code amount of information indicating a frame that minimizes the matching cost is reduced. Is possible.
- a motion vector for using temporal correlation is predicted using an encoding target viewpoint image obtained by viewpoint synthesis using inter-camera correlation.
- the present embodiment since correlation between cameras is used in motion vector generation and temporal correlation is used in video signal prediction, two correlations can be used simultaneously.
- an error may occur in the viewpoint synthesized image synthesized using the inter-camera correlation.
- a corresponding region search is performed using a template including such an error, the motion vector estimation accuracy is reduced due to the influence of the error. Therefore, in the second embodiment, there is a method of setting reliability indicating the certainty of the composite image for each pixel of the viewpoint composite image and weighting the matching cost for each pixel based on the reliability. By doing so, it is possible to predict motion vectors appropriately by focusing on pixels that have been synthesized accurately without being dragged by errors during viewpoint synthesis.
- the viewpoint composite image when the viewpoint composite image can be generated with high accuracy, it is possible to generate a motion vector necessary for motion compensation prediction according to the first embodiment.
- the viewpoint composite image cannot always be generated with high accuracy. Therefore, in the corresponding region search using the viewpoint composite image including an error, it is not always possible to find an optimal motion vector with sub-pixel accuracy from the viewpoint of coding efficiency. If an appropriate motion vector cannot be set, the amount of residual that must be encoded based on the result of motion compensation prediction increases, and efficient compression encoding cannot be realized.
- the corresponding area search using the encoding target frame it is possible to always find the optimum corresponding area from the viewpoint of encoding efficiency with arbitrary accuracy.
- the viewpoint synthesized image is used with a certain level of accuracy. Encoding is performed using the difference from the estimated motion vector. By doing so, it is possible to reduce the amount of code necessary for encoding an optimal motion vector while preventing an increase in the amount of residual that must be encoded. That is, according to the second embodiment, even when an error occurs in the viewpoint synthesized image, it is possible to reduce the amount of code of the motion vector while performing motion compensation prediction using an appropriate motion vector. . Therefore, it is possible to realize more robust and efficient compression encoding.
- the difference in pixel values between corresponding regions may be used as a matching cost.
- the amount of code required for difference vector encoding and the motion compensation prediction residual to be encoded are also included.
- the corresponding region search may be performed using the rate distortion cost that can be evaluated by integrating the difference amount.
- the encoding efficiency of multiview video encoding is higher when the latter cost function is used.
- rate distortion cost it is necessary to perform step Sb10 and step Sb11 before step Sb4 of the second embodiment. Since these two steps are independent of the processing of steps Sb4 to Sb9, the order may be changed.
- Non-Patent Document 1 when a motion vector is encoded, the spatial similarity is used to perform the encoding with the difference between the prediction vector estimated from the motion vector in the adjacent region and the motion vector. Efficient encoding is realized. However, when a subject different from the adjacent region is shown in the block being processed, the difference between the prediction vector generated assuming spatial similarity and the motion vector is large, and efficient coding is possible. Cannot be realized.
- a video signal for a block being processed is obtained by prediction between cameras, and a vector estimated based on the prediction is used as a prediction vector. By doing so, a prediction vector closer to a motion vector can be generated even when there is no spatial similarity.
- FIG. 5 is a block diagram showing the configuration of the multi-view video decoding apparatus according to the third embodiment.
- the multi-view video decoding apparatus 300 includes an encoded data input unit 301, an encoded data memory 302, a reference viewpoint frame input unit 303, a reference viewpoint image memory 304, a viewpoint synthesis unit 305, and a view synthesized image memory.
- 306 a reliability setting unit 307, a corresponding region search unit 308, a motion compensation prediction unit 309, a prediction residual decoding unit 310, a decoded image memory 311, and a decoded image calculation unit 312.
- the encoded data input unit 301 inputs encoded data of a video frame to be decoded.
- the encoded data memory 302 stores the input encoded data.
- the reference viewpoint frame input unit 303 inputs a video frame (reference viewpoint frame) for a viewpoint (reference viewpoint) different from the viewpoint (decoding target viewpoint) from which the decoding target frame was captured.
- the reference viewpoint image memory 304 stores the input reference viewpoint frame.
- the view synthesis unit 305 generates a view synthesized image for the decoding target frame using the reference view frame.
- the viewpoint composite image memory 306 stores the generated viewpoint composite image.
- the reliability setting unit 307 sets the reliability for each pixel of the generated viewpoint composite image.
- the corresponding region search unit 308 becomes a reference frame for motion compensation prediction for each coding unit block of the viewpoint synthesized image, is captured from the same viewpoint as the decoding target frame, and a motion vector indicating the corresponding block in the already decoded frame is obtained. Search using reliability.
- the motion compensation prediction unit 309 generates a motion compensated prediction image using the reference frame according to the determined corresponding block.
- the prediction residual decoding unit 310 decodes the prediction residual signal from the encoded data.
- the decoded image calculation unit 312 adds the decoded prediction residual signal and the motion compensated prediction image to calculate a decoded image of the decoding target frame.
- the decoded image memory 311 stores decoded images.
- FIG. 6 is a flowchart for explaining the operation of the multi-view video decoding apparatus 300 according to the third embodiment. The processing executed by the multi-view video decoding apparatus 300 according to the third embodiment will be described in detail according to this flowchart.
- the viewpoint synthesis unit 305 synthesizes an image shot at the same viewpoint as the decoding target frame from the information of the reference viewpoint frame, and stores the generated viewpoint synthesized image Syn in the viewpoint synthesized image memory 306. (Step Sc2).
- the process here is the same as step Sa2 in the first embodiment.
- the reliability setting unit 307 generates, for each pixel of the viewpoint composite image, a reliability ⁇ indicating how much the composition for the pixel can be realized (step Sc3).
- the process here is the same as step Sa3 in the first embodiment.
- part of the processing for obtaining corresponding point information and depth information is the same as part of the reliability calculation. May be. In such a case, it is possible to reduce the amount of calculation by simultaneously performing the viewpoint composite image generation and the reliability calculation.
- the video signal of the decoding target frame is decoded while searching for corresponding points and generating a predicted image for each predetermined block (steps Sc4 to Sc10).
- the decoding target block index is represented by blk and the total number of decoding target blocks is represented by numBlks
- blk is initialized to 0 (step Sc4), and 1 is added to blk (step Sc9), and blk is added to numBlks.
- the following processing (steps Sc5 to Sc8) is repeated until it becomes (step Sc10).
- these processes can also be performed as part of the process repeated for each decoding target block. For example, this corresponds to a case where depth information for a decoding target block is given.
- the corresponding area search unit 308 finds a corresponding block on the reference frame corresponding to the block blk using the viewpoint synthesized image (step Sc5).
- This process is the same as step Sa5 of the first embodiment, and the matching cost, search range, and the like are the same as those used on the encoding side.
- a reference frame is a decoded image that has already been obtained after the decoding process is completed. This data is data stored in the decoded image memory 311.
- an image taken with the same camera as the decoding target frame and taken at a different time from the decoding target frame is used.
- any frame may be used even if it is a frame shot by a camera different from the decoding target frame as long as the frame is processed before the decoding target frame.
- the motion compensated prediction unit 309 When the corresponding block is determined, the motion compensated prediction unit 309 generates a predicted image Pred for the block blk in the same manner as in step Sa6 of the first embodiment (step Sc6). Then, the prediction residual decoding unit 310 decodes the prediction residual from the input encoded data to obtain a decoded prediction residual DecRes (step Sc7). This process is the same as step Sa8 of the first embodiment, and decoding is performed by the reverse process of the method used for encoding the prediction residual on the encoding side.
- the decoded image calculation unit 312 adds the prediction signal Pred to the obtained decoded prediction residual DecRes, and generates a decoded image Dec cur [blk] for the block blk. (Step Sc8).
- the generated decoded image becomes an output of the multi-view video decoding apparatus 300 and is stored in the decoded image memory 311 for use in prediction in subsequent frames.
- FIG. 7 is a block diagram showing the configuration of the multi-view video decoding apparatus according to the fourth embodiment.
- the multi-view video decoding apparatus 400 includes an encoded data input unit 401, an encoded data memory 402, a reference viewpoint frame input unit 403, a view synthesis unit 404, a view synthesized image memory 405, a corresponding region search unit 406, a prediction.
- a vector generation unit 407, a motion vector decoding unit 408, a motion vector memory 409, a motion compensation prediction unit 410, an image decoding unit 411, and a decoded image memory 412 are provided.
- the encoded data input unit 401 inputs encoded data of a video frame to be decoded.
- the encoded data memory 402 stores the input encoded data.
- the reference viewpoint frame input unit 403 inputs a video frame for a viewpoint different from the decoding target frame.
- the view synthesis unit 404 generates a view synthesized image for the decoding target frame using the input reference view frame.
- the viewpoint composite image memory 405 stores the generated viewpoint composite image.
- the corresponding region search unit 406 searches for an estimated vector indicating the corresponding block in the reference frame for motion compensation prediction for each decoding unit block of the viewpoint synthesized image.
- the prediction vector generation unit 407 generates a prediction vector for the motion vector of the decoding target block from the motion vector and the estimation vector used for motion compensation in the adjacent block of the decoding target block.
- the motion vector decoding unit 408 decodes the motion vector that has been predictively encoded from the encoded data, using the generated prediction vector.
- the motion vector memory 409 stores motion vectors.
- the motion compensation prediction unit 410 generates a motion compensated prediction image based on the decoded motion vector.
- the image decoding unit 411 receives the motion compensated prediction image, decodes the decoding target frame that has been predictively encoded, and outputs the decoded image.
- the decoded image memory 412 stores the decoded image.
- FIG. 8 is a flowchart for explaining the operation of the multi-view video decoding apparatus 400 according to the fourth embodiment. The processing executed by the multi-view video decoding apparatus 400 according to the fourth embodiment will be described in detail according to this flowchart.
- the encoded data includes at least two types of data, ie, a prediction residual of a video signal and a prediction residual of a motion vector used for video prediction.
- the viewpoint synthesis unit 404 uses the reference viewpoint frame to synthesize an image shot at the same viewpoint as the decoding target frame, and accumulates the generated viewpoint synthesized image Syn in the viewpoint synthesized image memory 405. (Step Sd2).
- the processing performed here is the same as step Sb2 of the second embodiment.
- the video signal and the motion vector of the decoding target frame are decoded while searching for corresponding points and generating a predicted image for each predetermined block (steps Sd3 to Sd11).
- the decoding target block index is represented by blk and the total number of decoding target blocks is represented by numBlks
- blk is initialized with 0 (step Sd3)
- 1 is added to blk (step Sd10)
- blk is added to numBlks.
- the following processing (steps Sd4 to Sd9) is repeated until it becomes (step Sd11).
- these processes can also be performed as part of the processing repeated for each decoding target block. For example, this corresponds to the case where depth information for a decoding target frame is given.
- the corresponding area search unit 406 finds a corresponding block on the reference frame corresponding to the viewpoint synthesized image Syn [blk] (step Sd4).
- a two-dimensional vector indicating the displacement from the block blk for representing the corresponding block is referred to as an estimated vector vec.
- the process here is the same as step Sb10 of 2nd Embodiment.
- the fourth embodiment shows an example in which the reliability is not used. The reliability may be set and used as in the third embodiment.
- the prediction vector generation unit 407 uses the motion vector used in the adjacent block of the decoding target block stored in the motion vector memory 409 and the estimated vector to move the motion of the decoding target block.
- a prediction vector pmv for the vector mv is generated (step Sd5). The process here is the same as step Sb11 of the second embodiment.
- the motion vector decoding unit 408 decodes the motion vector mv in the decoding target block blk from the encoded data (step Sd6).
- the motion vector mv is predictively encoded using the prediction vector pmv, and the prediction residual vector dmv is decoded from the encoded data, and the motion vector mv is obtained by adding the prediction vector pmv to the prediction residual vector dmv.
- the decoded motion vector mv is sent to the motion compensation prediction unit 410 and is accumulated in the motion vector memory 409, and is used when decoding the motion vector of the subsequent decoding target block.
- the motion compensated prediction unit 410 When the motion vector for the decoding target block is obtained, the motion compensated prediction unit 410 generates a motion compensated prediction signal Pred [blk] for the decoding target block (step Sd7). This process is the same as step Sb5 of the second embodiment.
- the image decoding unit 411 decodes the decoding target frame that has been predictively encoded. Specifically, the prediction residual signal DecRes is decoded from the encoded data (step Sd8), the motion compensated prediction signal Pred is added to the obtained decoded prediction residual DecRes, and the decoded image Dec cur [blk for the block blk is added. ] Is generated (step Sd9).
- the generated decoded image becomes an output of the multi-view video decoding device 400 and is stored in the decoded image memory 412 for use in prediction in the subsequent frames.
- the viewpoint composite image and the reference frame are used as they are.
- noise such as film grain and coding distortion
- the effect is affected.
- the accuracy of the corresponding area search may be reduced. Since it can be assumed that these noises are high-frequency components, the influence can be reduced by performing a search after applying a low-pass filter to the frame (viewpoint composite image or reference frame) used for the corresponding region search.
- Another method is to apply an average value filter or median filter to the motion vector estimated for each block using the spatial correlation of the motion vector, so that an erroneous motion vector due to noise can be obtained. It is possible to prevent estimation.
- FIG. 9 is a block diagram showing the configuration of the motion vector estimation apparatus according to the fifth embodiment.
- the motion vector estimation apparatus 500 includes a reference viewpoint video input unit 501, a camera information input unit 502, a viewpoint synthesis unit 503, a low-pass filter unit 504, a corresponding region search unit 505, and a motion vector smoothing unit 506. It has.
- the reference viewpoint video input unit 501 inputs a video frame shot at a viewpoint (reference viewpoint) different from the processing target viewpoint at which the frame for which the motion vector is obtained is shot.
- the camera information input unit 502 inputs internal parameters indicating the focal length of the camera of the processing target viewpoint and the reference viewpoint, and external parameters indicating the position and orientation.
- the viewpoint synthesis unit 503 generates a viewpoint synthesized video for the processing target viewpoint using the reference viewpoint video.
- the low-pass filter unit 504 reduces noise included in the viewpoint synthesized video by applying a low-pass filter.
- the corresponding area search unit 505 searches for a motion vector indicating a corresponding block in another frame of the viewpoint synthesized video for each motion estimation unit block of a frame with the viewpoint synthesized video.
- the motion vector smoothing unit 506 spatially smoothes the motion vector so that the spatial correlation of the motion vector is increased.
- FIG. 10 is a flowchart for explaining the operation of the motion vector estimation apparatus 500 according to the fifth embodiment. The processing executed by the motion vector estimation apparatus 500 according to the fifth embodiment will be described in detail according to this flowchart.
- the camera information input unit 502 receives the internal parameters indicating the focal length of the camera of the processing target viewpoint and the reference viewpoint, and the external parameters indicating the position and orientation, and sends them to the viewpoint synthesis unit 503 (step Se1).
- n is an index indicating a reference view
- N is the number of reference views available here.
- t is an index indicating the shooting time of the frame, and in this embodiment, an example in which a motion vector between the block of the frame at time T1 and the block of the frame at time T1 is estimated will be described.
- the viewpoint synthesis unit 503 uses the reference viewpoint frame and camera information to synthesize an image shot at the processing target viewpoint for each shooting time (step Se2).
- the process here is the same as step Sa2 in the first embodiment. However, here synthesizing the view synthesized picture Syn t for time T1 and T2 each frame.
- the low-pass filter is multiplied by the low-pass filter 504 with respect to the view synthesized image, reduced view synthesized picture LPFSyn t noise is generated (step Se3).
- Any filter may be used as the low-pass filter, but a typical one is a mean value filter.
- the average value filter is a filter that replaces a pixel signal of a certain pixel with an average value of image signals of adjacent pixels.
- the corresponding region search unit 505 divides the viewpoint synthesized image LPFSyn T2 for which the motion vector is to be estimated, into blocks, performs a corresponding region search for each region, and generates a motion vector (step Se4). ⁇ Se7). That is, if the motion estimation unit block index is represented by blk and the total number of motion estimation unit blocks is represented by numBlks, blk is initialized with 0 (step Se4), and 1 is added to blk (step Se6).
- step Se7 the process (step Se5) of searching for the block corresponding to the viewpoint synthesized image LPFSyn T2 [blk] on the viewpoint synthesized image LPFSyn T1 is repeated.
- step Se5 The corresponding area search process (step Se5) is the same as step Sa5 of the first embodiment, except that the frame used is different.
- step Se5 the frame used is different.
- Equations (5) to (8) using the matching cost in which Syn is replaced with LPFSyn T2 and Dec t is replaced with LPFSyn T1 , a set of (best_vec, best_t) represented by Equation (9) is obtained. It is. However, since the search range of t is only T1 in this embodiment, best_t is T1.
- the motion vector smoothing unit 506 smoothes the obtained motion vector set ⁇ MV blk ⁇ so that the spatial correlation is increased (step Se8).
- a set of smoothed vectors becomes an output of the motion vector estimation apparatus 500.
- any method may be used for smoothing the motion vector.
- the average value filter processing here is processing in which the motion vector of the block blk is set as a vector represented by the average value of the motion vectors of the blocks adjacent to the block blk. Since the motion vector here is two-dimensional information, a process for obtaining an average value in each dimension is performed.
- ⁇ v ⁇ represents the norm of v. Any norm may be used, but there are L1 norm and L2 norm as typical norms.
- the L1 norm is the sum of absolute values of each component of v
- the L2 norm is the sum of squares of each component of v. wi is a weight and may be set in any way. For example, a value determined by the following formula (14) may be used.
- FIG. 11 is a block diagram showing the configuration of the motion vector estimation apparatus 500a in this case.
- the motion vector estimation apparatus 500a includes a reliability setting unit 507 in addition to the components included in the motion vector estimation apparatus 500 shown in FIG.
- the configuration of the reliability setting unit 507 is the same as, for example, the configuration of the reliability setting unit 107 illustrated in FIG.
- the motion vector estimation apparatus 500a is different from the motion vector estimation apparatus 500 in that a video is input instead of a frame (image).
- the reliability since the frame for searching for the corresponding region is also a viewpoint composite image, the reliability may be calculated and used for the viewpoint composite image serving as the search space. Furthermore, the reliability may be calculated for each image and used simultaneously.
- the mathematical formulas for calculating the matching costs corresponding to the mathematical formulas (5) to (8) are the following mathematical formulas (15) to (18). Note that ⁇ is the reliability with respect to the viewpoint target image serving as the search space.
- FIG. 12 is a block diagram showing the configuration of the multi-view video encoding apparatus according to the sixth embodiment.
- the multi-view video encoding apparatus 600 includes an encoding target frame input unit 601, an encoding target image memory 602, a reference viewpoint frame input unit 603, a reference viewpoint image memory 604, a viewpoint synthesis unit 605, a low pass.
- the encoding target frame input unit 601 inputs a video frame to be encoded.
- the encoding target image memory 602 stores the input encoding target frame.
- the reference viewpoint frame input unit 603 inputs a video frame for a viewpoint different from the encoding target frame.
- the reference viewpoint image memory 604 stores the input reference viewpoint frame.
- the view synthesis unit 605 generates a view synthesized image for the encoding target frame and the reference frame using the reference view frame.
- the low-pass filter unit 606 applies a low-pass filter to reduce noise included in the viewpoint synthesized video.
- the viewpoint composite image memory 607 stores the viewpoint composite image subjected to the low-pass filter process.
- the reliability setting unit 608 sets the reliability for each pixel of the generated viewpoint composite image.
- the corresponding region search unit 609 is a motion-compensated prediction reference frame for each encoding unit block of the viewpoint composite image, and is a motion that indicates a corresponding block on a frame that has been shot from the same viewpoint as the encoding target frame and has already been encoded. The vector is searched using the viewpoint composite image generated with respect to the reference frame and subjected to the low-pass filter processing and the reliability.
- the motion vector smoothing unit 610 spatially smoothes the motion vector so that the spatial correlation of the motion vector is increased.
- the motion compensation prediction unit 611 generates a motion compensated prediction image using the reference frame according to the determined corresponding block.
- the image encoding unit 612 receives the motion compensated prediction image, predictively encodes the encoding target frame, and outputs encoded data.
- the image decoding unit 613 receives the motion compensated prediction image and the encoded data, decodes the encoding target frame, and outputs a decoded image.
- the decoded image memory 614 stores the decoded image of the encoding target frame.
- FIG. 13 is a flowchart for explaining the operation of the multi-view video encoding apparatus 600 according to the sixth embodiment. The processing executed by the multi-view video encoding apparatus 600 according to the sixth embodiment will be described in detail according to this flowchart.
- n is an index indicating a reference viewpoint
- N is the number of reference viewpoints available here.
- t is an index indicating the shooting time of the frame, and indicates either the shooting time (T) of the encoding target frame Org or the shooting time (T1, T2,..., Tm) of the reference frame.
- T shooting time
- T1, T2,..., Tm shooting time
- m indicates the number of reference frames.
- the viewpoint synthesis unit 605 synthesizes an image shot at the same viewpoint as the encoding target frame for each shooting time using the information of the reference viewpoint frame (step Sf2).
- the process here is the same as step Sa2 in the first embodiment.
- the time T, T1, T2, ... synthesizes the view synthesized picture Syn t for Tm each frame.
- the low pass filter 606 a low pass filter is applied to the view synthesized picture, reduced view synthesized picture LPFSyn t noise is generated and stored in the view synthesized picture memory 607 (Step Sf3).
- Any filter may be used as the low-pass filter, but a typical one is a mean value filter.
- the average value filter is a filter that sets an output pixel signal of a certain pixel as an average value of input image signals of adjacent pixels.
- the reliability setting unit 608 generates, for each pixel of the viewpoint composite image, a reliability ⁇ indicating how accurately the composition for the pixel can be realized (step Sf4).
- the process here is the same as step Sa3 in the first embodiment.
- part of the processing for obtaining corresponding point information and depth information is the same as part of the reliability calculation. May be. In such a case, it is possible to reduce the amount of calculation by simultaneously performing the viewpoint composite image generation and the reliability calculation.
- the encoding target frame is divided into blocks, and the corresponding region search unit 609 performs corresponding region search for each region (step Sf5).
- the index of the divided block is represented as blk.
- the corresponding area search process (step Sf5) is the same as step Sa5 of the first embodiment, except that the frame used is different. That is, in the processing of obtaining the set of (best_vec, best_t) represented by the mathematical formula (9) using the matching cost in which the Syn is replaced with LPFSyn T and the Dec is replaced with LPFSyn in the mathematical formulas (5) to (8). is there. However, in this embodiment, the search range of t is T1 to Tm.
- the motion vector smoothing unit 610 smoothes the obtained motion vector set ⁇ MV blk ⁇ so as to increase the spatial correlation (step Sf6).
- the processing here is the same as step Se8 of the fifth embodiment.
- the time and direction in which the motion of the subject represented by the motion vector occurs differs depending on the selected reference frame.
- the time direction of motion means whether it is a past motion or a future motion starting from the encoding target frame. For this reason, when performing the average value processing or the median processing, it is necessary to perform calculation using only the motion vector having the same reference frame.
- the average value is calculated using only motion vectors of adjacent blocks and the same reference frame.
- the vector median filter it is necessary to define a set X of motion vectors as a set of vectors using the same reference frame as the motion vector MV blk with the motion vectors of the surrounding blocks.
- a motion compensation prediction signal Pred is generated by the motion compensation prediction unit 611 according to the obtained motion vector (step Sf7).
- the process here is the same as step Sa6 of the first embodiment.
- a motion compensated prediction signal for the entire frame is generated.
- the image encoding unit 612 predictively encodes the encoding target frame Org using the motion compensated prediction signal Pred. Specifically, the residual signal Res represented by the difference between the encoding target frame Org and the motion compensated prediction signal Pred is obtained and encoded (step Sf8). Any method may be used for encoding the residual signal. For example, the H.P. In H.264, encoding is performed by sequentially performing frequency conversion such as DCT, quantization, binarization, and entropy encoding. This encoded result data is the output of the multi-view video encoding apparatus 600 according to the sixth embodiment.
- the data of the encoding result is decoded by the image decoding unit 613 for use in prediction when encoding subsequent frames.
- the encoded prediction residual signal is decoded (step Sf9), and the motion compensated prediction signal Pred is added to the obtained decoded prediction residual signal DecRes to generate the local decoded image Dec cur . (Step Sf10).
- the obtained local decoded image is stored in the decoded image memory 614.
- a method for decoding encoded data obtained by a technique used in encoding is used. H. In the case of H.264, a decoded prediction residual signal is obtained by performing processing in the order of entropy decoding, inverse binarization, inverse quantization, and inverse frequency transform such as IDCT.
- encoding processing and decoding processing may be performed on the entire frame. It may be performed for each block as in H.264. When these processes are performed for each block, it is possible to reduce the amount of temporary memory for accumulating the motion compensation prediction signal by repeatedly performing steps Sf7, Sf8, Sf9, and Sf10 for each block. .
- this embodiment does not use the reference frame itself to obtain the corresponding region on the reference frame, and generates a viewpoint composite image generated for the reference frame. Is used to find the corresponding area.
- the view synthesized image Syn and the decoded image Dec are considered to be substantially equal. Therefore, even when the view synthesized image Syn is used, the effect of the present embodiment can be similarly obtained.
- the viewpoint synthesized image is continuously accumulated in the viewpoint synthesized image memory while the processed frames are accumulated in the decoded image memory.
- the processed region stored in the decoded image memory is not required in the corresponding region search, so the corresponding region search process needs to be performed in synchronization with the encoding process and the decoding process. Will disappear. As a result, parallel calculation or the like can be performed, and an effect of reducing the entire calculation time can be obtained.
- FIG. 14 is a block diagram showing the configuration of the multi-view video encoding apparatus according to the seventh embodiment.
- the multi-view video encoding apparatus 700 includes an encoding target frame input unit 701, an encoding target image memory 702, a motion estimation unit 703, a motion compensation prediction unit 704, an image encoding unit 705, and an image decoding.
- Unit 706 decoded image memory 707, reference viewpoint frame input unit 708, viewpoint synthesis unit 709, low pass filter unit 710, viewpoint synthesized image memory 711, corresponding region search unit 712, vector smoothing unit 713, prediction vector generation unit 714, vector An information encoding unit 715 and a motion vector memory 716 are provided.
- the encoding target frame input unit 701 inputs a video frame to be encoded.
- the encoding target image memory 702 stores the input encoding target frame.
- the motion estimation unit 703 estimates the motion between the encoding target frame and the reference frame for each encoding unit block of the encoding target frame.
- the motion compensation prediction unit 704 generates a motion compensated prediction image based on the motion estimation result.
- the image encoding unit 705 receives the motion compensated prediction image, predictively encodes the encoding target frame, and outputs encoded data.
- the image decoding unit 706 receives the motion compensated prediction image and the encoded data, decodes the encoding target frame, and outputs a decoded image.
- the decoded image memory 707 stores the decoded image of the encoding target frame.
- the reference viewpoint frame input unit 708 inputs a video frame for a viewpoint different from the encoding target frame.
- the view synthesis unit 709 generates a view synthesized image for the encoding target frame and the reference frame using the reference view frame.
- the low-pass filter unit 710 applies a low-pass filter to reduce noise included in the viewpoint synthesized video.
- the viewpoint composite image memory 711 stores the viewpoint composite image subjected to the low-pass filter process.
- the corresponding region search unit 712 is a reference frame for motion compensated prediction for each coding unit block of the viewpoint composite image, and is a vector indicating a corresponding block on a frame that has been captured from the same viewpoint as that of the current frame to be encoded Are searched for using the viewpoint composite image generated for the reference frame and subjected to the low-pass filter processing.
- the vector smoothing unit 713 generates an estimated vector by spatially smoothing the vector so that the spatial correlation of the obtained vector is increased.
- the prediction vector generation unit 714 generates a prediction vector for the motion vector of the encoding target block from the motion vector and the estimation vector used for motion compensation in the adjacent block.
- the vector information encoding unit 715 predictively encodes the motion vector using the generated prediction vector.
- the motion vector memory 716 stores motion vectors.
- FIG. 15 is a flowchart for explaining the operation of the multi-view video encoding apparatus 700 according to the seventh embodiment. The processing executed by the multi-view video encoding apparatus 700 according to the seventh embodiment will be described in detail according to this flowchart.
- the encoding target frame Org is input from the encoding target frame input unit 701 and stored in the encoding target image memory 702 (step Sg1).
- the encoding target frame is divided into blocks, and the video signal of the encoding target frame is encoded while performing motion compensation prediction for each region (steps Sg2 to Sg5).
- the encoding target block index is represented by blk.
- the motion estimation unit 703 finds a block on the reference frame corresponding to the encoding target block Org [blk] for each block blk (step Sg2). This process is called motion prediction and is the same as step Sb4 of the second embodiment.
- a two-dimensional vector indicating the displacement from the block blk used to represent the corresponding block is referred to as a motion vector and is represented as mv in the seventh embodiment.
- the motion vector mv is stored in the motion vector memory 716 for use in later block processing.
- H When a reference frame is selected for each block as in H.264, information indicating the selected reference frame is also stored in the motion vector memory 716.
- the motion compensation prediction unit 704 When the motion estimation is completed, the motion compensation prediction unit 704 generates a motion compensation prediction signal Pred for the encoding target frame Org (step Sg3).
- the processing here is the same as step Sb5 of the second embodiment.
- the image encoding unit 705 predictively encodes the encoding target frame using the motion compensation prediction signal Pred (step Sg4).
- the processing here is the same as step Sb6 of the second embodiment.
- the encoding result data becomes part of the output of the multi-view video encoding apparatus 700 according to the seventh embodiment.
- the encoding result data is decoded by the image decoding unit 706 to be used for prediction when encoding subsequent frames (step Sg5).
- the process here is the same as the process of step Sb7 and step Sb8 of 2nd Embodiment.
- the decoded local decoded image Dec cur is stored in the decoded image memory 707.
- steps Sg3 to Sg5 may be repeated for each block.
- the motion compensated prediction signal since the motion compensated prediction signal only needs to be held in units of blocks, it is possible to temporarily reduce the amount of memory used.
- the motion vector mv for generating the motion compensated prediction signal used for the encoding is encoded.
- n is an index indicating a reference viewpoint
- N is the number of reference viewpoints available here.
- t is an index indicating the shooting time of the frame, and indicates either the shooting time (T) of the encoding target frame Org or the shooting time (T1, T2,..., Tm) of the reference frame.
- T shooting time
- T1, T2,..., Tm shooting time
- m indicates the number of reference frames.
- the viewpoint synthesis unit 709 synthesizes an image shot at the same viewpoint as the encoding target frame for each shooting time using the information of the reference viewpoint frame (step Sg7).
- the process here is the same as step Sf2 of the sixth embodiment.
- Step Sg8 Once view synthesized picture Syn Synthesis of t is completed, the low pass filter 710, a low pass filter is applied to the view synthesized picture, reduced view synthesized picture LPFSyn t noise is generated and stored in the view synthesized picture memory 711 (Step Sg8).
- the process here is the same as step Sf3 of the sixth embodiment.
- the viewpoint synthesized image LPFSyn T generated for the encoding target frame is divided into blocks, and the corresponding region search unit 712 performs corresponding region search for each region (step Sg9).
- the view synthesized image LPFSyn T is divided into blocks, in step Sg3, the division is performed at the same block position and size as the block for which motion compensation prediction is performed.
- Expression (5) to Expression (8) is satisfied with Expression (9) using the matching cost in which Syn is replaced with LPFSyn T and Dec is replaced with LPFSyn (best_vec, best_t).
- best_vec is obtained for each of T1 to Tm as t. That is, a set of best_vec is obtained for each block.
- the reliability may be calculated and used as shown in the sixth embodiment.
- the motion vector smoothing unit 713 smoothes the obtained vector set ⁇ MV blk ⁇ so as to increase the spatial correlation, whereby a set of estimated vectors ⁇ vec (blk, t ) ⁇ Is generated (step Sg10).
- the processing here is the same as step Se8 of the fifth embodiment. Note that the smoothing process is performed at each photographing time of the reference frame.
- the prediction vector generation unit 714 uses, for each block, the motion vector used in the adjacent block of the processing block stored in the motion vector memory 716 and the estimated vector of the processing block.
- a prediction vector pmv for the motion vector mv of the block to be encoded is generated (step Sg11).
- the process here is the same as step Sb11 in the second embodiment.
- a prediction vector generation method considering the reference frame of each vector may be used.
- the following method may be used as a prediction vector generation method considering a vector reference frame.
- the reference frame of the motion vector of the processing block is compared with the reference frame of the motion vector used in the adjacent block of the processing block, and the reference frame of the motion vector used in the adjacent block refers to the motion vector of the processing block.
- a motion vector matching the frame is set as a prediction vector candidate.
- an estimated vector having a matching reference frame in the processing block is set as a prediction vector.
- a vector closest to the estimated vector with the matching reference frame in the processing block is set as the predicted vector.
- vectors that are more than a certain distance away from the estimated vector matching the reference frame in the processing block may be excluded.
- an estimated vector having a matching reference frame in the processing block is set as a predicted vector.
- the following method may be used as a prediction vector generation method considering a vector reference frame.
- a set of blocks having the same reference frame is defined in the peripheral blocks of the processing block.
- an estimated vector having a matching reference frame in the processing block is set as a prediction vector.
- this set is not an empty set, for each block included in the set, the similarity between the estimated vector matching the reference frame in the block and the estimated vector matching the reference frame in the processing block is calculated. Then, the motion vector of the block having the highest similarity is used as the prediction vector.
- the similarity is less than a certain value for all the blocks, an estimated vector that matches the reference frame in the processing block may be used as the prediction vector.
- an average vector of motion vectors for these blocks may be used as a prediction vector.
- the motion vector mv is predictively encoded by the vector information encoding unit 715 for each block (step Sg12).
- the process here is the same as step Sb12 of the second embodiment.
- the encoding result is one of outputs from the multi-view video encoding apparatus 700.
- steps Sg ⁇ b> 11 and Sg ⁇ b> 12 are shown as an example in units of frames.
- the encoding order in step Sg12 should be taken into consideration so that only the encoded block is used as an adjacent block. This is to prevent information before decoding from being required at the time of decoding so that decoding cannot be performed.
- step Sg11 and step Sg12 may be performed alternately for each block. In that case, it is possible to identify the encoded adjacent region without considering the encoding order.
- the prediction vector since the prediction vector only needs to be held in units of blocks, it is possible to reduce the amount of memory temporarily used.
- a vector is generated for each reference frame in step Sg9.
- the vector may be generated only for the reference frame of the motion vector for the processing block, or the vector is generated only for the reference frame of the motion vector for either the processing block or a peripheral block of the processing block. It does n’t matter. By doing in this way, the calculation cost of step Sg9 can be reduced.
- the vector smoothing in step Sg10 needs to be performed using only the motion vector having the same reference frame as in step Sf6 of the sixth embodiment.
- FIG. 16 is a block diagram showing the configuration of the multi-view video decoding apparatus according to the eighth embodiment.
- the multi-view video decoding apparatus 800 includes an encoded data input unit 801, an encoded data memory 802, a reference viewpoint frame input unit 803, a reference viewpoint image memory 804, a viewpoint synthesis unit 805, and a low-pass filter unit 806.
- the encoded data input unit 801 inputs encoded data of a video frame to be decoded.
- the encoded data memory 802 stores the input encoded data.
- the reference viewpoint frame input unit 803 inputs a video frame for a viewpoint different from the decoding target frame.
- the reference viewpoint image memory 804 stores the input reference viewpoint frame.
- the view synthesis unit 805 uses the reference view frame to generate a view synthesized image for the decoding target frame and the reference frame.
- the low pass filter unit 806 applies a low pass filter to reduce noise included in the viewpoint synthesized video.
- the viewpoint composite image memory 807 stores the viewpoint composite image subjected to the low-pass filter process.
- the reliability setting unit 808 sets the reliability for each pixel of the generated viewpoint composite image.
- the corresponding region search unit 809 is a reference frame for motion compensation prediction for each decoding unit block of the viewpoint composite image, and a motion vector indicating a corresponding block on a frame that has been shot at the same viewpoint as the decoding target frame and has already been decoded, Search is performed using the viewpoint composite image generated for the reference frame and subjected to low-pass filter processing and the reliability.
- the motion vector smoothing unit 810 spatially smoothes the motion vector so that the spatial correlation of the motion vector is increased.
- the motion compensation prediction unit 811 generates a motion compensated prediction image using the reference frame according to the determined corresponding block.
- the image decoding unit 812 receives the motion compensated prediction image and the encoded data, decodes the decoding target frame, and outputs a decoded image.
- the decoded image memory 813 stores the decoded image of the decoding target frame.
- FIG. 17 is a flowchart for explaining the operation of the multi-view video decoding apparatus 800 according to the eighth embodiment. The processing executed by the multi-view video decoding apparatus 800 according to the eighth embodiment will be described in detail according to this flowchart.
- the viewpoint synthesis unit 805 synthesizes an image shot at the same viewpoint as the decoding target frame for each shooting time using the information of the reference viewpoint frame (step Sh2).
- the process here is the same as step Sf2 of the sixth embodiment. That is, where the time T, T1, T2, ..., synthesizes the view synthesized picture Syn t for Tm each frame.
- the process here is the same as step Sf3 of the sixth embodiment.
- Any filter may be used as the low-pass filter, but a typical one is a mean value filter.
- the average value filter is a filter that sets an output pixel signal of a certain pixel as an average value of input image signals of adjacent pixels.
- the reliability setting unit 808 generates, for each pixel of the viewpoint composite image, a reliability ⁇ indicating how much the composition for the pixel can be realized (step Sh4).
- the process here is the same as step Sf4 of the sixth embodiment.
- part of the processing for obtaining corresponding point information and depth information is the same as part of the reliability calculation. May be. In such a case, it is possible to reduce the amount of calculation by simultaneously performing the viewpoint composite image generation and the reliability calculation.
- the corresponding area search unit 809 searches for the corresponding area for each predetermined block (step Sh5).
- the index of a block is represented as blk. The process here is the same as step Sf5 of the sixth embodiment.
- the motion vector smoothing unit 810 smoothes the obtained motion vector set ⁇ MV blk ⁇ so as to increase the spatial correlation (step Sh6).
- the process here is the same as step Sf6 of the sixth embodiment.
- the motion compensation prediction signal Pred is generated by the motion compensation prediction unit 811 according to the obtained motion vector (step Sh7).
- the process here is the same as step Sf7 of the sixth embodiment.
- the image decoding unit 812 decodes the decoding target frame (decoded image) Dec cur from the input encoded data using the motion compensated prediction signal Pred (step Sh8).
- the processing here is the same as the combination of step Sf9 and step Sf10 of the sixth embodiment, and decoding is performed by the reverse processing of the processing performed by the method used for encoding.
- the generated decoded image becomes an output of the multi-view video decoding apparatus 800 and is stored in the decoded image memory 813 for use in prediction in subsequent frames.
- the decoding process may be performed on the entire frame. It may be performed for each block as in H.264.
- the amount of temporary memory for accumulating the motion compensation prediction signal can be reduced by alternately performing Step Sh7 and Step Sh8 for each block.
- FIG. 18 is a block diagram showing the configuration of the multi-view video decoding apparatus according to the ninth embodiment.
- a multi-view video decoding apparatus 900 includes an encoded data input unit 901, an encoded data memory 902, a reference view frame input unit 903, a view synthesis unit 904, a low-pass filter unit 905, a view synthesized image memory 906, a corresponding area.
- a search unit 907, a vector smoothing unit 908, a prediction vector generation unit 909, a motion vector decoding unit 910, a motion vector memory 911, a motion compensation prediction unit 912, an image decoding unit 913, and a decoded image memory 914 are provided.
- the encoded data input unit 901 inputs encoded data of a video frame to be decoded.
- the encoded data memory 902 stores input encoded data.
- the reference view frame input unit 903 inputs a video frame for a reference view different from the decoding target frame.
- the view synthesis unit 904 generates a view synthesized image for the decoding target frame and the reference frame using the reference view frame.
- the low-pass filter unit 905 applies a low-pass filter to reduce noise included in the viewpoint synthesized video.
- the viewpoint composite image memory 906 stores the viewpoint composite image subjected to the low-pass filter process.
- the corresponding region search unit 907 is a reference frame for motion compensation prediction for each decoding unit block of the viewpoint composite image, and refers to a vector indicating a corresponding block on a frame that has been shot at the same viewpoint as the decoding target frame and has already been decoded. Search is performed using a viewpoint composite image generated for a frame and subjected to low-pass filter processing.
- the vector smoothing unit 908 spatially smoothes the vectors so as to increase the spatial correlation of the obtained vectors, and generates an estimated vector.
- the prediction vector generation unit 909 generates a prediction vector for the motion vector of the decoding target block from the motion vector and the estimation vector used for motion compensation in the adjacent block of the decoding target block.
- the motion vector decoding unit 910 decodes the motion vector that has been predictively encoded from the encoded data, using the generated prediction vector.
- the motion vector memory 911 stores the decoded motion vector.
- the motion compensation prediction unit 912 generates a motion compensated prediction image based on the decoded motion vector.
- the image decoding unit 913 receives the motion compensated prediction image, decodes the decoding target frame that has been predictively encoded, and outputs the decoded image.
- the decoded image memory 914 stores decoded images.
- FIG. 19 is a flowchart for explaining the operation of the multi-view video decoding apparatus 900 according to the ninth embodiment. The processing executed by the multi-view video decoding apparatus 900 according to the ninth embodiment will be described in detail according to this flowchart.
- the encoded data includes at least two types of data, ie, a prediction residual of a video signal and a prediction residual of a motion vector used for video prediction.
- the viewpoint synthesis unit 904 synthesizes an image shot at the same viewpoint as the decoding target frame for each shooting time using the information of the reference viewpoint frame (step Si2).
- the process here is the same as step Sg7 of the seventh embodiment.
- Step Si3 the low pass filter section 905
- a low pass filter is applied to the view synthesized picture
- reduced view synthesized picture LPFSyn t noise is generated and stored in the view synthesized picture memory 906 (Step Si3).
- the process here is the same as step Sg8 of the seventh embodiment.
- the viewpoint synthesized image LPFSyn T generated for the decoding target frame is divided into blocks, and the corresponding area search unit 907 performs corresponding area search for each area (step Si4).
- the process here is the same as step Sg9 of the seventh embodiment.
- the present embodiment does not use the reliability of viewpoint synthesis, the reliability may be calculated and used as shown in the sixth embodiment.
- the motion vector smoothing unit 908 smoothes the obtained vector set ⁇ MV blk ⁇ so as to increase the spatial correlation, thereby obtaining a set of estimated vectors ⁇ vec (blk, t ) ⁇ Is generated (step Si5).
- the process here is the same as step Sg10 of 7th Embodiment. Note that the smoothing process is performed at each photographing time of the reference frame.
- the video signal and motion vector of the decoding target frame are decoded for each predetermined block (steps Si6 to Si13).
- the decoding target block index is represented by blk and the total number of decoding target blocks is represented by numBlks
- blk is initialized to 0 (step Si6)
- 1 is added to blk (step Si12)
- blk is added to numBlks.
- the following processing (steps Si7 to Si11) is repeated until it becomes (step Si13).
- the prediction vector generation unit 909 performs decoding using the motion vector used in the block adjacent to the decoding target block stored in the motion vector memory 911 and the estimated vector.
- a prediction vector pmv for the motion vector mv of the target block is generated (step Si7).
- the process here is the same as step Sg11 of the seventh embodiment. However, in this embodiment, a prediction vector is generated only for the block blk, not the entire frame. The same method as that used at the time of encoding is used to generate a prediction vector.
- the motion vector decoding unit 910 decodes the motion vector mv in the decoding target block blk from the encoded data (step Si8).
- the motion vector mv is predictively encoded using the prediction vector pmv, and the prediction residual vector dmv is decoded from the encoded data, and the motion vector mv is obtained by adding the prediction vector pmv to the prediction residual vector dmv.
- the decoded motion vector mv is sent to the motion compensation prediction unit 912 and stored in the motion vector memory 911, and is used when decoding the motion vector of the subsequent decoding target block.
- the motion compensated prediction unit 912 When the motion vector for the decoding target block is obtained, the motion compensated prediction unit 912 generates a motion compensated prediction signal Pred [blk] for the decoding target block (step Si9). This process is the same as step Sg3 of the seventh embodiment.
- the image decoding unit 913 decodes the decoding target frame that has been predictively encoded. Specifically, the prediction residual signal DecRes is decoded from the encoded data (step Si10), the motion compensated prediction signal Pred is added to the obtained decoded prediction residual DecRes, and the decoded image Dec cur [blk for the block blk is added. ] Is generated (step Si11). The generated decoded image becomes an output of the multi-view video decoding apparatus 900 and is stored in the decoded image memory 914 for use in prediction in subsequent frames.
- the low-pass filter processing and the motion vector smoothing processing for the viewpoint synthesized image are performed by noise such as film grain and coding distortion in the reference viewpoint frame and synthesis distortion in the viewpoint synthesis.
- noise such as film grain and coding distortion in the reference viewpoint frame and synthesis distortion in the viewpoint synthesis.
- the corresponding region search accuracy is prevented from being lowered.
- the amount of these noises is small, it is possible to obtain the corresponding region with high accuracy without performing low-pass filter processing or motion vector smoothing processing. In such a case, the total amount of calculation can be reduced by omitting the low-pass filter processing and the motion vector smoothing processing of the fifth to ninth embodiments described above.
- motion compensation prediction has been described, but the idea of the present invention can be applied to all inter-frame prediction. That is, if the reference frame is a frame shot by another camera, the corresponding area search estimates the parallax. In addition, if the reference frame is a frame taken at a different time by a different camera, a vector including both motion and parallax is estimated. Further, the present invention can be applied to the case where the reference area is determined in the frame, such as fractal coding.
- all blocks are encoded using inter-frame prediction.
- the encoding may be performed using a different prediction method for each block such as H.264.
- the present invention applies only to blocks that use inter-frame prediction.
- a block that performs inter-frame prediction can also be encoded while switching between the conventional method and the method of the present invention. In that case, it is necessary to convey information indicating which method is used to the decoding side by some method.
- the processing described above can also be realized by a computer and a software program. Further, the program can be provided by being recorded on a computer-readable recording medium or can be provided through a network.
- the multi-view video encoding device and the multi-view video decoding device have been mainly described. However, steps corresponding to the operations of the respective units of the multi-view video encoding device and the multi-view video decoding device.
- the multi-view video encoding method and multi-view video decoding method of the present invention can be realized.
- the present invention is used, for example, for encoding and decoding multi-view video. According to the present invention, it is possible to accurately estimate a motion vector even in a situation where a processed image cannot be obtained. Further, by using temporal correlation in video signal prediction, efficient multi-view video coding can be realized by simultaneously using inter-camera correlation and temporal correlation.
- Multi-view video encoding device 101 201 Encoding target frame input unit 102, 202 Encoding target image memory 103, 203 Reference viewpoint frame input unit 104 Reference viewpoint image memory 105, 204 View synthesis unit 106, 205 View synthesis Image memory 107 Reliability setting unit 108, 211 Corresponding region search unit 109, 207 Motion compensation prediction unit 110 Prediction residual encoding unit 111 Prediction residual decoding unit 112, 210 Decoded image memory 113 Prediction residual calculation unit 114 Decoded image calculation Unit 206 motion estimation unit 208 image encoding unit 209 image decoding unit 212 prediction vector generation unit 213 vector information encoding unit 214 motion vector memory 300, 400 multi-view video decoding device 301, 401 encoded data input unit 302, 402 encoding Data memory 303, 03 Reference view frame input unit 304 Reference view image memory 305, 404 View synthesis unit 306, 405 View synthesis image memory 307 Reliability setting unit 308, 406 Corresponding region search unit 309,
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
Abstract
Description
本願は、2010年2月23日に日本へ出願された日本特願2010-037434号に対して優先権を主張し、その内容をここに援用する。
まず、本発明の第1実施形態について説明する。
図1は、本第1実施形態に係る多視点映像符号化装置の構成を示すブロック図である。図1に示すように、多視点映像符号化装置100は、符号化対象フレーム入力部101、符号化対象画像メモリ102、参照視点フレーム入力部103、参照視点画像メモリ104、視点合成部105、視点合成画像メモリ106、信頼度設定部107、対応領域探索部108、動き補償予測部109、予測残差符号化部110、予測残差復号部111、復号画像メモリ112、予測残差算出部113、及び復号画像算出部114を備えている。
次に、本発明の第2実施形態について説明する。
図3は、本第2実施形態による多視点映像符号化装置の構成を示すブロック図である。図3に示すように、多視点映像符号化装置200は、符号化対象フレーム入力部201、符号化対象画像メモリ202、参照視点フレーム入力部203、視点合成部204、視点合成画像メモリ205、動き推定部206、動き補償予測部207、画像符号化部208、画像復号部209、復号画像メモリ210、対応領域探索部211、予測ベクトル生成部212、ベクトル情報符号化部213、及び動きベクトルメモリ214を備えている。
次に、本発明の第3実施形態について説明する。
図5は、本第3実施形態による多視点映像復号装置の構成を示すブロック図である。図5に示すように、多視点映像復号装置300は、符号化データ入力部301、符号化データメモリ302、参照視点フレーム入力部303、参照視点画像メモリ304、視点合成部305、視点合成画像メモリ306、信頼度設定部307、対応領域探索部308、動き補償予測部309、予測残差復号部310、復号画像メモリ311、及び復号画像算出部312を備えている。
次に、本発明の第4実施形態について説明する。
図7は、本第4実施形態による多視点映像復号装置の構成を示すブロック図である。図7において、多視点映像復号装置400は、符号化データ入力部401、符号化データメモリ402、参照視点フレーム入力部403、視点合成部404、視点合成画像メモリ405、対応領域探索部406、予測ベクトル生成部407、動きベクトル復号部408、動きベクトルメモリ409、動き補償予測部410、画像復号部411、及び復号画像メモリ412を備えている。
次に、本発明の第5実施形態について説明する。
図9は、本第5実施形態による動きベクトル推定装置の構成を示すブロック図である。図9に示すように、動きベクトル推定装置500は、参照視点映像入力部501、カメラ情報入力部502、視点合成部503、ローパスフィルタ部504、対応領域探索部505、及び動きベクトル平滑化部506を備えている。
次に、本発明の第6実施形態について説明する。
図12は、本第6実施形態に係る多視点映像符号化装置の構成示すブロック図である。図12に示すように、多視点映像符号化装置600は、符号化対象フレーム入力部601、符号化対象画像メモリ602、参照視点フレーム入力部603、参照視点画像メモリ604、視点合成部605、ローパスフィルタ部606、視点合成画像メモリ607、信頼度設定部608、対応領域探索部609、動きベクトル平滑化部610、動き補償予測部611、画像符号化部612、画像復号部613、及び復号画像メモリ614を備えている。
次に、本発明の第7実施形態について説明する。
図14は、本第7実施形態による多視点映像符号化装置の構成を示すブロック図である。図7に示すように、多視点映像符号化装置700は、符号化対象フレーム入力部701、符号化対象画像メモリ702、動き推定部703、動き補償予測部704、画像符号化部705、画像復号部706、復号画像メモリ707、参照視点フレーム入力部708、視点合成部709、ローパスフィルタ部710、視点合成画像メモリ711、対応領域探索部712、ベクトル平滑化部713、予測ベクトル生成部714、ベクトル情報符号化部715、及び動きベクトルメモリ716を備えている。
次に、本発明の第8実施形態について説明する。
図16は、本第8実施形態による多視点映像復号装置の構成を示すブロック図である。図16に示すように、多視点映像復号装置800は、符号化データ入力部801、符号化データメモリ802、参照視点フレーム入力部803、参照視点画像メモリ804、視点合成部805、ローパスフィルタ部806、視点合成画像メモリ807、信頼度設定部808、対応領域探索部809、動きベクトル平滑化部810、動き補償予測部811、画像復号部812、及び復号画像メモリ813を備えている。
次に、本発明の第9実施形態について説明する。
図18は、本第9実施形態による多視点映像復号装置の構成を示すブロック図である。図9において、多視点映像復号装置900は、符号化データ入力部901、符号化データメモリ902、参照視点フレーム入力部903、視点合成部904、ローパスフィルタ部905、視点合成画像メモリ906、対応領域探索部907、ベクトル平滑化部908、予測ベクトル生成部909、動きベクトル復号部910、動きベクトルメモリ911、動き補償予測部912、画像復号部913、及び復号画像メモリ914を備えている。
101、201 符号化対象フレーム入力部
102、202 符号化対象画像メモリ
103、203 参照視点フレーム入力部
104 参照視点画像メモリ
105、204 視点合成部
106、205 視点合成画像メモリ
107 信頼度設定部
108、211 対応領域探索部
109、207 動き補償予測部
110 予測残差符号化部
111 予測残差復号部
112、210 復号画像メモリ
113 予測残差算出部
114 復号画像算出部
206 動き推定部
208 画像符号化部
209 画像復号部
212 予測ベクトル生成部
213 ベクトル情報符号化部
214 動きベクトルメモリ
300、400 多視点映像復号装置
301、401 符号化データ入力部
302、402 符号化データメモリ
303、403 参照視点フレーム入力部
304 参照視点画像メモリ
305、404 視点合成部
306、405 視点合成画像メモリ
307 信頼度設定部
308、406 対応領域探索部
309、410 動き補償予測部
310 予測残差復号部
311、412 復号画像メモリ
312 復号画像算出部
407 予測ベクトル生成部
408 動きベクトル復号部
409 動きベクトルメモリ
411 画像復号部
500、500a 動きベクトル推定装置
600、700 多視点映像符号化装置
800、900 多視点映像復号装置
Claims (19)
- 多視点映像に含まれる処理画像を撮影した処理カメラとは別のカメラで撮影された参照カメラ映像から、前記処理カメラと同じ設定に従って、前記処理画像が撮影された時刻における視点合成画像を生成する視点合成画像生成ステップと、
前記処理画像を用いずに、前記処理画像上の処理領域に該当する前記視点合成画像上の画像信号を用いて、前記処理カメラで撮影された参照画像における対応領域を探索することで、動きベクトルを推定する対応領域推定ステップと
を含む動きベクトル推定方法。 - 前記視点合成画像の各画素について、前記視点合成画像の確からしさを示す信頼度を設定する信頼度設定ステップを更に含み、
前記対応領域推定ステップは、前記信頼度に基づいて前記対応領域を探索する際のマッチングコストに重みを付ける、
請求項1に記載の動きベクトル推定方法。 - 多視点映像の予測符号化を行う多視点映像符号化方法であって、
前記多視点映像のある符号化対象視点とは異なる参照視点で、符号化対象フレームと同時刻に撮影された既に符号化済みの参照視点フレームから、前記符号化対象視点における視点合成画像を生成する視点合成画像生成ステップと、
前記視点合成画像の各符号化単位ブロックに対して、前記符号化対象視点における既に符号化済みの参照フレーム上の対応領域を探索することで動きベクトルを推定する動きベクトル推定ステップと、
前記推定された動きベクトルと前記参照フレームとを用いて、前記符号化対象フレームに対する動き補償予測画像を生成する動き補償予測画像生成ステップと、
前記符号化対象フレームと前記動き補償予測画像との差分信号を符号化する残差符号化ステップと
を含む多視点映像符号化方法。 - 前記視点合成画像の各画素について、前記視点合成画像の確からしさを示す信頼度を設定する信頼度設定ステップを更に含み、
前記動きベクトル推定ステップは、前記信頼度に基づいて前記対応領域を探索する際の各画素のマッチングコストに重みを付ける、
請求項3に記載多視点映像符号化方法。 - 前記符号化対象フレームの各符号化単位ブロックに対して、前記参照フレームとの間で対応領域を探索することで最適な動きベクトルを生成する動き探索ステップと、
前記動きベクトルと前記最適な動きベクトルとの差ベクトルを符号化する差ベクトル符号化ステップとを更に含み、
前記動き補償予測画像生成ステップは、前記最適な動きベクトルと前記参照フレームとを用いて前記動き補償予測画像を生成する、
請求項3または4に記載の多視点映像符号化方法。 - 前記動きベクトルと、符号化対象領域に隣接する領域で使用された最適な動きベクトル群とを用いて、予測ベクトルを生成する予測ベクトル生成ステップを更に含み、
前記差ベクトル符号化ステップは、前記予測ベクトルと前記最適な動きベクトルの差ベクトルを符号化する、
請求項5に記載の多視点映像符号化方法。 - 多視点映像のある視点に対する映像の符号化データを復号する多視点映像復号方法であって、
復号対象視点とは異なる参照視点で、復号対象フレームと同時刻に撮影された参照視点フレームから、前記復号対象視点における視点合成画像を生成する視点合成画像生成ステップと、
前記視点合成画像の各復号単位ブロックに対して、前記復号対象視点における既に復号済みの参照フレーム上の対応領域を探索することで動きベクトルを推定する動きベクトル推定ステップと、
前記推定された動きベクトルと前記参照フレームとを用いて、前記復号対象フレームに対する動き補償予測画像を生成する動き補償予測画像生成ステップと、
前記動き補償予測画像を予測信号として用いて、予測符号化されている前記復号対象フレームを前記符号化データから復号する画像復号ステップと
を含む多視点映像復号方法。 - 前記視点合成画像の各画素について、前記視点合成画像の確からしさを示す信頼度を設定する信頼度設定ステップを更に含み、
前記動きベクトル推定ステップは、前記信頼度に基づいて前記対応領域を探索する際の各画素のマッチングコストに重みを付ける、
請求項7に記載の多視点映像復号方法。 - 前記動きベクトルを予測ベクトルとして用いて、予測符号化されている最適な動きベクトルを前記符号化データから復号するベクトル復号ステップを更に含み、
前記動き補償予測画像生成ステップは、前記最適な動きベクトルと前記参照フレームとを用いて前記動き補償予測画像を生成する、
請求項7または8に記載の多視点映像復号方法。 - 前記動きベクトルと、復号対象領域に隣接する領域で使用された最適な動きベクトル群とを用いて、推定予測ベクトルを生成する予測ベクトル生成ステップを更に含み、
前記ベクトル復号ステップは、前記推定予測ベクトルを前記予測ベクトルとして用いて、前記最適な動きベクトルを復号する、
請求項9に記載の多視点映像復号方法。 - 多視点映像に含まれる処理画像を撮影した処理カメラとは別のカメラで撮影された参照カメラ映像から、前記処理カメラと同じ設定に従って、前記処理画像が撮影された時刻における視点合成画像を生成する視点合成画像生成手段と、
前記処理画像を用いずに、前記処理画像上の処理領域に該当する前記視点合成画像上の画像信号を用いて、前記処理カメラで撮影された参照画像における対応領域を探索することで、動きベクトルを推定する対応領域推定手段と
を備える動きベクトル推定装置。 - 前記視点合成画像の各画素について、前記視点合成画像の確からしさを示す信頼度を設定する信頼度設定手段を更に備え、
前記対応領域推定手段は、前記信頼度に基づいて前記対応領域を探索する際のマッチングコストに重みを付ける、
請求項11に記載の動きベクトル推定装置。 - 多視点映像の予測符号化を行う多視点映像符号化装置であって、
前記多視点映像のある符号化対象視点とは異なる参照視点で、符号化対象フレームと同時刻に撮影された既に符号化済みの参照視点フレームから、前記符号化対象視点における視点合成画像を生成する視点合成画像生成手段と、
前記視点合成画像の各符号化単位ブロックに対して、前記符号化対象視点における既に符号化済みの参照フレーム上の対応領域を探索することで動きベクトルを推定する動きベクトル推定手段と、
前記推定された動きベクトルと前記参照フレームとを用いて、前記符号化対象フレームに対する動き補償予測画像を生成する動き補償予測画像生成手段と、
前記符号化対象フレームと前記動き補償予測画像との差分信号を符号化する残差符号化手段と
を備える多視点映像符号化装置。 - 前記視点合成画像の各画素について、前記視点合成画像の確からしさを示す信頼度を設定する信頼度設定手段を更に含み、
前記動きベクトル推定手段は、前記信頼度に基づいて前記対応領域を探索する際の各画素のマッチングコストに重みを付ける、
請求項13に記載多視点映像符号化装置。 - 多視点映像のある視点に対する映像の符号化データを復号する多視点映像復号装置であって、
復号対象視点とは異なる参照視点で、復号対象フレームと同時刻に撮影された参照視点フレームから、前記復号対象視点における視点合成画像を生成する視点合成画像生成手段と、
前記視点合成画像の各復号単位ブロックに対して、前記復号対象視点における既に復号済みの参照フレーム上の対応領域を探索することで動きベクトルを推定する動きベクトル推定手段と、
前記推定された動きベクトルと前記参照フレームとを用いて、前記復号対象フレームに対する動き補償予測画像を生成する動き補償予測画像生成手段と、
前記動き補償予測画像を予測信号として用いて、予測符号化されている前記復号対象フレームを前記符号化データから復号する画像復号手段と
を備える多視点映像復号装置。 - 前記視点合成画像の各画素について、前記視点合成画像の確からしさを示す信頼度を設定する信頼度設定手段を更に含み、
前記動きベクトル推定手段は、前記信頼度に基づいて前記対応領域を探索する際の各画素のマッチングコストに重みを付ける、
請求項15に記載の多視点映像復号装置。 - 動きベクトル推定装置のコンピュータに、
多視点映像に含まれる処理画像を撮影した処理カメラとは別のカメラで撮影された参照カメラ映像から、前記処理カメラと同じ設定に従って、前記処理画像が撮影された時刻における視点合成画像を生成する視点合成画像生成機能、
前記処理画像を用いずに、前記処理画像上の処理領域に該当する前記視点合成画像上の画像信号を用いて、前記処理カメラで撮影された参照画像における対応領域を探索することで、動きベクトルを推定する対応領域推定機能
を実行させる動きベクトル推定プログラム。 - 多視点映像の予測符号化を行う多視点映像符号化装置のコンピュータに、
前記多視点映像のある符号化対象視点とは異なる参照視点で、符号化対象フレームと同時刻に撮影された既に符号化済みの参照視点フレームから、前記符号化対象視点における視点合成画像を生成する視点合成画像生成機能、
前記視点合成画像の各符号化単位ブロックに対して、前記符号化対象視点における既に符号化済みの参照フレーム上の対応領域を探索することで動きベクトルを推定する動きベクトル推定機能、
前記推定された動きベクトルと前記参照フレームとを用いて、前記符号化対象フレームに対する動き補償予測画像を生成する動き補償予測画像生成機能、
前記符号化対象フレームと前記動き補償予測画像との差分信号を符号化する残差符号化機能
を実行させる多視点映像符号化プログラム。 - 多視点映像のある視点に対する映像の符号化データを復号する多視点映像復号装置のコンピュータに、
復号対象視点とは異なる参照視点で、復号対象フレームと同時刻に撮影された参照視点フレームから、前記復号対象視点における視点合成画像を生成する視点合成画像生成機能、
前記視点合成画像の各復号単位ブロックに対して、前記復号対象視点における既に復号済みの参照フレーム上の対応領域を探索することで動きベクトルを推定する動きベクトル推定機能、
前記推定された動きベクトルと前記参照フレームとを用いて、前記復号対象フレームに対する動き補償予測画像を生成する動き補償予測画像生成機能、
前記動き補償予測画像を予測信号として用いて、予測符号化されている前記復号対象フレームを前記符号化データから復号する画像復号機能
を実行させる多視点映像復号プログラム。
Priority Applications (10)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
BR112012020856A BR112012020856A2 (pt) | 2010-02-23 | 2011-02-18 | ver anexo. |
CA2790406A CA2790406A1 (en) | 2010-02-23 | 2011-02-18 | Motion vector estimation method, multiview video encoding method, multiview video decoding method, motion vector estimation apparatus, multiview video encoding apparatus, multiview video decoding apparatus, motion vector estimation program, multiview video encoding program, and multiview video decoding program |
KR1020127021705A KR101451286B1 (ko) | 2010-02-23 | 2011-02-18 | 움직임 벡터 추정 방법, 다시점 영상 부호화 방법, 다시점 영상 복호 방법, 움직임 벡터 추정 장치, 다시점 영상 부호화 장치, 다시점 영상 복호 장치, 움직임 벡터 추정 프로그램, 다시점 영상 부호화 프로그램 및 다시점 영상 복호 프로그램 |
KR1020157010659A KR101623062B1 (ko) | 2010-02-23 | 2011-02-18 | 움직임 벡터 추정 방법, 다시점 영상 부호화 방법, 다시점 영상 복호 방법, 움직임 벡터 추정 장치, 다시점 영상 부호화 장치, 다시점 영상 복호 장치, 움직임 벡터 추정 프로그램, 다시점 영상 부호화 프로그램 및 다시점 영상 복호 프로그램 |
RU2012135491/08A RU2522309C2 (ru) | 2010-02-23 | 2011-02-18 | Способ оценки вектора движения, способ кодирования многовидового видеосигнала, способ декодирования многовидового видеосигнала, устройство оценки вектора движения, устройство кодирования многовидового видеосигнала, устройство декодирования многовидового видеосигнала. программа оценки вектора движения, программа кодирования многовидового видеосигнала и программа декодирования многовидового видеосигнала |
EP11747261.3A EP2541939A4 (en) | 2010-02-23 | 2011-02-18 | MOTION VECTOR ANALYSIS METHOD, METHOD FOR ENCRYPTION OF MORE VIEW IMAGES, METHOD OF DECODING OF MORE VIEW IMAGES, MOTION VECTOR MEASURING DEVICE, DEVICE FOR ENCRYPTION OF MORE VIEW IMAGES, DEVICE FOR DECODING OF MORE VIEW IMAGES, MOTION VECTOR ANALYSIS PROGRAM PROGRAM FOR ENCRYPTION OF MORE VIEW IMAGES AND PROGRAM FOR DECODING OF MORE VIEW IMAGES |
KR1020147015393A KR101545877B1 (ko) | 2010-02-23 | 2011-02-18 | 움직임 벡터 추정 방법, 다시점 영상 부호화 방법, 다시점 영상 복호 방법, 움직임 벡터 추정 장치, 다시점 영상 부호화 장치, 다시점 영상 복호 장치, 움직임 벡터 추정 프로그램, 다시점 영상 부호화 프로그램 및 다시점 영상 복호 프로그램 |
JP2012501760A JP5237500B2 (ja) | 2010-02-23 | 2011-02-18 | 動きベクトル推定方法、多視点映像符号化方法、多視点映像復号方法、動きベクトル推定装置、多視点映像符号化装置、多視点映像復号装置、動きベクトル推定プログラム、多視点映像符号化プログラム、及び多視点映像復号プログラム |
US13/580,128 US20120320986A1 (en) | 2010-02-23 | 2011-02-18 | Motion vector estimation method, multiview video encoding method, multiview video decoding method, motion vector estimation apparatus, multiview video encoding apparatus, multiview video decoding apparatus, motion vector estimation program, multiview video encoding program, and multiview video decoding program |
CN201180010256.5A CN103609119A (zh) | 2010-02-23 | 2011-02-18 | 运动向量推断方法、多视点视频编码方法、多视点视频解码方法、运动向量推断装置、多视点视频编码装置、多视点视频解码装置、运动向量推断程序、多视点视频编码程序及多视点视频解码程序 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2010-037434 | 2010-02-23 | ||
JP2010037434 | 2010-02-23 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2011105297A1 true WO2011105297A1 (ja) | 2011-09-01 |
Family
ID=44506709
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2011/053516 WO2011105297A1 (ja) | 2010-02-23 | 2011-02-18 | 動きベクトル推定方法、多視点映像符号化方法、多視点映像復号方法、動きベクトル推定装置、多視点映像符号化装置、多視点映像復号装置、動きベクトル推定プログラム、多視点映像符号化プログラム、及び多視点映像復号プログラム |
Country Status (10)
Country | Link |
---|---|
US (1) | US20120320986A1 (ja) |
EP (1) | EP2541939A4 (ja) |
JP (1) | JP5237500B2 (ja) |
KR (3) | KR101545877B1 (ja) |
CN (1) | CN103609119A (ja) |
BR (1) | BR112012020856A2 (ja) |
CA (1) | CA2790406A1 (ja) |
RU (1) | RU2522309C2 (ja) |
TW (1) | TWI461052B (ja) |
WO (1) | WO2011105297A1 (ja) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140029666A1 (en) * | 2012-07-30 | 2014-01-30 | Oki Electric Industry Co., Ltd. | Video image decoding apparatus and video image encoding system |
WO2014103967A1 (ja) * | 2012-12-27 | 2014-07-03 | 日本電信電話株式会社 | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、画像復号プログラム及び記録媒体 |
WO2014168121A1 (ja) * | 2013-04-11 | 2014-10-16 | 日本電信電話株式会社 | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、および画像復号プログラム |
CN104429079A (zh) * | 2012-07-09 | 2015-03-18 | 三菱电机株式会社 | 利用运动矢量预测列表处理用于视图合成的多视图视频的方法和系统 |
US9924197B2 (en) | 2012-12-27 | 2018-03-20 | Nippon Telegraph And Telephone Corporation | Image encoding method, image decoding method, image encoding apparatus, image decoding apparatus, image encoding program, and image decoding program |
JP2021533646A (ja) * | 2018-08-02 | 2021-12-02 | フェイスブック・テクノロジーズ・リミテッド・ライアビリティ・カンパニーFacebook Technologies, Llc | 深度情報を使用して2次元画像を外挿するためのシステムおよび方法 |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009032255A2 (en) * | 2007-09-04 | 2009-03-12 | The Regents Of The University Of California | Hierarchical motion vector processing method, software and devices |
RU2527737C2 (ru) * | 2010-02-24 | 2014-09-10 | Ниппон Телеграф Энд Телефон Корпорейшн | Способ кодирования многопроекционного видео, способ декодирования многопроекционного видео, устройство кодирования многопроекционного видео, устройство декодирования многопроекционного видео, и программа |
KR101289269B1 (ko) * | 2010-03-23 | 2013-07-24 | 한국전자통신연구원 | 영상 시스템에서 영상 디스플레이 장치 및 방법 |
EP2742688A1 (en) * | 2011-08-12 | 2014-06-18 | Telefonaktiebolaget LM Ericsson (PUBL) | Signaling of camera and/or depth parameters |
JP2013085157A (ja) * | 2011-10-12 | 2013-05-09 | Toshiba Corp | 画像処理装置、画像処理方法および画像処理システム |
TWI461066B (zh) * | 2011-11-03 | 2014-11-11 | Ind Tech Res Inst | 彈性調整估算搜尋範圍的移動估算方法及視差估算方法 |
TWI479897B (zh) * | 2011-12-27 | 2015-04-01 | Altek Corp | 具備三維去雜訊化功能之視訊編碼/解碼裝置及其控制方法 |
CN104065972B (zh) * | 2013-03-21 | 2018-09-28 | 乐金电子(中国)研究开发中心有限公司 | 一种深度图像编码方法、装置及编码器 |
KR20160002716A (ko) * | 2013-04-11 | 2016-01-08 | 엘지전자 주식회사 | 비디오 신호 처리 방법 및 장치 |
CN103546747B (zh) * | 2013-09-29 | 2016-11-23 | 北京航空航天大学 | 一种基于彩色视频编码模式的深度图序列分形编码方法 |
KR102250092B1 (ko) | 2013-10-14 | 2021-05-10 | 삼성전자주식회사 | 다시점 비디오 부호화 방법 및 장치, 다시점 비디오 복호화 방법 및 장치 |
CN106233716B (zh) * | 2014-04-22 | 2019-12-24 | 日本电信电话株式会社 | 动态错觉呈现装置、动态错觉呈现方法、程序 |
USD763550S1 (en) * | 2014-05-06 | 2016-08-16 | Ivivva Athletica Canada Inc. | Shirt |
US9939253B2 (en) * | 2014-05-22 | 2018-04-10 | Brain Corporation | Apparatus and methods for distance estimation using multiple image sensors |
TWI511530B (zh) | 2014-12-09 | 2015-12-01 | Univ Nat Kaohsiung 1St Univ Sc | Distributed video coding system and decoder for distributed video coding system |
KR102574479B1 (ko) * | 2017-09-13 | 2023-09-04 | 삼성전자주식회사 | 기본 움직임 벡터를 이용하여 움직임 벡터를 부호화하는 장치 및 방법, 및 복호화 장치 및 방법 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000232659A (ja) * | 1999-02-10 | 2000-08-22 | Meidensha Corp | 多視点における動画像の処理方法 |
WO2004114224A1 (ja) * | 2003-06-20 | 2004-12-29 | Nippon Telegraph And Telephone Corporation | 仮想視点画像生成方法及び3次元画像表示方法並びに装置 |
JP2008503973A (ja) * | 2004-06-25 | 2008-02-07 | エルジー エレクトロニクス インコーポレイティド | 多視点シーケンス符号化/復号化方法及びそのディスプレイ方法 |
JP2009213161A (ja) * | 2009-06-15 | 2009-09-17 | Nippon Telegr & Teleph Corp <Ntt> | 映像符号化方法、映像復号方法、映像符号化プログラム、映像復号プログラム及びそれらのプログラムを記録したコンピュータ読み取り可能な記録媒体 |
JP2010037434A (ja) | 2008-08-05 | 2010-02-18 | Sakura Color Prod Corp | 固形描画材 |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE69724672T2 (de) * | 1996-05-17 | 2004-07-15 | Matsushita Electric Industrial Co., Ltd., Kadoma | Videokodierungsverfahren zur Kodierung von Form- und Textursignalen unter Verwendung verschiedener Moden |
JP2002010265A (ja) * | 2000-06-20 | 2002-01-11 | Sony Corp | 送信装置およびその方法と受信装置およびその方法 |
US20020131500A1 (en) * | 2001-02-01 | 2002-09-19 | Gandhi Bhavan R. | Method for determining a motion vector for a video signal |
US6859494B2 (en) * | 2001-07-27 | 2005-02-22 | General Instrument Corporation | Methods and apparatus for sub-pixel motion estimation |
CN1565118A (zh) * | 2001-10-08 | 2005-01-12 | 皇家飞利浦电子股份有限公司 | 用于运动估计的装置和方法 |
EP1442428A2 (en) * | 2001-10-25 | 2004-08-04 | Koninklijke Philips Electronics N.V. | Method and apparatus for motion estimation |
JP4591657B2 (ja) * | 2003-12-22 | 2010-12-01 | キヤノン株式会社 | 動画像符号化装置及びその制御方法、プログラム |
TWI268715B (en) * | 2004-08-16 | 2006-12-11 | Nippon Telegraph & Telephone | Picture encoding method, picture decoding method, picture encoding apparatus, and picture decoding apparatus |
KR100944651B1 (ko) * | 2005-04-13 | 2010-03-04 | 가부시키가이샤 엔티티 도코모 | 동화상 부호화 장치, 동화상 복호 장치, 동화상 부호화 방법, 동화상 복호 방법, 동화상 부호화 프로그램을 기록한 기록 매체, 및 동화상 복호 프로그램을 기록한 기록 매체 |
US8385628B2 (en) * | 2006-09-20 | 2013-02-26 | Nippon Telegraph And Telephone Corporation | Image encoding and decoding method, apparatuses therefor, programs therefor, and storage media for storing the programs |
ES2439444T3 (es) * | 2006-10-30 | 2014-01-23 | Nippon Telegraph And Telephone Corporation | Método de codificación y método de descodificación de vídeo, aparatos para los mismos, programas para los mismos y medios de almacenamiento que almacenan los programas |
JP4999859B2 (ja) * | 2006-10-30 | 2012-08-15 | 日本電信電話株式会社 | 予測参照情報生成方法、動画像符号化及び復号方法、それらの装置、及びそれらのプログラム並びにプログラムを記録した記憶媒体 |
EP2061248A1 (en) * | 2007-11-13 | 2009-05-20 | IBBT vzw | Motion estimation and compensation process and device |
US9118428B2 (en) | 2009-11-04 | 2015-08-25 | At&T Intellectual Property I, L.P. | Geographic advertising using a scalable wireless geocast protocol |
-
2011
- 2011-02-18 KR KR1020147015393A patent/KR101545877B1/ko active IP Right Grant
- 2011-02-18 CA CA2790406A patent/CA2790406A1/en not_active Abandoned
- 2011-02-18 JP JP2012501760A patent/JP5237500B2/ja active Active
- 2011-02-18 RU RU2012135491/08A patent/RU2522309C2/ru active
- 2011-02-18 BR BR112012020856A patent/BR112012020856A2/pt not_active IP Right Cessation
- 2011-02-18 WO PCT/JP2011/053516 patent/WO2011105297A1/ja active Application Filing
- 2011-02-18 CN CN201180010256.5A patent/CN103609119A/zh active Pending
- 2011-02-18 KR KR1020127021705A patent/KR101451286B1/ko active IP Right Grant
- 2011-02-18 US US13/580,128 patent/US20120320986A1/en not_active Abandoned
- 2011-02-18 TW TW100105371A patent/TWI461052B/zh active
- 2011-02-18 KR KR1020157010659A patent/KR101623062B1/ko active IP Right Grant
- 2011-02-18 EP EP11747261.3A patent/EP2541939A4/en not_active Withdrawn
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000232659A (ja) * | 1999-02-10 | 2000-08-22 | Meidensha Corp | 多視点における動画像の処理方法 |
WO2004114224A1 (ja) * | 2003-06-20 | 2004-12-29 | Nippon Telegraph And Telephone Corporation | 仮想視点画像生成方法及び3次元画像表示方法並びに装置 |
JP2008503973A (ja) * | 2004-06-25 | 2008-02-07 | エルジー エレクトロニクス インコーポレイティド | 多視点シーケンス符号化/復号化方法及びそのディスプレイ方法 |
JP2010037434A (ja) | 2008-08-05 | 2010-02-18 | Sakura Color Prod Corp | 固形描画材 |
JP2009213161A (ja) * | 2009-06-15 | 2009-09-17 | Nippon Telegr & Teleph Corp <Ntt> | 映像符号化方法、映像復号方法、映像符号化プログラム、映像復号プログラム及びそれらのプログラムを記録したコンピュータ読み取り可能な記録媒体 |
Non-Patent Citations (10)
Title |
---|
"Advanced video coding for generic audiovisual services", REC. ITU-T H.264, March 2009 (2009-03-01) |
J. ASCENSO; C. BRITES; F. PEREIRA: "Improving frame interpolation with spatial motion smoothing for pixel domain distributed video coding", 5TH EURASIP CONFERENCE ON SPEECH AND PICTURE PROCESSING, MULTIMEDIA COMMUNICATIONS AND SERVICES, July 2005 (2005-07-01) |
J. SUN; N. ZHENG; H. SHUM: "Stereo Matching Using Belief Propagation", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, vol. 25, no. 7, July 2003 (2003-07-01), pages 787 - 800 |
K. YAMAMOTO; M. KITAHARA; H. KIMATA; T. YENDO; T. FUJII; M. TANIMOTO; S. SHIMIZU; K. KAMIKURA; Y. YASHIMA: "Multiview Video Coding Using View Interpolation and Color Correction", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM FOR VIDEO TECHNOLOGY, vol. 17, no. 11, November 2007 (2007-11-01), pages 1436 - 1449 |
S. KAMP; M. EVERTZ; M. WIEN: "Decoder side motion vector derivation for inter frame video coding", ICIP 2008, October 2008 (2008-10-01), pages 1120 - 1123 |
S. SHIMIZU; M. KITAHARA; H. KIMATA; K. KAMIKURA; Y. YASHIMA: "View Scalable Multiview Video Coding Using 3-D Warping with Depth Map", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEM FOR VIDEO TECHNOLOGY, vol. 17, no. 11, November 2007 (2007-11-01), pages 1485 - 1495 |
S. SHIMIZU; Y. TONOMURA; H. KIMATA; Y. OHTANI: "Improved View Interpolation Prediction for Side Information in Multiview Distributed Video Coding", PROCEEDINGS OF ICDSC2009, August 2009 (2009-08-01) |
S. YEA; A. VETRO: "View Synthesis Prediction for Rate-Overhead Reduction in FTV", PROCEEDINGS OF 3DTV-CON2008, May 2008 (2008-05-01), pages 145 - 148 |
See also references of EP2541939A4 * |
Y. MORI; N. FUKUSHIMA; T. FUJII; M. TANIMOTO: "View Generation with 3D Warping Using Depth Information for FTV", PROCEEDINGS OF 3DTV-CON2008, May 2008 (2008-05-01), pages 229 - 232 |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104429079A (zh) * | 2012-07-09 | 2015-03-18 | 三菱电机株式会社 | 利用运动矢量预测列表处理用于视图合成的多视图视频的方法和系统 |
CN104429079B (zh) * | 2012-07-09 | 2016-08-24 | 三菱电机株式会社 | 利用运动矢量预测列表处理用于视图合成的多视图视频的方法和系统 |
US20140029666A1 (en) * | 2012-07-30 | 2014-01-30 | Oki Electric Industry Co., Ltd. | Video image decoding apparatus and video image encoding system |
JP2014027600A (ja) * | 2012-07-30 | 2014-02-06 | Oki Electric Ind Co Ltd | 動画像復号装置及びプログラム、並びに、動画像符号化システム |
US9729871B2 (en) | 2012-07-30 | 2017-08-08 | Oki Electric Industry Co., Ltd. | Video image decoding apparatus and video image encoding system |
WO2014103967A1 (ja) * | 2012-12-27 | 2014-07-03 | 日本電信電話株式会社 | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、画像復号プログラム及び記録媒体 |
JPWO2014103967A1 (ja) * | 2012-12-27 | 2017-01-12 | 日本電信電話株式会社 | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム及び画像復号プログラム |
US9924197B2 (en) | 2012-12-27 | 2018-03-20 | Nippon Telegraph And Telephone Corporation | Image encoding method, image decoding method, image encoding apparatus, image decoding apparatus, image encoding program, and image decoding program |
WO2014168121A1 (ja) * | 2013-04-11 | 2014-10-16 | 日本電信電話株式会社 | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、および画像復号プログラム |
CN105075257A (zh) * | 2013-04-11 | 2015-11-18 | 日本电信电话株式会社 | 图像编码方法、图像解码方法、图像编码装置、图像解码装置、图像编码程序、以及图像解码程序 |
JP5926451B2 (ja) * | 2013-04-11 | 2016-05-25 | 日本電信電話株式会社 | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、および画像復号プログラム |
JP2021533646A (ja) * | 2018-08-02 | 2021-12-02 | フェイスブック・テクノロジーズ・リミテッド・ライアビリティ・カンパニーFacebook Technologies, Llc | 深度情報を使用して2次元画像を外挿するためのシステムおよび方法 |
Also Published As
Publication number | Publication date |
---|---|
KR20120118043A (ko) | 2012-10-25 |
TWI461052B (zh) | 2014-11-11 |
BR112012020856A2 (pt) | 2019-09-24 |
RU2012135491A (ru) | 2014-03-27 |
TW201143370A (en) | 2011-12-01 |
KR101451286B1 (ko) | 2014-10-17 |
EP2541939A4 (en) | 2014-05-21 |
JPWO2011105297A1 (ja) | 2013-06-20 |
EP2541939A1 (en) | 2013-01-02 |
US20120320986A1 (en) | 2012-12-20 |
KR20150052878A (ko) | 2015-05-14 |
JP5237500B2 (ja) | 2013-07-17 |
KR101623062B1 (ko) | 2016-05-20 |
RU2522309C2 (ru) | 2014-07-10 |
KR20140089590A (ko) | 2014-07-15 |
KR101545877B1 (ko) | 2015-08-20 |
CA2790406A1 (en) | 2011-09-01 |
CN103609119A (zh) | 2014-02-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5237500B2 (ja) | 動きベクトル推定方法、多視点映像符号化方法、多視点映像復号方法、動きベクトル推定装置、多視点映像符号化装置、多視点映像復号装置、動きベクトル推定プログラム、多視点映像符号化プログラム、及び多視点映像復号プログラム | |
JP5303754B2 (ja) | 多視点映像符号化方法、多視点映像復号方法、多視点映像符号化装置、多視点映像復号装置、及びプログラム | |
JP5934375B2 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、画像復号プログラム及び記録媒体 | |
JP4999854B2 (ja) | 画像符号化方法及び復号方法、それらの装置、及びそれらのプログラム並びにプログラムを記録した記憶媒体 | |
JP5277257B2 (ja) | 動画像復号化方法および動画像符号化方法 | |
JP5883153B2 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、画像復号プログラム及び記録媒体 | |
JP6027143B2 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、および画像復号プログラム | |
JP6053200B2 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム及び画像復号プログラム | |
JP5281632B2 (ja) | 多視点画像符号化方法,多視点画像復号方法,多視点画像符号化装置,多視点画像復号装置およびそれらのプログラム | |
JP4944046B2 (ja) | 映像符号化方法,復号方法,符号化装置,復号装置,それらのプログラムおよびコンピュータ読み取り可能な記録媒体 | |
JP4874578B2 (ja) | 画像符号化装置 | |
JP5706291B2 (ja) | 映像符号化方法,映像復号方法,映像符号化装置,映像復号装置およびそれらのプログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11747261 Country of ref document: EP Kind code of ref document: A1 |
|
DPE1 | Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2012501760 Country of ref document: JP |
|
ENP | Entry into the national phase |
Ref document number: 2790406 Country of ref document: CA |
|
ENP | Entry into the national phase |
Ref document number: 20127021705 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13580128 Country of ref document: US Ref document number: 7218/CHENP/2012 Country of ref document: IN Ref document number: 2011747261 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2012135491 Country of ref document: RU |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112012020856 Country of ref document: BR |
|
ENP | Entry into the national phase |
Ref document number: 112012020856 Country of ref document: BR Kind code of ref document: A2 Effective date: 20120820 |