WO2015098948A1 - 映像符号化方法、映像復号方法、映像符号化装置、映像復号装置、映像符号化プログラム及び映像復号プログラム - Google Patents
映像符号化方法、映像復号方法、映像符号化装置、映像復号装置、映像符号化プログラム及び映像復号プログラム Download PDFInfo
- Publication number
- WO2015098948A1 WO2015098948A1 PCT/JP2014/084118 JP2014084118W WO2015098948A1 WO 2015098948 A1 WO2015098948 A1 WO 2015098948A1 JP 2014084118 W JP2014084118 W JP 2014084118W WO 2015098948 A1 WO2015098948 A1 WO 2015098948A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- depth
- image
- viewpoint
- video
- encoding
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/204—Image signal generators using stereoscopic image cameras
- H04N13/239—Image signal generators using stereoscopic image cameras using two 2D image sensors having a relative position equal to or related to the interocular distance
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/271—Image signal generators wherein the generated image signals comprise depth maps or disparity maps
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/12—Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
- H04N19/122—Selection of transform size, e.g. 8x8 or 2x4x8 DCT; Selection of sub-band transforms of varying structure or type
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/167—Position within a video image, e.g. region of interest [ROI]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/537—Motion estimation other than block-based
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/547—Motion estimation performed in a transform domain
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/60—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
- H04N19/61—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N2013/0074—Stereoscopic image analysis
- H04N2013/0081—Depth or disparity estimation from stereoscopic image signals
Definitions
- the present invention relates to a video encoding method, a video decoding method, a video encoding device, a video decoding device, a video encoding program, and a video decoding program.
- This application claims priority based on Japanese Patent Application No. 2013-273523 for which it applied to Japan on December 27, 2013, and uses the content here.
- the free viewpoint video is a video in which the user can freely specify the position and orientation of the camera in the shooting space (hereinafter referred to as “viewpoint”).
- viewpoint the position and orientation of the camera in the shooting space
- the free viewpoint video is composed of a group of information necessary to generate videos from several specifiable viewpoints.
- the free viewpoint video may be referred to as a free viewpoint television, an arbitrary viewpoint video, an arbitrary viewpoint television, or the like.
- a free viewpoint video is expressed using various data formats.
- As a most general format there is a method using a video and a depth map (distance image) corresponding to a frame of the video (for example, non-patent). Reference 1).
- the depth map is a representation of the depth (distance) from the camera to the subject for each pixel.
- the depth map represents the three-dimensional position of the subject.
- Depth is proportional to the reciprocal of the parallax between two cameras (camera pair) when certain conditions are met. For this reason, the depth is sometimes referred to as a disparity map (parallax image).
- the depth is information stored in the Z buffer, and is sometimes called a Z image or a Z map.
- the coordinate value (Z value) of the Z axis of the three-dimensional coordinate system stretched on the expression target space may be used as the depth.
- the Z-axis coincides with the camera direction.
- the Z-axis may not match the camera orientation.
- the distance and the Z value are not distinguished and referred to as “depth”.
- An image representing depth as a pixel value is referred to as a “depth map”.
- the depth When expressing the depth as a pixel value, the value obtained by quantizing the depth when the value corresponding to the physical quantity is directly used as the pixel value and when the interval between the minimum value and the maximum value is quantized into a predetermined number of intervals. And a method using a value obtained by quantizing the difference from the minimum depth value with a predetermined step width.
- the depth can be expressed with higher accuracy by using additional information such as a minimum value.
- methods for quantizing a physical quantity at equal intervals include a method for quantizing the physical quantity as it is and a method for quantizing the reciprocal of the physical quantity. Since the reciprocal of the distance is a value proportional to the parallax, the former is used when the distance needs to be expressed with high accuracy, and the latter is used when the parallax needs to be expressed with high accuracy. Often.
- an image in which the depth is expressed is referred to as a “depth map” regardless of the pixel value conversion method or the quantization method. Since the depth map is expressed as an image having one value for each pixel, it can be regarded as a grayscale image. The subject exists continuously in the real space and cannot move instantaneously to a distant position. For this reason, it can be said that the depth map has a spatial correlation and a temporal correlation similarly to the video signal.
- a video composed of a depth map or a continuous depth map is selected according to an image encoding method used for encoding an image signal or a video encoding method used for encoding a video signal. It is possible to efficiently encode while removing spatial redundancy and temporal redundancy.
- the depth map and the video composed of continuous depth maps are not distinguished and referred to as “depth map”.
- each frame of video is divided into processing unit blocks called macroblocks in order to realize efficient encoding using the feature that the subject is spatially and temporally continuous.
- the video signal is predicted spatially and temporally, and prediction information indicating a prediction method and a prediction residual are encoded.
- information indicating the direction of spatial prediction is prediction information.
- information indicating a frame to be referred to and information indicating a position in the frame are prediction information. Since the prediction performed spatially is an intra-frame prediction, it is called intra-frame prediction, intra-screen prediction, or intra prediction.
- temporal prediction is also referred to as motion compensated prediction because video signals are predicted by compensating temporal changes of video, that is, motion.
- video signal prediction is performed by compensating for changes between video viewpoints, that is, parallax. Disparity compensation prediction is used.
- Non-Patent Document 2 obtains a disparity vector from a depth map for a region to be processed, and uses the disparity vector to determine a corresponding region on a video of another viewpoint that has already been encoded. A method for realizing efficient encoding by using the video signal in the corresponding region as a predicted value of the video signal in the processing target region is described.
- Non-Patent Document 2 a highly accurate disparity vector is obtained by converting the value of the depth map.
- the method described in Non-Patent Document 2 can achieve highly efficient predictive coding.
- the parallax is proportional to the inverse of the depth when the depth is converted into the parallax vector. More specifically, the parallax is determined by the product of the reciprocal of the depth, the focal length of the camera, and the distance between the viewpoints.
- Such a conversion gives correct results if the two viewpoints have the same focal length and the viewpoint orientation (camera optical axis) is three-dimensionally parallel, but in other situations it is incorrect. Will give.
- Non-Patent Document 1 In order to perform accurate conversion, as described in Non-Patent Document 1, after obtaining a three-dimensional point by back projecting a point on an image to a three-dimensional space according to depth, the three-dimensional point is converted into a three-dimensional point. It is necessary to calculate points on an image from another viewpoint by reprojecting to another viewpoint. However, such conversion requires a complicated calculation, and there is a problem that the calculation amount increases. That is, there is a problem that the video encoding efficiency is low.
- the present invention provides accuracy of a disparity vector calculated from a depth map even when the viewpoint directions are not parallel in encoding of free viewpoint video data having video and depth maps for a plurality of viewpoints as components. It is an object of the present invention to provide a video encoding method, a video decoding method, a video encoding device, a video decoding device, a video encoding program, and a video decoding program that can improve the efficiency of video encoding. Yes.
- One aspect of the present invention is an image for a reference viewpoint that is different from the viewpoint of the encoding target image when encoding an encoding target image that is one frame of a multi-view video including a plurality of different viewpoint videos.
- a coding device a representative depth setting unit that sets a representative depth from the depth map, and a conversion that converts a position on the encoding target image into a position on the reference viewpoint image based on the representative depth
- a transform matrix setting unit for setting a matrix
- a representative position setting unit for setting a representative position from a position in the encoding target region
- the encoding target using the representative position and the conversion matrix.
- a disparity information setting unit that sets disparity information of the viewpoint to be encoded and the reference viewpoint for a region, and a predicted image generation unit that generates a predicted image for the region to be encoded using the disparity information.
- one aspect of the present invention further includes a depth region setting unit that sets a depth region that is a corresponding region on the depth map with respect to the encoding target region, and the representative depth setting unit includes: The representative depth is set from the depth map for the depth region.
- one aspect of the present invention further includes a depth reference disparity vector setting unit configured to set a depth reference disparity vector that is a disparity vector with respect to the depth map for the encoding target region, and the depth region setting unit Sets the area indicated by the depth reference disparity vector as the depth area.
- a depth reference disparity vector setting unit configured to set a depth reference disparity vector that is a disparity vector with respect to the depth map for the encoding target region, and the depth region setting unit Sets the area indicated by the depth reference disparity vector as the depth area.
- the depth reference disparity vector setting unit sets the depth reference disparity vector using a disparity vector used when encoding an area adjacent to the encoding target area. .
- the representative depth setting unit is closest to the viewpoint of the encoding target image among the depths in the depth region corresponding to pixels at four vertices of the encoding target region. Is set as the representative depth.
- One aspect of the present invention provides a reference viewpoint image that is an image for a reference viewpoint different from the viewpoint of the decoding target image when decoding the decoding target image from multi-view video code data including a plurality of different viewpoint videos.
- a video decoding device that performs decoding while predicting between different viewpoints for each decoding target region that is a region obtained by dividing the decoding target image using a depth map for a subject in the multi-view video, A representative depth setting unit that sets a representative depth from the depth map, and a transformation matrix setting unit that sets a transformation matrix that converts a position on the decoding target image to a position on the reference viewpoint image based on the representative depth.
- a representative position setting unit that sets a representative position from a position in the decoding target area, and the decoding for the decoding target area using the representative position and the transformation matrix.
- one aspect of the present invention further includes a depth area setting unit that sets a depth area that is a corresponding area on the depth map, with respect to the decoding target area, and the representative depth setting section includes: The representative depth is set from the depth map for the depth region.
- one aspect of the present invention further includes a depth reference disparity vector setting unit configured to set a depth reference disparity vector that is a disparity vector with respect to the depth map for the decoding target region, and the depth region setting unit includes: The area indicated by the depth reference disparity vector is set as the depth area.
- the depth reference disparity vector setting unit sets the depth reference disparity vector using a disparity vector used when decoding an area adjacent to the decoding target area.
- the representative depth setting unit is closest to the viewpoint of the decoding target image among the depths in the depth region corresponding to the four vertex pixels of the decoding target region.
- the depth shown is set as the representative depth.
- One aspect of the present invention is an image for a reference viewpoint that is different from the viewpoint of the encoding target image when encoding an encoding target image that is one frame of a multi-view video including a plurality of different viewpoint videos.
- a coding method for setting a representative depth from the depth map, and a conversion for converting a position on the encoding target image into a position on the reference viewpoint image based on the representative depth
- a transformation matrix setting step for setting a matrix
- a representative position setting step for setting a representative position from a position in the encoding target region
- a disparity information setting step for setting disparity information of the viewpoint to be encoded and the reference viewpoint with respect to the encoding target region, and a prediction image for generating a prediction image for the encoding target region using the disparity information Generating step.
- One aspect of the present invention provides a reference viewpoint image that is an image for a reference viewpoint different from the viewpoint of the decoding target image when decoding the decoding target image from multi-view video code data including a plurality of different viewpoint videos.
- a video decoding method that performs decoding while predicting between different viewpoints for each decoding target area, which is an area obtained by dividing the decoding target image, using a depth map for a subject in the multi-view video, A representative depth setting step for setting a representative depth from the depth map, and a transformation matrix setting step for setting a transformation matrix for converting a position on the decoding target image to a position on the reference viewpoint image based on the representative depth.
- One aspect of the present invention is a video encoding program for causing a computer to execute a video encoding method.
- One aspect of the present invention is a video decoding program for causing a computer to execute a video decoding method.
- the accuracy of the disparity vector calculated from the depth map can be improved even when the viewpoint directions are not parallel.
- the efficiency of video encoding can be improved.
- FIG. 3 is a block diagram illustrating an example of a hardware configuration when a video decoding device is configured by a computer and a software program in an embodiment of the present invention.
- this information is an external parameter that represents the positional relationship between the camera A and the camera B, or an internal parameter that represents the projection information on the image plane by the camera, as long as it has the same meaning as these.
- the necessary information may be given in another format.
- these camera parameters see, for example, the document “Olivier Faugeras,“ Three-Dimensional Computer Vision ”, pp. 33-66, MIT Press; BCTC / UFF-006.37 F259 1993, ISBN: 0-262-06158-9 .”It is described in. This document describes a parameter indicating a positional relationship between a plurality of cameras and a parameter indicating projection information on the image plane by the camera.
- information that can specify a position (such as a coordinate value or an index that can be associated with a coordinate value) is added to an image, a video frame (image frame), or a depth map to thereby determine the position.
- the information to which the identifiable information is added indicates the video signal sampled at the pixel at the position and the depth based thereon.
- the coordinate value at a position shifted by the vector is represented by the value obtained by adding the index value that can be associated with the coordinate value and the vector.
- a block obtained by shifting the block by the vector is represented by an index value that can be associated with the block and a value obtained by adding the vector.
- FIG. 1 is a block diagram showing a configuration of a video encoding device in an embodiment of the present invention.
- the video encoding apparatus 100 includes an encoding target image input unit 101, an encoding target image memory 102, a reference viewpoint image input unit 103, a reference viewpoint image memory 104, a depth map input unit 105, and a disparity vector generation unit.
- 106 representsative depth setting unit, transformation matrix setting unit, representative position setting unit, disparity information setting unit, depth region setting unit, depth reference disparity vector setting unit
- image encoding unit 107 predicted image generation unit.
- the encoding target image input unit 101 inputs a video to be encoded into the encoding target image memory 102 for each frame.
- the video to be encoded is referred to as “encoding target image group”.
- a frame that is input and encoded is referred to as an “encoding target image”.
- the encoding target image input unit 101 inputs an encoding target image from the encoding target image group captured by the camera B for each frame.
- the viewpoint (camera B) that captured the encoding target image is referred to as an “encoding target viewpoint”.
- the encoding target image memory 102 stores the input encoding target image.
- the reference viewpoint image input unit 103 inputs video captured from a viewpoint (camera A) different from the encoding target image to the reference viewpoint image memory 104.
- a video shot from a viewpoint (camera A) different from the encoding target image is an image referred to when encoding the encoding target image.
- the viewpoint of an image that is referred to when an encoding target image is encoded is referred to as a “reference viewpoint”.
- An image from the reference viewpoint is referred to as a “reference viewpoint image”.
- the reference viewpoint image memory 104 stores the input reference viewpoint image.
- the depth map input unit 105 inputs, to the disparity vector generation unit 106, a depth map that is referred to when obtaining a disparity vector (information indicating disparity) based on a correspondence relationship between pixels between viewpoints.
- a depth map corresponding to an encoding target image is input, but a depth map at another viewpoint (such as a reference viewpoint) may be used.
- the depth map represents the three-dimensional position of the subject in the encoding target image for each pixel.
- the depth map can be expressed using, for example, a distance from the camera to the subject, a coordinate value of an axis that is not parallel to the image plane, or a parallax amount with respect to another camera (for example, camera A).
- the depth map is passed in the form of an image, but the depth map may not be passed in the form of an image as long as similar information can be obtained.
- the disparity vector generation unit 106 generates a disparity vector between a region included in the encoding target image and a region included in the reference viewpoint image associated with the encoding target image from the depth map.
- the image encoding unit 107 predictively encodes the encoding target image based on the generated disparity vector and the reference viewpoint image.
- FIG. 2 is a flowchart showing the operation of the video encoding device 100 according to an embodiment of the present invention.
- the encoding target image input unit 101 inputs the encoding target image Org to the encoding target image memory 102.
- the encoding target image memory 102 stores the encoding target image Org.
- the reference viewpoint image input unit 103 inputs the reference viewpoint image Ref to the reference viewpoint image memory 104.
- the reference viewpoint image memory 104 stores the reference viewpoint image Ref (step S101).
- the reference viewpoint image input here is the same as the reference viewpoint image obtained on the decoding side, such as a decoded reference viewpoint image that has already been encoded. This is to suppress the occurrence of coding noise such as drift by using exactly the same information as the reference viewpoint image obtained on the decoding side.
- a reference viewpoint image that can be obtained only on the encoding side such as a reference viewpoint image before encoding, may be input.
- the encoding target image is divided into regions of a predetermined size, and the video signal of the encoding target image is encoded for each of the divided regions.
- an area obtained by dividing the encoding target image is referred to as an “encoding target area”.
- it is divided into processing unit blocks called macroblocks of 16 pixels ⁇ 16 pixels, but may be divided into blocks of other sizes as long as they are the same as those on the decoding side. Further, the entire encoding target image may not be divided into the same size but may be divided into blocks having different sizes for each region (steps S102 to S107).
- the encoding target area index is represented as “blk”.
- the total number of encoding target areas in one frame of the encoding target image is represented as “numBlks”.
- blk is initialized with 0 (step S102).
- a depth map corresponding to the encoding target area blk (a depth area that is a corresponding area on the depth map) is set (step S103).
- This depth map is input by the depth map input unit 105. It is assumed that the input depth map is the same as the depth map obtained on the decoding side, such as a depth map that has already been encoded. This is to suppress the occurrence of coding noise such as drift by using the same depth map as that obtained on the decoding side. However, when such generation of encoding noise is allowed, a depth map that can be obtained only on the encoding side, such as a depth map before encoding, may be input.
- the depth map estimated by applying stereo matching or the like to the multi-view video decoded for a plurality of cameras, or the decoded disparity A depth map estimated using a vector, a motion vector, or the like can also be used as the same depth map is obtained on the decoding side.
- the depth map corresponding to the encoding target area is input for each encoding target area.
- the depth map used for the entire encoding target image is input and accumulated in advance.
- the depth map in the encoding target region blk may be set by referring to the accumulated depth map for each encoding target region.
- the depth map of the encoding target area blk may be set in any way.
- a depth map at the same position as the encoding target region blk in the encoding target image may be set, or may be set in advance or specified separately.
- a depth map at a position shifted by the vector may be set.
- a scaled area may be set according to the resolution ratio, or the scaled area according to the resolution ratio may be set as the resolution.
- a depth map generated by up-sampling according to the ratio may be set.
- a depth map at the same position as the encoding target area of the depth map corresponding to an image encoded in the past with respect to the encoding target viewpoint may be set.
- the estimated parallax PDV depth reference parallax
- the encoding target viewpoint and the depth viewpoint in the encoding target region blk is used.
- Vector and a depth map in “blk + PDV” is set. If the resolution of the encoding target image and the depth map are different, the position and size may be scaled according to the resolution ratio.
- the estimated parallax PDV between the encoding target viewpoint and the depth viewpoint in the encoding target region blk may be obtained using any method as long as it is the same method as that on the decoding side.
- the disparity vector used when encoding the peripheral region of the encoding target region blk, the global disparity vector set for the entire encoding target image or the partial image including the encoding target region, or the code It is possible to use a parallax vector or the like that is separately set and encoded for each area to be converted.
- the disparity vectors used in different encoding target regions or encoding target images encoded in the past may be stored, and the stored disparity vectors may be used.
- the disparity vector generation unit 106 generates a disparity vector of the encoding target region blk using the set depth map (step S104). Details of this processing will be described later.
- the image encoding unit 107 performs prediction using the disparity vector of the encoding target region blk and the reference viewpoint image stored in the reference viewpoint image memory 104, and performs video of the encoding target image in the encoding target region blk.
- the signal (pixel value) is encoded (step S105).
- the bit stream obtained as a result of encoding is the output of the video encoding apparatus 100.
- the image encoding unit 107 is MPEG-2 or H.264.
- general coding such as H.264 / AVC is used
- frequency conversion such as discrete cosine transform (DCT: Discrete Cosine Transform), quantum, etc. is applied to the difference signal between the video signal of the coding target region blk and the predicted image.
- Encoding is performed by sequentially performing binarization, binarization, and entropy encoding.
- the image encoding unit 107 adds 1 to blk (step S106).
- the image encoding unit 107 determines whether blk is less than numBlks (step S107). If blk is less than numBlks (step S107: Yes), the image encoding unit 107 returns the process to step S103. On the other hand, if blk is not less than numBlks (step S107: No), the image encoding unit 107 ends the process.
- FIG. 3 is a flowchart showing a process (step S104) in which the disparity vector generation unit 106 generates a disparity vector according to an embodiment of the present invention.
- the representative pixel position pos and the representative depth rep are set from the depth map of the encoding target region blk (step S1403).
- the representative pixel position pos and the representative depth rep may be set using any method, but it is necessary to use the same method as that on the decoding side.
- a method of setting a predetermined position such as the center or upper left in the encoding target region as the representative pixel position, or the same as the representative depth after obtaining the representative depth.
- a method of setting the position of a pixel in an encoding target area having a depth as a representative pixel position there is a method of setting the position of a pixel having a depth that satisfies a predetermined condition by comparing depths based on pixels at a predetermined position.
- pixels located in the center of the encoding target area pixels positioned at the four vertices determined in the encoding target area, or pixels positioned at the four vertices determined in the encoding target area And a pixel located at the center, and a pixel that gives a maximum depth, a minimum depth, a median depth, or the like is selected.
- the average value, median value, maximum value, or minimum value of the depth map of the encoding target region blk (depending on the definition of the depth, it is closest to the viewpoint of the encoding target image).
- a depth indicating the depth or the depth farthest from the viewpoint of the encoding target image There is a method using a depth indicating the depth or the depth farthest from the viewpoint of the encoding target image.
- an average value, a median value, a maximum value, or a minimum value of depth values based on some pixels may be used instead of all the pixels in the encoding target region.
- pixels located at the four vertices defined in the encoding target region, or pixels located at the four vertices and pixels located at the center may be used.
- a depth value based on a predetermined position such as the upper left or the center with respect to the encoding target region.
- a transformation matrix H rep is obtained (step S1404).
- the transformation matrix is called a homography matrix, and gives a correspondence relationship between points on the image plane between viewpoints when it is assumed that a subject exists on a plane represented by a representative depth.
- the transformation matrix H rep may be obtained in any way. For example, it can be obtained using equation (1).
- R indicates a 3 ⁇ 3 rotation matrix between the encoding target viewpoint and the reference viewpoint.
- t indicates a translation vector between the encoding target viewpoint and the reference viewpoint.
- D rep indicates the representative depth.
- n (D rep ) represents a normal vector of a three-dimensional plane corresponding to the representative depth D rep at the encoding target viewpoint.
- d (D rep ) indicates the distance between the three-dimensional plane and the viewpoint center between the encoding target viewpoint and the reference viewpoint.
- the right shoulder T indicates vector transposition.
- the reference viewpoint is based on the equation (2).
- the corresponding point q i on the image is obtained.
- P t and P r indicate 3 ⁇ 4 camera matrices at the encoding target viewpoint and the reference viewpoint, respectively.
- the camera matrix here is A for the camera internal parameters, R for the rotation matrix from the world coordinate system (any common coordinate system independent of the camera) to the camera coordinate system, and translation from the world coordinate system to the camera coordinate system.
- a column vector representing T is given by A [R
- the inverse matrix P ⁇ 1 of the camera matrix P is a matrix corresponding to the inverse transformation of the transformation by the camera matrix P, and is represented by R ⁇ 1 [A ⁇ 1
- d t (p i ) indicates the distance on the optical axis from the encoding target viewpoint to the subject at the point p i when the depth at the point p i on the encoding target image is the representative depth.
- s is an arbitrary real number, but when there is no error in the camera parameter, s is a distance d r (q i ) on the optical axis from the reference viewpoint at the point q i on the reference viewpoint image to the subject at the point q i . Is equal to Moreover, when Formula (2) is calculated according to the said definition, it will become the following Formula (3).
- the subscripts of the internal parameter A, the rotation matrix R, and the translation vector t represent the camera, and t and r represent the encoding target viewpoint and the reference viewpoint, respectively.
- the transformation matrix H rep is obtained by solving the homogeneous equation obtained according to the equation (4).
- the (3, 3) component of the transformation matrix H rep is an arbitrary real number (for example, 1).
- the transformation matrix H rep depends on the reference viewpoint and the depth, it may be obtained every time the representative depth is obtained. In addition, the transformation matrix H rep is obtained for each combination of the reference viewpoint and the representative depth before starting the processing for each encoding target region, and here, the reference viewpoint from the transformation matrix group that has already been calculated. One transformation matrix may be selected and set based on the representative depth.
- cpos indicates a position on the reference viewpoint.
- Cpos-pos indicates a desired disparity vector.
- the position obtained by adding the disparity vector to the position of the encoding target viewpoint indicates a corresponding position on the reference viewpoint corresponding to the position of the encoding target viewpoint.
- the corresponding position is represented by subtracting the disparity vector from the position of the encoding target viewpoint, the disparity vector is “pos-cpos”.
- the disparity vector is generated for the entire encoding target region blk.
- the encoding target region blk may be divided into a plurality of sub regions, and the disparity vector may be generated for each sub region. .
- FIG. 4 is a flowchart illustrating processing for generating a disparity vector by dividing an encoding target region into sub-regions in an embodiment of the present invention.
- the disparity vector generation unit 106 divides the encoding target region blk (step S1401).
- numSBlks indicates the number of sub-regions in the encoding target region blk.
- the disparity vector generation unit 106 initializes the sub-region index “sblk” with 0 (step S1402).
- the disparity vector generation unit 106 sets a representative pixel position and a representative depth value (step S1403).
- the disparity vector generation unit 106 obtains a transformation matrix from the representative depth value (step S1404).
- the disparity vector generation unit 106 obtains a disparity vector for the reference viewpoint. That is, the disparity vector generation unit 106 obtains a disparity vector from the depth map of the sub-region sblk (step S1405).
- the disparity vector generation unit 106 adds 1 to sblk (step S1406).
- the disparity vector generation unit 106 determines whether sblk is less than numSBlks (step S1407). When sblk is less than numSBlks (step S1407: Yes), the disparity vector generation unit 106 returns the process to step S1403. That is, the disparity vector generation unit 106 repeats “steps S1403 to S1407” for obtaining a disparity vector from the depth map for each sub-region obtained by the division. On the other hand, when sblk is not less than numSBlks (step S1407: No), the disparity vector generation unit 106 ends the process.
- the encoding target region blk may be divided by any method as long as it is the same method as that on the decoding side. For example, it may be divided into predetermined sizes (4 pixels ⁇ 4 pixels or 8 pixels ⁇ 8 pixels), or may be divided by analyzing a depth map of the encoding target region blk. Absent. For example, the division may be performed by clustering based on the value of the depth map. For example, you may divide
- the analysis may be performed only on a specific set of pixels such as a plurality of predetermined points and the center. Furthermore, it may be divided into the same number of sub-regions for each encoding target region, or may be divided into a different number of sub-regions for each encoding target region.
- FIG. 5 is a block diagram showing the configuration of the video decoding apparatus 200 in an embodiment of the present invention.
- the video decoding apparatus 200 includes a bit stream input unit 201, a bit stream memory 202, a reference viewpoint image input unit 203, a reference viewpoint image memory 204, a depth map input unit 205, and a disparity vector generation unit 206 (representative depth setting). Unit, a transformation matrix setting unit, a representative position setting unit, a parallax information setting unit, a depth region setting unit, a depth reference parallax vector setting unit), and an image decoding unit 207 (predicted image generation unit).
- the bit stream input unit 201 inputs the bit stream encoded by the video encoding device 100, that is, the bit stream of the video to be decoded, into the bit stream memory 202.
- the bit stream memory 202 stores a bit stream of video to be decoded.
- an image included in the video to be decoded is referred to as a “decoding target image”.
- the decoding target image is an image included in the video (decoding target image group) captured by the camera B.
- the viewpoint of the camera B that captured the decoding target image is referred to as a “decoding target viewpoint”.
- the reference viewpoint image input unit 203 inputs an image included in video captured from a viewpoint (camera A) different from the decoding target image to the reference viewpoint image memory 204.
- An image based on a viewpoint different from the decoding target image is an image that is referred to when the decoding target image is decoded.
- the viewpoint of an image that is referred to when decoding a decoding target image is referred to as a “reference viewpoint”.
- the reference viewpoint image is referred to as a “reference viewpoint image”.
- the reference viewpoint image memory 204 stores the input reference viewpoint image.
- the depth map input unit 205 inputs, to the disparity vector generation unit 206, a depth map that is referred to when obtaining a disparity vector (information indicating disparity) based on the correspondence relationship between pixels between viewpoints.
- a depth map corresponding to a decoding target image is input, but a depth map at another viewpoint (such as a reference viewpoint) may be used.
- this depth map represents the three-dimensional position of the subject in the decoding target image for each pixel.
- the depth map can be expressed using, for example, a distance from the camera to the subject, a coordinate value of an axis that is not parallel to the image plane, or a parallax amount with respect to another camera (for example, camera A).
- the depth map is passed in the form of an image, but the depth map may not be passed in the form of an image as long as similar information can be obtained.
- the disparity vector generation unit 206 generates a disparity vector between a region included in the decoding target image and a region included in the reference viewpoint image associated with the decoding target image from the depth map.
- the image decoding unit 207 decodes the decoding target image from the bitstream based on the generated disparity vector and the reference viewpoint image.
- FIG. 6 is a flowchart showing the operation of the video decoding apparatus 200 in an embodiment of the present invention.
- the bit stream input unit 201 inputs a bit stream obtained by encoding the decoding target image to the bit stream memory 202.
- the bit stream memory 202 stores a bit stream obtained by encoding a decoding target image.
- the reference viewpoint image input unit 203 inputs the reference viewpoint image Ref to the reference viewpoint image memory 204.
- the reference viewpoint image memory 204 stores the reference viewpoint image Ref (step S201).
- the reference viewpoint image input here is the same as the reference viewpoint image used on the encoding side. This is to suppress the occurrence of encoding noise such as drift by using exactly the same information as the reference viewpoint image used at the time of encoding. However, when the generation of such encoding noise is allowed, a reference viewpoint image different from the reference viewpoint image used at the time of encoding may be input.
- the decoding target image is divided into regions of a predetermined size, and the video signal of the decoding target image is decoded from the bitstream for each divided region.
- an area obtained by dividing the decoding target image is referred to as a “decoding target area”.
- the block is divided into processing unit blocks called macroblocks of 16 pixels ⁇ 16 pixels, but may be divided into blocks of other sizes as long as they are the same as those on the encoding side.
- the entire decoding target image may not be divided into the same size, but may be divided into blocks having different sizes for each region (steps S202 to S207).
- the decoding target area index is represented as “blk”.
- the total number of decoding target areas in one frame of the decoding target image is represented as “numBlks”.
- blk is initialized with 0 (step S202).
- a depth map of the decoding target area blk is set (step S203).
- This depth map is input by the depth map input unit 205.
- the input depth map is the same as the depth map used on the encoding side. This is to suppress the generation of encoding noise such as drift by using the same depth map as that used on the encoding side. However, when such generation of encoding noise is allowed, a depth map different from that on the encoding side may be input.
- the same depth map as that used on the encoding side was estimated by applying stereo matching to multi-view images decoded for multiple cameras, in addition to the depth map separately decoded from the bitstream.
- a depth map or a depth map estimated using a decoded disparity vector, motion vector, or the like can be used.
- the depth map corresponding to the decoding target area is input for each decoding target area.
- the depth map used for the entire decoding target image is input and stored in advance and stored.
- the depth map corresponding to the decoding target area blk may be set by referring to the depth map in each decoding target area.
- the depth map corresponding to the decoding target area blk may be set in any way.
- a depth map at the same position as the position of the decoding target region blk in the decoding target image may be set, or only a predetermined or separately designated vector may be set.
- a depth map at a shifted position may be set.
- a scaled area may be set according to the resolution ratio, or the scaled area may be set according to the resolution ratio.
- a depth map generated by up-sampling may be set according to the above. Also, a depth map at the same position as the decoding target area of the depth map corresponding to an image decoded in the past with respect to the decoding target viewpoint may be set.
- the estimated parallax PDV between the decoding target viewpoint and the depth viewpoint in the decoding target region blk may be obtained using any method as long as it is the same method as that on the encoding side.
- the disparity vector used when decoding the peripheral region of the decoding target region blk, the global disparity vector set for the entire decoding target image or the partial image including the decoding target region, or for each decoding target region Separately set and encoded disparity vectors or the like can be used.
- the disparity vectors used in different decoding target areas or decoding target images decoded in the past may be stored, and the stored disparity vectors may be used.
- the disparity vector generation unit 206 generates a disparity vector in the decoding target area blk (step S204). This process is the same as step S104 described above, except that the encoding target area is replaced with the decoding target area and read.
- the image decoding unit 207 performs the prediction using the disparity vector of the decoding target region blk and the reference viewpoint image stored in the reference viewpoint image memory 204, and converts the video signal (pixel value) in the decoding target region blk to bit Decoding from the stream (step S205).
- the obtained decoding target image is an output of the video decoding device 200.
- a method corresponding to the method used at the time of encoding is used for decoding the video signal.
- the image decoding unit 207 is, for example, MPEG-2 or H.264.
- general coding such as H.264 / AVC is used
- frequency inverse such as entropy decoding, inverse binarization, inverse quantization, and inverse discrete cosine transform (IDCT: Inverse Discrete Cosine Transform)
- IDCT Inverse Discrete Cosine Transform
- the image decoding unit 207 adds 1 to blk (step S206).
- the image decoding unit 207 determines whether blk is less than numBlks (step S207). When blk is less than numBlks (step S207: Yes), the image decoding unit 207 returns the process to step S203. On the other hand, if blk is not less than numBlks (step S207: No), the image decoding unit 207 ends the process.
- the disparity vector is generated for each region obtained by dividing the encoding target image or the decoding target image.
- the disparity vectors are preliminarily applied to all the regions of the encoding target image or the decoding target image.
- the generated disparity vectors may be generated and stored, and the stored disparity vector may be referred to for each region.
- a flag indicating whether or not to apply the process may be encoded or decoded.
- a flag indicating whether or not to apply the process may be specified by some other means. For example, whether to apply the process may be expressed as one of modes indicating a method for generating a predicted image for each region.
- the transformation matrix is always generated.
- the transformation matrix does not change unless the positional relationship between the encoding target viewpoint or the decoding target viewpoint and the reference viewpoint or the definition of depth (a three-dimensional plane corresponding to each depth) changes. For this reason, when a set of transformation matrices is obtained in advance, it is not necessary to recalculate the transformation matrix for each frame or region.
- the positional relationship between the encoding target viewpoint and the reference viewpoint represented by separately provided camera parameters, and the positional relationship between the encoding target viewpoint and the reference viewpoint represented by the camera parameters in the immediately preceding frame are encoded. A comparison is made every time the target image changes.
- the transformation matrix set used in the immediately preceding frame may be used as it is, and the transformation matrix set may be obtained only in other cases.
- the positional relationship between the decoding target viewpoint and the reference viewpoint represented by separately provided camera parameters, and the positional relationship between the decoding target viewpoint and the reference viewpoint represented by the camera parameters in the immediately preceding frame are as follows: Compared with each change. When the positional relationship does not change or is small, the transformation matrix set used in the immediately preceding frame may be used as it is, and the transformation matrix set may be obtained only in other cases.
- a transformation matrix based on a reference viewpoint having a different positional relationship compared to the previous frame and a transformation matrix based on a depth whose definition has changed And conversion matrices may be obtained again only for them.
- the decoding side may determine whether to recalculate the transformation matrix based on the transmitted information. Only one piece of information indicating whether recalculation is necessary may be set for the entire frame, may be set for each reference viewpoint, or may be set for each depth.
- a transformation matrix is generated for each depth, but one depth value is set as a quantization depth for each separately defined depth value, and the transformation matrix is set for each quantization depth. May be set. Since the representative depth can take any depth value in the range of depth, a transformation matrix for all depth values may be required. By doing so, the depth value that requires the transformation matrix is quantized. It can be limited to the same depth value as the depth. When obtaining a transformation matrix after obtaining a representative depth, a quantization depth is obtained from a section of depth values including the representative depth, and a transformation matrix is obtained using the quantization depth. In particular, when one quantization depth is set for the entire range of depth, the transformation matrix is unique for the reference view.
- the quantization interval and the quantization depth may be set in any manner as long as the method is the same as that on the decoding side.
- the depth range may be divided equally and the median value may be set as the quantization depth.
- the interval and the quantization depth may be determined according to the depth distribution in the depth map.
- the encoding method transmits the determined quantization method (section and quantization depth) and the decoding side determines the quantization method from the bitstream. You may make it obtain by decoding. Note that, particularly when one quantization depth is set for the entire depth map, the quantization depth value may be encoded or decoded instead of the quantization method.
- the transformation matrix is generated also on the decoding side using camera parameters or the like.
- the transformation matrix obtained by calculation on the encoding side may be encoded and transmitted. Absent.
- the transformation matrix is obtained by decoding from the bitstream without generating the transformation matrix from the camera parameters or the like.
- the transformation matrix is always used.
- the camera parameters are checked, and if the directions are parallel between the viewpoints, a lookup table is generated, and the depth and the disparity vector are generated according to the lookup table. If the directions are not parallel between the viewpoints, the method of the present invention may be used. Further, it is possible to check only on the encoding side and encode information indicating which method is used. In that case, the decoding side decodes the information and decides which method to use.
- one disparity vector is set for each region (encoding target region or decoding target region and their subregions) obtained by dividing the encoding target image or the decoding target image.
- the above disparity vectors may be set.
- a plurality of parallax vectors may be generated by selecting a plurality of representative pixels or a plurality of representative depths for one region.
- disparity vectors for both the foreground and the background may be set by setting two representative depths of the maximum value and the minimum value.
- the homography matrix is used as the conversion matrix.
- another matrix is used. May be used.
- a simplified matrix may be used instead of a strict homography matrix.
- an affine transformation matrix, a projection matrix, a matrix generated by combining a plurality of transformation matrices, or the like may be used.
- the same transformation matrix is used for encoding and decoding.
- FIG. 7 is a block diagram showing an example of a hardware configuration when the video encoding apparatus 100 is configured by a computer and a software program in an embodiment of the present invention.
- the system includes a CPU (Central Processing Unit) 50, a memory 51, an encoding target image input unit 52, a reference viewpoint image input unit 53, a depth map input unit 54, a program storage device 55, and a bit stream output unit. 56. Each unit is communicably connected via a bus.
- CPU Central Processing Unit
- the CPU 50 executes a program.
- the memory 51 is a RAM (Random Access Memory) or the like in which programs and data accessed by the CPU 50 are stored.
- the encoding target image input unit 52 inputs an encoding target video signal from the camera B or the like to the CPU 50.
- the encoding target image input unit 52 may be a storage unit such as a disk device that stores a video signal.
- the reference viewpoint image input unit 53 inputs a video signal from a reference viewpoint such as the camera A to the CPU 50.
- the reference viewpoint image input unit 53 may be a storage unit such as a disk device that stores a video signal.
- the depth map input unit 54 inputs a depth map at a viewpoint where a subject is photographed by a depth camera or the like to the CPU 50.
- the depth map input unit 54 may be a storage unit such as a disk device that stores the depth map.
- the program storage device 55 stores a video encoding program 551 that is a software program that causes the CPU 50 to execute a video image encoding process.
- the bit stream output unit 56 outputs a bit stream generated by the CPU 50 executing the video encoding program 551 loaded from the program storage device 55 to the memory 51 via, for example, a network.
- the bit stream output unit 56 may be a storage unit such as a disk device that stores the bit stream.
- the encoding target image input unit 101 corresponds to the encoding target image input unit 52.
- the encoding target image memory 102 corresponds to the memory 51.
- the reference viewpoint image input unit 103 corresponds to the reference viewpoint image input unit 53.
- the reference viewpoint image memory 104 corresponds to the memory 51.
- the depth map input unit 105 corresponds to the depth map input unit 54.
- the disparity vector generation unit 106 corresponds to the CPU 50.
- the image encoding unit 107 corresponds to the CPU 50.
- FIG. 8 is a block diagram showing an example of a hardware configuration when the video decoding apparatus 200 is configured by a computer and a software program in one embodiment of the present invention.
- the system includes a CPU 60, a memory 61, a bit stream input unit 62, a reference viewpoint image input unit 63, a depth map input unit 64, a program storage device 65, and a decoding target image output unit 66. Each unit is communicably connected via a bus.
- the CPU 60 executes a program.
- the memory 61 is a RAM or the like in which programs and data accessed by the CPU 60 are stored.
- the bit stream input unit 62 inputs the bit stream encoded by the video encoding device 100 to the CPU 60.
- the bit stream input unit 62 may be a storage unit such as a disk device that stores the bit stream.
- the reference viewpoint image input unit 63 inputs a video signal from a reference viewpoint such as the camera A to the CPU 60.
- the reference viewpoint image input unit 63 may be a storage unit such as a disk device that stores a video signal.
- the depth map input unit 64 inputs a depth map at a viewpoint where a subject is photographed by a depth camera or the like to the CPU 60.
- the depth map input unit 64 may be a storage unit such as a disk device that stores depth information.
- the program storage device 65 stores a video decoding program 651 that is a software program that causes the CPU 60 to execute video decoding processing.
- the decoding target image output unit 66 outputs the decoding target image obtained by decoding the bitstream by the CPU 60 executing the video decoding program 651 loaded in the memory 61 to a playback device or the like.
- the decoding target image output unit 66 may be a storage unit such as a disk device that stores a video signal.
- the bit stream input unit 201 corresponds to the bit stream input unit 62.
- the bit stream memory 202 corresponds to the memory 61.
- the reference viewpoint image input unit 203 corresponds to the reference viewpoint image input unit 63.
- the reference viewpoint image memory 204 corresponds to the memory 61.
- the depth map input unit 205 corresponds to the depth map input unit 64.
- the disparity vector generation unit 206 corresponds to the CPU 60.
- the image decoding unit 207 corresponds to the CPU 60.
- the video encoding device 100 or the video decoding device 200 in the above-described embodiment may be realized by a computer.
- a program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on this recording medium may be read into a computer system and executed.
- the “computer system” includes hardware such as an OS (Operating System) and peripheral devices.
- “Computer-readable recording medium” means a portable medium such as a flexible disk, a magneto-optical disk, a ROM (Read Only Memory), a CD (Compact Disk) -ROM, or a hard disk built in a computer system. Refers to the device.
- the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line.
- a volatile memory inside a computer system serving as a server or a client in that case may be included and a program held for a certain period of time.
- the program may be a program for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.
- the video encoding device 100 and the video decoding device 200 may be realized using a programmable logic device such as FPGA (Field Programmable Gate Gate Array).
- the present invention can be applied to encoding and decoding of a free viewpoint video, for example.
- the present invention in encoding free viewpoint video data having video and depth maps for a plurality of viewpoints as constituent elements, the accuracy of the disparity vector calculated from the depth map can be improved even when the viewpoint directions are not parallel. Thus, the efficiency of video encoding can be improved.
- Video decoding apparatus 201 ... Bit stream input unit, 202 ... Bit stream memory, 203 ... Reference viewpoint image input unit, 204 ... Irradiation viewpoint image memory, 205 ... depth map input unit, 206 ... disparity vector generation unit, 207 ... image decoding unit, 551 ... video encoding program, 651 ... video decoding program
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Discrete Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
本願は、2013年12月27日に日本へ出願された特願2013-273523号に基づき優先権を主張し、その内容をここに援用する。
図1は、本発明の一実施形態における、映像符号化装置の構成を示すブロック図である。映像符号化装置100は、符号化対象画像入力部101と、符号化対象画像メモリ102と、参照視点画像入力部103と、参照視点画像メモリ104と、デプスマップ入力部105と、視差ベクトル生成部106(代表デプス設定部、変換行列設定部、代表位置設定部、視差情報設定部、デプス領域設定部、デプス参照視差ベクトル設定部)と、画像符号化部107(予測画像生成部)とを備える。
図2は、本発明の一実施形態における、映像符号化装置100の動作を示すフローチャートである。
符号化対象画像入力部101は、符号化対象画像Orgを、符号化対象画像メモリ102に入力する。符号化対象画像メモリ102は、符号化対象画像Orgを記憶する。参照視点画像入力部103は、参照視点画像Refを、参照視点画像メモリ104に入力する。参照視点画像メモリ104は、参照視点画像Refを記憶する(ステップS101)。
符号化対象領域ごとに繰り返される処理では、まず、符号化対象領域blkに対応するデプスマップ(デプスマップ上の対応領域であるデプス領域)を設定する(ステップS103)。
画像符号化部107は、blkがnumBlks未満であるか否か、を判定する(ステップS107)。blkがnumBlks未満である場合(ステップS107:Yes)、画像符号化部107は、ステップS103に処理を戻す。一方、blkがnumBlks未満でない場合(ステップS107:No)、画像符号化部107は、処理を終了する。
視差ベクトルを生成する処理では、まず、符号化対象領域blkのデプスマップから、代表画素位置pos及び代表デプスrepを設定する(ステップS1403)。代表画素位置pos及び代表デプスrepをどのような方法を用いて設定しても構わないが、復号側と同じ方法を用いる必要がある。
dt(pi)は、符号化対象画像上の点piにおけるデプスが代表デプスであるとしたときの、符号化対象視点から点piにおける被写体までの光軸上の距離を示す。
sは任意の実数であるが、カメラパラメータの誤差がない場合、sは参照視点の画像上の点qiにおける参照視点から点qiにおける被写体までの光軸上の距離dr(qi)と等しい。
また、上記定義に従い式(2)を計算すると、次の式(3)となる。なお、内部パラメータA、回転行列R、並進ベクトルtの添え字はカメラを表し、tとrはそれぞれ符号化対象視点と参照視点を表す。
視差ベクトル生成部106は、符号化対象領域blkを分割する(ステップS1401)。
numSBlksは、符号化対象領域blk内のサブ領域数を示す。視差ベクトル生成部106は、サブ領域インデックス「sblk」を、0で初期化する(ステップS1402)。
視差ベクトル生成部106は、代表デプス値から変換行列を求める(ステップS1404)。
視差ベクトル生成部106は、参照視点に対する視差ベクトルを求める。つまり、視差ベクトル生成部106は、サブ領域sblkのデプスマップから、視差ベクトルを求める(ステップS1405)。
視差ベクトル生成部106は、sblkがnumSBlks未満であるか否かを判定する(ステップS1407)。sblkがnumSBlks未満である場合(ステップS1407:Yes)、視差ベクトル生成部106は、ステップS1403に処理を戻す。つまり、視差ベクトル生成部106は、分割によって得られたサブ領域ごとに、デプスマップから視差ベクトルを求める「ステップS1403~S1407」を繰り返す。一方、sblkがnumSBlks未満でない場合(ステップS1407:No)、視差ベクトル生成部106は、処理を終了する。
図5は、本発明の一実施形態における、映像復号装置200の構成を示すブロック図である。映像復号装置200は、ビットストリーム入力部201と、ビットストリームメモリ202と、参照視点画像入力部203と、参照視点画像メモリ204と、デプスマップ入力部205と、視差ベクトル生成部206(代表デプス設定部、変換行列設定部、代表位置設定部、視差情報設定部、デプス領域設定部、デプス参照視差ベクトル設定部)と、画像復号部207(予測画像生成部)とを備える。
図6は、本発明の一実施形態における、映像復号装置200の動作を示すフローチャートである。
ビットストリーム入力部201は、復号対象画像を符号化したビットストリームを、ビットストリームメモリ202に入力する。ビットストリームメモリ202は、復号対象画像を符号化したビットストリームを記憶する。参照視点画像入力部203は、参照視点画像Refを、参照視点画像メモリ204に入力する。参照視点画像メモリ204は、参照視点画像Refを記憶する(ステップS201)。
復号対象領域ごとに繰り返される処理では、まず、復号対象領域blkのデプスマップを設定する(ステップS203)。
画像復号部207は、blkがnumBlks未満であるか否か、を判定する(ステップS207)。blkがnumBlks未満である場合(ステップS207:Yes)、画像復号部207は、ステップS203に処理を戻す。一方、blkがnumBlks未満でない場合(ステップS207:No)、画像復号部207は、処理を終了する。
図7は、本発明の一実施形態における、映像符号化装置100をコンピュータとソフトウェアプログラムとによって構成する場合のハードウェア構成の例を示すブロック図である。システムは、CPU(Central Processing Unit)50と、メモリ51と、符号化対象画像入力部52と、参照視点画像入力部53と、デプスマップ入力部54と、プログラム記憶装置55と、ビットストリーム出力部56とを備える。各部は、バスを介して通信可能に接続されている。
Claims (14)
- 複数の異なる視点の映像からなる多視点映像の1フレームである符号化対象画像を符号化する際に、前記符号化対象画像の視点とは異なる参照視点に対する画像である参照視点画像と、前記多視点映像中の被写体に対するデプスマップとを用いて、前記符号化対象画像を分割した領域である符号化対象領域ごとに、異なる視点間で予測しながら符号化を行う映像符号化装置であって、
前記デプスマップから代表デプスを設定する代表デプス設定部と、
前記代表デプスに基づいて、前記符号化対象画像上の位置を前記参照視点画像上の位置へと変換する変換行列を設定する変換行列設定部と、
前記符号化対象領域内の位置から代表位置を設定する代表位置設定部と、
前記代表位置と前記変換行列を用いて、前記符号化対象領域に対する前記符号化対象の前記視点と前記参照視点の視差情報を設定する視差情報設定部と、
前記視差情報を用いて、前記符号化対象領域に対する予測画像を生成する予測画像生成部と
を有する映像符号化装置。 - 前記符号化対象領域に対して、前記デプスマップ上での対応領域であるデプス領域を設定するデプス領域設定部をさらに有し、
前記代表デプス設定部は、前記デプス領域に対する前記デプスマップから前記代表デプスを設定する請求項1に記載の映像符号化装置。 - 前記符号化対象領域に対して、前記デプスマップに対する視差ベクトルであるデプス参照視差ベクトルを設定するデプス参照視差ベクトル設定部をさらに有し、
前記デプス領域設定部は、前記デプス参照視差ベクトルによって示される領域を前記デプス領域として設定する請求項2に記載の映像符号化装置。 - 前記デプス参照視差ベクトル設定部は、前記符号化対象領域に隣接する領域を符号化する際に使用した視差ベクトルを用いて、前記デプス参照視差ベクトルを設定する請求項3に記載の映像符号化装置。
- 前記代表デプス設定部は、前記符号化対象領域の4頂点の画素に対応する前記デプス領域内のデプスのうち、最も前記符号化対象画像の前記視点に近いことを示すデプスを前記代表デプスとして設定する請求項2から請求項4のいずれか1項に記載の映像符号化装置。
- 複数の異なる視点の映像からなる多視点映像の符号データから、復号対象画像を復号する際に、前記復号対象画像の視点とは異なる参照視点に対する画像である参照視点画像と、前記多視点映像中の被写体に対するデプスマップとを用いて、前記復号対象画像を分割した領域である復号対象領域ごとに、異なる視点間で予測しながら復号を行う映像復号装置であって、
前記デプスマップから代表デプスを設定する代表デプス設定部と、
前記代表デプスに基づいて、前記復号対象画像上の位置を前記参照視点画像上の位置へと変換する変換行列を設定する変換行列設定部と、
前記復号対象領域内の位置から代表位置を設定する代表位置設定部と、
前記代表位置と前記変換行列を用いて、前記復号対象領域に対する前記復号対象の前記視点と前記参照視点の視差情報を設定する視差情報設定部と、
前記視差情報を用いて、前記復号対象領域に対する予測画像を生成する予測画像生成部と
を有する映像復号装置。 - 前記復号対象領域に対して、前記デプスマップ上での対応領域であるデプス領域を設定するデプス領域設定部をさらに有し、
前記代表デプス設定部は、前記デプス領域に対する前記デプスマップから前記代表デプスを設定する請求項6に記載の映像復号装置。 - 前記復号対象領域に対して、前記デプスマップに対する視差ベクトルであるデプス参照視差ベクトルを設定するデプス参照視差ベクトル設定部をさらに有し、
前記デプス領域設定部は、前記デプス参照視差ベクトルによって示される領域を前記デプス領域として設定する請求項7に記載の映像復号装置。 - 前記デプス参照視差ベクトル設定部は、前記復号対象領域に隣接する領域を復号する際に使用した視差ベクトルを用いて、前記デプス参照視差ベクトルを設定する請求項8に記載の映像復号装置。
- 前記代表デプス設定部は、前記復号対象領域の4頂点の画素に対応する前記デプス領域内のデプスのうち、最も前記復号対象画像の前記視点に近いことを示すデプスを前記代表デプスとして設定する請求項7から請求項9のいずれか1項に記載の映像復号装置。
- 複数の異なる視点の映像からなる多視点映像の1フレームである符号化対象画像を符号化する際に、前記符号化対象画像の視点とは異なる参照視点に対する画像である参照視点画像と、前記多視点映像中の被写体に対するデプスマップとを用いて、前記符号化対象画像を分割した領域である符号化対象領域ごとに、異なる視点間で予測しながら符号化を行う映像符号化方法であって、
前記デプスマップから代表デプスを設定する代表デプス設定ステップと、
前記代表デプスに基づいて、前記符号化対象画像上の位置を前記参照視点画像上の位置へと変換する変換行列を設定する変換行列設定ステップと、
前記符号化対象領域内の位置から代表位置を設定する代表位置設定ステップと、
前記代表位置と前記変換行列を用いて、前記符号化対象領域に対する前記符号化対象の前記視点と前記参照視点の視差情報を設定する視差情報設置ステップと、
前記視差情報を用いて、前記符号化対象領域に対する予測画像を生成する予測画像生成ステップと
を有する映像符号化方法。 - 複数の異なる視点の映像からなる多視点映像の符号データから、復号対象画像を復号する際に、前記復号対象画像の視点とは異なる参照視点に対する画像である参照視点画像と、前記多視点映像中の被写体に対するデプスマップとを用いて、前記復号対象画像を分割した領域である復号対象領域ごとに、異なる視点間で予測しながら復号を行う映像復号方法であって、
前記デプスマップから代表デプスを設定する代表デプス設定ステップと、
前記代表デプスに基づいて、前記復号対象画像上の位置を前記参照視点画像上の位置へと変換する変換行列を設定する変換行列設定ステップと、
前記復号対象領域内の位置から代表位置を設定する代表位置設定ステップと、
前記代表位置と前記変換行列を用いて、前記復号対象領域に対する前記復号対象の前記視点と前記参照視点の視差情報を設定する視差情報設置ステップと、
前記視差情報を用いて、前記復号対象領域に対する予測画像を生成する予測画像生成ステップと
を有する映像復号方法。 - コンピュータに、請求項11に記載の映像符号化方法を実行させるための映像符号化プログラム。
- コンピュータに、請求項12に記載の映像復号方法を実行させるための映像復号プログラム。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/105,450 US20160316224A1 (en) | 2013-12-27 | 2014-12-24 | Video Encoding Method, Video Decoding Method, Video Encoding Apparatus, Video Decoding Apparatus, Video Encoding Program, And Video Decoding Program |
JP2015554948A JP6232076B2 (ja) | 2013-12-27 | 2014-12-24 | 映像符号化方法、映像復号方法、映像符号化装置、映像復号装置、映像符号化プログラム及び映像復号プログラム |
CN201480070358.XA CN106134197A (zh) | 2013-12-27 | 2014-12-24 | 视频编码方法、视频解码方法、视频编码装置、视频解码装置、视频编码程序以及视频解码程序 |
KR1020167016393A KR20160086941A (ko) | 2013-12-27 | 2014-12-24 | 영상 부호화 방법, 영상 복호 방법, 영상 부호화 장치, 영상 복호 장치, 영상 부호화 프로그램 및 영상 복호 프로그램 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2013273523 | 2013-12-27 | ||
JP2013-273523 | 2013-12-27 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015098948A1 true WO2015098948A1 (ja) | 2015-07-02 |
Family
ID=53478799
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2014/084118 WO2015098948A1 (ja) | 2013-12-27 | 2014-12-24 | 映像符号化方法、映像復号方法、映像符号化装置、映像復号装置、映像符号化プログラム及び映像復号プログラム |
Country Status (5)
Country | Link |
---|---|
US (1) | US20160316224A1 (ja) |
JP (1) | JP6232076B2 (ja) |
KR (1) | KR20160086941A (ja) |
CN (1) | CN106134197A (ja) |
WO (1) | WO2015098948A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2022519462A (ja) * | 2019-01-18 | 2022-03-24 | ソニーグループ株式会社 | ホモグラフィ変換を使用した点群符号化 |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102098322B1 (ko) | 2017-09-07 | 2020-04-07 | 동의대학교 산학협력단 | 평면모델링을 통한 깊이 영상 부호화에서 움직임 추정 방법 및 장치와 비일시적 컴퓨터 판독가능 기록매체 |
US10645417B2 (en) * | 2017-10-09 | 2020-05-05 | Google Llc | Video coding using parameterized motion model |
FR3075540A1 (fr) * | 2017-12-15 | 2019-06-21 | Orange | Procedes et dispositifs de codage et de decodage d'une sequence video multi-vues representative d'une video omnidirectionnelle. |
KR102074929B1 (ko) | 2018-10-05 | 2020-02-07 | 동의대학교 산학협력단 | 깊이 영상을 통한 평면 검출 방법 및 장치 그리고 비일시적 컴퓨터 판독가능 기록매체 |
CN110012310B (zh) * | 2019-03-28 | 2020-09-25 | 北京大学深圳研究生院 | 一种基于自由视点的编解码方法及装置 |
KR102224272B1 (ko) | 2019-04-24 | 2021-03-08 | 동의대학교 산학협력단 | 깊이 영상을 통한 평면 검출 방법 및 장치 그리고 비일시적 컴퓨터 판독가능 기록매체 |
CN111954032A (zh) * | 2019-05-17 | 2020-11-17 | 阿里巴巴集团控股有限公司 | 视频处理方法、装置、电子设备及存储介质 |
CN111189460B (zh) * | 2019-12-31 | 2022-08-23 | 广州展讯信息科技有限公司 | 一种含高精度地图轨迹的视频合成转换方法及装置 |
CN111163319B (zh) * | 2020-01-10 | 2023-09-15 | 上海大学 | 一种视频编码方法 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007036800A (ja) * | 2005-07-28 | 2007-02-08 | Nippon Telegr & Teleph Corp <Ntt> | 映像符号化方法、映像復号方法、映像符号化プログラム、映像復号プログラム及びそれらのプログラムを記録したコンピュータ読み取り可能な記録媒体 |
JP2009116532A (ja) * | 2007-11-05 | 2009-05-28 | Nippon Telegr & Teleph Corp <Ntt> | 仮想視点画像生成方法および仮想視点画像生成装置 |
JP2013030898A (ja) * | 2011-07-27 | 2013-02-07 | Nippon Telegr & Teleph Corp <Ntt> | 画像伝送方法、画像伝送装置、画像送信装置、画像受信装置、画像送信プログラム及び画像受信プログラム |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8898178B2 (en) * | 2011-12-15 | 2014-11-25 | Microsoft Corporation | Solution monitoring system |
-
2014
- 2014-12-24 WO PCT/JP2014/084118 patent/WO2015098948A1/ja active Application Filing
- 2014-12-24 KR KR1020167016393A patent/KR20160086941A/ko not_active Application Discontinuation
- 2014-12-24 CN CN201480070358.XA patent/CN106134197A/zh active Pending
- 2014-12-24 JP JP2015554948A patent/JP6232076B2/ja active Active
- 2014-12-24 US US15/105,450 patent/US20160316224A1/en not_active Abandoned
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007036800A (ja) * | 2005-07-28 | 2007-02-08 | Nippon Telegr & Teleph Corp <Ntt> | 映像符号化方法、映像復号方法、映像符号化プログラム、映像復号プログラム及びそれらのプログラムを記録したコンピュータ読み取り可能な記録媒体 |
JP2009116532A (ja) * | 2007-11-05 | 2009-05-28 | Nippon Telegr & Teleph Corp <Ntt> | 仮想視点画像生成方法および仮想視点画像生成装置 |
JP2013030898A (ja) * | 2011-07-27 | 2013-02-07 | Nippon Telegr & Teleph Corp <Ntt> | 画像伝送方法、画像伝送装置、画像送信装置、画像受信装置、画像送信プログラム及び画像受信プログラム |
Non-Patent Citations (3)
Title |
---|
GERHARD TECH ET AL.: "3D-HEVC Test Model 1", JOINT COLLABORATIVE TEAM ON 3D VIDEO CODING EXTENSION DEVELOPMENT OF ITU-T SG 16 WP3 AND ISO/IEC JTC1/SC29/WG11 JCT3V-A1005_D0, ITU-T, 20 September 2012 (2012-09-20), pages 12 - 21 * |
JIAN-LIANG LIN ET AL.: "3D-CE5.h related: Simplification on disparity vector derivation for HEVC-based 3D video coding", JOINT COLLABORATIVE TEAM ON 3D VIDEO CODING EXTENSION DEVELOPMENT OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11 JCT2-A0047, ITU-T, 20 July 2012 (2012-07-20), pages 1 - 3 * |
SHIN'YA SHIMIZU ET AL.: "Efficient Multi-view Video Coding using Multi-view Depth Map", THE JOURNAL OF THE INSTITUTE OF IMAGE INFORMATION AND TELEVISION ENGINEERS, THE INSTITUTE OF IMAGE INFORMATION AND TELEVISION ENGINEERS, vol. 63, no. 4, 1 April 2009 (2009-04-01), pages 524 - 532 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2022519462A (ja) * | 2019-01-18 | 2022-03-24 | ソニーグループ株式会社 | ホモグラフィ変換を使用した点群符号化 |
JP7371691B2 (ja) | 2019-01-18 | 2023-10-31 | ソニーグループ株式会社 | ホモグラフィ変換を使用した点群符号化 |
Also Published As
Publication number | Publication date |
---|---|
JP6232076B2 (ja) | 2017-11-22 |
JPWO2015098948A1 (ja) | 2017-03-23 |
CN106134197A (zh) | 2016-11-16 |
KR20160086941A (ko) | 2016-07-20 |
US20160316224A1 (en) | 2016-10-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6232076B2 (ja) | 映像符号化方法、映像復号方法、映像符号化装置、映像復号装置、映像符号化プログラム及び映像復号プログラム | |
KR20120000485A (ko) | 예측 모드를 이용한 깊이 영상 부호화 장치 및 방법 | |
JP6027143B2 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、および画像復号プログラム | |
JP6307152B2 (ja) | 画像符号化装置及び方法、画像復号装置及び方法、及び、それらのプログラム | |
JP6053200B2 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム及び画像復号プログラム | |
US20150249839A1 (en) | Picture encoding method, picture decoding method, picture encoding apparatus, picture decoding apparatus, picture encoding program, picture decoding program, and recording media | |
TWI499277B (zh) | 多視點畫像編碼方法、多視點畫像解碼方法、多視點畫像編碼裝置、多視點畫像解碼裝置及這些程式 | |
JPWO2014168082A1 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム及び画像復号プログラム | |
JP6232075B2 (ja) | 映像符号化装置及び方法、映像復号装置及び方法、及び、それらのプログラム | |
JP5926451B2 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、および画像復号プログラム | |
WO2015056712A1 (ja) | 動画像符号化方法、動画像復号方法、動画像符号化装置、動画像復号装置、動画像符号化プログラム、及び動画像復号プログラム | |
JP5706291B2 (ja) | 映像符号化方法,映像復号方法,映像符号化装置,映像復号装置およびそれらのプログラム | |
JP6386466B2 (ja) | 映像符号化装置及び方法、及び、映像復号装置及び方法 | |
WO2015098827A1 (ja) | 映像符号化方法、映像復号方法、映像符号化装置、映像復号装置、映像符号化プログラム及び映像復号プログラム | |
JP5759357B2 (ja) | 映像符号化方法、映像復号方法、映像符号化装置、映像復号装置、映像符号化プログラム及び映像復号プログラム | |
WO2015141549A1 (ja) | 動画像符号化装置及び方法、及び、動画像復号装置及び方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14873795 Country of ref document: EP Kind code of ref document: A1 |
|
DPE2 | Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101) | ||
ENP | Entry into the national phase |
Ref document number: 2015554948 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15105450 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref document number: 20167016393 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 14873795 Country of ref document: EP Kind code of ref document: A1 |