WO2014050827A1 - 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、画像復号プログラム及び記録媒体 - Google Patents
画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、画像復号プログラム及び記録媒体 Download PDFInfo
- Publication number
- WO2014050827A1 WO2014050827A1 PCT/JP2013/075735 JP2013075735W WO2014050827A1 WO 2014050827 A1 WO2014050827 A1 WO 2014050827A1 JP 2013075735 W JP2013075735 W JP 2013075735W WO 2014050827 A1 WO2014050827 A1 WO 2014050827A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- depth map
- image
- viewpoint
- virtual
- encoding
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/111—Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/161—Encoding, multiplexing or demultiplexing different image signal components
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/513—Processing of motion vectors
- H04N19/521—Processing of motion vectors for estimating the reliability of the determined motion vectors or motion vector field, e.g. for smoothing the motion vector field or for correcting motion vectors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/59—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
Definitions
- the present invention relates to an image encoding method, an image decoding method, an image encoding device, an image decoding device, an image encoding program, an image decoding program, and a recording medium that encode and decode a multi-view image.
- This application claims priority based on Japanese Patent Application No. 2012-212154 for which it applied to Japan on September 25, 2012, and uses the content here.
- multi-viewpoint image composed of a plurality of images obtained by photographing the same subject and background with a plurality of cameras.
- These moving images taken by a plurality of cameras are called multi-view moving images (or multi-view images).
- an image (moving image) taken by one camera is referred to as a “two-dimensional image (moving image)”, and a plurality of cameras having the same subject and background in different positions and orientations (hereinafter referred to as viewpoints).
- viewpoints A group of two-dimensional images (two-dimensional moving images) photographed in the above is referred to as “multi-view images (multi-view images)”.
- the two-dimensional moving image has a strong correlation in the time direction, and the encoding efficiency can be increased by using the correlation.
- the encoding efficiency can be increased by using this correlation.
- H. an international encoding standard.
- high-efficiency encoding is performed using techniques such as motion compensation prediction, orthogonal transform, quantization, and entropy encoding.
- H.M. In H.264, encoding using temporal correlation with a plurality of past or future frames is possible.
- H. The details of the motion compensation prediction technique used in H.264 are described in Non-Patent Document 1, for example.
- H. An outline of the motion compensation prediction technique used in H.264 will be described.
- H. H.264 motion compensation prediction divides the encoding target frame into blocks of various sizes, and allows each block to have different motion vectors and different reference frames. By using a different motion vector for each block, it is possible to achieve highly accurate prediction that compensates for different motions for each subject. On the other hand, by using a different reference frame for each block, it is possible to realize highly accurate prediction in consideration of occlusion caused by temporal changes.
- the difference between the multi-view image encoding method and the multi-view image encoding method is that, in addition to the correlation between cameras, the multi-view image has a temporal correlation at the same time.
- the same method can be used as the method using the correlation between cameras in either case. Therefore, here, a method used in encoding a multi-view video is described.
- FIG. 13 is a conceptual diagram showing parallax generated between cameras.
- the image plane of the camera whose optical axes are parallel is looked down vertically. In this way, the position where the same part on the subject is projected on the image plane of a different camera is generally called a corresponding point.
- each pixel value of the encoding target frame is predicted from the reference frame based on the correspondence relationship, and the prediction residual and the disparity information indicating the correspondence relationship are encoded. Since the parallax changes for each target camera pair and position, it is necessary to encode the parallax information for each region where the parallax compensation prediction is performed. In fact, H. In the H.264 multi-view encoding method, a vector representing disparity information is encoded for each block using disparity compensation prediction.
- Correspondence given by the parallax information can be represented by a one-dimensional quantity indicating the three-dimensional position of the subject instead of a two-dimensional vector based on epipolar geometric constraints by using camera parameters.
- information indicating the three-dimensional position of the subject there are various expressions, but the distance from the reference camera to the subject or the coordinate value on the axis that is not parallel to the image plane of the camera is often used. In some cases, the reciprocal of the distance is used instead of the distance. In addition, since the reciprocal of the distance is information proportional to the parallax, there are cases where two reference cameras are set and the three-dimensional position of the subject is expressed as the amount of parallax between images taken by these cameras. . Since there is no essential difference in the physical meaning no matter what representation is used, in the following, the information indicating these three-dimensional positions will be expressed as depth without being distinguished by the representation.
- FIG. 14 is a conceptual diagram of epipolar geometric constraints.
- the point on the image of another camera corresponding to the point on the image of one camera is constrained on a straight line called an epipolar line.
- the corresponding point is uniquely determined on the epipolar line.
- the corresponding point in the second camera image with respect to the subject projected at the position m in the first camera image is on the epipolar line when the subject position in the real space is M ′.
- the subject position in the real space is M ′′, it is projected at the position m ′′ on the epipolar line.
- Non-Patent Document 2 by using this property, the predicted image for the encoding target frame is synthesized from the reference frame according to the three-dimensional information of each subject given by the depth map (distance image) for the reference frame.
- a highly predictive image is generated, and efficient multi-view video encoding is realized.
- a predicted image generated based on this depth is called a viewpoint composite image, a viewpoint interpolation image, or a parallax compensation image.
- a depth map for a reference frame is converted into a depth map for a frame to be encoded, and corresponding points are obtained using the converted depth map, so that only a necessary region is obtained. It is possible to generate a viewpoint composite image. Accordingly, when encoding or decoding an image or a moving image while switching a method for generating a predicted image for each region of a frame to be encoded or decoded, a processing amount for generating a viewpoint composite image or In addition, the memory amount for temporarily storing the viewpoint composite image is reduced.
- An object is to provide an image encoding method, an image decoding method, an image encoding device, an image decoding device, an image encoding program, an image decoding program, and a recording medium that can be performed.
- the present invention when encoding a multi-view image that is an image of a plurality of viewpoints, encodes an encoded reference viewpoint image for a viewpoint different from the viewpoint of the encoding target image, and the depth of a subject in the reference viewpoint image.
- An image encoding method that performs encoding while predicting an image between viewpoints using a reference viewpoint depth map that is a map, and has a resolution lower than that of the encoding target image,
- a virtual depth map generating step for generating a virtual depth map that is a depth map of the subject, and generating a parallax compensation image for the encoding target image from the virtual depth map and the reference viewpoint image, thereby generating An inter-viewpoint image prediction step for performing image prediction.
- the image encoding method of the present invention further includes a same-resolution depth map generating step of generating, from the reference viewpoint depth map, a same-resolution depth map having the same resolution as the encoding target image, and the virtual depth map In the generating step, the virtual depth map is generated by reducing the same resolution depth map.
- the depth closest to the viewpoint among the depths corresponding to the plurality of pixels in the same resolution depth map is determined.
- the virtual depth map is generated by selecting the depth to be shown.
- the image encoding method of the present invention further includes a reduced depth map generation step of generating a reduced depth map of the subject in the reference viewpoint image by reducing the reference viewpoint depth map,
- the virtual depth map is generated from the reduced depth map.
- the reference view depth map is reduced only in either the vertical direction or the horizontal direction.
- the reduced depth map generation step in the image encoding method of the present invention for each pixel of the reduced depth map, it is determined that the depth is closest to the viewpoint among the depths corresponding to the plurality of pixels corresponding to the reference viewpoint depth map.
- the virtual depth map is generated by selecting the depth to be shown.
- the image coding method of the present invention further includes a sample pixel selection step of selecting some sample pixels from the pixels of the reference viewpoint depth map, and the virtual depth map generation step includes the sample pixel selection step.
- the virtual depth map is generated by converting the corresponding reference view depth map.
- the image encoding method of the present invention further includes a region dividing step of dividing the reference view depth map into partial regions according to a resolution ratio of the reference view depth map and the virtual depth map, and the sample pixel In the selection step, the sample pixel is selected for each partial region.
- the shape of the partial region is determined in accordance with a resolution ratio of the reference viewpoint depth map and the virtual depth map.
- the sample pixel selection step in the image encoding method of the present invention either a pixel having a depth indicating the closest to the viewpoint or a pixel having a depth indicating the furthest from the viewpoint for each partial region. One of them is selected as the sample pixel.
- a pixel having a depth indicating closest to the viewpoint and a pixel having a depth indicating the farthest from the viewpoint for each of the partial regions are sampled. Select as a pixel.
- the present invention provides a decoded reference viewpoint image for a viewpoint different from the viewpoint of the decoding target image, when decoding the decoding target image from code data of a multi-view image that is an image of a plurality of viewpoints, and the reference viewpoint
- An image decoding method that performs decoding while predicting an image between viewpoints using a reference viewpoint depth map that is a depth map of a subject in an image, wherein the resolution is lower than the decoding target image, and the decoding target image
- a virtual depth map generating step for generating a virtual depth map that is a depth map of the subject in the image, and generating a parallax compensation image for the decoding target image from the virtual depth map and the reference viewpoint image, thereby generating An inter-viewpoint image prediction step for performing the image prediction.
- the image decoding method of the present invention further includes a same-resolution depth map generating step for generating a same-resolution depth map having the same resolution as the decoding target image from the reference viewpoint depth map, and the virtual depth map generating step Then, the virtual depth map is generated by reducing the same resolution depth map.
- the virtual depth map generation step in the image decoding method of the present invention for each pixel of the virtual depth map, it is indicated that the depth is closest to the viewpoint among the depths corresponding to the plurality of pixels corresponding to the same resolution depth map.
- the virtual depth map is generated by selecting a depth.
- the image decoding method of the present invention further includes a reduced depth map generation step of generating a reduced depth map of the subject in the reference viewpoint image by reducing the reference viewpoint depth map, and the virtual depth map In the map generation step, the virtual depth map is generated from the reduced depth map.
- the reference view depth map is reduced only in either the vertical direction or the horizontal direction.
- the reduced depth map generating step in the image decoding method of the present invention for each pixel of the reduced depth map, it indicates that the depth is closest to the viewpoint among the depths corresponding to the plurality of pixels corresponding to the reference viewpoint depth map.
- the virtual depth map is generated by selecting a depth.
- the image decoding method of the present invention further includes a sample pixel selection step of selecting some sample pixels from the pixels of the reference viewpoint depth map, and the virtual depth map generation step corresponds to the sample pixels.
- the virtual depth map is generated by converting the reference view depth map.
- the image decoding method of the present invention further includes a region dividing step of dividing the reference view depth map into partial regions according to a resolution ratio of the reference view depth map and the virtual depth map, and the sample pixel selection In the step, a sample pixel is selected for each partial region.
- the shape of the partial region is determined according to a resolution ratio of the reference viewpoint depth map and the virtual depth map.
- the sample pixel selection step in the image decoding method of the present invention either a pixel having a depth indicating the closest to the viewpoint or a pixel having a depth indicating the furthest from the viewpoint for each partial region.
- One is selected as the sample pixel.
- a pixel having a depth indicating the closest to the viewpoint and a pixel having a depth indicating the farthest from the viewpoint for each of the partial regions are selected as the sample pixels. Choose as.
- the present invention when encoding a multi-view image that is an image of a plurality of viewpoints, encodes an encoded reference viewpoint image for a viewpoint different from the viewpoint of the encoding target image, and the depth of a subject in the reference viewpoint image.
- An image encoding device that performs encoding while predicting an image between viewpoints using a reference viewpoint depth map that is a map, having a resolution lower than that of the encoding target image, By generating a virtual depth map generating unit that generates a virtual depth map that is a depth map of the subject, and generating a parallax compensation image for the encoding target image from the virtual depth map and the reference viewpoint image,
- An inter-viewpoint image prediction unit that performs image prediction.
- the image coding apparatus further includes a reduced depth map generation unit that generates a reduced depth map of the subject in the reference viewpoint image by reducing the reference viewpoint depth map, and the virtual depth
- the map generation unit generates the virtual depth map by converting the reduced depth map.
- the image encoding device of the present invention further includes a sample pixel selection unit that selects some sample pixels from the pixels of the reference viewpoint depth map, and the virtual depth map generation unit corresponds to the sample pixels.
- the virtual depth map is generated by converting the reference view depth map.
- the present invention provides a decoded reference viewpoint image for a viewpoint different from the viewpoint of the decoding target image, when decoding the decoding target image from code data of a multi-view image that is an image of a plurality of viewpoints, and the reference viewpoint
- An image decoding apparatus that performs decoding while predicting an image between viewpoints using a reference viewpoint depth map that is a depth map of a subject in an image, wherein the resolution is lower than the decoding target image, and the decoding target image
- a virtual depth map generating unit that generates a virtual depth map that is a depth map of the subject in the image, and generating a parallax compensation image for the decoding target image from the virtual depth map and the reference viewpoint image, thereby generating And an inter-viewpoint image predicting unit that performs image prediction.
- the image decoding apparatus further includes a reduced depth map generation unit that generates a reduced depth map of the subject in the reference viewpoint image by reducing the reference viewpoint depth map, and the virtual depth map The generation unit generates the virtual depth map by converting the reduced depth map.
- the image decoding device of the present invention further includes a sample pixel selection unit that selects some sample pixels from the pixels of the reference viewpoint depth map, and the virtual depth map generation unit corresponds to the sample pixels.
- the virtual depth map is generated by converting the reference view depth map.
- the present invention is an image encoding program for causing a computer to execute the image encoding method.
- the present invention is an image decoding program for causing a computer to execute the image decoding method.
- the present invention is a computer-readable recording medium on which the image encoding program is recorded.
- the present invention is a computer-readable recording medium on which the image decoding program is recorded.
- the present invention it is possible to generate a viewpoint synthesized image with a small amount of calculation without significantly reducing the quality of the viewpoint synthesized image when generating a viewpoint synthesized image of the processing target frame.
- step S3 It is a flowchart which shows the processing operation of the process (step S3) which converts the reference camera depth map shown in FIG. 2, FIG. It is a flowchart which shows the operation
- step S3 It is a block diagram which shows the structure of the image decoding apparatus in one Embodiment of this invention. It is a flowchart which shows operation
- FIG. 25 is a block diagram illustrating a hardware configuration when an image decoding device is configured by a computer and a software program. It is a conceptual diagram which shows the parallax which arises between cameras. It is a conceptual diagram of epipolar geometric constraint.
- a multi-viewpoint image captured by two cameras a first camera (referred to as camera A) and a second camera (referred to as camera B), is encoded.
- camera A a first camera
- camera B a second camera
- information necessary for obtaining the parallax from the depth information is given separately. Specifically, this information is an external parameter representing the positional relationship between the camera A and the camera B, or an internal parameter representing projection information on the image plane by the camera. Other information may be given as long as parallax can be obtained.
- information that can specify the position between the symbols [] is added to an image, video frame, or depth map to add the position. It is assumed that the image signal sampled by the pixels and the depth corresponding thereto are shown. Further, the depth is information having a smaller value as the distance from the camera increases (the parallax is smaller). When the relationship between the depth size and the distance from the camera is defined in reverse, it is necessary to appropriately read the description of the magnitude of the value for the depth.
- FIG. 1 is a block diagram showing a configuration of an image encoding device according to this embodiment.
- the image encoding device 100 includes an encoding target image input unit 101, an encoding target image memory 102, a reference camera image input unit 103, a reference camera image memory 104, a reference camera depth map input unit 105, A depth map conversion unit 106, a virtual depth map memory 107, a viewpoint synthesized image generation unit 108, and an image encoding unit 109 are provided.
- the encoding target image input unit 101 inputs an image to be encoded.
- the image to be encoded is referred to as an encoding target image.
- an image of camera B is input.
- a camera that captures an encoding target image (camera B in this case) is referred to as an encoding target camera.
- the encoding target image memory 102 stores the input encoding target image.
- the reference camera image input unit 103 inputs a reference camera image that becomes a reference image when generating a viewpoint composite image (parallax compensation image).
- an image of camera A is input.
- the reference camera image memory 104 stores the input reference camera image.
- the reference camera depth map input unit 105 inputs a depth map for the reference camera image.
- the depth map for the reference camera image is referred to as a reference camera depth map.
- the depth map represents the three-dimensional position of the subject shown in each pixel of the corresponding image. Any information may be used as long as the three-dimensional position can be obtained by information such as separately provided camera parameters. For example, a distance from the camera to the subject, a coordinate value with respect to an axis that is not parallel to the image plane, and a parallax amount with respect to another camera (for example, camera B) can be used.
- the depth map is assumed to be passed in the form of an image here, but it may not be in the form of an image as long as similar information can be obtained.
- the camera corresponding to the reference camera depth map is referred to as a reference camera.
- the depth map conversion unit 106 uses the reference camera depth map to generate a depth map of a subject captured in the encoding target image and having a lower resolution than that of the encoding target image.
- the generated depth map can be considered as a depth map for an image captured by a low-resolution camera at the same position and orientation as the encoding target camera.
- the depth map generated here is referred to as a virtual depth map.
- the virtual depth map memory 107 stores the generated virtual depth map.
- the viewpoint composite image generation unit 108 generates a viewpoint composite image for the encoding target image using the correspondence relationship between the pixel of the encoding target image obtained from the virtual depth map and the pixel of the reference camera image.
- the image encoding unit 109 performs predictive encoding on the encoding target image using the viewpoint synthesized image, and outputs a bit stream that is encoded data.
- FIG. 2 is a flowchart showing the operation of the image coding apparatus 100 shown in FIG.
- the encoding target image input unit 101 inputs an encoding target image and stores the input encoding target image in the encoding target image memory 102 (step S1).
- the reference camera image input unit 103 inputs a reference camera image and stores the input reference camera image in the reference camera image memory 104.
- the reference camera depth map input unit 105 inputs the reference camera depth map, and outputs the input reference camera depth map to the depth map conversion unit 106 (step S2).
- the reference camera image and the reference camera depth map input in step S2 are the same as those obtained on the decoding side, such as the one already decoded. This is to suppress the occurrence of coding noise such as drift by using exactly the same information obtained by the decoding device. However, when the generation of such coding noise is allowed, the one that can be obtained only on the coding side, such as the one before coding, may be input.
- the reference camera depth map in addition to the one already decoded, the depth map estimated by applying stereo matching or the like to multi-view images decoded for a plurality of cameras, or decoding The depth map estimated using the parallax vector, the motion vector, etc., can also be used as the same one can be obtained on the decoding side.
- the depth map conversion unit 106 generates a virtual depth map based on the reference camera depth map output from the reference camera depth map input unit 105, and stores the generated virtual depth map in the virtual depth map memory 107 (step). S3).
- any resolution may be set. For example, a resolution with a predetermined reduction rate may be set for the encoding target image. Details of the processing here will be described later.
- the viewpoint composite image generation unit 108 uses the reference camera image stored in the reference camera image memory 104 and the virtual depth map stored in the virtual depth map memory 107 to generate a viewpoint composite image for the encoding target image. And the generated viewpoint synthesized image is output to the image encoding unit 109 (step S4).
- the image of the encoding target camera is synthesized using the depth map for the encoding target camera having a resolution lower than that of the encoding target image and an image captured by a camera different from the encoding target camera. Any method may be used as long as it is a method.
- one pixel of the virtual depth map is selected, a corresponding region on the encoding target image is obtained, and a corresponding region on the reference camera image is obtained from the depth value.
- the pixel value of the image in the corresponding area is obtained.
- the obtained pixel value is assigned as the pixel value of the viewpoint composite image in the area identified on the encoding target image.
- a predetermined pixel value may be assigned, the pixel value of the pixel in the nearest frame, You may assign the pixel value of the pixel in the nearest flame
- how the pixel value is determined needs to be the same as that on the decoding side.
- a filter such as a low-pass filter may be applied after a viewpoint composite image for one frame is obtained.
- the image encoding unit 109 predictively encodes the encoding target image using the viewpoint composite image as the predicted image and outputs the encoding result (step S5).
- the bit stream obtained as a result of encoding is the output of the image encoding apparatus 100. Note that any method may be used for encoding as long as decoding is possible on the decoding side.
- MPEG-2 and H.264 In general video encoding or image encoding such as H.264 and JPEG, an image is divided into blocks of a predetermined size, and a difference signal between the encoding target image and the predicted image is generated for each block. Then, frequency conversion such as DCT (Discrete Cosine Transform) is performed on the difference image, and the resulting value is encoded by sequentially applying quantization, binarization, and entropy coding processing. I do.
- DCT Discrete Cosine Transform
- FIG. 3 is a flowchart showing an operation of encoding the encoding target image by alternately repeating the viewpoint composite image generation processing and the encoding target image encoding processing for each block.
- the same parts as those in the processing operation shown in FIG. the index of a block that is a unit for performing the predictive encoding process is set as blk, and the number of blocks in the encoding target image is expressed as numBlks.
- the encoding target image input unit 101 inputs an encoding target image and stores the input encoding target image in the encoding target image memory 102 (step S1).
- the reference camera image input unit 103 inputs a reference camera image and stores the input reference camera image in the reference camera image memory 104.
- the reference camera depth map input unit 105 inputs the reference camera depth map, and outputs the input reference camera depth map to the depth map conversion unit 106 (step S2).
- the depth map conversion unit 106 generates a virtual depth map based on the reference camera depth map output from the reference camera depth map input unit 105, and stores the generated virtual depth map in the virtual depth map memory 107 (step). S3). Then, the viewpoint composite image generation unit 108 substitutes 0 for the variable blk (step S6).
- the viewpoint composite image generation unit 108 generates a viewpoint composite image for the block blk from the reference camera image stored in the reference camera image memory 104 and the virtual depth map stored in the virtual depth map memory 107. Then, the generated viewpoint composite image is output to the image encoding unit 109 (step S4a). Subsequently, after obtaining the viewpoint composite image, the image encoding unit 109 predictively encodes the encoding target image for the block blk using the viewpoint composite image as the prediction image and outputs the encoding result (step S5a).
- FIGS. 4 to 6 are flowcharts showing the processing operation of the process (step S3) for converting the reference camera depth map shown in FIGS.
- three different methods will be described as methods for generating a virtual depth map from a reference depth map. Any method may be used, but it is necessary to use the same method as that on the decoding side.
- information indicating the method used may be encoded and notified to the decoding side.
- the depth map conversion unit 106 synthesizes a depth map for the encoding target image from the reference camera depth map (step S21). That is, the resolution of the depth map obtained here is the same as that of the encoding target image. Any method can be used for the processing as long as it can be executed on the decoding side. For example, Reference 2 “Y. Mori, N. Fukushima, T. Fujii, and M .Tanimoto, “View Generation Generation with 3D Warping” Using “Depth Information” for “FTV”, “In Proceedings” of “3DTV-CON2008,” pp.229-232, “May” 2008. ”may be used.
- a virtual depth map for this region (encoding target image) may be generated.
- a corresponding depth on the virtual depth map is obtained using the depth value of the pixel, and the depth value converted to the corresponding point is assigned to the virtual depth map. May be generated.
- the converted depth value is obtained by converting a depth value for the reference camera depth map into a depth value for the virtual depth map.
- the corresponding points are not necessarily obtained as integer pixel positions in the virtual depth map, so continuity is assumed between positions on the virtual depth map corresponding to adjacent pixels on the reference camera depth map. Therefore, it is necessary to generate corresponding points by interpolating the depth values for the respective pixels of the virtual depth map.
- continuity is assumed only for pixels adjacent to each other on the reference camera depth map when the change in depth value is within a predetermined range. This is because different subjects are considered to appear in pixels with greatly different depth values, and continuity of the subject in real space cannot be assumed.
- one or a plurality of integer pixel positions may be obtained from the obtained corresponding points, and a converted depth value may be assigned to a pixel at the integer pixel position. In this case, it is not necessary to interpolate depth values, and the amount of calculation can be reduced.
- the subject that appears in a part of the reference camera image is shielded by the subject that appears in another region of the reference camera image, and there is a subject that does not appear in the encoding target image. Therefore, when using this method, it is necessary to assign a depth value to the corresponding point in consideration of the context.
- the order of processing the pixels of the reference camera depth map is determined according to the positional relationship between the encoding target camera and the reference camera, and the determination is made. By performing the processing according to the order, the virtual depth map can be generated by always overwriting the obtained corresponding points without considering the context.
- the pixels of the reference camera depth map are processed in the order of scanning from left to right in each row, and the encoding target camera is compared with the reference camera.
- the pixels in the reference camera depth map are processed in the order of scanning from right to left in each row, thereby eliminating the need to consider the context. Note that the calculation amount can be reduced by eliminating the need to consider the context.
- an effective depth can only be obtained for an area that is common to both.
- an estimated depth value may be assigned using the method described in Patent Document 1, or may remain without an effective value.
- the depth map conversion unit 106 when the synthesis of the depth map with respect to the encoding target image is completed, the depth map conversion unit 106 generates a virtual depth map of the target resolution by reducing the depth map obtained by the synthesis (step). S22).
- any method may be used as a method for reducing the depth map. For example, for each pixel of the virtual depth map, a plurality of corresponding pixels are set in the depth map obtained by combining, and an average value, an intermediate value, a mode value, etc. of the depth values for those pixels are obtained, There is a method of setting the depth value of the virtual depth map.
- the weight may be calculated according to the distance between the pixels, and the average value or intermediate value may be obtained using the weight. It should be noted that the pixel value is not considered in the calculation of the average value or the like for the region that has been left without a valid value in step S21.
- a plurality of corresponding pixels are set in the combined depth map, and the depth indicating the closest to the camera among the depth values for these pixels is set. There is a way to choose. As a result, since the prediction efficiency for the subject existing in front of subjective importance is improved, it is possible to realize subjectively excellent coding with a small code amount.
- step S21 if the effective depth is not obtained for a part of the area, the patent is finally applied to the area where the effective depth is not obtained in the generated virtual depth map.
- the estimated depth value may be assigned using the method described in Document 1.
- the depth map conversion unit 106 reduces the reference camera depth map (step S31). Any method may be used for reduction as long as the same processing can be performed on the decoding side. For example, reduction may be performed using the same method as in step S22 described above. Note that the resolution after the reduction may be reduced to any resolution as long as the decoding side can reduce the resolution to the same resolution. For example, it may be converted to a resolution with a predetermined reduction rate, or it may be the same as the virtual depth map. However, the resolution of the reduced depth map is the same as or higher than the resolution of the virtual depth map.
- any method may be used as a method for determining which of the vertical and horizontal reduction is performed. For example, it may be determined in advance or may be determined according to the positional relationship between the encoding target camera and the reference camera.
- a method of determining according to the positional relationship between the encoding target camera and the reference camera there is a method in which a direction different from the direction in which the parallax is generated as much as possible is set as the direction of reduction. That is, when the encoding target camera and the reference camera are arranged side by side in parallel, reduction is performed only in the vertical direction. By determining in this way, in the next step, processing using high-precision parallax is possible, and a high-quality virtual depth map can be generated.
- the depth map conversion unit 106 synthesizes a virtual depth map from the reduced depth map (step S32).
- the processing here is the same as step S21 except that the resolution of the depth map is different.
- the resolution of the depth map obtained by the reduction is different from the resolution of the virtual depth map
- the corresponding pixel on the virtual depth map is reduced for each pixel of the depth map obtained by the reduction.
- the plurality of pixels of the depth map obtained in this way have a corresponding relationship with one pixel of the virtual depth map.
- it is possible to generate a higher-quality virtual depth map by assigning the depth value of the pixel having the smallest error in decimal pixel accuracy.
- by selecting a depth value indicating that the pixel group is closest to the camera from among the plurality of pixel groups it is possible to improve the prediction efficiency with respect to a subject existing in a subjectively more important front.
- the depth map conversion unit 106 sets a plurality of sample pixels from the pixels of the reference camera depth map (step S41).
- Any sample pixel selection method may be used as long as the decoding side can realize the same selection.
- the reference camera depth map may be divided into a plurality of regions according to the ratio of the resolution of the reference camera depth map and the virtual depth map, and sample pixels may be selected according to a certain rule for each region.
- a certain rule is, for example, by selecting a pixel that exists at a specific position in an area, a pixel that has the depth that is the farthest from the camera, or a pixel that has the depth that is the closest to the camera. is there.
- a plurality of pixels may be selected for each region. That is, the four pixels existing in the four corners of the area, the two pixels of the pixel having the depth indicating the farthest from the camera and the pixel having the depth indicating the closest to the camera, and the depth indicating the proximity from the camera.
- a plurality of pixels such as three in order, may be used as sample pixels.
- the positional relationship between the encoding target camera and the reference camera may be used. For example, there is a method in which a width of a plurality of pixels is set according to a resolution ratio only in a direction that is as different as possible from the direction in which parallax occurs, and the width of one pixel is set in the other (direction in which parallax occurs). is there. Further, by selecting sample pixels having a resolution equal to or higher than the resolution of the virtual depth map, it is possible to reduce the number of pixels for which an effective depth cannot be obtained and generate a high-quality virtual depth map in the next step.
- the depth map conversion unit 106 synthesizes a virtual depth map using only the sample pixels of the reference camera depth map (step S42). This process is the same as step S32 except that the synthesis is performed using some pixels.
- a virtual depth map may be directly generated from the reference camera depth map.
- the processing in that case is the same as when the reduction ratio is set to 1 in the second method or when all the pixels of the reference camera depth map are set as sample pixels in the third method.
- the camera arrangement is one-dimensional parallel means that the theoretical projection plane of the camera is on the same plane and the optical axes are parallel to each other.
- the cameras are installed next to each other in the horizontal direction, and the reference camera exists on the left side of the encoding target camera.
- the epipolar straight line for the pixels on the horizontal line on the image plane is a horizontal line that exists at the same height. For this reason, parallax always exists only in the horizontal direction.
- the projection plane exists on the same plane when the depth is expressed as a coordinate value with respect to the coordinate axis in the optical axis direction, the definition axis of the depth is coincident between the cameras.
- FIG. 7 is a flowchart showing an operation of generating a virtual depth map from the reference camera depth map.
- the reference camera depth map is represented as RDdepth
- the virtual depth map is represented as VDepth. Since the camera arrangement is one-dimensionally parallel, the reference camera depth map is converted for each line to generate a virtual depth map. That is, assuming that the index indicating the virtual depth map line is h and the number of virtual depth map lines is Height, the depth map conversion unit 106 initializes h to 0 (step S51), and then adds 1 to h. However, the following processing (steps S52 to S64) is repeated until h becomes Height (step S66) (step S65).
- the depth map conversion unit 106 synthesizes a virtual depth map for one line from the reference camera depth map (steps S52 to S62). After that, it is determined whether or not there is an area where the depth could not be generated from the reference camera depth map on the line (step S63). If such an area exists, the depth is generated (step S64). Any method may be used. For example, for all the pixels in the region where the depth is not generated, the depth (Vdepth [ last]) may be assigned.
- the depth map conversion unit 106 determines a sample pixel set S corresponding to the line h of the virtual depth map (step S52). At this time, since the camera arrangement is one-dimensionally parallel, if the ratio of the number of lines of the reference camera depth map and the virtual depth map is N: 1, the sample pixel set is a line from the line N ⁇ h of the reference camera depth map. It will be selected from ⁇ N ⁇ (h + 1) ⁇ 1 ⁇ .
- any method may be used to determine the sample pixel set. For example, a pixel having a depth value indicating that it is closest to the camera for each pixel column (a set of pixels in the vertical direction) may be selected as the sample pixel. In addition, one pixel may be selected as a sample pixel for each of a plurality of columns instead of for each column. The column width at this time may be determined based on the ratio of the number of columns of the reference camera depth map and the virtual depth map.
- the pixel position last on the virtual depth map obtained by warping the sample pixel processed immediately before is initialized with (h, ⁇ 1) (step S53).
- the depth map conversion unit 106 repeats the process of warping the depth of the reference camera depth map for each pixel included in the sample pixel set. That is, while removing the processed sample pixels from the sample pixel set (step S61), the following processing (steps S54 to S60) is repeated until the sample pixel set becomes an empty set (step S62).
- the depth map conversion unit 106 selects, from the sample pixel set, a pixel p that is positioned on the leftmost on the reference camera depth map as a sample pixel to be processed ( Step S54). Next, the depth map conversion unit 106 obtains a point cp corresponding to the sample pixel p on the virtual depth map from the value of the reference camera depth map for the sample pixel p (step S55). When the corresponding point cp is obtained, the depth map conversion unit 106 checks whether or not the corresponding point exists in the frame of the virtual depth map (step S56). When the corresponding point is outside the frame, the depth map conversion unit 106 ends the processing for the sample pixel p without doing anything.
- the depth map conversion unit 106 assigns the depth for the pixel p of the reference camera depth map to the pixel of the virtual camera depth map for the corresponding point cp (step S57).
- the depth map conversion unit 106 determines whether another pixel exists between the position last assigned the depth of the immediately previous sample pixel and the position cp assigned the depth of the current sample pixel. (Step S58). When such a pixel exists, the depth map conversion unit 106 generates a depth for a pixel existing between the pixel last and the pixel cp (step S59). Any processing may be used to generate the depth. For example, the depth of the pixel last and the pixel cp may be linearly interpolated.
- the depth map conversion unit 106 updates the last to cp. Then, the processing for the sample pixel p is finished.
- the processing operation illustrated in FIG. 7 is processing when the reference camera is installed on the left side of the encoding target camera.
- the positional relationship between the reference camera and the encoding target camera is reversed, the order of the pixels to be processed
- the pixel position determination conditions may be reversed. Specifically, in step S53, last is initialized with (h, Width), and in step S54, the pixel p in the sample pixel set located on the rightmost on the reference camera depth map is selected as a sample pixel to be processed.
- step S63 it is determined whether or not there is a pixel on the left side of the last.
- a depth on the left side of the last is generated. Note that Width is the number of pixels in the horizontal direction of the virtual depth map.
- the processing operation shown in FIG. 7 is processing when the camera arrangement is one-dimensionally parallel, but even when the camera arrangement is one-dimensional convergence, the same processing flow can be applied depending on the definition of depth. .
- the same processing flow can be applied when the coordinate axes representing the depth are the same in the reference camera depth map and the virtual depth map. If the depth definition axis is different, the value of the reference camera depth map is not directly assigned to the virtual depth map, but the 3D position represented by the depth of the reference camera depth map is converted according to the depth definition axis. Later, the same flow can be applied basically only by assigning the three-dimensional position obtained by the conversion to the virtual depth map.
- FIG. 8 is a block diagram showing the configuration of the image decoding apparatus according to this embodiment.
- the image decoding apparatus 200 includes a code data input unit 201, a code data memory 202, a reference camera image input unit 203, a reference camera image memory 204, a reference camera depth map input unit 205, and a depth map conversion unit 206.
- the code data input unit 201 inputs code data of an image to be decoded.
- the image to be decoded is referred to as a decoding target image.
- the decoding target image indicates an image of the camera B.
- a camera that captures a decoding target image (camera B in this case) is referred to as a decoding target camera.
- the code data memory 202 stores code data that is an input decoding target image.
- the reference camera image input unit 203 inputs a reference camera image that becomes a reference image when generating a viewpoint composite image (parallax compensation image).
- the image of camera A is input.
- the reference camera image memory 204 stores the input reference camera image.
- the reference camera depth map input unit 205 inputs a depth map for the reference camera image.
- the depth map for the reference camera image is referred to as a reference camera depth map.
- the depth map represents the three-dimensional position of the subject shown in each pixel of the corresponding image. Any information may be used as long as the three-dimensional position can be obtained by information such as separately provided camera parameters. For example, a distance from the camera to the subject, a coordinate value with respect to an axis that is not parallel to the image plane, and a parallax amount with respect to another camera (for example, camera B) can be used.
- the depth map is assumed to be passed in the form of an image here, but it may not be in the form of an image as long as similar information can be obtained.
- a camera corresponding to the reference camera depth map is referred to as a reference camera.
- the depth map conversion unit 206 uses the reference camera depth map to generate a depth map of a subject captured in the decoding target image and having a lower resolution than the decoding target image.
- the generated depth map can be considered as a depth map for an image captured by a low-resolution camera at the same position and orientation as the decoding target camera.
- the depth map generated here is referred to as a virtual depth map.
- the virtual depth map memory 207 stores the generated virtual depth map.
- the viewpoint composite image generation unit 208 generates a viewpoint composite image for the decoding target image using the correspondence relationship between the pixel of the decoding target image obtained from the virtual depth map and the pixel of the reference camera image.
- the image decoding unit 209 decodes the decoding target image from the code data using the viewpoint synthesized image and outputs the decoded image.
- FIG. 9 is a flowchart showing the operation of the image decoding apparatus 200 shown in FIG.
- the code data input unit 201 inputs code data of a decoding target image, and stores the input code data in the code data memory 202 (step S71).
- the reference camera image input unit 203 inputs a reference camera image and stores the input reference camera image in the reference camera image memory 204.
- the reference camera depth map input unit 205 inputs the reference camera depth map, and outputs the input reference camera depth map to the depth map conversion unit 206 (step S72).
- reference camera image and the reference camera depth map input in step S72 are the same as those used on the encoding side. This is to suppress the occurrence of encoding noise such as drift by using exactly the same information as that used in the encoding apparatus. However, if such encoding noise is allowed to occur, a different one from that used at the time of encoding may be input.
- reference camera depth maps in addition to those separately decoded, depth maps estimated by applying stereo matching to multi-viewpoint images decoded for a plurality of cameras, decoded parallax vectors, and motion A depth map estimated using a vector or the like may be used.
- the depth map conversion unit 206 generates a virtual depth map from the reference camera depth map, and stores the generated virtual depth map in the virtual depth map memory 207 (step S73).
- the processing here is the same as step S3 shown in FIG. 2 except that encoding and decoding are different, such as an encoding target image and a decoding target image.
- the viewpoint composite image generation unit 208 generates a viewpoint composite image for the decoding target image from the reference camera image and the virtual depth map, and the generated viewpoint composite image is an image. It outputs to the decoding part 209 (step S74).
- the process here is the same as step S4 shown in FIG. 2 except that encoding and decoding are different, such as an encoding target image and a decoding target image.
- the image decoding unit 209 decodes the decoding target image from the code data and outputs the decoding result while using the viewpoint composite image as the predicted image (step S75).
- the decoded image obtained as a result of decoding is the output of the image decoding apparatus 200. Note that any method may be used for decoding as long as the code data (bit stream) can be correctly decoded. In general, a method corresponding to the method used at the time of encoding is used.
- the image is divided into blocks of a predetermined size, entropy decoding, inverse binary for each block After performing quantization, inverse quantization, etc., applying inverse frequency transform such as IDCT (Inverse Discrete Cosine Transform) to obtain the prediction residual signal, adding the prediction image to the prediction residual signal, the result obtained Is decoded in the pixel value range.
- IDCT Inverse Discrete Cosine Transform
- the decoding target image may be decoded by alternately repeating the viewpoint composite image generation processing (step S74) and the decoding target image decoding processing (step S75) for each block.
- step S74 the viewpoint composite image generation processing
- step S75 the decoding target image decoding processing
- FIG. 10 is a flowchart illustrating an operation of decoding the decoding target image by alternately repeating the viewpoint composite image generation processing and the decoding target image decoding processing for each block.
- the index of a block that is a unit for performing the decoding process is represented by blk
- the number of blocks in the decoding target image is represented by numBlks.
- the code data input unit 201 inputs code data of a decoding target image, and stores the input code data in the code data memory 202 (step S71).
- the reference camera image input unit 203 inputs a reference camera image and stores the input reference camera image in the reference camera image memory 204.
- the reference camera depth map input unit 205 inputs the reference camera depth map, and outputs the input reference camera depth map to the depth map conversion unit 206 (step S72).
- the depth map conversion unit 206 generates a virtual depth map from the reference camera depth map, and stores the generated virtual depth map in the virtual depth map memory 207 (step S73). Then, the viewpoint composite image generation unit 208 substitutes 0 for the variable blk (step S76).
- the viewpoint synthesized image generation unit 208 generates a viewpoint synthesized image for the block blk from the reference camera image and the virtual depth map, and outputs the generated viewpoint synthesized image to the image decoding unit 209 (step S74a).
- the image decoding unit 209 decodes the decoding target image for the block blk from the code data and outputs the decoding result while using the viewpoint synthesized image as the predicted image (step S75a).
- the process of encoding and decoding all the pixels in one frame has been described.
- the encoding or decoding may be performed using intra-frame prediction encoding or motion compensation prediction encoding used in H.264 / AVC or the like. In that case, it is necessary to encode and decode information indicating which method is used for prediction for each pixel.
- encoding or decoding may be performed using a different prediction method for each block instead of for each pixel.
- the process of encoding and decoding one frame has been described.
- the embodiment of the present invention can be applied to moving picture encoding by repeating the process for a plurality of frames.
- the embodiment of the present invention can be applied only to some frames and some blocks of a moving image.
- the configurations and processing operations of the image encoding device and the image decoding device have been described.
- the image encoding method of the present invention is performed by processing operations corresponding to the operations of the respective units of the image encoding device and the image decoding device. And an image decoding method can be realized.
- FIG. 11 is a block diagram showing a hardware configuration when the above-described image encoding device is configured by a computer and a software program.
- the system shown in FIG. 11 includes a CPU (Central Processing Unit) 50 that executes a program, a memory 51 such as a RAM (Random Access Memory) that stores programs and data accessed by the CPU 50, and an encoding target from a camera or the like.
- a CPU Central Processing Unit
- RAM Random Access Memory
- Encoding target image input unit 52 (which may be a storage unit that stores an image signal from a disk device or the like), and reference camera image input unit 53 (disk that inputs a reference target image signal from a camera or the like)
- a reference camera depth map input unit 54 for inputting a depth map for a camera having a position and orientation different from that of the camera that has captured the encoding target image from the depth camera or the like.
- a storage unit that stores a depth map by a disk device or the like), and the image encoding process described above is performed by the CPU 5.
- a code data output unit 56 (which may be a storage unit for storing code data by a disk device or the like) is connected via a bus.
- FIG. 12 is a block diagram showing a hardware configuration when the above-described image decoding apparatus is configured by a computer and a software program.
- the system shown in FIG. 12 includes a CPU 60 that executes a program, a memory 61 such as a RAM that stores programs and data accessed by the CPU 60, and code data that is input with code data encoded by the image encoding apparatus according to this method.
- An input unit 62 may be a storage unit that stores code data by a disk device or the like
- a reference camera image input unit 63 a storage unit that stores an image signal by a disk device or the like) that inputs a reference target image signal from a camera or the like
- a reference camera depth map input unit 64 a storage unit for storing depth information by a disk device or the like that inputs a depth map for a camera in a position and orientation different from that of a camera that has captured a decoding target from a depth camera or the like.
- a software program that causes the CPU 60 to execute the image decoding process described above.
- the decoding target image obtained by decoding the code data by executing the program storage device 65 storing the image decoding program 651 and the image decoding program 651 loaded in the memory 61 by the CPU 60 is used as a reproduction device or the like.
- the decoding target image output unit 66 (which may be a storage unit that stores an image signal by a disk device or the like) that is output to the network is connected by a bus.
- a program for realizing the function of each processing unit in the image encoding device shown in FIG. 1 and the image decoding device shown in FIG. 8 is recorded on a computer-readable recording medium, and the program recorded on the recording medium
- the image encoding process and the image decoding process may be performed by causing the computer system to read and execute.
- the “computer system” includes hardware such as an OS (Operating System) and peripheral devices.
- the “computer system” also includes a WWW (World Wide Web) system provided with a homepage providing environment (or display environment).
- Computer-readable recording medium means a portable medium such as a flexible disk, a magneto-optical disk, a ROM (Read Only Memory), a CD (Compact Disk) -ROM, or a hard disk built in a computer system. Refers to the device. Further, the “computer-readable recording medium” refers to a volatile memory (RAM) in a computer system that becomes a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In addition, those holding programs for a certain period of time are also included.
- RAM volatile memory
- the program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium.
- the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line.
- the program may be for realizing a part of the functions described above. Further, the program may be a so-called difference file (difference program) that can realize the above-described functions in combination with a program already recorded in the computer system.
- the present invention achieves high coding efficiency with a small amount of calculation when performing parallax compensation prediction on an encoding (decoding) target image using a depth map representing a three-dimensional position of a subject with respect to a reference frame. Can be applied to essential applications.
- DESCRIPTION OF SYMBOLS 100 ... Image coding apparatus, 101 ... Encoding object image input part, 102 ... Encoding object image memory, 103 ... Reference camera image input part, 104 ... Reference camera image memory, 105 ... Reference camera depth map input unit, 106 ... Depth map conversion unit, 107 ... Virtual depth map memory, 108 ... Viewpoint composite image generation unit, 109 ... Image encoding unit, 200 ... Image decoding apparatus, 201: Code data input unit, 202: Code data memory, 203: Reference camera image input unit, 204: Reference camera image memory, 205 ... Reference camera depth map input , 206 ... Depth map conversion unit, 207 ... Virtual depth map memory, 208 ... Viewpoint composite image generation unit, 209 ... Image decoding unit
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
本願は、2012年9月25日に日本へ出願された特願2012-211154号に基づき優先権を主張し、その内容をここに援用する。
Claims (32)
- 複数の視点の画像である多視点画像を符号化する際に、符号化対象画像の視点とは異なる視点に対する符号化済みの参照視点画像と、前記参照視点画像内の被写体のデプスマップである参照視点デプスマップとを用いて、視点間で画像を予測しながら符号化を行う画像符号化方法であって、
前記符号化対象画像よりも解像度が低く、前記符号化対象画像内の前記被写体のデプスマップである仮想デプスマップを生成する仮想デプスマップ生成ステップと、
前記仮想デプスマップと前記参照視点画像とから、前記符号化対象画像に対する視差補償画像を生成することにより、視点間の画像予測を行う視点間画像予測ステップと
を有する画像符号化方法。 - 前記参照視点デプスマップから、前記符号化対象画像と同じ解像度の同解像度デプスマップを生成する同解像度デプスマップ生成ステップをさらに有し、
前記仮想デプスマップ生成ステップでは、前記同解像度デプスマップを縮小することにより、前記仮想デプスマップを生成する請求項1に記載の画像符号化方法。 - 前記仮想デプスマップ生成ステップでは、前記仮想デプスマップの画素ごとに、前記同解像度デプスマップで対応する複数の画素に対するデプスのうち、最も視点に近いことを示すデプスを選択することにより、前記仮想デプスマップを生成する請求項2に記載の画像符号化方法。
- 前記参照視点デプスマップを縮小することにより、前記参照視点画像内の前記被写体の縮小デプスマップを生成する縮小デプスマップ生成ステップをさらに有し、
前記仮想デプスマップ生成ステップでは、前記縮小デプスマップから、前記仮想デプスマップを生成する請求項1に記載の画像符号化方法。 - 前記縮小デプスマップ生成ステップでは、前記参照視点デプスマップを縦方向または横方向のいずれか一方に対してのみ縮小する請求項4に記載の画像符号化方法。
- 前記縮小デプスマップ生成ステップでは、前記縮小デプスマップの画素ごとに、前記参照視点デプスマップで対応する複数の画素に対するデプスのうち、最も視点に近いことを示すデプスを選択することにより、前記仮想デプスマップを生成する請求項4または請求項5に記載の画像符号化方法。
- 前記参照視点デプスマップの画素から、一部のサンプル画素を選択するサンプル画素選択ステップをさらに有し、
前記仮想デプスマップ生成ステップでは、前記サンプル画素に対応する前記参照視点デプスマップを変換することにより、前記仮想デプスマップを生成する請求項1に記載の画像符号化方法。 - 前記参照視点デプスマップと前記仮想デプスマップの解像度の比に従って、前記参照視点デプスマップを部分領域に分割する領域分割ステップをさらに有し、
前記サンプル画素選択ステップでは、前記部分領域ごとに前記サンプル画素を選択する請求項7に記載の画像符号化方法。 - 前記領域分割ステップでは、前記参照視点デプスマップと前記仮想デプスマップの解像度の比に従って、前記部分領域の形状を決定する請求項8に記載の画像符号化方法。
- 前記サンプル画素選択ステップでは、前記部分領域ごとに最も視点に近いことを示すデプスを持つ画素、または、最も視点から遠いことを示すデプスを持つ画素のいずれか一方を前記サンプル画素として選択する請求項8または請求項9に記載の画像符号化方法。
- 前記サンプル画素選択ステップでは、前記部分領域ごとに最も視点に近いことを示すデプスを持つ画素と最も視点から遠いことを示すデプスを持つ画素とを前記サンプル画素として選択する請求項8または請求項9に記載の画像符号化方法。
- 複数の視点の画像である多視点画像の符号データから、復号対象画像を復号する際に、前記復号対象画像の視点とは異なる視点に対する復号済みの参照視点画像と、前記参照視点画像内の被写体のデプスマップである参照視点デプスマップとを用いて、視点間で画像を予測しながら復号を行う画像復号方法であって、
前記復号対象画像よりも解像度が低く、前記復号対象画像内の前記被写体のデプスマップである仮想デプスマップを生成する仮想デプスマップ生成ステップと、
前記仮想デプスマップと前記参照視点画像とから、前記復号対象画像に対する視差補償画像を生成することにより、視点間の画像予測を行う視点間画像予測ステップと
を有する画像復号方法。 - 前記参照視点デプスマップから、前記復号対象画像と同じ解像度の同解像度デプスマップを生成する同解像度デプスマップ生成ステップをさらに有し、
前記仮想デプスマップ生成ステップでは、前記同解像度デプスマップを縮小することにより、前記仮想デプスマップを生成する請求項12に記載の画像復号方法。 - 前記仮想デプスマップ生成ステップでは、前記仮想デプスマップの画素ごとに、前記同解像度デプスマップで対応する複数の画素に対するデプスのうち、最も視点に近いことを示すデプスを選択することにより、前記仮想デプスマップを生成する請求項13に記載の画像復号方法。
- 前記参照視点デプスマップを縮小することにより、前記参照視点画像内の前記被写体の縮小デプスマップを生成する縮小デプスマップ生成ステップをさらに有し、
前記仮想デプスマップ生成ステップでは、前記縮小デプスマップから、前記仮想デプスマップを生成する請求項12に記載の画像復号方法。 - 前記縮小デプスマップ生成ステップでは、前記参照視点デプスマップを縦方向または横方向のいずれか一方に対してのみ縮小する請求項15に記載の画像復号方法。
- 前記縮小デプスマップ生成ステップでは、前記縮小デプスマップの画素ごとに、前記参照視点デプスマップで対応する複数の画素に対するデプスのうち、最も視点に近いことを示すデプスを選択することにより、前記仮想デプスマップを生成する請求項15または請求項16に記載の画像復号方法。
- 前記参照視点デプスマップの画素から、一部のサンプル画素を選択するサンプル画素選択ステップをさらに有し、
前記仮想デプスマップ生成ステップでは、前記サンプル画素に対応する前記参照視点デプスマップを変換することにより、前記仮想デプスマップを生成する請求項12に記載の画像復号方法。 - 前記参照視点デプスマップと前記仮想デプスマップの解像度の比に従って、前記参照視点デプスマップを部分領域に分割する領域分割ステップをさらに有し、
前記サンプル画素選択ステップでは、前記部分領域ごとにサンプル画素を選択する請求項18に記載の画像復号方法。 - 前記領域分割ステップでは、前記参照視点デプスマップと前記仮想デプスマップの解像度の比に従って、前記部分領域の形状を決定する請求項19に記載の画像復号方法。
- 前記サンプル画素選択ステップでは、前記部分領域ごとに最も視点に近いことを示すデプスを持つ画素、または、最も視点から遠いことを示すデプスを持つ画素のいずれか一方を前記サンプル画素として選択する請求項19または請求項20に記載の画像復号方法。
- 前記サンプル画素選択ステップでは、前記部分領域ごとに最も視点に近いことを示すデプスを持つ画素と最も視点から遠いことを示すデプスを持つ画素とを前記サンプル画素として選択する請求項19または請求項20に記載の画像復号方法。
- 複数の視点の画像である多視点画像を符号化する際に、符号化対象画像の視点とは異なる視点に対する符号化済みの参照視点画像と、前記参照視点画像内の被写体のデプスマップである参照視点デプスマップとを用いて、視点間で画像を予測しながら符号化を行う画像符号化装置であって、
前記符号化対象画像よりも解像度が低く、前記符号化対象画像内の前記被写体のデプスマップである仮想デプスマップを生成する仮想デプスマップ生成部と、
前記仮想デプスマップと前記参照視点画像とから、前記符号化対象画像に対する視差補償画像を生成することにより、視点間の画像予測を行う視点間画像予測部と
を備える画像符号化装置。 - 前記参照視点デプスマップを縮小することにより、前記参照視点画像内の前記被写体の縮小デプスマップを生成する縮小デプスマップ生成部をさらに備え、
前記仮想デプスマップ生成部は、前記縮小デプスマップを変換することにより、前記仮想デプスマップを生成する請求項23に記載の画像符号化装置。 - 前記参照視点デプスマップの画素から、一部のサンプル画素を選択するサンプル画素選択部をさらに備え、
前記仮想デプスマップ生成部は、前記サンプル画素に対応する前記参照視点デプスマップを変換することにより、前記仮想デプスマップを生成する請求項23に記載の画像符号化装置。 - 複数の視点の画像である多視点画像の符号データから、復号対象画像を復号する際に、前記復号対象画像の視点とは異なる視点に対する復号済みの参照視点画像と、前記参照視点画像内の被写体のデプスマップである参照視点デプスマップとを用いて、視点間で画像を予測しながら復号を行う画像復号装置であって、
前記復号対象画像よりも解像度が低く、前記復号対象画像内の前記被写体のデプスマップである仮想デプスマップを生成する仮想デプスマップ生成部と、
前記仮想デプスマップと前記参照視点画像とから、前記復号対象画像に対する視差補償画像を生成することにより、視点間の画像予測を行う視点間画像予測部と
を備える画像復号装置。 - 前記参照視点デプスマップを縮小することにより、前記参照視点画像内の前記被写体の縮小デプスマップを生成する縮小デプスマップ生成部をさらに備え、
前記仮想デプスマップ生成部は、前記縮小デプスマップを変換することにより、前記仮想デプスマップを生成する請求項26に記載の画像復号装置。 - 前記参照視点デプスマップの画素から、一部のサンプル画素を選択するサンプル画素選択部をさらに備え、
前記仮想デプスマップ生成部は、前記サンプル画素に対応する前記参照視点デプスマップを変換することにより、前記仮想デプスマップを生成する請求項26に記載の画像復号装置。 - コンピュータに、請求項1から11のいずれか1項に記載の画像符号化方法を実行させるための画像符号化プログラム。
- コンピュータに、請求項12から22のいずれか1項に記載の画像復号方法を実行させるための画像復号プログラム。
- 請求項29に記載の画像符号化プログラムを記録したコンピュータ読み取り可能な記録媒体。
- 請求項30に記載の画像復号プログラムを記録したコンピュータ読み取り可能な記録媒体。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020157002048A KR101648094B1 (ko) | 2012-09-25 | 2013-09-24 | 화상 부호화 방법, 화상 복호 방법, 화상 부호화 장치, 화상 복호 장치, 화상 부호화 프로그램, 화상 복호 프로그램 및 기록매체 |
JP2014538497A JP5883153B2 (ja) | 2012-09-25 | 2013-09-24 | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、画像復号プログラム及び記録媒体 |
US14/430,433 US20150249839A1 (en) | 2012-09-25 | 2013-09-24 | Picture encoding method, picture decoding method, picture encoding apparatus, picture decoding apparatus, picture encoding program, picture decoding program, and recording media |
CN201380044060.7A CN104871534A (zh) | 2012-09-25 | 2013-09-24 | 图像编码方法、图像解码方法、图像编码装置、图像解码装置、图像编码程序、图像解码程序以及记录介质 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2012211154 | 2012-09-25 | ||
JP2012-211154 | 2012-09-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014050827A1 true WO2014050827A1 (ja) | 2014-04-03 |
Family
ID=50388224
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2013/075735 WO2014050827A1 (ja) | 2012-09-25 | 2013-09-24 | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、画像復号プログラム及び記録媒体 |
Country Status (5)
Country | Link |
---|---|
US (1) | US20150249839A1 (ja) |
JP (1) | JP5883153B2 (ja) |
KR (1) | KR101648094B1 (ja) |
CN (1) | CN104871534A (ja) |
WO (1) | WO2014050827A1 (ja) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3171598A1 (en) * | 2015-11-19 | 2017-05-24 | Thomson Licensing | Methods and devices for encoding and decoding a matrix of views obtained from light-field data, corresponding computer program and non-transitory program storage device |
KR101876007B1 (ko) * | 2015-12-28 | 2018-07-06 | 전자부품연구원 | 분산 및 병렬 프로그래밍 기반의 실시간 초다시점 3d 콘텐츠 합성 시스템 및 방법 |
US11457196B2 (en) | 2019-08-28 | 2022-09-27 | Snap Inc. | Effects for 3D data in a messaging system |
US11189104B2 (en) | 2019-08-28 | 2021-11-30 | Snap Inc. | Generating 3D data in a messaging system |
US11410401B2 (en) | 2019-08-28 | 2022-08-09 | Snap Inc. | Beautification techniques for 3D data in a messaging system |
US11488359B2 (en) | 2019-08-28 | 2022-11-01 | Snap Inc. | Providing 3D data for messages in a messaging system |
CN110519607B (zh) * | 2019-09-27 | 2022-05-20 | 腾讯科技(深圳)有限公司 | 视频解码方法及装置,视频编码方法及装置 |
US11410387B1 (en) * | 2020-01-17 | 2022-08-09 | Facebook Technologies, Llc. | Systems, methods, and media for generating visualization of physical environment in artificial reality |
US10950034B1 (en) | 2020-01-27 | 2021-03-16 | Facebook Technologies, Llc | Systems, methods, and media for generating visualization of physical environment in artificial reality |
CN113271464B (zh) * | 2021-05-11 | 2022-11-18 | 北京奇艺世纪科技有限公司 | 视频编码方法、解码方法及相关装置 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH09289638A (ja) * | 1996-04-23 | 1997-11-04 | Nec Corp | 3次元画像符号化復号方式 |
JP2000215311A (ja) * | 1999-01-21 | 2000-08-04 | Nippon Telegr & Teleph Corp <Ntt> | 仮想視点画像生成方法およびその装置 |
JP2010021844A (ja) * | 2008-07-11 | 2010-01-28 | Nippon Telegr & Teleph Corp <Ntt> | 多視点画像符号化方法,復号方法,符号化装置,復号装置,符号化プログラム,復号プログラムおよびコンピュータ読み取り可能な記録媒体 |
JP4698831B2 (ja) * | 1997-12-05 | 2011-06-08 | ダイナミック ディジタル デプス リサーチ プロプライエタリー リミテッド | 画像変換および符号化技術 |
JP4999853B2 (ja) * | 2006-09-20 | 2012-08-15 | 日本電信電話株式会社 | 画像符号化方法及び復号方法、それらの装置、及びそれらのプログラム並びにプログラムを記録した記憶媒体 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5587552B2 (ja) * | 2005-10-19 | 2014-09-10 | トムソン ライセンシング | スケーラブル映像符号化を用いた多視点映像符号化 |
US20080278582A1 (en) * | 2007-05-07 | 2008-11-13 | Sentinel Ave Llc | Video Fusion Display Systems |
KR100918862B1 (ko) * | 2007-10-19 | 2009-09-28 | 광주과학기술원 | 참조영상을 이용한 깊이영상 생성방법 및 그 장치, 생성된깊이영상을 부호화/복호화하는 방법 및 이를 위한인코더/디코더, 그리고 상기 방법에 따라 생성되는 영상을기록하는 기록매체 |
WO2011046607A2 (en) * | 2009-10-14 | 2011-04-21 | Thomson Licensing | Filtering and edge encoding |
KR20120068540A (ko) * | 2010-12-17 | 2012-06-27 | 한국전자통신연구원 | 병렬 처리를 이용한 다시점 영상 컨텐츠 생성 장치 및 방법 |
US9398313B2 (en) | 2010-12-29 | 2016-07-19 | Nokia Technologies Oy | Depth map coding |
-
2013
- 2013-09-24 CN CN201380044060.7A patent/CN104871534A/zh active Pending
- 2013-09-24 US US14/430,433 patent/US20150249839A1/en not_active Abandoned
- 2013-09-24 WO PCT/JP2013/075735 patent/WO2014050827A1/ja active Application Filing
- 2013-09-24 JP JP2014538497A patent/JP5883153B2/ja active Active
- 2013-09-24 KR KR1020157002048A patent/KR101648094B1/ko active IP Right Grant
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH09289638A (ja) * | 1996-04-23 | 1997-11-04 | Nec Corp | 3次元画像符号化復号方式 |
JP4698831B2 (ja) * | 1997-12-05 | 2011-06-08 | ダイナミック ディジタル デプス リサーチ プロプライエタリー リミテッド | 画像変換および符号化技術 |
JP2000215311A (ja) * | 1999-01-21 | 2000-08-04 | Nippon Telegr & Teleph Corp <Ntt> | 仮想視点画像生成方法およびその装置 |
JP4999853B2 (ja) * | 2006-09-20 | 2012-08-15 | 日本電信電話株式会社 | 画像符号化方法及び復号方法、それらの装置、及びそれらのプログラム並びにプログラムを記録した記憶媒体 |
JP2010021844A (ja) * | 2008-07-11 | 2010-01-28 | Nippon Telegr & Teleph Corp <Ntt> | 多視点画像符号化方法,復号方法,符号化装置,復号装置,符号化プログラム,復号プログラムおよびコンピュータ読み取り可能な記録媒体 |
Also Published As
Publication number | Publication date |
---|---|
JPWO2014050827A1 (ja) | 2016-08-22 |
US20150249839A1 (en) | 2015-09-03 |
KR101648094B1 (ko) | 2016-08-12 |
JP5883153B2 (ja) | 2016-03-09 |
CN104871534A (zh) | 2015-08-26 |
KR20150034205A (ko) | 2015-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5883153B2 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、画像復号プログラム及び記録媒体 | |
JP5934375B2 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、画像復号プログラム及び記録媒体 | |
JP5833757B2 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、画像復号プログラム及び記録媒体 | |
JP6027143B2 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、および画像復号プログラム | |
JP6053200B2 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム及び画像復号プログラム | |
JP5947977B2 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム及び画像復号プログラム | |
JP6307152B2 (ja) | 画像符号化装置及び方法、画像復号装置及び方法、及び、それらのプログラム | |
JP6232076B2 (ja) | 映像符号化方法、映像復号方法、映像符号化装置、映像復号装置、映像符号化プログラム及び映像復号プログラム | |
JP5926451B2 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、および画像復号プログラム | |
WO2015056712A1 (ja) | 動画像符号化方法、動画像復号方法、動画像符号化装置、動画像復号装置、動画像符号化プログラム、及び動画像復号プログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13841497 Country of ref document: EP Kind code of ref document: A1 |
|
DPE1 | Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101) | ||
ENP | Entry into the national phase |
Ref document number: 2014538497 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20157002048 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14430433 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 13841497 Country of ref document: EP Kind code of ref document: A1 |