WO2014050830A1 - 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、画像復号プログラム及び記録媒体 - Google Patents
画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、画像復号プログラム及び記録媒体 Download PDFInfo
- Publication number
- WO2014050830A1 WO2014050830A1 PCT/JP2013/075753 JP2013075753W WO2014050830A1 WO 2014050830 A1 WO2014050830 A1 WO 2014050830A1 JP 2013075753 W JP2013075753 W JP 2013075753W WO 2014050830 A1 WO2014050830 A1 WO 2014050830A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- depth
- depth map
- subject
- pixel
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/111—Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/161—Encoding, multiplexing or demultiplexing different image signal components
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/105—Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/144—Processing image signals for flicker reduction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/513—Processing of motion vectors
- H04N19/517—Processing of motion vectors by encoding
- H04N19/52—Processing of motion vectors by encoding by predictive encoding
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/577—Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
Definitions
- the present invention relates to an image encoding method, an image decoding method, an image encoding device, an image decoding device, an image encoding program, an image decoding program, and a recording medium that encode and decode a multi-view image.
- This application claims priority based on Japanese Patent Application No. 2012-2111155 for which it applied to Japan on September 25, 2012, and uses the content here.
- multi-viewpoint image composed of a plurality of images obtained by photographing the same subject and background with a plurality of cameras.
- These moving images taken by a plurality of cameras are called multi-view moving images (or multi-view images).
- an image (moving image) taken by one camera is referred to as a “two-dimensional image (moving image)”, and a plurality of cameras having the same subject and background in different positions and orientations (hereinafter referred to as viewpoints).
- viewpoints A group of two-dimensional images (two-dimensional moving images) photographed in the above is referred to as “multi-view images (multi-view images)”.
- the two-dimensional moving image has a strong correlation in the time direction, and the encoding efficiency can be increased by using the correlation.
- the encoding efficiency can be increased by using this correlation.
- H. an international encoding standard.
- high-efficiency encoding is performed using techniques such as motion compensation prediction, orthogonal transform, quantization, and entropy encoding.
- H.M. In H.264, encoding using temporal correlation with a plurality of past or future frames is possible.
- H. The details of the motion compensation prediction technique used in H.264 are described in Non-Patent Document 1, for example.
- H. An outline of the motion compensation prediction technique used in H.264 will be described.
- H. H.264 motion compensation prediction divides the encoding target frame into blocks of various sizes, and allows each block to have different motion vectors and different reference frames. By using a different motion vector for each block, it is possible to achieve highly accurate prediction that compensates for different motions for each subject. On the other hand, by using a different reference frame for each block, it is possible to realize highly accurate prediction in consideration of occlusion caused by temporal changes.
- the difference between the multi-view image encoding method and the multi-view image encoding method is that, in addition to the correlation between cameras, the multi-view image has a temporal correlation at the same time.
- the same method can be used as the method using the correlation between cameras in either case. Therefore, here, a method used in encoding a multi-view video is described.
- FIG. 21 is a conceptual diagram showing parallax generated between cameras.
- the image plane of a camera with parallel optical axes is looked down vertically. In this way, the position where the same part on the subject is projected on the image plane of a different camera is generally called a corresponding point.
- each pixel value of the encoding target frame is predicted from the reference frame based on the correspondence relationship, and the prediction residual and the disparity information indicating the correspondence relationship are encoded. Since the parallax changes for each target camera pair and position, it is necessary to encode the parallax information for each region where the parallax compensation prediction is performed. In fact, H. In the H.264 multi-view encoding method, a vector representing disparity information is encoded for each block using disparity compensation prediction.
- Correspondence given by the parallax information can be represented by a one-dimensional quantity indicating the three-dimensional position of the subject instead of a two-dimensional vector based on epipolar geometric constraints by using camera parameters.
- information indicating the three-dimensional position of the subject there are various expressions, but the distance from the reference camera to the subject or the coordinate value on the axis that is not parallel to the image plane of the camera is often used.
- the reciprocal of the distance is used instead of the distance.
- the reciprocal of the distance is information proportional to the parallax, there are cases where two reference cameras are set and expressed as a parallax amount between images taken by these cameras. Since there is no essential difference in the physical meaning no matter what representation is used, in the following, the information indicating these three-dimensional positions will be expressed as depth without being distinguished by the representation.
- FIG. 22 is a conceptual diagram of epipolar geometric constraints.
- the point on the image of another camera corresponding to the point on the image of one camera is constrained on a straight line called an epipolar line.
- the corresponding point is uniquely determined on the epipolar line.
- the corresponding point in the second camera image with respect to the subject projected at the position m in the first camera image is on the epipolar line when the subject position in the real space is M ′.
- the subject position in the real space is M ′′, it is projected at the position m ′′ on the epipolar line.
- Non-Patent Document 2 by using this property, the predicted image for the encoding target frame is synthesized from the reference frame according to the three-dimensional information of each subject given by the depth map (distance image) for the reference frame.
- a highly predictive image is generated, and efficient multi-view video encoding is realized.
- a predicted image generated based on this depth is called a viewpoint composite image, a viewpoint interpolation image, or a parallax compensation image.
- a depth map (reference depth map) for a reference frame is converted into a depth map (virtual depth map) for an encoding target frame, and the converted depth map (virtual depth map) is used.
- the converted depth map virtual depth map
- the corresponding pixel on the reference frame can be obtained from the pixel of the encoding target frame.
- the amount of processing and the amount of memory required can be increased compared to the case where the viewpoint composite image is always generated by generating the viewpoint composite image only for the designated region of the encoding target frame. Can be reduced.
- FIG. 11 is an explanatory diagram showing a situation in which an occlusion area OCC occurs. This is because there is no corresponding depth information on the depth map for the reference frame. As a result of the depth information not being obtained, a situation in which a viewpoint composite image cannot be generated occurs.
- the depth map (virtual depth map) for the encoding target frame obtained by conversion is corrected assuming the continuity in the real space, so that the depth of the occlusion area OCC is also corrected.
- a method of generating information is also provided.
- the occlusion area OCC is an area shielded by surrounding objects, in the correction assuming the continuity in the real space, the depth of the background object OBJ-B around the occlusion area or the foreground object OBJ
- the depth that smoothly connects -F and the background object OBJ-B is given as the depth of the occlusion area OCC.
- FIG. 13 shows that when the depth of the surrounding background object OBJ-B is given to the occlusion area OCC (that is, assuming the continuity of the background object, the depth is given to the occlusion area OCC). )) Depth map.
- the depth value of the background object OBJ-B is given as the depth value in the occlusion area OCC of the encoding target frame. Therefore, when the viewpoint composite image is generated using the generated virtual depth map, as shown in FIG. 19, the background object OBJ-B is shielded by the foreground object OBJ-F for occlusion in the reference frame.
- FIG. 19 is an explanatory diagram illustrating a viewpoint composite image generated in an encoding target frame including the occlusion area OCC when continuity of background objects is assumed in the occlusion area OCC.
- FIG. 14 shows a case where a depth for smoothly connecting the foreground object OBJ-F and the background object OBJ-B is given to the occlusion area OCC (that is, assuming the continuity of the subject).
- Depth map in the case where depth is given to OCC In this case, a depth value that continuously changes from a depth value indicating that it is close to the viewpoint to a depth value indicating that it is far from the viewpoint is given as the depth value in the occlusion area OCC of the encoding target frame.
- the pixels on the occlusion area OCC are the pixels of the foreground object OBJ-F and the background object OBJ-B on the reference frame. Are associated with the other pixels.
- FIG. 20 shows a view synthesized image generated in the encoding target frame including the occlusion area OCC in a situation where the depth for smoothly connecting the foreground object OBJ-F and the background object OBJ-B is given to the occlusion area OCC. It is explanatory drawing shown.
- the pixel value of the occlusion area OCC at this time can be obtained by interpolating the pixel of the foreground object OBJ-F and the pixel of the background object OBJ-B.
- the pixels in the occlusion area OCC have a mixed value of the foreground object OBJ-F and the background object OBJ-B. End up.
- Non-Patent Document 3 For such an occlusion area, as represented by Non-Patent Document 3, a viewpoint synthesized image is generated by performing an in-paint process using a viewpoint synthesized image obtained in a peripheral area of the occlusion area. Is possible. However, in order to perform the in-paint process, it is necessary to generate a viewpoint composite image for the peripheral region of the occlusion region, so a viewpoint composite image is generated only for the specified region of the encoding target frame. Thus, the effect of Patent Document 1 that the amount of processing and the amount of temporary memory can be reduced cannot be obtained.
- the present invention has been made in view of such circumstances, and when generating a viewpoint composite image of a frame to be encoded or decoded using a depth map with respect to a reference frame, the viewpoint composite image Image encoding method, image decoding method, image encoding device, image decoding device, image encoding program, image capable of realizing high encoding efficiency, reduction in memory capacity and calculation amount while suppressing deterioration in quality, image
- An object is to provide a decoding program and a recording medium.
- the present invention uses an encoded reference image for a viewpoint different from the viewpoint of the encoding target image, and a depth map of a subject in the reference image.
- An image encoding method that performs encoding while predicting an image between viewpoints using a certain reference depth map, wherein the reference depth map is a virtual depth map that is a depth map of a subject in the encoding target image.
- the depth value of the occlusion area is generated by assuming continuity of a subject that covers the occlusion area on the reference depth map. May be.
- the image encoding method of the present invention may further include an occlusion occurrence pixel boundary determining step for determining a pixel boundary on the reference depth map corresponding to the occlusion region.
- an occlusion occurrence pixel boundary determining step for determining a pixel boundary on the reference depth map corresponding to the occlusion region.
- a depth value indicating closeness to the viewpoint is obtained at a pixel position having a depth value indicating closeness to the viewpoint on the reference depth map.
- the depth value of the occlusion region is generated assuming that the subject continuously exists from the same depth value as a pixel having the same depth value as a pixel having a depth value indicating that the pixel is far from the viewpoint. Also good.
- a subject region determination step for determining a subject region on the virtual depth map with respect to a region that shields the occlusion region on the reference depth map, and the subject region in the direction of the occlusion region
- the occlusion area depth generation step, and the occlusion area depth generation step may include a pixel generated by the expansion in a direction opposite to the object area adjacent to the occlusion area.
- the depth value of the occlusion area may be generated by smoothly interpolating the depth value with existing pixels.
- a corresponding pixel on the virtual depth map is obtained for each reference pixel of the reference depth map, and the depth indicating the same three-dimensional position as the depth with respect to the reference pixel is obtained. May be converted into a virtual depth map by assigning to the corresponding pixel.
- the present invention when decoding a decoding target image of a multi-viewpoint image, using a decoded reference image and a reference depth map that is a depth map of a subject in the reference image, images between viewpoints
- An image decoding method that performs decoding while predicting, and is generated by a depth map conversion step of converting the reference depth map into a virtual depth map that is a depth map of a subject in the decoding target image, and a context of the subject.
- the occlusion region By assigning a depth value that provides a corresponding relationship to an area on the same subject as the subject occluded in the reference image to an occlusion region in which no depth value exists in the reference depth map, the occlusion region An occlusion area depth generation step for generating a depth value; and a depth of the occlusion area From said virtual depth map and the reference image after generation of the, by generating a disparity-compensated image for the decoding target image, and an inter-view image prediction step of performing image prediction between views.
- a depth value of the occlusion area is generated by assuming continuity of a subject that covers the occlusion area on the reference depth map. Also good.
- the image decoding method of the present invention may further include an occlusion occurrence pixel boundary determination step for determining a pixel boundary on the reference depth map corresponding to the occlusion region, and the occlusion region depth generation step includes the occlusion region depth generation step.
- Each set of pixels of the reference depth map adjacent to the generated pixel boundary has a depth value indicating close to the viewpoint at the position of a pixel having a depth value indicating close to the viewpoint on the reference depth map. Assuming that the subject is continuously present from the same depth value as the pixel to the same depth value as the pixel having a depth value indicating that it is far from the viewpoint, the depth of the assumed subject is set on the decoding target image.
- the depth value of the occlusion area is generated by converting to the depth of It may be.
- a subject region determination step for determining a subject region on the virtual depth map with respect to a region that shields the occlusion region on the reference depth map, and the subject region in the direction of the occlusion region.
- a subject area extending step for extending a pixel, and the occlusion area depth generating step includes a pixel generated by the extension and a pixel adjacent to the occlusion area and in a direction opposite to the subject area.
- the depth value of the occlusion area may be generated by smoothly interpolating the depth value with the pixel to be processed.
- a corresponding pixel on the virtual depth map is obtained for each reference pixel of the reference depth map, and a depth indicating the same three-dimensional position as the depth with respect to the reference pixel is obtained.
- Conversion to a virtual depth map may be performed by assigning to the corresponding pixel.
- the present invention uses an encoded reference image for a viewpoint different from the viewpoint of the encoding target image, and a depth map of a subject in the reference image.
- An image encoding apparatus that performs encoding while predicting an image between viewpoints using a certain reference depth map, wherein the reference depth map is a virtual depth that is a depth map of a subject in the encoding target image.
- a depth map conversion unit for converting to a map, and an occlusion area where a depth value does not exist in the reference depth map generated by the anteroposterior relationship of the subject, an area on the same subject as the subject shielded in the reference image
- An occlusion area depth for generating a depth value of the occlusion area by assigning a depth value for which a corresponding relationship is obtained.
- the occlusion area depth generation unit generates a depth value of the occlusion area by assuming continuity of a subject that covers the occlusion area on the reference depth map. May be.
- the present invention when decoding a decoding target image of a multi-viewpoint image, uses a decoded reference image and a reference depth map that is a depth map of a subject in the reference image, to convert an image between viewpoints.
- An image decoding apparatus that performs decoding while predicting, and is generated by a depth map conversion unit that converts the reference depth map into a virtual depth map that is a depth map of a subject in the decoding target image, and the context of the subject Depth of the occlusion area is assigned to an occlusion area where a depth value does not exist in the reference depth map by assigning a depth value that provides a corresponding relationship to an area on the same subject as the occluded object in the reference image.
- An occlusion area depth generation unit for generating a value and a depth value of the occlusion area From the virtual depth map and said reference image, by generating a disparity-compensated image for the decoding target image, and an inter-view image prediction unit that performs image prediction between views.
- the occlusion area depth generation unit generates a depth value of the occlusion area by assuming continuity of a subject that covers the occlusion area on the reference camera depth map. May be.
- the present invention is an image encoding program for causing a computer to execute the image encoding method.
- the present invention is an image decoding program for causing a computer to execute the image decoding method.
- the present invention is a computer-readable recording medium on which the image encoding program is recorded.
- the present invention is a computer-readable recording medium on which the image decoding program is recorded.
- the present invention when generating a view synthesized image of a frame to be encoded or decoded using a depth map with respect to a reference frame, high encoding is performed while suppressing deterioration in the quality of the view synthesized image. It is possible to achieve an effect that the efficiency, the memory capacity, and the calculation amount can be reduced.
- FIG. 6 is a flowchart illustrating another example of an operation for encoding an encoding target image in the image encoding device illustrated in FIG. 1.
- FIG. 4 is a flowchart showing a processing operation of a reference camera depth map conversion process shown in FIGS. 2 and 3.
- FIG. It is a flowchart which shows the operation
- FIG. 7 It is a flowchart which shows operation
- FIG. 10 is a cross-sectional view showing a process of another embodiment of the present invention for creating a virtual depth map of an encoding target area including an occlusion area on the assumption of continuity of a subject after extending a foreground object.
- FIG. 16 is a cross-sectional view illustrating processing of an embodiment of the present invention that generates a parallax compensation image of an encoding target region including an occlusion region, which is created using the virtual depth map illustrated in FIG. 15. It is sectional drawing which shows the process of the other Example of this invention which produces
- FIG. 15 is a cross-sectional view showing another example of conventional processing for generating a parallax compensation image of an encoding target area including an occlusion area, which is created using the virtual depth map shown in FIG. 14. It is sectional drawing which shows the parallax which arises between cameras (viewpoint). It is a conceptual diagram for demonstrating epipolar geometric constraints.
- a multi-viewpoint image captured by two cameras a first camera (referred to as camera A) and a second camera (referred to as camera B), is encoded.
- a description will be given assuming that an image of the camera B is encoded or decoded as a reference image.
- information necessary for obtaining the parallax from the depth information is given separately. Specifically, it is an external parameter representing the positional relationship between the camera A and the camera B and an internal parameter representing projection information on the image plane by the camera, but parallax can be obtained from the depth information even in other forms.
- information that can specify the position between the symbols [] is added to an image, video frame, or depth map to add the position. It is assumed that the image signal sampled by the pixels and the depth corresponding thereto are shown. Further, the depth is information having a smaller value as the distance from the camera increases (the parallax is smaller). When the relationship between the depth size and the distance from the camera is defined in reverse, it is necessary to appropriately read the description of the magnitude of the value for the depth.
- FIG. 1 is a block diagram showing a configuration of an image encoding device according to this embodiment.
- the image encoding device 100 includes an encoding target image input unit 101, an encoding target image memory 102, a reference camera image input unit 103, a reference camera image memory 104, a reference camera depth map input unit 105, A depth map conversion unit 106, a virtual depth map memory 107, a viewpoint synthesized image generation unit 108, and an image encoding unit 109 are provided.
- the encoding target image input unit 101 inputs an image to be encoded.
- the image to be encoded is referred to as an encoding target image.
- an image of camera B is input.
- a camera that captures an encoding target image (camera B in this case) is referred to as an encoding target camera.
- the encoding target image memory 102 stores the input encoding target image.
- the reference camera image input unit 103 inputs an image to be a reference image when generating a viewpoint composite image (parallax compensation image).
- an image of camera A is input.
- the reference camera image memory 104 stores the input reference image.
- the reference camera depth map input unit 105 inputs a depth map for the reference image.
- the depth map for the reference image is referred to as a reference camera depth map or a reference depth map.
- the depth map represents the three-dimensional position of the subject shown in each pixel of the corresponding image. Any information may be used as long as the three-dimensional position can be obtained by information such as separately provided camera parameters. For example, a distance from the camera to the subject, a coordinate value with respect to an axis that is not parallel to the image plane, and a parallax amount with respect to another camera (for example, camera B) can be used. Further, here, it is assumed that the depth map is passed in the form of an image. However, as long as similar information can be obtained, the image may not be in the form of an image.
- the camera corresponding to the reference camera depth map is referred to as a reference camera.
- the depth map conversion unit 106 generates a depth map for the encoding target image using the reference camera depth map (reference depth map).
- the depth map generated for the encoding target image is referred to as a virtual depth map.
- the virtual depth map memory 107 stores the generated virtual depth map.
- the viewpoint composite image generation unit 108 uses the virtual depth map obtained from the virtual depth map memory 107 to obtain a correspondence relationship between the pixel of the encoding target image and the pixel of the reference camera image, and generates the viewpoint composite image for the encoding target image. Is generated.
- the image encoding unit 109 performs predictive encoding on the encoding target image using the viewpoint synthesized image, and outputs a bit stream that is encoded data.
- FIG. 2 is a flowchart showing the operation of the image coding apparatus 100 shown in FIG.
- the encoding target image input unit 101 inputs an encoding target image and stores it in the encoding target image memory 102 (step S1).
- the reference camera image input unit 103 inputs a reference camera image and stores it in the reference camera image memory 104.
- the reference camera depth map input unit 105 inputs the reference camera depth map and outputs it to the depth map conversion unit 106 (step S2).
- the reference camera image and the reference camera depth map input in step S2 are the same as those obtained on the decoding side, such as the one already decoded. This is to suppress the occurrence of coding noise such as drift by using exactly the same information obtained by the decoding device. However, when the generation of such coding noise is allowed, the one that can be obtained only on the coding side, such as the one before coding, may be input.
- the reference camera depth map in addition to the one already decoded, the depth map estimated by applying stereo matching or the like to multi-view images decoded for a plurality of cameras, or decoding The depth map estimated using the parallax vector, the motion vector, etc., can also be used as the same one can be obtained on the decoding side.
- the depth map conversion unit 106 generates a virtual depth map from the reference camera depth map and stores it in the virtual depth map memory 107 (step S3). Details of the processing here will be described later.
- the viewpoint composite image generation unit 108 uses the reference camera image stored in the reference camera image memory 104 and the virtual depth map stored in the virtual depth map memory 107 to generate a viewpoint composite image for the encoding target image. Is output to the image encoding unit 109 (step S4).
- the processing here is any method that synthesizes the image of the encoding target camera using a depth map for the encoding target image and an image captured by a camera different from the encoding target camera. You may use the method.
- a viewpoint composite image for one frame is obtained.
- a predetermined pixel value may be assigned, the pixel value of the pixel in the nearest frame.
- You may assign the pixel value of the pixel in the nearest frame in epipolar linear form.
- a filter such as a low-pass filter may be applied after a viewpoint composite image for one frame is obtained.
- the image encoding unit 109 predictively encodes the encoding target image using the viewpoint composite image as a predicted image and outputs the encoded image (step S5).
- the bit stream obtained as a result of encoding is the output of the image encoding apparatus 100. Note that any method may be used for encoding as long as decoding is possible on the decoding side.
- MPEG-2 and H.264 In general video encoding or image encoding such as H.264 and JPEG, an image is divided into blocks of a predetermined size, and a difference signal between the encoding target image and the predicted image is generated for each block. Then, the differential image is subjected to frequency transformation such as DCT (discrete cosine transform), and the resulting value is encoded by sequentially applying quantization, binarization, and entropy encoding processing. I do.
- DCT discrete cosine transform
- FIG. 3 is a flowchart showing an operation of encoding the encoding target image by alternately repeating the viewpoint composite image generation processing and the encoding target image encoding processing for each block.
- the same parts as those in the processing operation shown in FIG. the index of a block that is a unit for performing the predictive encoding process is set as blk, and the number of blocks in the encoding target image is expressed as numBlks.
- the encoding target image input unit 101 inputs an encoding target image and stores it in the encoding target image memory 102 (step S1).
- the reference camera image input unit 103 inputs a reference camera image and stores it in the reference camera image memory 104.
- the reference camera depth map input unit 105 inputs the reference camera depth map and outputs it to the depth map conversion unit 106 (step S2).
- the depth map conversion unit 106 generates a virtual depth map based on the reference camera depth map output from the reference camera depth map input unit 105, and stores it in the virtual depth map memory 107 (step S3). Then, the viewpoint composite image generation unit 108 substitutes 0 for the variable blk (step S6).
- the viewpoint composite image generation unit 108 generates a viewpoint composite image for the block blk from the reference camera image stored in the reference camera image memory 104 and the virtual depth map stored in the virtual depth map memory 107.
- the image is output to the image encoding unit 109 (step S4a).
- the image encoding unit 109 predictively encodes the encoding target image for the block blk, using the viewpoint composite image as the prediction image (step S5a).
- FIG. 4 is a flowchart showing the processing operation of the reference camera depth map conversion process (step S3) shown in FIGS.
- a virtual depth map is generated from the reference camera depth map in three steps. In each step, depth values are generated for different regions of the virtual depth map.
- the depth map conversion unit 106 generates a virtual depth map for an area shown in both the encoding target image and the reference camera depth map (step S21).
- This area is depth information included in the reference camera depth map, and is information that should also exist in the virtual depth map. Therefore, a virtual depth map obtained by converting the reference camera depth map is obtained.
- Any processing may be used, but for example, the method described in Non-Patent Document 3 may be used.
- the three-dimensional position of each pixel is obtained from the reference camera depth map, the three-dimensional model of the subject space is restored, and the depth when the restored model is observed from the encoding target camera is obtained.
- a virtual depth map for this region can be generated.
- a corresponding point on the virtual depth map is obtained by using the depth value of the pixel, and generated by assigning the depth value converted to the corresponding point.
- the converted depth value is obtained by converting a depth value for the reference camera depth map into a depth value for the virtual depth map.
- the corresponding points are not necessarily obtained as integer pixel positions of the virtual depth map, assuming the continuity on the virtual depth map with the adjacent pixels on the reference camera depth map, the virtual depth map It is necessary to interpolate and generate the depth value for each pixel.
- continuity is assumed only for pixels adjacent to each other on the reference camera depth map when the change in depth value is within a predetermined range. This is because different subjects are considered to appear in pixels with greatly different depth values, and continuity of the subject in real space cannot be assumed.
- one or a plurality of integer pixel positions may be obtained from the obtained corresponding points, and a converted depth value may be assigned to the pixels. In this case, it is not necessary to interpolate depth values, and the amount of calculation can be reduced.
- the order of processing the pixels of the reference camera depth map is determined according to the positional relationship between the encoding target camera and the reference camera, and the determination is made. By performing the processing according to the order, the virtual depth map can be generated by always overwriting the obtained corresponding points without considering the context.
- the pixels of the reference camera depth map are processed in the order of scanning from left to right in each row, and the encoding target camera is compared with the reference camera.
- the pixels in the reference camera depth map are processed in the order of scanning from right to left in each row, thereby eliminating the need to consider the context. Note that the calculation amount can be reduced by eliminating the need to consider the context.
- FIG. 11 is an explanatory diagram showing a situation in which an occlusion area OCC occurs. As shown in FIG. 11, in this area, there are an area that is not captured due to the context of the subject (occlusion area OCC) and an area that is not captured to correspond to the outside of the frame of the reference camera depth map (outside frame area OUT). There are two types of areas. Therefore, the depth map conversion unit 106 generates a depth for the occlusion area OCC (step S22).
- the first method for generating the depth for the occlusion area OCC is a method for assigning the same depth value as the foreground object OBJ-F around the occlusion area OCC.
- a depth value assigned to each pixel included in the occlusion area OCC may be obtained, or one depth value may be obtained for a plurality of pixels such as for each line of the occlusion area OCC or for the entire occlusion area OCC.
- region OCC you may obtain
- the depth value to be assigned is determined from the depth values of the pixels of the determined foreground object OBJ-F.
- one depth value is determined based on one of the average value, median value, maximum value, and most frequently appearing depth value for these pixels.
- the determined depth value is assigned to all the pixels included in the set of pixels to which the same depth is assigned.
- the occlusion area OCC is represented on the reference camera depth map from the positional relationship between the encoding target camera and the reference camera.
- the depth value is modified so that the depth value changes smoothly so that the same depth value is obtained over a plurality of lines in the occlusion area OCC far from the foreground object OBJ-F. It doesn't matter.
- the depth value is changed so as to monotonously increase or decrease from a near pixel to a far pixel from the foreground object OBJ-F.
- the second method of generating the depth for the occlusion area OCC is a method of assigning a depth value that provides a correspondence relationship to the pixels on the reference depth map for the background object OBJ-B around the occlusion area OCC.
- a depth value that provides a correspondence relationship to the pixels on the reference depth map for the background object OBJ-B around the occlusion area OCC.
- the depth value that is larger than the background object depth value and that corresponds to the area corresponding to the background object OBJ-B on the reference camera depth map is obtained for each pixel in the occlusion area OCC.
- the minimum depth value is obtained and assigned as the depth value of the virtual depth map.
- FIG. 12 is an explanatory diagram showing an operation of generating a depth for the occlusion area OCC.
- a boundary B between the pixel for the foreground object OBJ-F and the pixel for the background object OBJ-B on the reference camera depth map, where the occlusion area OCC occurs in the virtual depth map is obtained (S12-1).
- the pixels of the foreground object OBJ-F adjacent to the obtained boundary are expanded by one pixel E in the direction of the adjacent background object OBJ-B (S12-2).
- the pixel obtained by the expansion has two depth values, that is, the depth value for the pixel of the original background object OBJ-B and the depth value for the pixel of the adjacent foreground object OBJ-F.
- a virtual depth map is generated (S12-4).
- the depth value of the pixel in the occlusion area OCC is determined by converting the assumed depth of the subject into a depth on the encoding target image.
- the final processing corresponds to obtaining the corresponding point on the virtual depth map for the pixel obtained by the expansion a plurality of times while changing the depth value.
- the obtained pixels are obtained by using the corresponding points obtained by using the depth values for the pixels of the original background object OBJ-B and the depth values for the pixels of the adjacent foreground object OBJ-F.
- the depth values for the pixels in the occlusion area OCC may be obtained by obtaining the corresponding points and performing linear interpolation between the corresponding points.
- the occlusion area OCC is an area shielded by the foreground object OBJ-F. Therefore, in consideration of such a structure in the real space, as shown in FIG. Assuming the continuity of the background object OBJ-B, a depth value is assigned to the surrounding background object OBJ-B.
- FIG. 13 is an explanatory diagram showing an operation of assigning a depth value to the background object OBJ-B around the occlusion region OCC, assuming the continuity of the background object OBJ-B. Further, as shown in FIG.
- FIG. 14 is an explanatory diagram showing an operation of assigning a depth value obtained by interpolating between the foreground object OBJ-F and the background object OBJ-B in the peripheral area.
- the first method for generating the depth for the occlusion area OCC described above is a process that assumes the continuity of the foreground object OBJ-F, ignoring the structure in the real space, as shown in FIG.
- FIG. 15 is an explanatory diagram showing the processing operation assuming the continuity of the foreground object OBJ-F.
- the virtual depth map of the encoding target frame is created by giving the depth value of the foreground object OBJ-F as the depth value to the occlusion region OCC.
- the second method is a process of changing the shape of the object as shown in FIG.
- FIG. 16 is an explanatory diagram showing a processing operation for changing the shape of an object.
- the virtual depth map of the encoding target frame is shown in S12-4 after the foreground object OBJ-F is expanded as a depth value in the occlusion region OCC as shown in S12-2 of FIG. It is created by giving the depth value of the subject assuming such continuity. That is, the occlusion area OCC in FIG. 16 is given a depth value that continuously changes in the right direction in FIG. 16 from a depth value indicating that it is close to the viewpoint to a depth value that indicates that it is far from the viewpoint.
- this method cannot generate a consistent depth value for the occlusion area OCC on the reference camera depth map.
- FIGS. 15 and 16 generated in this way are used to obtain corresponding points for each pixel of the encoding target image and the viewpoint synthesized image is synthesized, FIG. 17 and FIG. As shown in FIG. 18, the pixel value of the background object OBJ-B is assigned to the pixel of the occlusion area OCC.
- the pixel value of the foreground object OBJ-F is assigned to the pixel of the occlusion area OCC, as shown in FIGS.
- a pixel value interpolated from both is assigned.
- 19 and 20 are explanatory diagrams showing that the pixel values of the foreground object OBJ-F and the interpolated pixel values are assigned. Since the occlusion area OCC is an area occluded by the foreground object OBJ-F, it is assumed that the background object OBJ-B exists. Therefore, the above-described method produces a higher-quality viewpoint composite image than the conventional method. Can be generated.
- the depth value of the virtual depth map for the pixel of the encoding target image and the reference camera depth map for the corresponding point on the reference camera image are compared to determine whether or not occlusion by the foreground object OBJ-F has occurred (whether or not the difference between the depth values is small). Only when the difference is small), by generating the pixel value from the reference camera image, it is possible to prevent an erroneous viewpoint composite image from being generated.
- the depth map conversion unit 106 when the generation of the depth for the occlusion area OCC is completed, the depth map conversion unit 106 generates the depth for the out-of-frame area OUT (step S23). It should be noted that one depth value may be assigned to the continuous out-of-frame region OUT, or one depth value may be assigned to each line. Specifically, there is a method of assigning a minimum value of the depth value of a pixel adjacent to the out-of-frame region OUT for determining the depth value, or an arbitrary depth value smaller than the minimum value.
- step S4 or step S4a a corresponding point is not obtained for a pixel to which no effective depth value is given, and no pixel value is assigned or a default pixel value is set. It is necessary to use a method of generating a viewpoint composite image that is assigned.
- the camera arrangement is one-dimensional parallel means that the theoretical projection plane of the camera is on the same plane and the optical axes are parallel to each other.
- the cameras are installed next to each other in the horizontal direction, and the reference camera exists on the left side of the encoding target camera.
- the epipolar straight line for the pixels on the horizontal line on the image plane is a horizontal line that exists at the same height. For this reason, parallax always exists only in the horizontal direction.
- the projection plane exists on the same plane when the depth is expressed as a coordinate value with respect to the coordinate axis in the optical axis direction, the definition axis of the depth is coincident between the cameras.
- FIG. 5 is a flowchart showing an operation in which the depth map conversion unit 106 generates a virtual depth map from the reference camera depth map.
- the reference camera depth map is expressed as RDdepth
- the virtual depth map is expressed as VDepth. Since the camera arrangement is one-dimensionally parallel, the reference camera depth map is converted for each line to generate a virtual depth map. That is, assuming that the index indicating the line of the reference camera depth map is h and the number of lines of the reference camera depth map is Height, the depth map conversion unit 106 initializes h to 0 (step S31), and then sets h to 1 each. While adding (step S45), the following processing (steps S32 to S44) is repeated until h becomes Height (step S46).
- the depth map conversion unit 106 warps the depth of the reference camera depth map (steps S32 to S42). Thereafter, a depth for the out-of-frame region OUT is generated (steps S43 to S44), thereby generating a virtual depth map for one line.
- the process of warping the depth of the reference camera depth map is performed for each pixel of the reference camera depth map. That is, assuming that the index indicating the pixel position in the horizontal direction is w and the total number of pixels in one line is Width, the depth map conversion unit 106 is a pixel on the virtual depth map in which w is 0 and the depth of the previous pixel is warped. After initializing the position lastW with ⁇ 1 (step S32), while adding w by 1 (step S41), the following processing (step S33 to step S40) is repeated until w becomes Width (step S42). .
- the depth map conversion unit 106 obtains the parallax dv of the pixel (h, w) with respect to the virtual depth map from the value of the reference camera depth map (step S33).
- the processing here depends on the definition of depth.
- the parallax dv is a vector amount having a parallax direction, and indicates that the pixel (h, w) of the reference camera depth map corresponds to the pixel (h, w + dv) on the virtual depth map.
- the depth map conversion unit 106 checks whether or not the corresponding pixel on the virtual depth map exists in the frame (step S34). Here, it is checked whether or not w + dv is negative due to restrictions due to the positional relationship of the cameras. When w + dv is negative, there is no corresponding pixel, so the depth for the pixel (h, w) in the reference camera depth map is not warped, and the process for the pixel (h, w) is terminated.
- the depth map conversion unit 106 warps the depth for the pixel (h, w) of the reference camera depth map to the corresponding pixel (h, w + dv) of the virtual depth map (step S35).
- the depth map conversion unit 106 checks the positional relationship between the position where the depth of the previous pixel is warped and the position where the current warping is performed (step S36). Specifically, it is determined whether the right and left order of the immediately preceding pixel and the current pixel on the reference camera depth map are the same on the virtual depth map.
- Step S40 If the positional relationship is reversed, it is determined that a subject closer to the camera is captured in the pixel processed this time than the pixel processed immediately before, and the last W is updated to w + dv without performing special processing ( Step S40), the process for the pixel (h, w) is terminated.
- the depth map conversion unit 106 when the positional relationship is not reversed, the depth map conversion unit 106 generates a depth for the pixel of the virtual depth map existing between the position lastW where the depth of the previous pixel is warped and the position w + dv where the current warping is performed. To do. Then, in the process of generating the depth for the pixels of the virtual depth map existing between the position where the depth of the previous pixel is warped and the position where the current warping is performed, first, the depth map conversion unit 106 first compares the previous pixel and the current pixel. It is checked whether or not the same subject appears in the warped pixel (step S37).
- any method may be used for the determination, but here the determination is made on the assumption that the change in the depth of the same subject is small due to the continuity of the subject in the real space. Specifically, it is determined whether or not the difference in parallax obtained from the difference between the position where the depth of the previous pixel is warped and the position where the current warping is performed is smaller than a predetermined threshold value.
- the depth map conversion unit 106 determines that the same subject is captured in the two pixels, and performs the current warping with the position lastW that warped the depth of the previous pixel.
- the depth for the pixels of the virtual depth map existing between the positions w + dv is interpolated assuming the continuity of the subject (step S38). Any method may be used for the depth interpolation.
- the lastW depth and the w + dvp depth may be linearly interpolated, and either the lastW depth or the w + dvw depth may be used. It may be performed by assigning the same depth.
- the depth map conversion unit 106 determines that different subjects are captured in the two pixels. From the positional relationship, it can be determined that the subject closer to the camera is captured in the pixel processed immediately before the pixel processed this time. That is, the area between the two pixels is the occlusion area OCC, and then a depth is generated for the occlusion area OCC (step S39). As described above, there are a plurality of depth generation methods for the occlusion area OCC. When assigning the depth value of the foreground object OBJ-F around the occlusion area OCC in the first method described above, the depth VDepth [h, lastW] of the pixel processed immediately before is assigned.
- VDepth [h, lastW] is copied to VDepth [h, lastW + 1] and (h , LastW + 1) to (h, w + dv) is generated by linearly interpolating the depths of VDepth [h, lastW + 1] and VDepth [h, w + dv].
- Step S40 the process for the pixel (h, w) is terminated.
- the depth map conversion unit 106 confirms the warping result of the reference camera depth map and determines whether or not the out-of-frame region OUT exists (step S43). ). If there is no out-of-frame area OUT, the process ends without doing anything. On the other hand, when the out-of-frame region OUT exists, the depth map conversion unit 106 generates a depth for the out-of-frame region OUT (step S44). Any method may be used. For example, the last warped depth VDepth [h, lastW] may be assigned to all the pixels in the out-of-frame region OUT.
- the processing operation illustrated in FIG. 5 is processing when the reference camera is installed on the left side of the encoding target camera. However, when the positional relationship between the reference camera and the encoding target camera is reversed, the order of the pixels to be processed
- the pixel position determination conditions may be reversed. Specifically, in step S32, w is initialized with Width-1 and lastW is initialized with Width. In step S41, w is subtracted by 1 until w becomes less than 0 (step S42). S33 to S40) are repeated.
- the processing operation shown in FIG. 5 is processing when the camera arrangement is one-dimensionally parallel, but even when the camera arrangement is one-dimensional convergence, the same processing operation can be applied depending on the definition of depth. .
- the same processing operation can be applied. If the depth definition axis is different, the value of the reference camera depth map is not directly assigned to the virtual depth map, but the 3D position represented by the depth of the reference camera depth map is converted according to the depth definition axis. Later, by simply assigning to the virtual depth map, basically the same processing operation can be applied.
- FIG. 6 is a block diagram showing the configuration of the image decoding apparatus according to this embodiment.
- the image decoding apparatus 200 includes a code data input unit 201, a code data memory 202, a reference camera image input unit 203, a reference camera image memory 204, a reference camera depth map input unit 205, and a depth map conversion unit 206.
- the code data input unit 201 inputs code data of an image to be decoded.
- the image to be decoded is referred to as a decoding target image.
- the image of the camera B is indicated.
- a camera that captures a decoding target image (camera B in this case) is referred to as a decoding target camera.
- the code data memory 202 stores code data that is an input decoding target image.
- the reference camera image input unit 203 inputs an image to be a reference image when generating a viewpoint composite image (parallax compensation image).
- the image of camera A is input.
- the reference camera image memory 204 stores the input reference image.
- the reference camera depth map input unit 205 inputs a depth map for the reference image.
- the depth map for the reference image is referred to as a reference camera depth map.
- the depth map represents the three-dimensional position of the subject shown in each pixel of the corresponding image. Any information may be used as long as the three-dimensional position can be obtained by information such as separately provided camera parameters. For example, a distance from the camera to the subject, a coordinate value with respect to an axis that is not parallel to the image plane, and a parallax amount with respect to another camera (for example, camera B) can be used. Further, here, it is assumed that the depth map is passed in the form of an image. However, as long as similar information can be obtained, the image may not be in the form of an image.
- a camera corresponding to the reference camera depth map is referred to as a reference camera.
- the depth map conversion unit 206 uses the reference camera depth map to generate a depth map for the decoding target image.
- the depth map generated for the decoding target image is referred to as a virtual depth map.
- the virtual depth map memory 207 stores the generated virtual depth map.
- the viewpoint composite image generation unit 208 generates a viewpoint composite image for the decoding target image using the correspondence relationship between the pixel of the decoding target image obtained from the virtual depth map and the pixel of the reference camera image.
- the image decoding unit 209 decodes the decoding target image from the code data using the viewpoint synthesized image and outputs the decoded image.
- FIG. 7 is a flowchart showing the operation of the image decoding apparatus 200 shown in FIG.
- the code data input unit 201 inputs code data of a decoding target image and stores the code data in the code data memory 202 (step S51).
- the reference camera image input unit 203 inputs a reference image and stores it in the reference camera image memory 204.
- the reference camera depth map input unit 205 inputs the reference camera depth map and outputs it to the depth map conversion unit 206 (step S52).
- reference camera image and the reference camera depth map input in step S52 are the same as those used on the encoding side. This is to suppress the occurrence of encoding noise such as drift by using exactly the same information as that used in the encoding apparatus. However, if such encoding noise is allowed to occur, a different one from that used at the time of encoding may be input.
- reference camera depth maps in addition to those separately decoded, depth maps estimated by applying stereo matching to multi-viewpoint images decoded for a plurality of cameras, decoded parallax vectors, and motion A depth map estimated using a vector or the like may be used.
- the depth map conversion unit 206 converts the reference camera depth map to generate a virtual depth map, and stores it in the virtual depth map memory 207 (step S53).
- the processing here is the same as step S3 shown in FIG. 2 except that the encoding target image and the decoding target image are different in encoding and decoding.
- the viewpoint composite image generation unit 208 performs decoding from the reference camera image stored in the reference camera image memory 204 and the virtual depth map stored in the virtual depth map memory 207.
- a viewpoint composite image for the image is generated and output to the image decoding unit 209 (step S54).
- the processing here is the same as step S4 shown in FIG. 2 except that the encoding target image and the decoding target image are different in encoding and decoding.
- the image decoding unit 209 decodes the decoding target image from the code data and outputs the decoded image while using the viewpoint synthesized image as the predicted image (step S55).
- the decoded image obtained as a result of this decoding is the output of the image decoding device 200.
- any method may be used for decoding as long as the code data (bit stream) can be correctly decoded. In general, a method corresponding to the method used at the time of encoding is used.
- MPEG-2 and H.264 H.264, JPEG or other general video encoding or image encoding the image is divided into blocks of a predetermined size, entropy decoding, inverse binary for each block
- inverse frequency transform such as IDCT is performed to obtain a prediction residual signal, and then a predicted image is added, and decoding is performed by clipping in the pixel value range.
- FIG. 8 is a flowchart illustrating an operation of decoding the decoding target image by alternately repeating the viewpoint composite image generation processing and the decoding target image decoding processing for each block.
- the same parts as those of the processing operation shown in FIG. the index of a block that is a unit for performing the decoding process is blk, and the number of blocks in the decoding target image is represented by numBlks.
- the code data input unit 201 inputs code data of a decoding target image and stores it in the code data memory 202 (step S51).
- the reference camera image input unit 203 inputs a reference image and stores it in the reference camera image memory 204.
- the reference camera depth map input unit 205 inputs the reference camera depth map and outputs it to the depth map conversion unit 206 (step S52).
- the depth map conversion unit 206 generates a virtual depth map from the reference camera depth map and stores it in the virtual depth map memory 207 (step S53). Then, the viewpoint composite image generation unit 208 substitutes 0 for the variable blk (step S56).
- the viewpoint composite image generation unit 208 generates a viewpoint composite image for the block blk from the reference camera image and the virtual depth map, and outputs the viewpoint composite image to the image decoding unit 209 (step S54a).
- the image decoding unit 209 decodes and outputs the decoding target image for the block blk from the code data while using the viewpoint synthesized image as the predicted image (step S55a).
- the process of encoding and decoding all the pixels in one frame has been described. However, the process is applied only to some pixels, and H.
- the encoding or decoding may be performed using intra-frame prediction encoding or motion compensation prediction encoding used in H.264 / AVC or the like. In that case, it is necessary to encode and decode information indicating which method is used for prediction for each pixel. Also, encoding or decoding may be performed using a different prediction method for each block instead of for each pixel.
- processing for generating a viewpoint composite image only for that pixel is performed. By doing so, it is possible to reduce the amount of calculation required for the viewpoint synthesis processing.
- the process of encoding and decoding one frame has been described. However, it can be applied to moving picture encoding by repeating a plurality of frames. It can also be applied only to some frames or some blocks of a moving image. Further, in the above description, the configurations and processing operations of the image encoding device and the image decoding device have been described. However, the image encoding method of the present invention is performed by processing operations corresponding to the operations of the respective units of the image encoding device and the image decoding device. And an image decoding method can be realized.
- FIG. 9 is a block diagram showing a hardware configuration when the above-described image encoding device is configured by a computer and a software program.
- the system shown in FIG. 9 includes a CPU 50, a memory 51 such as a RAM, an encoding target image input unit 52, a reference camera image input unit 53, a reference camera depth map input unit 54, a program storage device 55, and a multiplexing unit.
- the encoded code data output unit 56 is connected by a bus.
- the CPU 50 executes a program.
- a memory 51 such as a RAM stores programs and data accessed by the CPU 50.
- An encoding target image input unit 52 (which may be a storage unit that stores an image signal from a disk device or the like) inputs an encoding target image signal from a camera or the like.
- a reference camera image input unit 53 (which may be a storage unit that stores an image signal from a disk device or the like) inputs an image signal to be referenced from a camera or the like.
- a reference camera depth map input unit 54 (which may be a storage unit that stores a depth map by a disk device or the like) inputs a depth map for a camera having a position and orientation different from that of the camera that has captured the encoding target image from the depth camera or the like. To do.
- the program storage device 55 stores an image encoding program 551 that is a software program that causes the CPU 50 to execute the image encoding processing described as the first embodiment.
- the multiplexed code data output unit 56 (which may be a storage unit that stores multiplexed code data by a disk device or the like) receives code data generated by the CPU 50 executing the image encoding program 551 loaded in the memory 51. For example, output via a network.
- FIG. 10 is a block diagram showing a hardware configuration when the above-described image decoding apparatus is configured by a computer and a software program.
- the system shown in FIG. 10 includes a CPU 60, a memory 51 such as a RAM, a code data input unit 62, a reference camera image input unit 63, a reference camera depth map input unit 64, a program storage device 65, and a decoding target image.
- the output unit 66 is connected by a bus.
- the CPU 60 executes a program.
- a memory 51 such as a RAM stores programs and data accessed by the CPU 60.
- a code data input unit 62 (which may be a storage unit that stores an image signal from a disk device or the like) inputs code data encoded by the image encoding device according to this method.
- a reference camera image input unit 63 (which may be a storage unit that stores an image signal from a disk device or the like) inputs an image signal to be referenced from a camera or the like.
- a reference camera depth map input unit 64 (which may be a storage unit that stores depth information by a disk device or the like) inputs a depth map for a camera having a position and orientation different from that of a camera that has captured a decoding target from a depth camera or the like.
- the program storage device 65 stores an image decoding program 651 that is a software program that causes the CPU 60 to execute the image decoding processing described as the second embodiment.
- the decoding target image output unit 66 (which may be a storage unit that stores an image signal by a disk device or the like) is obtained by decoding the code data by the CPU 60 executing the image decoding program 651 loaded in the memory 61.
- the decoding target image is output to a playback device or the like.
- a program for realizing the functions of the processing units in the image encoding device shown in FIG. 1 and the image decoding device shown in FIG. 6 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is recorded.
- the image encoding process and the image decoding process may be performed by causing the computer system to read and execute.
- the “computer system” includes an OS and hardware such as peripheral devices.
- the “computer system” includes a WWW system having a homepage providing environment (or display environment).
- the “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM or a CD-ROM, and a hard disk incorporated in a computer system.
- the “computer-readable recording medium” refers to a volatile memory (RAM) in a computer system that becomes a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In addition, those holding programs for a certain period of time are also included.
- RAM volatile memory
- the program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium.
- the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line.
- the program may be for realizing a part of the functions described above. Furthermore, what can implement
- DESCRIPTION OF SYMBOLS 100 ... Image coding apparatus 101 ... Encoding object image input part 102 ... Encoding object image memory 103 ... Reference camera image input part 104 ... Reference camera image memory 105 ... Reference camera Depth map input unit 106 ... Depth map conversion unit 107 ... Virtual depth map memory 108 ... Viewpoint composite image generation unit 109 ... Image encoding unit 200 ... Image decoding device 201 ... Code data Input unit 202 ... Code data memory 203 ... Reference camera image input unit 204 ... Reference camera image memory 205 ... Reference camera depth map input unit 206 ... Depth map conversion unit 207 ... Virtual depth Map memory 208... View synthesized image generation unit 209... Image decoding unit
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
Abstract
Description
本願は、2012年9月25日に、日本に出願された特願2012-211155号に基づき優先権を主張し、その内容をここに援用する。
なお、デプス情報から視差を得るために必要となる情報は別途与えられているものとする。具体的には、カメラAとカメラBの位置関係を表す外部パラメータや、カメラによる画像平面への投影情報を表す内部パラメータであるが、これら以外の形態であってもデプス情報から視差が得られるものであれば、別の情報が与えられていてもよい。これらのカメラパラメータに関する詳しい説明は、例えば、文献「Oliver Faugeras, "Three-Dimension Computer Vision", MIT Press; BCTC/UFF-006.37 F259 1993, ISBN:0-262-06158-9.」に記載されている。この文献には、複数のカメラの位置関係を示すパラメータや、カメラによる画像平面への投影情報を表すパラメータに関する説明が記載されている。
以下では、この参照画像に対するデプスマップを参照カメラデプスマップまたは参照デプスマップと称する。なお、デプスマップとは対応する画像の各画素に写っている被写体の3次元位置を表すものである。別途与えられるカメラパラメータ等の情報によって3次元位置が得られるものであれば、どのような情報でもよい。例えば、カメラから被写体までの距離や、画像平面とは平行ではない軸に対する座標値、別のカメラ(例えばカメラB)に対する視差量を用いることができる。また、ここではデプスマップとして画像の形態で渡されるものとしているが、同様の情報が得られるのであれば、画像の形態でなくても構わない。以下では、参照カメラデプスマップに対応するカメラを参照カメラと称する。
図4は、図2、図3に示す参照カメラデプスマップの変換処理(ステップS3)の処理動作を示すフローチャートである。この処理では、3つのステップによって、参照カメラデプスマップから仮想デプスマップを生成する。各ステップでは、仮想デプスマップの異なる領域に対してデプス値を生成する。
ただし、符号化対象カメラと参照カメラの光軸が同一平面上に存在する場合、符号化対象カメラと参照カメラとの位置関係に従って、参照カメラデプスマップの画素を処理する順序を決定し、その決定された順序に従って処理を行うことで、前後関係を考慮せずに、得られた対応点に対して常に上書き処理を行うことで、仮想デプスマップを生成することができる。具体的には、符号化対象カメラが参照カメラよりも右に存在している場合、参照カメラデプスマップの画素を各行で左から右にスキャンする順で処理し、符号化対象カメラが参照カメラよりも左に存在している場合、参照カメラデプスマップの画素を各行で右から左にスキャンする順で処理することで、前後関係を考慮する必要がなくなる。なお、前後関係を考慮する必要がなくなることによって、演算量を削減することができる。
まず、参照カメラデプスマップ上の前景オブジェクトOBJ-Fに対する画素と背景オブジェクトOBJ-Bに対する画素との境界であり、仮想デプスマップにおいてオクルージョン領域OCCが発生する境界Bを求める(S12-1)。次に、得られた境界に隣接する前景オブジェクトOBJ-Fの画素を1画素Eだけ、隣接する背景オブジェクトOBJ-Bの向きに伸張する(S12-2)。この時、伸張して得られた画素は、元の背景オブジェクトOBJ-Bの画素に対するデプス値と、隣接する前景オブジェクトOBJ-Fの画素に対するデプス値の2つのデプス値を持つ。
次に、その画素Eにおいて前景オブジェクトOBJ-Fと背景オブジェクトOBJ-Bとが連続していると仮定Aして(S12-3)、仮想デプスマップを生成する(S12-4)。すなわち、参照カメラデプスマップ上の画素Eの位置において、参照カメラから近いことを示すデプス値を持つ画素と同じデプス値から、参照カメラから遠いことを示すデプス値を持つ画素と同じデプス値まで、被写体が連続的に存在すると仮定して、仮定された被写体のデプスを符号化対象画像上のデプスへと変換することで、オクルージョン領域OCCの画素に対するデプス値を決定する。
図13は、背景オブジェクトOBJ-Bの連続性を仮定して、オクル―ジョン領域OCCの周辺の背景オブジェクトOBJ-Bに対するデプス値を割り当てる動作を示す説明図である。また、図14に示すように、参照カメラにおける被写体の連続性を考慮して、周辺領域の前景オブジェクトOBJ-Fと背景オブジェクトOBJ-Bとの間を補間したデプス値を割り当てることもある。
図14は、周辺領域の前景オブジェクトOBJ-Fと背景オブジェクトOBJ-Bとの間を補間したデプス値を割り当てる動作を示す説明図である。
図15において、符号化対象フレームの仮想デプスマップは、オクル―ジョン領域OCCに、デプス値として前景オブジェクトOBJ-Fのデプス値が与えられることによって作成される。
また、第2の方法も、図16に示すように、オブジェクトの形状を変化させる処理となる。図16は、オブジェクトの形状を変化させる処理動作を示す説明図である。
図16において、符号化対象フレームの仮想デプスマップは、オクル―ジョン領域OCCに、デプス値として前景オブジェクトOBJ-Fを図12のS12-2に示すように伸張させた後に、S12-4に示すような連続性が仮定された被写体のデプス値が与えられることによって作成される。すなわち、図16のオクル―ジョン領域OCCには、デプス値として視点から近いことを示すデプス値から遠いことを示すデプス値まで図16の右方向に連続的に変化するデプス値が与えられる。
これらの仮定では、参照カメラに対して与えられた参照カメラデプスマップと矛盾することになる。実際に、このような仮定をした場合、図15および図16において破線の楕円で囲んだ画素において、デプス値の矛盾I1およびI2がそれぞれ発生していることが確認できる。図15の場合では、参照カメラデプスマップにおいては、背景オブジェクトOBJ-Bのデプス値が存在するはずの位置に、仮定した被写体空間では前景オブジェクトOBJ-Fのデプス値が存在してしまう。図16の場合では、参照カメラデプスマップにおいては、背景オブジェクトOBJ-Bのデプス値が存在するはずの位置に、仮定した被写体空間では前景オブジェクトOBJ-Fと背景オブジェクトOBJ-Bとをつなぐオブジェクトのデプス値が存在してしまう。
なお、視差dvは視差の方向を持ったベクトル量とし、参照カメラデプスマップの画素(h,w)が仮想デプスマップ上の画素(h,w+dv)と対応することを示すものとする。
具体的には、直前の画素のデプスをワーピングした位置と今回ワーピングを行った位置の差から得られる視差の差が予め定められた閾値より小さいか否かを判定する。
以下では、この参照画像に対するデプスマップを参照カメラデプスマップと呼ぶ。なお、デプスマップとは対応する画像の各画素に写っている被写体の3次元位置を表すものである。別途与えられるカメラパラメータ等の情報によって3次元位置が得られるものであれば、どのような情報でもよい。例えば、カメラから被写体までの距離や、画像平面とは平行ではない軸に対する座標値、別のカメラ(例えばカメラB)に対する視差量を用いることができる。また、ここではデプスマップとして画像の形態で渡されるものとしているが、同様の情報が得られるのであれば、画像の形態でなくても構わない。以下では、参照カメラデプスマップに対応するカメラを参照カメラと呼ぶ。
CPU50は、プログラムを実行する。RAM等のメモリ51は、CPU50がアクセスするプログラムやデータを格納する。符号化対象画像入力部52(ディスク装置等による画像信号を記憶する記憶部でもよい)は、カメラ等からの符号化対象の画像信号を入力する。参照カメラ画像入力部53(ディスク装置等による画像信号を記憶する記憶部でもよい)は、カメラ等からの参照対象の画像信号を入力する。参照カメラデプスマップ入力部54(ディスク装置等によるデプスマップを記憶する記憶部でもよい)は、デプスカメラ等からの符号化対象画像を撮影したカメラとは異なる位置や向きのカメラに対するデプスマップを入力する。プログラム記憶装置55は、第1実施形態として説明した画像符号化処理をCPU50に実行させるソフトウェアプログラムである画像符号化プログラム551を格納する。多重化符号データ出力部56(ディスク装置等による多重化符号データを記憶する記憶部でもよい)は、CPU50がメモリ51にロードされた画像符号化プログラム551を実行することにより生成された符号データを、例えばネットワークを介して出力する。
CPU60は、プログラムを実行する。RAM等のメモリ51は、CPU60がアクセスするプログラムやデータを格納する。符号データ入力部62(ディスク装置等による画像信号を記憶する記憶部でもよい)は、画像符号化装置が本手法により符号化した符号データを入力する。参照カメラ画像入力部63(ディスク装置等による画像信号を記憶する記憶部でもよい)は、カメラ等からの参照対象の画像信号を入力する。参照カメラデプスマップ入力部64(ディスク装置等によるデプス情報を記憶する記憶部でもよい)は、デプスカメラ等からの復号対象を撮影したカメラとは異なる位置や向きのカメラに対するデプスマップを入力する。プログラム記憶装置65は、第2実施形態として説明した画像復号処理をCPU60に実行させるソフトウェアプログラムである画像復号プログラム651を格納する。復号対象画像出力部66(ディスク装置等による画像信号を記憶する記憶部でもよい)は、CPU60がメモリ61にロードされた画像復号プログラム651を実行することにより、符号データを復号して得られた復号対象画像を、再生装置などに出力する。
101・・・符号化対象画像入力部
102・・・符号化対象画像メモリ
103・・・参照カメラ画像入力部
104・・・参照カメラ画像メモリ
105・・・参照カメラデプスマップ入力部
106・・・デプスマップ変換部
107・・・仮想デプスマップメモリ
108・・・視点合成画像生成部
109・・・画像符号化部
200・・・画像復号装置
201・・・符号データ入力部
202・・・符号データメモリ
203・・・参照カメラ画像入力部
204・・・参照カメラ画像メモリ
205・・・参照カメラデプスマップ入力部
206・・・デプスマップ変換部
207・・・仮想デプスマップメモリ
208・・・視点合成画像生成部
209・・・画像復号部
Claims (18)
- 複数の視点の画像である多視点画像を符号化する際に、符号化対象画像の視点とは異なる視点に対する符号化済みの参照画像と、前記参照画像中の被写体のデプスマップである参照デプスマップとを用いて、視点間で画像を予測しながら符号化を行う画像符号化方法であって、
前記参照デプスマップを、前記符号化対象画像中の被写体のデプスマップである仮想デプスマップに変換するデプスマップ変換ステップと、
前記被写体の前後関係によって生じる前記参照デプスマップ内にデプス値が存在しないオクルージョン領域に対して、前記参照画像において遮蔽されている被写体と同じ被写体上の領域に対して対応関係が得られるデプス値を割り当てることにより前記オクルージョン領域のデプス値を生成するオクルージョン領域デプス生成ステップと、
前記オクルージョン領域のデプス値を生成した後の前記仮想デプスマップと前記参照画像とから、前記符号化対象画像に対する視差補償画像を生成することで、視点間の画像予測を行う視点間画像予測ステップと
を有する画像符号化方法。 - 前記オクルージョン領域デプス生成ステップでは、前記参照デプスマップ上において前記オクルージョン領域を遮蔽する被写体の連続性を仮定することにより前記オクルージョン領域のデプス値を生成する請求項1に記載の画像符号化方法。
- 前記オクルージョン領域に対応する前記参照デプスマップ上の画素境界を決定するオクルージョン発生画素境界決定ステップをさらに有し、
前記オクルージョン領域デプス生成ステップでは、前記オクルージョン発生画素境界に隣接する前記参照デプスマップの画素の組ごとに、前記参照デプスマップ上で前記視点から近いことを示すデプス値を持つ画素の位置において、前記視点から近いことを示すデプス値を持つ画素と同じデプス値から、前記視点から遠いことを示すデプス値を持つ画素と同じデプス値まで、前記被写体が連続的に存在すると仮定して、当該仮定された被写体のデプスを前記符号化対象画像上のデプスへと変換することで、前記オクルージョン領域のデプス値を生成する請求項1に記載の画像符号化方法。 - 前記オクルージョン領域を前記参照デプスマップ上で遮蔽する領域に対する前記仮想デプスマップ上の被写体領域を決定する被写体領域決定ステップと、
前記被写体領域を前記オクルージョン領域の方向へ画素を伸張する被写体領域伸張ステップとをさらに有し、
前記オクルージョン領域デプス生成ステップでは、前記伸張して生成された画素と、前記オクルージョン領域に隣接し前記被写体領域とは反対方向に存在する画素との間でデプス値を滑らかに補間することにより前記オクルージョン領域のデプス値を生成する請求項1に記載の画像符号化方法。 - 前記デプスマップ変換ステップでは、前記参照デプスマップの参照画素ごとに前記仮想デプスマップ上の対応画素を求め、前記参照画素に対するデプスと同じ3次元位置を示すデプスを、前記対応画素に割り当てることにより仮想デプスマップへの変換を行う請求項1から4のいずれか1項に記載の画像符号化方法。
- 多視点画像の復号対象画像を復号する際に、復号済みの参照画像と、前記参照画像中の被写体のデプスマップである参照デプスマップとを用いて、視点間で画像を予測しながら復号を行う画像復号方法であって、
前記参照デプスマップを、前記復号対象画像中の被写体のデプスマップである仮想デプスマップに変換するデプスマップ変換ステップと、
前記被写体の前後関係によって生じる前記参照デプスマップ内にデプス値が存在しないオクルージョン領域に対して、前記参照画像において遮蔽されている被写体と同じ被写体上の領域に対して対応関係が得られるデプス値を割り当てることにより前記オクルージョン領域のデプス値を生成するオクルージョン領域デプス生成ステップと、
前記オクルージョン領域のデプス値を生成した後の前記仮想デプスマップと前記参照画像とから、前記復号対象画像に対する視差補償画像を生成することで、視点間の画像予測を行う視点間画像予測ステップと
を有する画像復号方法。 - 前記オクルージョン領域デプス生成ステップでは、前記参照デプスマップ上において前記オクルージョン領域を遮蔽する被写体の連続性を仮定することにより前記オクルージョン領域のデプス値を生成する請求項6に記載の画像復号方法。
- 前記オクルージョン領域に対応する前記参照デプスマップ上の画素境界を決定するオクルージョン発生画素境界決定ステップをさらに有し、
前記オクルージョン領域デプス生成ステップでは、前記オクルージョン発生画素境界に隣接する前記参照デプスマップの画素の組ごとに、前記参照デプスマップ上で前記視点から近いことを示すデプス値を持つ画素の位置において、前記視点から近いことを示すデプス値を持つ画素と同じデプス値から、前記視点から遠いことを示すデプス値を持つ画素と同じデプス値まで、前記被写体が連続的に存在すると仮定して、当該仮定された被写体のデプスを前記復号対象画像上のデプスへと変換することで、前記オクルージョン領域のデプス値を生成する請求項6に記載の画像復号方法。 - 前記オクルージョン領域を前記参照デプスマップ上で遮蔽する領域に対する前記仮想デプスマップ上の被写体領域を決定する被写体領域決定ステップと、
前記被写体領域を前記オクルージョン領域の方向へ画素を伸張する被写体領域伸張ステップとをさらに有し、
前記オクルージョン領域デプス生成ステップでは、前記伸張して生成された画素と、前記オクルージョン領域に隣接し前記被写体領域とは反対方向に存在する画素との間でデプス値を滑らかに補間することにより前記オクルージョン領域のデプス値を生成する請求項6に記載の画像復号方法。 - 前記デプスマップ変換ステップでは、前記参照デプスマップの参照画素ごとに前記仮想デプスマップ上の対応画素を求め、前記参照画素に対するデプスと同じ3次元位置を示すデプスを、前記対応画素に割り当てることにより仮想デプスマップへの変換を行う請求項6から9のいずれか1項に記載の画像復号方法。
- 複数の視点の画像である多視点画像を符号化する際に、符号化対象画像の視点とは異なる視点に対する符号化済みの参照画像と、前記参照画像中の被写体のデプスマップである参照デプスマップとを用いて、視点間で画像を予測しながら符号化を行う画像符号化装置であって、
前記参照デプスマップを、前記符号化対象画像中の被写体のデプスマップである仮想デプスマップに変換するデプスマップ変換部と、
前記被写体の前後関係によって生じる前記参照デプスマップ内にデプス値が存在しないオクルージョン領域に対して、前記参照画像において遮蔽されている被写体と同じ被写体上の領域に対して対応関係が得られるデプス値を割り当てることにより前記オクルージョン領域のデプス値を生成するオクルージョン領域デプス生成部と、
前記オクルージョン領域のデプス値を生成した後の前記仮想デプスマップと前記参照画像とから、前記符号化対象画像に対する視差補償画像を生成することで、視点間の画像予測を行う視点間画像予測部と
を備える画像符号化装置。 - 前記オクルージョン領域デプス生成部は、前記参照デプスマップ上において前記オクルージョン領域を遮蔽する被写体の連続性を仮定することにより前記オクルージョン領域のデプス値を生成する請求項11に記載の画像符号化装置。
- 多視点画像の復号対象画像を復号する際に、復号済みの参照画像と、前記参照画像中の被写体のデプスマップである参照デプスマップとを用いて、視点間で画像を予測しながら復号を行う画像復号装置であって、
前記参照デプスマップを、前記復号対象画像中の被写体のデプスマップである仮想デプスマップに変換するデプスマップ変換部と、
前記被写体の前後関係によって生じる前記参照デプスマップ内にデプス値が存在しないオクルージョン領域に対して、前記参照画像において遮蔽されている被写体と同じ被写体上の領域に対して対応関係が得られるデプス値を割り当てることにより前記オクルージョン領域のデプス値を生成するオクルージョン領域デプス生成部と、
前記オクルージョン領域のデプス値を生成した後の前記仮想デプスマップと前記参照画像とから、前記復号対象画像に対する視差補償画像を生成することで、視点間の画像予測を行う視点間画像予測部と
を備える画像復号装置。 - 前記オクルージョン領域デプス生成部は、前記参照デプスマップ上において前記オクルージョン領域を遮蔽する被写体の連続性を仮定することにより前記オクルージョン領域のデプス値を生成する請求項13に記載の画像復号装置。
- コンピュータに、請求項1から5のいずれか1項に記載の画像符号化方法を実行させるための画像符号化プログラム。
- コンピュータに、請求項6から10のいずれか1項に記載の画像復号方法を実行させるための画像復号プログラム。
- 請求項15に記載の画像符号化プログラムを記録したコンピュータ読み取り可能な記録媒体。
- 請求項16に記載の画像復号プログラムを記録したコンピュータ読み取り可能な記録媒体。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201380049370.8A CN104662897A (zh) | 2012-09-25 | 2013-09-24 | 图像编码方法、图像解码方法、图像编码装置、图像解码装置、图像编码程序、图像解码程序以及记录介质 |
JP2014538499A JP5934375B2 (ja) | 2012-09-25 | 2013-09-24 | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、画像復号プログラム及び記録媒体 |
US14/430,492 US20150245062A1 (en) | 2012-09-25 | 2013-09-24 | Picture encoding method, picture decoding method, picture encoding apparatus, picture decoding apparatus, picture encoding program, picture decoding program and recording medium |
KR20157006802A KR20150046154A (ko) | 2012-09-25 | 2013-09-24 | 화상 부호화 방법, 화상 복호 방법, 화상 부호화 장치, 화상 복호 장치, 화상 부호화 프로그램, 화상 복호 프로그램 및 기록매체 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2012-211155 | 2012-09-25 | ||
JP2012211155 | 2012-09-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014050830A1 true WO2014050830A1 (ja) | 2014-04-03 |
Family
ID=50388227
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2013/075753 WO2014050830A1 (ja) | 2012-09-25 | 2013-09-24 | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、画像復号プログラム及び記録媒体 |
Country Status (5)
Country | Link |
---|---|
US (1) | US20150245062A1 (ja) |
JP (1) | JP5934375B2 (ja) |
KR (1) | KR20150046154A (ja) |
CN (1) | CN104662897A (ja) |
WO (1) | WO2014050830A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2019502295A (ja) * | 2015-11-09 | 2019-01-24 | ヴァーシテック・リミテッドVersitech Limited | アーチファクトを意識したビュー合成のための補助データ |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
IN2013CH05313A (ja) * | 2013-11-18 | 2015-05-29 | Nokia Corp | |
JP6365153B2 (ja) * | 2014-09-10 | 2018-08-01 | 株式会社ソシオネクスト | 画像符号化方法および画像符号化装置 |
US10404969B2 (en) * | 2015-01-20 | 2019-09-03 | Qualcomm Incorporated | Method and apparatus for multiple technology depth map acquisition and fusion |
WO2017082078A1 (ja) * | 2015-11-11 | 2017-05-18 | ソニー株式会社 | 画像処理装置および画像処理方法 |
EP3171598A1 (en) * | 2015-11-19 | 2017-05-24 | Thomson Licensing | Methods and devices for encoding and decoding a matrix of views obtained from light-field data, corresponding computer program and non-transitory program storage device |
CN116489350A (zh) * | 2015-11-20 | 2023-07-25 | 韩国电子通信研究院 | 对图像进行编/解码的方法和装置 |
US10469821B2 (en) * | 2016-06-17 | 2019-11-05 | Altek Semiconductor Corp. | Stereo image generating method and electronic apparatus utilizing the method |
KR20230030017A (ko) | 2016-10-04 | 2023-03-03 | 주식회사 비원영상기술연구소 | 영상 데이터 부호화/복호화 방법 및 장치 |
EP3422708A1 (en) * | 2017-06-29 | 2019-01-02 | Koninklijke Philips N.V. | Apparatus and method for generating an image |
WO2019201355A1 (en) * | 2018-04-17 | 2019-10-24 | Shanghaitech University | Light field system occlusion removal |
US11055879B1 (en) * | 2020-04-03 | 2021-07-06 | Varjo Technologies Oy | Encoder and encoding method for mitigating discrepancies in reconstructed images |
US11418719B2 (en) | 2020-09-04 | 2022-08-16 | Altek Semiconductor Corp. | Dual sensor imaging system and calibration method which includes a color sensor and an infrared ray sensor to perform image alignment and brightness matching |
US11568526B2 (en) | 2020-09-04 | 2023-01-31 | Altek Semiconductor Corp. | Dual sensor imaging system and imaging method thereof |
CN114143443B (zh) * | 2020-09-04 | 2024-04-05 | 聚晶半导体股份有限公司 | 双传感器摄像系统及其摄像方法 |
US11689822B2 (en) | 2020-09-04 | 2023-06-27 | Altek Semiconductor Corp. | Dual sensor imaging system and privacy protection imaging method thereof |
US11496694B2 (en) | 2020-09-04 | 2022-11-08 | Altek Semiconductor Corp. | Dual sensor imaging system and imaging method thereof |
US11496660B2 (en) * | 2020-09-04 | 2022-11-08 | Altek Semiconductor Corp. | Dual sensor imaging system and depth map calculation method thereof |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000215311A (ja) * | 1999-01-21 | 2000-08-04 | Nippon Telegr & Teleph Corp <Ntt> | 仮想視点画像生成方法およびその装置 |
JP2010021844A (ja) * | 2008-07-11 | 2010-01-28 | Nippon Telegr & Teleph Corp <Ntt> | 多視点画像符号化方法,復号方法,符号化装置,復号装置,符号化プログラム,復号プログラムおよびコンピュータ読み取り可能な記録媒体 |
WO2012111756A1 (ja) * | 2011-02-18 | 2012-08-23 | ソニー株式会社 | 画像処理装置および画像処理方法 |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100576934C (zh) * | 2008-07-03 | 2009-12-30 | 浙江大学 | 基于深度和遮挡信息的虚拟视点合成方法 |
JP2011060216A (ja) * | 2009-09-14 | 2011-03-24 | Fujifilm Corp | 画像処理装置および画像処理方法 |
CN101888566B (zh) * | 2010-06-30 | 2012-02-15 | 清华大学 | 立体视频编码率失真性能估计方法 |
KR101676830B1 (ko) * | 2010-08-16 | 2016-11-17 | 삼성전자주식회사 | 영상 처리 장치 및 방법 |
WO2012036903A1 (en) * | 2010-09-14 | 2012-03-22 | Thomson Licensing | Compression methods and apparatus for occlusion data |
EP2432232A1 (en) * | 2010-09-19 | 2012-03-21 | LG Electronics, Inc. | Method and apparatus for processing a broadcast signal for 3d (3-dimensional) broadcast service |
KR101210625B1 (ko) * | 2010-12-28 | 2012-12-11 | 주식회사 케이티 | 빈공간 채움 방법 및 이를 수행하는 3차원 비디오 시스템 |
CN102413353B (zh) * | 2011-12-28 | 2014-02-19 | 清华大学 | 立体视频编码过程的多视点视频和深度图的码率分配方法 |
-
2013
- 2013-09-24 KR KR20157006802A patent/KR20150046154A/ko not_active Application Discontinuation
- 2013-09-24 JP JP2014538499A patent/JP5934375B2/ja active Active
- 2013-09-24 CN CN201380049370.8A patent/CN104662897A/zh active Pending
- 2013-09-24 WO PCT/JP2013/075753 patent/WO2014050830A1/ja active Application Filing
- 2013-09-24 US US14/430,492 patent/US20150245062A1/en not_active Abandoned
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000215311A (ja) * | 1999-01-21 | 2000-08-04 | Nippon Telegr & Teleph Corp <Ntt> | 仮想視点画像生成方法およびその装置 |
JP2010021844A (ja) * | 2008-07-11 | 2010-01-28 | Nippon Telegr & Teleph Corp <Ntt> | 多視点画像符号化方法,復号方法,符号化装置,復号装置,符号化プログラム,復号プログラムおよびコンピュータ読み取り可能な記録媒体 |
WO2012111756A1 (ja) * | 2011-02-18 | 2012-08-23 | ソニー株式会社 | 画像処理装置および画像処理方法 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2019502295A (ja) * | 2015-11-09 | 2019-01-24 | ヴァーシテック・リミテッドVersitech Limited | アーチファクトを意識したビュー合成のための補助データ |
JP7012642B2 (ja) | 2015-11-09 | 2022-01-28 | ヴァーシテック・リミテッド | アーチファクトを意識したビュー合成のための補助データ |
Also Published As
Publication number | Publication date |
---|---|
JPWO2014050830A1 (ja) | 2016-08-22 |
KR20150046154A (ko) | 2015-04-29 |
JP5934375B2 (ja) | 2016-06-15 |
CN104662897A (zh) | 2015-05-27 |
US20150245062A1 (en) | 2015-08-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5934375B2 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、画像復号プログラム及び記録媒体 | |
JP5268645B2 (ja) | カメラパラメータを利用して視差ベクトルを予測する方法、その方法を利用して多視点映像を符号化及び復号化する装置、及びそれを行うためのプログラムが記録された記録媒体 | |
JP5883153B2 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、画像復号プログラム及び記録媒体 | |
JP5947977B2 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム及び画像復号プログラム | |
JP6027143B2 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、および画像復号プログラム | |
JP6307152B2 (ja) | 画像符号化装置及び方法、画像復号装置及び方法、及び、それらのプログラム | |
JP6053200B2 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム及び画像復号プログラム | |
JP5833757B2 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、画像復号プログラム及び記録媒体 | |
JP5926451B2 (ja) | 画像符号化方法、画像復号方法、画像符号化装置、画像復号装置、画像符号化プログラム、および画像復号プログラム | |
KR101750421B1 (ko) | 동화상 부호화 방법, 동화상 복호 방법, 동화상 부호화 장치, 동화상 복호 장치, 동화상 부호화 프로그램, 및 동화상 복호 프로그램 | |
US20170019683A1 (en) | Video encoding apparatus and method and video decoding apparatus and method | |
Wong et al. | Horizontal scaling and shearing-based disparity-compensated prediction for stereo video coding | |
JP6310340B2 (ja) | 映像符号化装置、映像復号装置、映像符号化方法、映像復号方法、映像符号化プログラム及び映像復号プログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13841166 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2014538499 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020157006802 Country of ref document: KR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14430492 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 13841166 Country of ref document: EP Kind code of ref document: A1 |