JP2014099716A

JP2014099716A - Image encoder and control method of the same

Info

Publication number: JP2014099716A
Application number: JP2012249645A
Authority: JP
Inventors: Hideyuki Matsui; 秀往松井
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2012-11-13
Filing date: 2012-11-13
Publication date: 2014-05-29

Abstract

PROBLEM TO BE SOLVED: To highly precisely generate a predicted image and to generate encoded data with a high compression ratio even if an occlusion section and a non-occlusion section coexist in an encoded block.SOLUTION: A mask image generation section 101 generates mask data that shows to which of an occlusion section and a non-occlusion section each pixel of image data of a target view point belongs by using distance image data. A parallax vector retrieval section 103 refers to the mask data and determines whether a noticed block is a mixed block where the occlusion section and the non-occlusion section coexist or not. When the block is determined to be the mixed block, noticed block image data is separated into the occlusion region and the non-occlusion region by referring to identification data, and obtains a predicted vector corresponding to each region. A parallax compensation prediction section 104 extracts predicted image data from the individual vectors and integrates data to generate the predicted image data for predicting and encoding the noticed block.

Description

本発明は画像符号化技術に関するものである。 The present invention relates to an image encoding technique.

従来から、複数の視点で映像を撮影し、撮影した多視点映像を利用してステレオ立体視や、自由視点映像合成といった映像表現を実現する方法が知られている。そのような映像表現を実現するためには、多視点撮影映像の保存技術は必要不可欠となる。単純には、通常のデジタルカメラの如く、単一視点の画像の圧縮符号化技術を視点の数だけ用いることであるが、この場合のデータ量は視点数に比例したものとなってしまう。 2. Description of the Related Art Conventionally, a method is known in which video is captured from a plurality of viewpoints, and video representation such as stereo stereoscopic viewing and free viewpoint video synthesis is performed using the captured multi-viewpoint videos. In order to realize such video expression, multi-viewpoint video storage technology is indispensable. Simply, as with a normal digital camera, a single-viewpoint image compression encoding technique is used as many as the number of viewpoints. However, the data amount in this case is proportional to the number of viewpoints.

このような状況を顧み、多視点映像の相関を利用してデータ量を削減する符号化方式が検討されている。代表的なものとしてＨ．２６４／ＭＰＥＧ−４ＡＶＣｍｕｌｔｉｖｉｅｗｖｉｄｅｏｃｏｄｉｎｇ（以下、ＭＶＣ）が知られている。ＭＶＣでは、対象視点の映像を、参照視点の映像を用いて視差補償予測によって予測し、対象視点の映像と予測画像との差分を符号化することで、対象視点の映像そのものを符号化する場合に比べてデータ量を削減する。ここで、視差補償予測は、一般の動画の時間軸に沿ったフレーム間の相関を利用した圧縮技術として知られている動き補償予測技術を、異なる視点間の映像に適用したものである。 In consideration of such a situation, an encoding method that reduces the amount of data by using the correlation of multi-view video has been studied. A typical example is H.264. H.264 / MPEG-4 AVC multiview video coding (hereinafter referred to as MVC) is known. In MVC, when the target viewpoint video is predicted by parallax compensation prediction using the reference viewpoint video, and the difference between the target viewpoint video and the predicted image is encoded, thereby encoding the target viewpoint video itself. Reduce the amount of data compared to. Here, the parallax compensation prediction is obtained by applying a motion compensation prediction technique known as a compression technique using a correlation between frames along a time axis of a general moving image to videos between different viewpoints.

また、３ＤＶｉｄｅｏＣｏｄｉｎｇ（以下、３ＤＶ）の策定も行われている。ＭＶＣは多視点映像の符号化を行うのに対して、３ＤＶは特に自由視点映像合成を行うために、各視点の距離画像も合わせて符号化する。自由視点映像合成は、多視点映像のみが符号化された場合でもデコーダ側で多視点映像から距離を推定すれば実現可能である。しかし、エンコーダ側で距離を取得して距離画像を符号化することで、デコーダ側の負荷を減らすことができるという利点がある。当然ながら距離画像を同時に送信する場合は、距離画像のデータ量が増加するが、多視点映像と距離画像間の相関を利用することで、多視点映像と距離画像を別々に符号化した場合に比べデータ量を削減できると考えられる。

距離画像を利用して多視点映像の符号量を削減する技術として特許文献１に記載の方法が知られている。特許文献１では、参照視点のカラー画像、参照視点の距離画像が符号化されているという前提で、対象視点のカラー画像の符号化におけるデータ量を削減する方法が示されている。具体的には、参照視点のカラー画像、参照視点の距離画像、各視点間のカメラの設置位置及び向きに基づいて、対象視点のカラー画像を予測する。この際、対象視点から見えるが、参照視点からは見えない画素（オクルージョン）に対しては予測が得られないが、この様な画素に対しては隣接する画素から推定することで予測値を得る。 In addition, 3D Video Coding (hereinafter 3DV) is being formulated. MVC encodes multi-view video, whereas 3DV particularly encodes distance images for each viewpoint in order to perform free-view video synthesis. Free viewpoint video synthesis can be realized by estimating the distance from the multi-view video on the decoder side even when only the multi-view video is encoded. However, there is an advantage that the load on the decoder side can be reduced by acquiring the distance on the encoder side and encoding the distance image. Of course, when the distance image is transmitted simultaneously, the data amount of the distance image increases, but when the multi-view image and the distance image are encoded separately by using the correlation between the multi-view image and the distance image. Compared to this, the data volume can be reduced.

As a technique for reducing the code amount of a multi-view video using a distance image, a method described in Patent Document 1 is known. Patent Document 1 discloses a method for reducing the amount of data in encoding a color image of a target viewpoint on the premise that a color image of a reference viewpoint and a distance image of a reference viewpoint are encoded. Specifically, the color image of the target viewpoint is predicted based on the color image of the reference viewpoint, the distance image of the reference viewpoint, and the installation position and orientation of the camera between the viewpoints. At this time, prediction cannot be obtained for pixels (occlusion) that can be seen from the target viewpoint but not from the reference viewpoint, but for such pixels, a prediction value is obtained by estimation from adjacent pixels. .

特許第４４１４３７９号公報Japanese Patent No. 4414379

ここで、前述のＭＶＣが持つ課題を以下で述べる。ＭＶＣでは、１６×１６画素から４×４画素までの可変のブロックに対して、視差補償予測を適用する。ブロックサイズが小さいほど予測画像と元画像との差分符号量は小さくなるが、視差ベクトルはブロックに１つ割り当てられるため、視差ベクトルの符号量が増加するというトレードオフが存在する。ブロックサイズの決定方法は規格内では定められていないが、例えば、数通りのブロックサイズで符号化した場合のデータ量を比較し、最も符号化効率が良いブロックサイズを採用する。この際、オクルージョン部と非オクルージョン部の特定は明示的に行わないものの、オクルージョン部と非オクルージョン部が混在する領域においては、ブロックサイズが小さくなると予想できる。これは、オクルージョン部と非オクルージョン部は奥行きの異なるシーンが混在するため、単独の視差ベクトルでは、うまく予測画像が生成できないと考えられる為である。つまり、ＭＶＣではオクルージョン近辺の予測画像を高精度で生成するためには、ブロックサイズが小さくなってしまうという課題がある。 Here, the problems of the aforementioned MVC will be described below. In MVC, parallax compensation prediction is applied to a variable block from 16 × 16 pixels to 4 × 4 pixels. The smaller the block size, the smaller the difference code amount between the predicted image and the original image. However, since one disparity vector is assigned to the block, there is a trade-off that the amount of disparity vector code increases. Although the method for determining the block size is not defined in the standard, for example, the block size with the highest coding efficiency is adopted by comparing the data amounts when coding with several block sizes. At this time, although the occlusion part and the non-occlusion part are not explicitly specified, the block size can be expected to be small in a region where the occlusion part and the non-occlusion part are mixed. This is because scenes with different depths are mixed in the occlusion part and the non-occlusion part, so that it is considered that a predicted image cannot be generated well with a single disparity vector. That is, in MVC, in order to generate a prediction image near occlusion with high accuracy, there is a problem that the block size becomes small.

次に、特許文献１に記載の方法が持つ課題を、図３を用いて説明する。図３（ａ）、（ｂ）はそれぞれ同一の被写体を、横に並べた２つのカメラで撮影して得た画像である。図３（ａ）、（ｂ）内の網点部及び斜線部は背景で、互いに異なるテクスチャ模様を持っており、前景に丸いオブジェクトを配置して撮影したものとする。図３（ａ）が左、（ｂ）が右に位置するカメラで撮影した画像である。また、図３（ｃ）は、同図（ａ）に写った被写体の、カメラからの距離を示す距離画像であり、格子部が背景の距離、横線部が前景の距離を示しており、前景領域、背景領域はそれぞれ単一の距離であるとする。ここでは図３（ａ）、（ｃ）をまず符号化し、図３（ａ）、（ｃ）を用いて図３（ｂ）の予測画像を生成し、差分を符号化することを考える。まず図３（ｃ）で得られる各画素の距離と、２つのカメラの位置関係を利用して、図３（ａ）の各画素を投影変換し、図３（ｂ）を撮影したカメラの位置から見た画像に変換する。この結果、図３（ｄ）の画像が得られる。同図（ｄ）において、白色部３０１はオクルージョンである。高い符号化効率を得るには、このオクルージョン部を穴埋めすることで、図３（ｂ）に近い画像を予測画像として作る必要がある。特許文献１では、オクルージョン部の穴埋めは隣接画素から行う方法が示されている。しかし図３（ｄ）で示したように、オクルージョン部がテクスチャの一部であった場合は、隣接画像から穴埋めしただけではテクスチャが十分に予測できる可能性は低い。つまり、特許文献１に記載の方法では、オクルージョン部がテクスチャ領域であった場合などに、予測画像の精度が低くなる、という課題があることが理解できよう。 Next, the problem of the method described in Patent Document 1 will be described with reference to FIG. 3A and 3B are images obtained by photographing the same subject with two cameras arranged side by side. 3A and 3B, it is assumed that the halftone dot portion and the shaded portion are backgrounds, have different texture patterns, and are photographed by placing a round object on the foreground. FIG. 3A is an image taken with a camera located on the left and FIG. 3B on the right. 3C is a distance image showing the distance from the camera of the subject shown in FIG. 3A, where the grid portion indicates the background distance and the horizontal line portion indicates the foreground distance. It is assumed that the area and the background area are each a single distance. Here, it is assumed that FIGS. 3A and 3C are first encoded, the prediction image of FIG. 3B is generated using FIGS. 3A and 3C, and the difference is encoded. First, using the distance of each pixel obtained in FIG. 3C and the positional relationship between the two cameras, the position of the camera that has projected and converted each pixel in FIG. 3A and photographed FIG. Convert to the image seen from. As a result, the image of FIG. 3D is obtained. In FIG. 4D, the white portion 301 is occlusion. In order to obtain high coding efficiency, it is necessary to make an image close to FIG. 3B as a predicted image by filling in the occlusion part. Japanese Patent Application Laid-Open No. H10-228561 discloses a method of filling an occlusion portion from an adjacent pixel. However, as shown in FIG. 3D, when the occlusion part is a part of the texture, it is unlikely that the texture can be sufficiently predicted by just filling in the adjacent image. In other words, it can be understood that the method described in Patent Document 1 has a problem that the accuracy of the predicted image is lowered when the occlusion portion is a texture region.

本発明は係る課題に鑑みなされたものであり、基準視点における画像及び距離画像から、他の視点の画像を効率良く予測符号化する技術を提供しようとするものである。 The present invention has been made in view of the above problems, and an object of the present invention is to provide a technique for efficiently predictively encoding an image at another viewpoint from an image at a reference viewpoint and a distance image.

この課題を解決するため、例えば本発明の画像符号化装置は以下の構成を備える。すなわち、
基準視点における画像データと当該基準視点の距離画像データに基づき、前記基準視点とは異なる目標視点の画像データを予測符号化する画像符号化装置であって、
前記距離画像データを用いて、前記目標視点の画像データにおける各画素について、オクルージョン部と非オクルージョン部のいずれに属するかを識別するための識別データを生成する識別データ生成手段と、
前記目標視点の画像データから、予め設定されたサイズのブロックを単位とするブロック画像データを入力する入力手段と、
前記識別データ生成手段で生成された識別データを参照することで、前記入力手段で入力した着目ブロック画像データが、オクルージョン部と非オクルージョン部が混在した混在ブロックであるか否かを判定する判定手段と、
該判定手段で非混在ブロックであると判定した場合には、前記着目ブロック画像データに対する予測ブロック画像データを前記基準視点の画像データを参照して生成し、
前記判定手段で混在ブロックであると判定した場合には、前記識別データを参照して、前記着目ブロック画像データを、オクルージョンの画素で構成されるオクルージョン領域と、非オクルージョン画素で構成される非オクルージョン領域とに分離し、それぞれの領域に対応する予測領域画像データを前記基準視点の画像データを参照して生成し、得られた予測領域画像データそれぞれを統合することで、前記着目ブロック画像データに対する予測ブロック画像データを生成する予測画像生成手段と、
該予測画像生成手段で生成された予測ブロック画像データと前記着目ブロック画像データとの差分を符号化する符号化手段とを有する。 In order to solve this problem, for example, an image encoding device of the present invention has the following configuration. That is,
An image encoding device that predictively encodes image data of a target viewpoint different from the reference viewpoint based on image data at a reference viewpoint and distance image data of the reference viewpoint,
Using the distance image data, identification data generating means for generating identification data for identifying whether each pixel in the image data of the target viewpoint belongs to an occlusion part or a non-occlusion part;
Input means for inputting block image data in units of blocks of a preset size from the image data of the target viewpoint;
Determination means for determining whether the target block image data input by the input means is a mixed block in which an occlusion part and a non-occlusion part are mixed by referring to the identification data generated by the identification data generation means When,
If the determination unit determines that the block is a non-mixed block, the prediction block image data for the block image data of interest is generated with reference to the image data of the reference viewpoint,
When the determination unit determines that the block is a mixed block, referring to the identification data, the block image data of interest includes an occlusion area including occlusion pixels and a non-occlusion pixel including non-occlusion pixels. The prediction area image data corresponding to each area is generated with reference to the image data of the reference viewpoint, and the obtained prediction area image data is integrated to obtain the corresponding block image data. Predicted image generation means for generating predicted block image data;
And encoding means for encoding a difference between the predicted block image data generated by the predicted image generation means and the block image data of interest.

本発明によれば、符号化単位となるブロック内にオクルージョン部と非オクルージョン部が混在する場合でも、高い精度で予測画像を生成することで、これまでよりも圧縮率の高い符号化データを生成することが可能になる。 According to the present invention, even when an occlusion part and a non-occlusion part coexist in a block that is a coding unit, it is possible to generate encoded data with a higher compression rate than before by generating a predicted image with high accuracy. It becomes possible to do.

第１の実施形態のブロック図。The block diagram of 1st Embodiment. 第１の実施形態のフローチャート。The flowchart of 1st Embodiment. 第１の実施形態で処理する画像の例を示す図。FIG. 5 is a diagram illustrating an example of an image processed in the first embodiment. 第１の実施形態のシステムのシステム構成例を示す図。The figure which shows the system configuration example of the system of 1st Embodiment. 第１の実施形態の領域分割処理の例を示す図。The figure which shows the example of the area | region division process of 1st Embodiment. 第１の実施形態の領域統合の例を示す図。The figure which shows the example of the area | region integration of 1st Embodiment. 第１の実施形態における領域境界補正の例を示す図。The figure which shows the example of the area | region boundary correction | amendment in 1st Embodiment. 第１の実施形態における混在ブロック予測画像生成方法の詳細を示すフローチャート。The flowchart which shows the detail of the mixed block prediction image generation method in 1st Embodiment. コンピュータのブロック構成図。The block block diagram of a computer. 第２の実施形態のフローチャート。The flowchart of 2nd Embodiment. 第３の実施形態のフローチャート。The flowchart of 3rd Embodiment.

以下、添付図面に従って本発明に係る実施形態を詳細に説明する。なお、以下で説明する各実施形態は、本発明を具体的に実施した場合の例を示すもので、特許請求の範囲に記載の構成の具体的な実施形態の１つである。 Hereinafter, embodiments according to the present invention will be described in detail with reference to the accompanying drawings. Each embodiment described below shows an example in the case where the present invention is specifically implemented, and is one of the specific embodiments having the configurations described in the claims.

［第１の実施形態］
本実施形態では、ステレオ立体視や自由視点画像合成が実現可能なシステムを想定し、多視点画像とそのうちの１つの視点の画像とその距離画像とから、他の１つの目標視点の画像を予測符号化を行う例を説明する。従って、以下の説明では、２つの画像と、そのうちの一方の画像の視点における１つの距離画像を用いることになる。 [First Embodiment]
In the present embodiment, assuming a system that can realize stereo stereoscopic vision and free viewpoint image synthesis, an image of one other target viewpoint is predicted from a multi-viewpoint image, one of the viewpoint images, and its distance image. An example of performing encoding will be described. Therefore, in the following description, two images and one distance image at the viewpoint of one of the images are used.

まず自由視点画像合成と距離画像について概要を説明する。自由視点画像合成とは、カメラ等の撮影装置を用いて、或る視点から画像を撮影し、その視点の画像を利用して、実際には撮影を行っていない別の視点から見える画像を合成する技術である。自由視点画像合成の手法の１つに、距離画像を利用する方法が知られている。距離画像とは撮影した視点の各画素の距離情報が格納された画像である。距離画像があれば、撮影した被写体の３次元空間上の座標が特定できるため、別の視点から見た際に、被写体が画像上のどこの位置に写るかを推測することができる。 First, an outline of the free viewpoint image synthesis and the distance image will be described. Free viewpoint image composition is to shoot an image from a certain viewpoint using a camera or other imaging device, and synthesize an image that can be seen from another viewpoint that is not actually photographed. Technology. A method using a distance image is known as one of free viewpoint image synthesis methods. A distance image is an image in which distance information of each pixel of a photographed viewpoint is stored. If there is a distance image, the coordinates of the photographed subject in the three-dimensional space can be specified, so it can be estimated where the subject appears in the image when viewed from another viewpoint.

距離画像の取得には大きく分けてアクティブ方式とパッシブ方式がある。アクティブ方式は被写体にレーザーなどを照射することで距離を取得する方式で、代表的なものとしてレーザーが往復してくるまでの時間から距離を推測するタイムオブフライト方式がある。パッシブ方式は撮影した画像を用いて距離を取得する方式で、代表的なものとして、異なる２視点の撮影画像を用いるステレオマッチング法がある。 There are an active method and a passive method for obtaining a distance image. The active method is a method of acquiring a distance by irradiating a subject with a laser or the like. As a representative method, there is a time-of-flight method of estimating the distance from the time until the laser reciprocates. The passive method is a method of acquiring a distance using a photographed image, and a typical example is a stereo matching method using photographed images from two different viewpoints.

アクティブ取得した距離画像を用いて自由視点画像合成を行う場合は、撮影した多視点画像に加えて、アクティブ取得した距離画像の符号化を行う必要がある。距離画像をパッシブ取得する場合は、多視点画像から距離画像を取得することができるため距離画像の符号化は不要である。しかし、再生時の処理負荷低減などを目的に、撮影時に距離画像のパッシブ取得を行って符号化し、再生時に距離画像の再取得を行わない場合は、やはり距離画像の符号化が必要となる。 When free viewpoint image composition is performed using an actively acquired distance image, it is necessary to encode an actively acquired distance image in addition to the captured multi-viewpoint image. When the distance image is passively acquired, it is not necessary to encode the distance image because the distance image can be acquired from the multi-viewpoint image. However, for the purpose of reducing the processing load at the time of reproduction or the like, when the distance image is passively acquired and encoded at the time of shooting and the distance image is not re-acquired at the time of reproduction, the distance image needs to be encoded.

自由視点画像合成に用いられる距離画像は、視差として符号化されることが多い。視差とは、異なる２つの視点で同一の被写体（点）を撮影した際に、一方の視点に映った点が、もう一方の視点での撮影画像のどこに写っているかを示す量である。一般には異なる画像間で、同一の被写体が写っている点（以下、対応点）同士が何画素ずれているかで表現される。より具体的には、光軸が平行で、水平軸に沿った左右にｂ（ｍｍ）の間隔で並べた２つのカメラで、撮影面（光軸に垂直かつ、２つのカメラを通る平面）からｚ（ｍｍ）離れた点を撮影した場合、２つの撮影画像内の対応点の視差ｄ（ｐｉｘ）は次式（１）で得られる。
ｄ＝ｂｆＷ／（ｚＣ） …（式１）
ここで，ｆはカメラの焦点距離（ｍｍ）、Ｗは撮影画像の幅（ｐｉｘ）、Ｃは撮影素子の幅（ｍｍ）である。なお、カメラが左右に正確に配置されている場合は、対応点の垂直方向の視差は０であることが保証できる（エピポーラ拘束）。従ってここで説明した視差は、対応点同士の水平方向のずれを示していることになる。 A distance image used for free viewpoint image synthesis is often encoded as parallax. The parallax is an amount indicating where a point reflected in one viewpoint is captured in a captured image at the other viewpoint when the same subject (point) is captured from two different viewpoints. In general, it is expressed by how many pixels the points (hereinafter referred to as corresponding points) in which the same subject appears in different images are shifted. More specifically, the two optical cameras are parallel to each other and arranged at an interval of b (mm) on the left and right along the horizontal axis, from the photographing surface (a plane perpendicular to the optical axis and passing through the two cameras). When a point separated by z (mm) is photographed, the parallax d (pix) of the corresponding point in the two photographed images is obtained by the following equation (1).
d = bfW / (zC) (Formula 1)
Here, f is the focal length (mm) of the camera, W is the width (pix) of the captured image, and C is the width (mm) of the imaging element. When the cameras are accurately arranged on the left and right, it can be ensured that the vertical parallax of the corresponding points is 0 (epipolar constraint). Therefore, the parallax described here indicates a horizontal shift between corresponding points.

以上で、自由視点画像合成と、距離画像の概要についての説明を終え、本実施形態の説明に移る。 The description of the free viewpoint image synthesis and the outline of the distance image is thus completed, and the description shifts to the description of the present embodiment.

本第１の実施形態で想定するシステムを図４に示す。図４は撮影装置により２枚の画像を撮影し、２枚の画像及び自由視点画像合成結果を表示するまでの各部の構成と、データの流れを示したものである。 FIG. 4 shows a system assumed in the first embodiment. FIG. 4 shows the configuration of each part and the data flow until two images are taken by the photographing apparatus and the two images and the free viewpoint image composition result are displayed.

まず撮影部４２０、４２１においてカラー画像を撮影する。撮影は光軸が平行で、左、右に並べた２つのデジタルカメラで行う。左視点の画像をＩ１、右視点の画像をＩ２とする。 First, a color image is photographed by the photographing units 420 and 421. Photographing is performed with two digital cameras with the optical axes parallel and arranged on the left and right. The left viewpoint image is I1, and the right viewpoint image is I2.

距離画像取得部４２２では画像Ｉ１の距離画像Ｓ１を取得する。従って、現実には、距離画像取得部４２２の視点位置は、撮像部４２０のそれに近接した位置になる。図示では距離画像はアクティブ方式によって、被写体までの距離を計測することで得られる。なお、Ｉ１、Ｉ２を用いて、ステレオマッチング法により求めることも可能である。また、３ＤのＣＧ画像を扱う場合は３Ｄモデルから距離を取得することも可能である。 The distance image acquisition unit 422 acquires a distance image S1 of the image I1. Therefore, in reality, the viewpoint position of the distance image acquisition unit 422 is close to that of the imaging unit 420. In the figure, the distance image is obtained by measuring the distance to the subject by the active method. In addition, it is also possible to obtain | require by the stereo matching method using I1 and I2. In the case of handling a 3D CG image, it is also possible to acquire the distance from the 3D model.

次に予測符号化部４２３では、画像Ｉ１と距離画像Ｓ１を利用して、Ｉ２の予測符号化を行う。本実施形態の特徴は予測符号化部４２３での予測方法であり、後で詳しく説明する。予測符号化の結果、Ｉ２の予測画像を生成するための予測情報、及びＩ２と予測画像との差分の符号データを得る。予測情報は可逆符号化、差分は非可逆符号化を行うのが望ましい。 Next, the predictive encoding unit 423 performs predictive encoding of I2 using the image I1 and the distance image S1. A feature of the present embodiment is a prediction method in the predictive encoding unit 423, which will be described in detail later. As a result of the predictive coding, prediction information for generating a predicted image of I2 and code data of a difference between I2 and the predicted image are obtained. It is desirable to perform lossless encoding for prediction information and lossy encoding for differences.

符号化部４２４は画像Ｉ１の符号化を行う。Ｉ１の符号化はＰＮＧなどの可逆符号化が望ましい。なお、Ｉ１の符号化をＪＰＥＧなどにより非可逆符号化しても構わないが、この場合、予測符号化部４２３にはＩ１ではなく、Ｉ１を非可逆符号化した後に復号した劣化画像を入力するのが望ましい。 The encoding unit 424 encodes the image I1. The encoding of I1 is preferably lossless encoding such as PNG. Note that I1 may be irreversibly encoded by JPEG or the like, but in this case, not the I1 but the degraded image decoded after irreversibly encoding I1 is input to the predictive encoding unit 423. Is desirable.

符号化部４２５は距離画像Ｓ１の符号化を行う。距離画像Ｓ１の符号化はＺＩＰなどの可逆符号化が望ましい。 The encoding unit 425 encodes the distance image S1. The encoding of the distance image S1 is preferably lossless encoding such as ZIP.

次に、これらで得た符号データと、撮影したカメラの位置および向きを特定する情報とを、送信部４２６が受信部４２７に向けて送信する。この際、各符号化データや撮影に係る情報をコンバインして１つのファイルとして送信すると都合が良い。 Next, the transmission unit 426 transmits the code data obtained by these and information for specifying the position and orientation of the photographed camera to the reception unit 427. At this time, it is convenient to combine each piece of encoded data and information related to photographing and transmit as one file.

受信部４２７は、上記の符号化データを受信すると、符号化データを分離し、それぞれを該当する復号部４２８乃至４３０に供給する。復号部４２８は画像Ｉ１の符号化データを復号する。復号部４３０は、距離画像Ｓ１の符号化データを復号する。そして、復号部４２９は、予測差分符号化データを復号する。そして、復号部４２９は、復号して得られた画像Ｉ１と距離画像Ｓ１、並びに撮影に係る情報を元に予測画像を生成し、その予測画像に対して、予測差分データを適用することで、画像Ｉ２を復号（生成）する。本実施形態では画像Ｉ２のみ非可逆符号化を行うため、劣化前の画像と区別するため、復号部４２９の出力結果はＩ２’として示した。 When receiving the above encoded data, the receiving unit 427 separates the encoded data and supplies each to the corresponding decoding units 428 to 430. The decoding unit 428 decodes the encoded data of the image I1. The decoding unit 430 decodes the encoded data of the distance image S1. Then, the decoding unit 429 decodes the prediction difference encoded data. Then, the decoding unit 429 generates a prediction image based on the image I1 and the distance image S1 obtained by decoding and information related to shooting, and applies the prediction difference data to the prediction image. The image I2 is decoded (generated). In this embodiment, only the image I2 is irreversibly encoded, so that the output result of the decoding unit 429 is shown as I2 'in order to distinguish it from the image before deterioration.

ここまでで撮影したカラー画像Ｉ１、Ｉ２’の表示が可能になる。ステレオ立体視はこの２つの画像で実現可能である。さらに、カラー画像Ｉ１と距離画像Ｓ１を利用して自由視点画像合成を自由視点画像合成部４３１で行うことで、撮影装置で撮影していない視点の画像Ｉ３も表示することが可能である。 The color images I1 and I2 'taken so far can be displayed. Stereo stereoscopic vision can be realized with these two images. Furthermore, by performing the free viewpoint image composition by the free viewpoint image composition unit 431 using the color image I1 and the distance image S1, it is also possible to display a viewpoint image I3 that has not been photographed by the photographing apparatus.

以上で、本実施形態で想定するシステムの全体像の説明を終える。以下では本実施形態の特徴である、予測符号化部４２３の処理の詳細を説明する。 This is the end of the description of the overall image of the system assumed in this embodiment. Details of the process of the predictive coding unit 423, which is a feature of this embodiment, will be described below.

予測符号化部４２３で行う処理の詳細を図２のフローチャートを用いて説明する。図４から明らかなように、予測符号化部４２３への入力は、カラー画像Ｉ１、Ｉ２、及び、カラー画像Ｉ１の距離画像Ｓ１である。以下では説明のため、カラー画像Ｉ１、Ｉ２としてそれぞれ図３（ａ）、（ｂ）を、距離画像Ｓ１として図３（ｃ）を入力したものとして説明を行う。 Details of processing performed by the predictive encoding unit 423 will be described with reference to the flowchart of FIG. As is clear from FIG. 4, the input to the predictive coding unit 423 is the color images I1 and I2 and the distance image S1 of the color image I1. In the following description, for the sake of explanation, FIGS. 3A and 3B are input as the color images I1 and I2, respectively, and FIG. 3C is input as the distance image S1.

図２のＳ２０１では、まず、距離画像Ｓ１を投影変換して、カラー画像Ｉ２の視点での距離画像を推定する（識別データ生成手段）。例として、図３（ｃ）の距離画像を投影変換することで得た距離画像を図３（ｅ）に示す。ここで白色領域３０２は、カラー画像Ｉ１の視点からは見ることができないオクルージョン部である。なお、オクルージョン部ではない場合も、投影変換して得た距離画像に値が存在しない穴が生じることがある。この原因としては、距離画像の量子化誤差などが考えられる。この影響を抑えるために、１画素程度の微小な穴に対しては周囲画素の中間値（もしくは平均値）で穴埋めを行うことが望ましい。 In S201 of FIG. 2, first, the distance image S1 is projected and converted to estimate the distance image at the viewpoint of the color image I2 (identification data generating means). As an example, FIG. 3E shows a distance image obtained by projecting the distance image of FIG. Here, the white region 302 is an occlusion portion that cannot be seen from the viewpoint of the color image I1. Even in the case of not being an occlusion portion, a hole having no value may occur in a distance image obtained by projection conversion. This may be due to a distance image quantization error or the like. In order to suppress this influence, it is desirable to fill a minute hole of about one pixel with an intermediate value (or average value) of surrounding pixels.

次に、図３（ｅ）の距離画像から、画素毎に、オクルージョン部、非オクルージョン部のいずれに属するかの識別する為の識別データを生成する。本実施形態では、後述するようにオクルージョン部、非オクルージョン部を分けるためのデータでもあるので、この識別データの２次元配列を、通常の画像と同様に表現しつつも、通常の画像と区別するために、以降、「マスク画像」という。図３（ｆ）は、このマスク画像の例である。図３（ｆ）の白色部がオクルージョン部、斜線部が非オクルージョン部である。マスク画像は、オクルージョン部、非オクルージョン部の判定（識別）ができれば良いので１画素につき１ビットの２値画像で良い。 Next, identification data for identifying whether the pixel belongs to the occlusion part or the non-occlusion part is generated for each pixel from the distance image of FIG. In the present embodiment, as will be described later, it is also data for separating the occlusion part and the non-occlusion part, so that the two-dimensional array of this identification data is expressed in the same way as a normal image, but is distinguished from a normal image. For this reason, it is hereinafter referred to as a “mask image”. FIG. 3F shows an example of this mask image. In FIG. 3F, the white portion is an occlusion portion, and the shaded portion is a non-occlusion portion. The mask image only needs to be able to determine (identify) the occlusion portion and the non-occlusion portion, and therefore may be a binary image of 1 bit per pixel.

次にＳ２０２に進む。Ｓ２０２〜Ｓ２１２は、カラー画像Ｉ２を所定画素数を包含するブロック単位で符号化する。本実施形態の例では図３（ｂ）のカラー画像を８×８画素で構成されるブロック画像データを単位に符号化するものとして説明する。 Next, the process proceeds to S202. In S202 to S212, the color image I2 is encoded in units of blocks including a predetermined number of pixels. In the example of the present embodiment, the color image in FIG. 3B will be described as being encoded in units of block image data composed of 8 × 8 pixels.

Ｓ２０２では、カラー画像Ｉ２内の符号化対象ブロックが、オクルージョン部と非オクルージョン部が混在するブロックであるかどうかを、マスク画像を参照することで判定する。判定はＳ２０１で生成したマスク画像を用いて行う。 In S202, it is determined by referring to the mask image whether the encoding target block in the color image I2 is a block in which the occlusion part and the non-occlusion part are mixed. The determination is performed using the mask image generated in S201.

まずＳ２０２で非混在ブロックと判定された場合の動作を述べる。例えば図３（ｂ）に示したブロック３０３は、対応するマスク画像図３（ｆ）のブロック３０５が全て非オクルージョン部であるため、非混在ブロックであると判定される。非混在ブロックである場合の処理は、以下で説明する通り、基本的にＭＶＣで用いられる視差補償予測と同様の処理を行う。 First, the operation when it is determined as a non-mixed block in S202 will be described. For example, the block 303 shown in FIG. 3B is determined to be a non-mixed block because the corresponding mask images 305 in FIG. 3F are all non-occlusion portions. The process in the case of a non-mixed block basically performs the same process as the parallax compensation prediction used in MVC as described below.

まず、Ｓ２０３に進み、視差ベクトル探索を行う。視差ベクトル探索では図３（ｂ）のブロック３０３の予測に用いるブロック（予測ブロック画像データ）を図３（ａ）の画像内で探策し、Ｓ２０４でそのブロックの位置を特定する情報を視差ベクトルとして符号化する。符号化はエントロピー符号化などの、可逆符号化が望ましい。 First, it progresses to S203 and performs a parallax vector search. In the disparity vector search, a block (predicted block image data) used for prediction of the block 303 in FIG. 3B is searched in the image in FIG. 3A, and information for specifying the position of the block in S204 is a disparity vector. Is encoded as The encoding is preferably lossless encoding such as entropy encoding.

Ｓ２０３で予測に用いるブロックを定める基準としては、符号化対象ブロックと、予測候補ブロック内の各画素の色の差分絶対値和が最小となるものを選ぶ。 As a criterion for determining a block to be used for prediction in S203, a block that minimizes the sum of absolute differences of colors of pixels in the encoding target block and the prediction candidate block is selected.

Ｓ２０３では、符号化対象ブロックが非オクルージョン部のみからなるブロックで有る場合は、予測候補ブロックは既に説明した通り、エピポーラ拘束に従って探索範囲を限定する。つまり、本実施形態のように、水平にカメラが配置されている場合は、符号化対象ブロックの位置を基準に、横（水平）方向のみをずらしたブロックを探索範囲とする。この場合、視差ベクトルは横方向の位置を示す１次元の位置情報を符号化すれば良い。また、ブロック内の画素全てがオクルージョン画素のみからなる場合は、エピポーラ拘束を無視して、画像全体から対応ブロックを探索する。この場合、視差ベクトルは２次元の位置情報になる。 In S203, when the encoding target block is a block including only the non-occlusion portion, the prediction candidate block limits the search range according to the epipolar constraint as described above. That is, as in the present embodiment, when the camera is arranged horizontally, a block that is shifted only in the horizontal (horizontal) direction with the position of the encoding target block as a reference is set as a search range. In this case, the disparity vector may be encoded with one-dimensional position information indicating the position in the horizontal direction. Further, when all the pixels in the block are composed only of occlusion pixels, the corresponding block is searched from the entire image ignoring the epipolar constraint. In this case, the disparity vector is two-dimensional position information.

次にＳ２０５では予測画像生成を行う。非混在ブロックの場合は、予測画像はＳ２０３で探索した視差ベクトルで指定されるブロックを用いれば良い。 In step S205, predicted image generation is performed. In the case of a non-mixed block, the block specified by the disparity vector searched in S203 may be used as the predicted image.

次にＳ２１２では、Ｓ２０５で生成したブロックの予測画像と、符号化対象ブロックとの各画素の差分値を符号化する。差分の符号化は離散コサイン変換（以下、ＤＣＴ）を行い、変換係数を量子化し、エントロピー符号化を行う。なお、差分信号の符号化はＪＰＥＧ２０００などの他の非可逆符号化や、ＺＩＰなどの可逆符号化で行っても構わない。そして、Ｓ２１３にて、全ブロックについて処理済みかどうかを判定し、否の場合には、着目ブロック画像データの位置をずらし、ステップＳ２０２以降の処理を行う。 Next, in S212, the difference value of each pixel between the prediction image of the block generated in S205 and the encoding target block is encoded. For the difference encoding, discrete cosine transform (hereinafter referred to as DCT) is performed, the transform coefficient is quantized, and entropy encoding is performed. The differential signal may be encoded by other lossy encoding such as JPEG2000 or lossless encoding such as ZIP. Then, in S213, it is determined whether or not the processing has been completed for all the blocks. If not, the position of the block image data of interest is shifted, and the processing after step S202 is performed.

以上で、Ｓ２０２で非混在ブロックと判定された場合の処理の説明を終える。次に、Ｓ２０２にて、着目ブロック画像データが混在ブロックと判定される場合の処理を説明する。 Above, description of the process when it determines with a non-mixed block by S202 is finished. Next, a process when it is determined in S202 that the target block image data is a mixed block will be described.

混在ブロックと判定された場合の符号化の基本的なアイデアは、ブロック内のオクルージョン部として判定された画素で構成されるオクルージョン領域と、非オクルージョン部として判定された画素で構成される非オクルージョン領域のそれぞれに対して予測ブロックを割り当て、それらを統合することで予測精度を向上させる。つまり、１つのブロックに対して視差ベクトルを２本割り当てる。 The basic idea of encoding when it is determined to be a mixed block is that an occlusion area composed of pixels determined as an occlusion part in the block and a non-occlusion area composed of pixels determined as a non-occlusion part The prediction accuracy is improved by assigning prediction blocks to each of them and integrating them. That is, two disparity vectors are assigned to one block.

以下では混在ブロックの例として図３（ｂ）のブロック３０４を用いて説明する。ブロック３０４は対応するマスク画像図３（ｆ）のブロック３０６が斜線部と白色部、つまりオクルージョン部と非オクルージョン部が混じっていることから、Ｓ２０２で混在ブロックであると判定される。 Hereinafter, description will be given using the block 304 in FIG. 3B as an example of the mixed block. The block 304 corresponding to the mask image 306 in FIG. 3F is determined to be a mixed block in S202 because the hatched portion and the white portion, that is, the occlusion portion and the non-occlusion portion are mixed.

Ｓ２０６では、混在ブロックと判定された着目ブロック内をオクルージョン部と非オクルージョン部に分割する。分割の例を、図５を用いて説明する。図５の３０４、３０６はそれぞれ図３（ｂ）の３０４、図３（ｆ）の３０６と同様のブロックである。Ｓ２０６では符号化対象ブロックをマスク画像に従って非オクルージョン画素で構成される非オクルージョン領域５０１と、オクルージョン画素で構成されるオクルージョン領域５０２に分割（分離）する。図５の白色領域５０３及び５０４は値の存在しない画素であり、実装上は負の値等、画素が取りえない値を格納すればよい。 In S206, the block of interest determined as the mixed block is divided into an occlusion part and a non-occlusion part. An example of division will be described with reference to FIG. Reference numerals 304 and 306 in FIG. 5 are the same blocks as 304 in FIG. 3B and 306 in FIG. In S206, the encoding target block is divided (separated) into a non-occlusion region 501 composed of non-occlusion pixels and an occlusion region 502 composed of occlusion pixels according to the mask image. The white areas 503 and 504 in FIG. 5 are pixels having no value, and it is sufficient to store a value that the pixel cannot take such as a negative value in terms of mounting.

次にＳ２０７では、オクルージョン領域を予測するための対応ブロックをカラー画像Ｉ１から探す。本実施形態では図３（ａ）内で探索する。対応ブロックを探索する際の特徴として、ブロック５０２内の値が存在しない画素はブロックの類似度の評価の際に無視することで、オクルージョン領域、非オクルージョン領域それぞれの予測領域画像データを探索する。本実施形態ではブロック同士の類似度を、ブロック内のオクルージョンとなっている画素に関する差分絶対値和で評価する。オクルージョン領域は、参照視点の画像に同じ被写体が写っていないと考えられるため、同じ物体が異なるカメラ間で写っていることを前提としたエピポーラ拘束上の探索だけでなく、画像全体を探索するのが望ましい。この場合、視差ベクトルは２次元のデータを符号化する。なお、処理負荷の軽減を目的にエピポーラ拘束上のみを探索しても良い。この場合、オクルージョン領域の予測精度は下がる可能性が高いが、視差ベクトルを表現するためのデータが１次元に抑えられるという利点もある。 Next, in S207, a corresponding block for predicting the occlusion area is searched from the color image I1. In the present embodiment, the search is performed in FIG. As a feature when searching for a corresponding block, pixels having no value in the block 502 are ignored when evaluating the similarity of the block, thereby searching the predicted area image data of each of the occlusion area and the non-occlusion area. In this embodiment, the similarity between blocks is evaluated by the sum of absolute differences regarding pixels that are occluded in the block. In the occlusion area, it is considered that the same subject is not shown in the image of the reference viewpoint, so not only the search on the epipolar constraint on the assumption that the same object is shown between different cameras, but the entire image is searched. Is desirable. In this case, the disparity vector encodes two-dimensional data. Note that only the epipolar constraint may be searched for the purpose of reducing the processing load. In this case, there is a high possibility that the prediction accuracy of the occlusion area is lowered, but there is also an advantage that data for expressing the disparity vector can be suppressed to one dimension.

ここでＳ２０７の処理の効果を補足する。図３（ａ）の画像中には、オクルージョン領域に存在する物体と同様の物は存在しないはずである。しかしながら、オクルージョン領域が周囲と同じテクスチャ模様となっているようなシーンは多く存在すると考えられる。本実施形態の図３（ａ）、（ｂ）で示した画像はそのような例となっており、背景がテクスチャ模様で、前景に人物などが写っている。このような場合には、オクルージョン領域に関しても、異なる視点で撮影した画像内に色が類似したブロックが見つかり、オクルージョン領域を高精度に予測できると考えられる。次にＳ２０８ではＳ２０７で得たオクルージョン領域を含むブロックの視差ベクトルを符号化する。符号化はＳ２０４と同様可逆符号化を用いるのが望ましい。 Here, the effect of the process of S207 will be supplemented. In the image of FIG. 3A, there should be no object similar to the object existing in the occlusion area. However, there are many scenes where the occlusion area has the same texture pattern as the surrounding area. The images shown in FIGS. 3A and 3B of this embodiment are such an example, where the background is a texture pattern and a person is shown in the foreground. In such a case, regarding the occlusion area, it is considered that blocks having similar colors are found in images taken from different viewpoints, and the occlusion area can be predicted with high accuracy. Next, in S208, the disparity vector of the block including the occlusion area obtained in S207 is encoded. It is desirable to use lossless encoding as in S204.

次にＳ２０９では非オクルージョン領域５０１を予測するための対応ブロックを図３（ａ）から探す。Ｓ２０９は基本的にはＳ２０７で行ったオクルージョン領域の予測ブロックを得る処理を、非オクルージョン領域に置き換えて行えば良い。ただし、非オクルージョン領域の対応ブロックはエピポーラ拘束上に存在すると考えられるため、視差ベクトルの探索は横方向の１次元に限定する。探索の結果図３（ａ）のブロック３０７を、非オクルージョン領域を含むブロックを予測ブロックとして得る。次にＳ２１０ではＳ２０４と同様の方法で、非オクルージョン領域を含む予測ブロックの位置を示す視差ベクトルを符号化する。 Next, in S209, a corresponding block for predicting the non-occlusion area 501 is searched from FIG. In S209, basically, the process of obtaining the predicted block of the occlusion area performed in S207 may be replaced with a non-occlusion area. However, since it is considered that the corresponding block in the non-occlusion region exists on the epipolar constraint, the search for the disparity vector is limited to one dimension in the horizontal direction. As a result of the search, a block including a non-occlusion area is obtained as a prediction block for the block 307 in FIG. Next, in S210, a disparity vector indicating the position of the prediction block including the non-occlusion region is encoded by the same method as in S204.

次にＳ２１１では混在ブロックの予測画像を生成する。このＳ２１１の処理の詳細を図８のフローチャートを用いて説明する。 In step S211, a mixed block prediction image is generated. Details of the processing of S211 will be described using the flowchart of FIG.

まず、Ｓ８０１で、混在ブロック内のオクルージョン領域を予測するブロックと、非オクルージョン領域を予測するブロックを作成する。図３（ｂ）のブロック３０４の予測ブロックを作成する例を図６に示す。図６のブロック３０７、３０８はそれぞれ図３（ａ）のブロック３０７、３０８と同一のものであり、それぞれ混在ブロック３０４のオクルージョン領域、非オクルージョン領域を予測するブロックである。また、マスク３０６は図３（ｆ）のブロック３０６と同一のものである。予測ブロックを得るために、まず、図６に示す如く、非オクルージョン領域の予測ブロックと、オクルージョン領域の予測ブロックそれぞれに対し、マスクによって予測に用いる領域を切り出す（図６のブロック６０１と６０２）。次に両者を互いに統合する（図６のブロック６０３）。 First, in step S801, a block for predicting an occlusion area in a mixed block and a block for predicting a non-occlusion area are created. An example of creating a prediction block of block 304 in FIG. 3B is shown in FIG. Blocks 307 and 308 in FIG. 6 are the same as blocks 307 and 308 in FIG. 3A, respectively, and are blocks for predicting the occlusion area and the non-occlusion area of the mixed block 304, respectively. The mask 306 is the same as the block 306 in FIG. In order to obtain a prediction block, first, as shown in FIG. 6, regions used for prediction are cut out using a mask for each of the prediction block in the non-occlusion region and the prediction block in the occlusion region (blocks 601 and 602 in FIG. 6). Next, they are integrated with each other (block 603 in FIG. 6).

図６で得たブロック６０３は、図３（ｂ）のブロック３０４の予測として十分機能すると考えられる。しかし、本実施形態では、更に予測ブロックとしての精度を更に高いものとするため、オクルージョン領域と非オクルージョン領域の境界位置に対して、Ｓ８０２〜Ｓ８０４を通して更に補正を加える。境界補正の例を、ブロック６０３の１ラインであるライン６０４上の画素を拡大して、説明する。 The block 603 obtained in FIG. 6 is considered to function sufficiently as the prediction of the block 304 in FIG. However, in the present embodiment, in order to further increase the accuracy as a prediction block, the boundary position between the occlusion area and the non-occlusion area is further corrected through S802 to S804. An example of boundary correction will be described by enlarging the pixels on the line 604 that is one line of the block 603.

まず、境界補正の必要性を説明する。図７（ｂ）はライン６０４を拡大した図である。また、図７（ａ）は、ライン６０４で予測を行うライン（以下、予測対象ライン）を拡大して示したもので、予測対象ブロックである図３（ｂ）３０４の１ラインである。当然、図７（ｂ）の各画素が、図７（ａ）に近い色を持つほど、予測精度が高いと言える。図７（ａ）の直線７０１は非オクルージョン部とオクルージョン部の境界を示しており、図７（ｂ）〜（ｅ）内の直線も同様である。図７（ａ）の画素７０２、７０３は予測対象ラインの境界画素であり、非オクルージョン部側の境界にある画素（黒色）と、オクルージョン部側の境界画素（網点）との中間色になっている。一方でこのラインの予測を行う図７（ｂ）の境界画素７０４、７０５は異なる色を持っている。まず、境界画素７０４は、図６のブロック３０７を、マスク３０６で切り取った際の境界にあった画素であるため、境界画素７０４は、ブロック３０７内の前景部（黒色）と、背景部（斜線）の中間色を持つ。また、境界画素７０５は、図６のブロック３０８をマスク３０６で切り取った画素であるため、網点で示した領域の色を持つ。以上述べたように、境界画素７０２、７０３を予測する画素７０４、７０５は十分な予測精度を達成できないと考えられる。 First, the necessity of boundary correction will be described. FIG. 7B is an enlarged view of the line 604. FIG. 7A is an enlarged view of a line for prediction on the line 604 (hereinafter, a prediction target line), and is one line of FIG. 3B 304, which is a prediction target block. Naturally, it can be said that the prediction accuracy is higher as each pixel in FIG. 7B has a color closer to that in FIG. A straight line 701 in FIG. 7A indicates a boundary between the non-occlusion portion and the occlusion portion, and the straight lines in FIGS. 7B to 7E are the same. The pixels 702 and 703 in FIG. 7A are boundary pixels of the prediction target line, and are intermediate colors between the pixels on the non-occlusion part side (black) and the boundary pixels (dots) on the occlusion part side. Yes. On the other hand, the boundary pixels 704 and 705 in FIG. 7B that perform prediction of this line have different colors. First, since the boundary pixel 704 is a pixel at the boundary when the block 307 in FIG. 6 is cut out by the mask 306, the boundary pixel 704 includes the foreground part (black) and the background part (diagonal line) in the block 307. ) Intermediate color. The boundary pixel 705 is a pixel obtained by cutting the block 308 in FIG. As described above, it is considered that the pixels 704 and 705 that predict the boundary pixels 702 and 703 cannot achieve sufficient prediction accuracy.

以上で述べた境界画素の予測精度を向上させるため、まずＳ８０２では図７（ｂ）の境界画素７０４、７０５を削除する。その様子を図７（ｃ）に示す。 In order to improve the prediction accuracy of the boundary pixels described above, the boundary pixels 704 and 705 in FIG. 7B are first deleted in S802. This is shown in FIG.

次にＳ８０３は、Ｓ８０２で削除後の境界画素を二重に補完する。二重に補間とは、図７（ｃ）の画素７０６、７０７のように、削除後の境界の１画素に隣接する画素の色を持つ画素で、削除後の画素値を穴埋めする。その結果を図７（ｄ）に示す。 In step S803, the boundary pixel deleted in step S802 is complemented twice. Double interpolation is a pixel having the color of a pixel adjacent to one pixel at the boundary after deletion, such as the pixels 706 and 707 in FIG. 7C, and the pixel value after deletion is filled. The result is shown in FIG.

次にＳ８０４では、二重に補間した画素をブレンドすることで、境界の両側１画素を予測する色を決定する。その様子を図７（ｅ）に示す。図７（ｅ）の画素７０８、７０９がブレンドによって補完された画素である。ブレンドは二重に補完された２色の平均で行えば良い。 In step S804, a color for predicting one pixel on both sides of the boundary is determined by blending the double-interpolated pixels. This is shown in FIG. Pixels 708 and 709 in FIG. 7E are pixels complemented by blending. The blending may be performed by the average of the two colors complemented twice.

なお、ブレンドの際の重み係数は、予測対象画像などから予測しても良い。２つの領域をブレンドする係数を推定する技術はマッティングとして広く知られており、既知の手法を流用すれば良い。ブレンド係数を予測対象画像から求めた場合、予測対象画像は復号側では知ることができないため、ブレンド係数を符号化する必要がある。 Note that the weighting factor for blending may be predicted from a prediction target image or the like. A technique for estimating a coefficient for blending two regions is widely known as matting, and a known method may be used. When the blend coefficient is obtained from the prediction target image, since the prediction target image cannot be known on the decoding side, it is necessary to encode the blend coefficient.

次にＳ２１２の差分符号化は非混在ブロックと同様の手法で行えば良い。最後にＳ２１３で、予測対象画像の全ブロックを処理したかを判定し、全ブロックを処理していない場合は、再びＳ２０２に戻り、未処理のブロックを符号化する。全ブロックを処理した場合は処理を終える。 Next, the differential encoding in S212 may be performed by the same method as that for non-mixed blocks. Finally, in S213, it is determined whether all the blocks of the prediction target image have been processed. If all the blocks have not been processed, the process returns to S202 again, and the unprocessed block is encoded. If all blocks have been processed, the process ends.

以上が、本実施形態における予測符号化部４２３の処理詳細である。図１に予測符号化部４２３の具体的なブロック構成図の一例を示す。以下で図１の各ブロックの役割を説明する。 The above is the processing details of the predictive coding unit 423 in the present embodiment. FIG. 1 shows an example of a specific block configuration diagram of the predictive coding unit 423. The role of each block in FIG. 1 will be described below.

図１に示す通り予測符号化部４２６は、符号化対象画像（カラー画像Ｉ２）をブロック単位で入力する。また、参照視点カラー画像（カラー画像Ｉ１）、参照視点距離画像（距離画像Ｓ１）も入力される。出力は参照視点カラー画像の予測するための予測情報と差分情報の符号化データである。画像メモリ１０２には参照視点カラー画像（カラー画像Ｉ１）が蓄積される。 As shown in FIG. 1, the predictive encoding unit 426 inputs an encoding target image (color image I2) in units of blocks. In addition, a reference viewpoint color image (color image I1) and a reference viewpoint distance image (distance image S1) are also input. The output is encoded data of prediction information and difference information for predicting the reference viewpoint color image. The image memory 102 stores a reference viewpoint color image (color image I1).

マスク画像生成部１０１は参照視点距離画像と、予め設定した２つの撮像部４２０、４２１の位置（距離）情報から、撮像部４２１の視点位置のマスク画像を生成する。つまり、図２のＳ２０１の処理を行う。 The mask image generation unit 101 generates a mask image at the viewpoint position of the imaging unit 421 from the reference viewpoint distance image and the preset position (distance) information of the two imaging units 420 and 421. That is, the process of S201 in FIG. 2 is performed.

ブロック属性判定部１０８は、符号化対象ブロックが、領域混在ブロックであるかどうかを、マスク画像の対応ブロックを参照して行う。つまり、Ｓ２０２の処理を行う。 The block attribute determination unit 108 refers to the corresponding block of the mask image to determine whether the encoding target block is a region mixed block. That is, the process of S202 is performed.

視差ベクトル探索部１０３は、マスク画像生成部１０１で生成されたマスク画像を参照し、着目画素ブロックがオクルージョン部と非オクルージョン部が混在したブロックか否かを判定し、その判定の下で、画像メモリ１０２から視差ベクトルの探索を行う。すなわち、非混在ブロックの場合は単一の視差ベクトルを探索し、混在ブロックの場合は領域毎に２本の視差ベクトルを探索する。つまり、Ｓ２０３、Ｓ２０６、Ｓ２０７、Ｓ２０９の処理を行う。 The disparity vector search unit 103 refers to the mask image generated by the mask image generation unit 101, determines whether the pixel block of interest is a block in which an occlusion part and a non-occlusion part are mixed, and under this determination, an image A disparity vector is searched from the memory 102. That is, in the case of a non-mixed block, a single disparity vector is searched, and in the case of a mixed block, two disparity vectors are searched for each region. That is, the processes of S203, S206, S207, and S209 are performed.

視差補償予測部１０４では、視差ベクトル探索部１０３で得た視差ベクトルを用いて、予測画像を生成する。つまり、Ｓ２０５、Ｓ２１１の処理を行う。 The disparity compensation prediction unit 104 generates a predicted image using the disparity vector obtained by the disparity vector search unit 103. That is, the processing of S205 and S211 is performed.

ＤＣＴ部１０５、量子化部１０６、エントロピー符号化部１０７では差分画像の符号化を行う。つまりＳ２１２の処理を行う。また、エントロピー符号化部１０７は視差ベクトルの符号化も行う。つまり、Ｓ２０４、Ｓ２０８、Ｓ２１０の処理も行う。 The DCT unit 105, the quantization unit 106, and the entropy encoding unit 107 encode the difference image. That is, the process of S212 is performed. The entropy encoding unit 107 also encodes a disparity vector. That is, the processing of S204, S208, and S210 is also performed.

なお、図１は、実施形態における一例を示すものであって、各処理部のいくつかを統合して１つの処理部としても良いのは勿論であるし、更に、分割しても構わない。 FIG. 1 shows an example in the embodiment. It goes without saying that some of the processing units may be integrated into one processing unit, and may be further divided.

以上説明したようの本実施形態によれば、多視点（複数）のうちの基準視点の撮像装置より得られた基準画像データと当該基準画像における距離画像データとから、基準視点とは異なる視点の画像に対する、符号化単位であるブロック毎の予測画像を高い精度で生成することができることになり、符号化効率を上げることが可能になる。 According to the present embodiment as described above, the reference image data obtained from the imaging device of the reference viewpoint among the multiple viewpoints (plurality) and the distance image data in the reference image have a viewpoint different from the reference viewpoint. A predicted image for each block, which is an encoding unit, for an image can be generated with high accuracy, and encoding efficiency can be increased.

［第１の実施形態の変形例］
図１に示した各部はハードウェアで構成しても良いが、ソフトウェア（コンピュータプログラム）として実装しても良い。この場合、このソフトウェアは、ＰＣ（パーソナルコンピュータ）等、一般のコンピュータのメモリにインストールされることになる。そしてこのコンピュータのＣＰＵがこのインストールされたソフトウェアを実行することで、このコンピュータは、上述の画像処理装置の機能を実現することになる。即ち、このコンピュータは、上述の画像処理装置に適用することができる。第１の実施形態に係る予測符号化装置に適用可能なコンピュータのハードウェア構成例について、図９のブロック図を用いて説明する。 [Modification of First Embodiment]
Each unit shown in FIG. 1 may be configured by hardware, but may be implemented as software (computer program). In this case, the software is installed in a memory of a general computer such as a PC (personal computer). Then, when the CPU of this computer executes the installed software, this computer realizes the functions of the above-described image processing apparatus. That is, this computer can be applied to the above-described image processing apparatus. An example of the hardware configuration of a computer applicable to the predictive coding apparatus according to the first embodiment will be described with reference to the block diagram of FIG.

ＣＰＵ１５０１は、ＲＡＭ１５０２やＲＯＭ１５０３に格納されているコンピュータプログラムやデータを用いて、コンピュータ全体の制御を行うと共に、多視点画像符号化装置が行うものとして説明した上述の各処理を実行する。 The CPU 1501 controls the entire computer using computer programs and data stored in the RAM 1502 and the ROM 1503, and executes the above-described processes described as being performed by the multi-view image encoding apparatus.

ＲＡＭ１５０２は、コンピュータ読み取り可能な記憶媒体の一例である。ＲＡＭ１５０２は、外部記憶装置１５０７や記憶媒体ドライブ１５０５、更にはネットワークインタフェース１５１０からロードされたコンピュータプログラムやデータを一時的に記憶するためのエリアを有する。更に、ＲＡＭ１５０２は、ＣＰＵ１５０１が各種の処理を実行する際に用いるワークエリアを有する。即ち、ＲＡＭ１５０２は、各種のエリアを適宜提供することができる。ＲＯＭ１５０３は、コンピュータ読み取り可能な記憶媒体の一例であり、コンピュータの設定データや、ブートプログラムなどが格納されている。 The RAM 1502 is an example of a computer-readable storage medium. The RAM 1502 has an area for temporarily storing computer programs and data loaded from the external storage device 1507, the storage medium drive 1505, and the network interface 1510. Further, the RAM 1502 has a work area used when the CPU 1501 executes various processes. That is, the RAM 1502 can provide various areas as appropriate. The ROM 1503 is an example of a computer-readable storage medium, and stores computer setting data, a boot program, and the like.

キーボード１５０４、マウス１５０５は、コンピュータの操作者が操作することで、各種の指示をＣＰＵ１５０１に対して入力することができる。表示装置１５０６は、ＣＲＴや液晶画面などにより構成されており、ＣＰＵ１５０１による処理結果を画像や文字などでもって表示することができる。例えば、上記符号化対象の画像の表示や、符号化した結果の表示ができる。 A keyboard 1504 and a mouse 1505 can be operated by a computer operator to input various instructions to the CPU 1501. The display device 1506 is configured by a CRT, a liquid crystal screen, or the like, and can display a processing result by the CPU 1501 using an image, text, or the like. For example, the image to be encoded can be displayed, and the encoded result can be displayed.

外部記憶装置１５０７は、コンピュータ読み取り記憶媒体の一例であり、ハードディスクドライブ装置に代表される大容量情報記憶装置である。外部記憶装置１５０７には、ＯＳ（オペレーティングシステム）や、図１に示した各処理をＣＰＵ１５０１に実現させるためのコンピュータプログラムやデータ、上記の各種テーブル、データベース等が保存されている。外部記憶装置１５０７に保存されているコンピュータプログラムやデータは、ＣＰＵ１５０１による制御に従って適宜ＲＡＭ１５０２にロードされ、ＣＰＵ１５０１による処理対象となる。 The external storage device 1507 is an example of a computer-readable storage medium, and is a large-capacity information storage device represented by a hard disk drive device. The external storage device 1507 stores an OS (Operating System), computer programs and data for causing the CPU 1501 to implement the processes shown in FIG. 1, the above-described various tables, databases, and the like. Computer programs and data stored in the external storage device 1507 are appropriately loaded into the RAM 1502 under the control of the CPU 1501 and are processed by the CPU 1501.

記憶媒体ドライブ１５０８は、ＣＤ−ＲＯＭやＤＶＤ−ＲＯＭなどの記憶媒体に記録されているコンピュータプログラムやデータを読み出し、読み出したコンピュータプログラムやデータを外部記憶装置１５０７やＲＡＭ１５０２に出力する。なお、外部記憶装置１５０７に保存されているものとして説明した情報の一部若しくは全部をこの記憶媒体に記録させておき、この記憶媒体ドライブ１５０８に読み取らせても良い。 The storage medium drive 1508 reads a computer program and data recorded on a storage medium such as a CD-ROM or DVD-ROM, and outputs the read computer program or data to the external storage device 1507 or the RAM 1502. Note that part or all of the information described as being stored in the external storage device 1507 may be recorded on this storage medium and read by this storage medium drive 1508.

Ｉ／Ｆ１５０９は、外部から符号化対象カラー画像、参照視点カラー画像、参照視点距離画像を入力するインタフェースであり、一例として示すのであればＵＳＢ（Universal Serial Bus）である。１５１０は、上述の各部を繋ぐバスである。 The I / F 1509 is an interface for inputting an encoding target color image, a reference viewpoint color image, and a reference viewpoint distance image from the outside, and is USB (Universal Serial Bus) as an example. A bus 1510 connects the above-described units.

上述構成において、本コンピュータの電源がＯＮになると、ＣＰＵ１５０１はＲＯＭ１５０３に格納されているブートプログラムに従って、外部記憶装置１５０７からＯＳをＲＡＭ１５０２にロードする。この結果、キーボード１５０４、マウス１５０５を介した情報入力操作が可能となり、表示装置１５０６にＧＵＩを表示することが可能となる。ユーザが、キーボード１５０４やマウス１５０５を操作し、外部記憶装置１５０７に格納されたグレー画像量子化用のアプリケーションプログラムの起動指示を入力すると、ＣＰＵ１５０１はこのプログラムをＲＡＭ１５０２にロードし、実行する。この結果、本コンピュータが予測符号化装置として機能することになる。 In the above configuration, when the computer is turned on, the CPU 1501 loads the OS from the external storage device 1507 to the RAM 1502 according to the boot program stored in the ROM 1503. As a result, an information input operation can be performed via the keyboard 1504 and the mouse 1505, and a GUI can be displayed on the display device 1506. When the user operates the keyboard 1504 or the mouse 1505 and inputs an instruction to start an application program for gray image quantization stored in the external storage device 1507, the CPU 1501 loads the program into the RAM 1502 and executes it. As a result, this computer functions as a predictive coding apparatus.

なお、ＣＰＵ１５０１が実行する予測符号化用のアプリケーションプログラムは、基本的に図１に示す各部に相当する関数を備えることになる。ここで、符号化結果は外部記憶装置１５０７に保存することになる。なお、このコンピュータは、以降の各実施形態に係る画像処理装置にも同様に適用可能であることは、以下の説明より明らかである。 Note that the application program for predictive encoding executed by the CPU 1501 basically includes functions corresponding to the respective units shown in FIG. Here, the encoding result is stored in the external storage device 1507. It is apparent from the following description that this computer can be similarly applied to image processing apparatuses according to the following embodiments.

［第２の実施形態］
第２の実施形態では、第１の実施形態における、オクルージョン部の符号化方法を変形した例を示す。 [Second Embodiment]
In the second embodiment, an example in which the encoding method of the occlusion part in the first embodiment is modified is shown.

図１０に第２の実施形態における、予測符号化部４２３で行う処理のフローチャートを示す。図１０のフローチャートは、第１の実施形態における予測符号化部のフローチャート図２において、混在ブロックのオクルージョン部の予測方法を変えたものとなっている。従って、オクルージョン部の予測画像作成に関するＳ１００７、Ｓ１００８、Ｓ１０１１、Ｓ１０１４、Ｓ１０１５以外は、図２のフローチャート内の同一名を付与された処理と同じである。以下では、第１の実施形態と差異がある部分に絞って説明を行う。 FIG. 10 shows a flowchart of processing performed by the predictive coding unit 423 in the second embodiment. The flowchart of FIG. 10 is obtained by changing the prediction method of the occlusion part of the mixed block in the flowchart of FIG. 2 of the predictive encoding part in the first embodiment. Therefore, the processes other than S1007, S1008, S1011, S1014, and S1015 relating to the creation of the predicted image of the occlusion part are the same as the processes given the same names in the flowchart of FIG. In the following, the description will be focused on the parts that are different from the first embodiment.

まず、図１０のフローチャートにおいて、Ｓ１００６までの処理を第１の実施形態と同様の手順で終える。即ち、オクルージョン、非オクルージョンが混在するブロックが符号化対象ブロックとして入力された場合、Ｓ１００６の領域分割において、オクルージョン部と、非オクルージョン部へ分割される。本第２の実施形態でもまず、Ｓ１００７では第１の実施形態と同様、オクルージョン部に対して１本の視差ベクトルを求める。 First, in the flowchart of FIG. 10, the processing up to S1006 is completed in the same procedure as in the first embodiment. That is, when a block containing both occlusion and non-occlusion is input as an encoding target block, the block is divided into an occlusion part and a non-occlusion part in S1006. Also in the second embodiment, first, in S1007, one disparity vector is obtained for the occlusion part, as in the first embodiment.

次に、本実施形態においてはＳ１０１４の誤差判定処理において、Ｓ１００７で求めた、ブロック内のオクルージョン部の予測画像と、予測対象画像との誤差を求める。誤差はオクルージョン画素の色の差分絶対値和で評価する。そして差分が閾値以上（ブロック内の画素数×５以上）の場合はＳ１０１５に進む。なお、この閾値はシステムに許容される画質の劣化などから、設計者が任意に決めて良い。 Next, in this embodiment, in the error determination process in S1014, the error between the predicted image of the occlusion part in the block and the prediction target image obtained in S1007 is obtained. The error is evaluated by the sum of absolute differences of the occlusion pixel colors. If the difference is equal to or greater than the threshold (the number of pixels in the block × 5 or more), the process proceeds to S1015. Note that this threshold value may be arbitrarily determined by the designer based on the degradation of image quality allowed for the system.

Ｓ１０１５では、ブロック内のオクルージョン部を分割する。ブロック分割は８×８画素のブロックを４×４画素のブロック（以下、小ブロックという）４つに分割する。Ｈ．２６４のように、異なるサイズのブロックへ分割しても構わない。 In S1015, the occlusion part in the block is divided. In block division, an 8 × 8 pixel block is divided into four 4 × 4 pixel blocks (hereinafter referred to as small blocks). H. Like H.264, it may be divided into blocks of different sizes.

ブロックを分割したら、再びＳ１００７において視差ベクトルを探索する。この際視差ベクトルは分割後の小ブロックに対して１つずつ探索する。ただし、小ブロック内の全ての画素が非オクルージョン部に属する場合は、その小ブロックに対しては視差を割り当てる必要はない。 When the block is divided, the parallax vector is searched again in S1007. At this time, the disparity vector is searched one by one for the divided small blocks. However, when all the pixels in the small block belong to the non-occlusion part, it is not necessary to assign parallax to the small block.

次にＳ１０１４に進む。本実施形態では４×４画素のブロックより小さい分割は行わないため、分割を打ち切ってＳ１００８へ進む。 Next, the process proceeds to S1014. In this embodiment, since the division smaller than the block of 4 × 4 pixels is not performed, the division is terminated and the process proceeds to S1008.

なお、さらに細かい分割を行う場合は、小ブロックの予測画像を組み合わせて、符号化対象ブロックのオクルージョン部の予測画像を生成し、再び符号化対象ブロックのオクルージョン部の予測誤差を求める。予測誤差が閾値以下ならば、さらにＳ１０１５に進む。ただし、ブロック分割を多く行うと予測の精度は上がるが、視差ベクトルの情報量が増加するため、適当な分割で打ち切ることが望ましい。 In the case of performing finer division, a prediction image of the occlusion part of the encoding target block is generated by combining the prediction images of the small blocks, and the prediction error of the occlusion part of the encoding target block is obtained again. If the prediction error is less than or equal to the threshold, the process further proceeds to S1015. However, although the accuracy of prediction increases when block division is performed a lot, the amount of information of disparity vectors increases, so it is desirable to terminate with appropriate division.

次にＳ１００８ではオクルージョン部の視差ベクトルを符号化する。視差ベクトルが複数本あることを除けば、第１の実施形態と同様の手順で符号化すれば良い。 Next, in S1008, the parallax vector of the occlusion part is encoded. Except that there are a plurality of disparity vectors, encoding may be performed in the same procedure as in the first embodiment.

Ｓ１０１１では混在ブロックの予測画像を生成する。基本的な手順は、第１の実施形態のＳ２１１と同様である。ただし、オクルージョン部の予測ブロックを生成する際に、小ブロック毎の予測画像を組み合わせて生成する点のみが異なる。以上で、第２の実施形態の処理の説明を終える。 In S1011, a prediction image of a mixed block is generated. The basic procedure is the same as S211 in the first embodiment. However, when generating the prediction block of the occlusion part, the only difference is that the prediction image for each small block is generated in combination. This is the end of the description of the processing of the second embodiment.

本第２の実施形態ではオクルージョン部のブロックを分割する方法を示した。ブロック分割が加わる分、第１の実施形態よりも視差ベクトルの情報量が増加する。しかしながら、オクルージョン部と非オクルージョン部を分けた後でブロック分割を行うため、領域を分けずにブロック分割を行う方法に比べて、少ない分割数でより精度の高い予測を行える可能性が高くできる。 In the second embodiment, the method of dividing the block of the occlusion part is shown. Since the block division is added, the information amount of the disparity vector is increased as compared with the first embodiment. However, since the block division is performed after the occlusion portion and the non-occlusion portion are separated, it is possible to increase the possibility of performing more accurate prediction with a smaller number of divisions compared to the method of performing block division without dividing the region.

なお、本第２の実施形態ではブロック内のオクルージョン部をブロック分割する方法を示したが、非オクルージョン部をブロック分割する処理を追加することも容易に可能である。 In the second embodiment, the method of dividing the occlusion part in the block into blocks is shown, but it is also possible to easily add a process of dividing the non-occlusion part into blocks.

［第３の実施形態］
第１の実施形態では、非オクルージョン部、オクルージョン部のそれぞれに対して、視差ベクトルにより予測画像を生成した。しかしながら、図３（ｄ）の例に示すように、非オクルージョン部に関しては、参照視点の距離画像を利用して投影変換を行えば、ブロック毎に視差ベクトルを割り当てずに、予測画像を生成することが可能である。本実施形態では、非オクルージョン部の予測画像生成に、第１の実施形態では視差ベクトルを用いたのに対し、距離画像による投影変換を用いる例を示す。 [Third Embodiment]
In the first embodiment, a predicted image is generated with a disparity vector for each of the non-occlusion part and the occlusion part. However, as shown in the example of FIG. 3D, for the non-occlusion part, if projection conversion is performed using the distance image of the reference viewpoint, a predicted image is generated without assigning a disparity vector for each block. It is possible. In the present embodiment, an example is shown in which projection conversion based on a distance image is used for generating a predicted image of a non-occlusion part, whereas a disparity vector is used in the first embodiment.

図１１は本実施形態における予測符号化部４２３の処理のフローチャートである。図２の第１の実施形態のフローチャートと差異のある部分に絞って、以下で処理の詳細を説明する。以下では、第１の実施形態と同様、カラー画像図３（ａ）と、その視点の距離画像図３（ｃ）を用いて、図３（ｂ）のカラー画像を予測符号化する例を説明に用いる。 FIG. 11 is a flowchart of the process of the predictive coding unit 423 in the present embodiment. The details of the processing will be described below, focusing on portions that are different from the flowchart of the first embodiment of FIG. In the following, as in the first embodiment, an example in which the color image in FIG. 3B is predictively encoded using the color image in FIG. 3A and the distance image of the viewpoint in FIG. 3C will be described. Used for.

まずＳ１１０１では、Ｓ２０１と同様の方法でマスク画像を生成する。次にＳ１１１４では投影変換により、非オクルージョン部の予測画像を生成する。図３（ａ）と（ｃ）により生成した例が図３（ｄ）である。 First, in S1101, a mask image is generated by the same method as in S201. In step S1114, a predicted image of the non-occlusion part is generated by projection conversion. An example generated by FIGS. 3A and 3C is FIG.

次に第１の実施形態と同様、ブロック単位の処理に移る。まず、Ｓ１１０２ではブロックの属性判定を行う。本実施形態では、非オクルージョン単体ブロック、オクルージョン単体ブロック、混在ブロックの３種類のどれかを判定する。 Next, as in the first embodiment, the process moves to a block unit. First, in S1102, block attribute determination is performed. In the present embodiment, one of the three types of non-occlusion single block, occlusion single block, and mixed block is determined.

Ｓ１１０２で非オクルージョン単体ブロックと判定された場合は、Ｓ１１１５において、非オクルージョン単体ブロックに対する予測画像を生成する。図３（ｄ）に示した通り、非オクルージョン単体ブロックに対しては投影変換でブロックの予測画像が得られるため、これをそのまま用いる。 If it is determined in S1102 that the block is a non-occlusion single block, a predicted image for the non-occlusion single block is generated in S1115. As shown in FIG. 3D, since a predicted image of a block is obtained by projection conversion for a non-occlusion single block, this is used as it is.

Ｓ１１０２でオクルージョン単体ブロックと判定された場合はＳ１１０３からＳ１１０５の処理を行う。この処理は第１の実施形態の図２のフローチャートＳ２０３からＳ２０５と同様の処理を行えばよい。 If it is determined in S1102 that the block is an occlusion single block, the processing from S1103 to S1105 is performed. This processing may be performed in the same manner as the flowcharts S203 to S205 in FIG. 2 of the first embodiment.

Ｓ１１０２で混在ブロックと判定された場合は、Ｓ１１０６に進む。Ｓ１１０６〜Ｓ１１０８は図２のＳ２０６〜Ｓ２０８と同様の処理を行う。これまでの処理により、混在ブロック内のオクルージョン部に対する予測が可能になる。 If it is determined that the block is a mixed block in S1102, the process proceeds to S1106. S1106 to S1108 perform the same processing as S206 to S208 in FIG. By the processing so far, it is possible to predict the occlusion portion in the mixed block.

次にＳ１１１１では混在ブロックの予測画像を生成する。オクルージョン部の予測はＳ１１０８で求めた視差ベクトルで行うことができる。また非オクルージョン部に関しては図３（ｄ）において、投影変換で作った予測画像から、同一の位置にある画素により予測が可能である。第１の実施形態と同様、図８のフローチャートに従い、オクルージョン部と非オクルージョン部の統合と、境界画素の補正を行う。 Next, in S1111, a prediction image of a mixed block is generated. The prediction of the occlusion part can be performed with the disparity vector obtained in S1108. In addition, regarding the non-occlusion portion, in FIG. 3D, prediction can be performed by using pixels at the same position from a predicted image created by projection conversion. As in the first embodiment, the occlusion part and the non-occlusion part are integrated and boundary pixels are corrected according to the flowchart of FIG.

以上で、Ｓ１１０２で判定されたブロック属性に応じた予測画像の説明を終える。次にＳ１１１２、Ｓ１１１３の処理は第１の実施形態のＳ２１２、Ｓ２１３と同様の処理を行えば良い。以上が本第３の実施形態の処理である。 Above, description of the prediction image according to the block attribute determined by S1102 is finished. Next, the processing of S1112 and S1113 may be performed in the same manner as S212 and S213 of the first embodiment. The above is the processing of the third embodiment.

（その他の実施例）
また、本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。 (Other examples)
The present invention can also be realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, or the like) of the system or apparatus reads the program. It is a process to be executed.

Claims

An image encoding device that predictively encodes image data of a target viewpoint different from the reference viewpoint based on image data at a reference viewpoint and distance image data of the reference viewpoint,
Using the distance image data, identification data generating means for generating identification data for identifying whether each pixel in the image data of the target viewpoint belongs to an occlusion part or a non-occlusion part;
Input means for inputting block image data in units of blocks of a preset size from the image data of the target viewpoint;
Determination means for determining whether the target block image data input by the input means is a mixed block in which an occlusion part and a non-occlusion part are mixed by referring to the identification data generated by the identification data generation means When,
If the determination unit determines that the block is a non-mixed block, the prediction block image data for the block image data of interest is generated with reference to the image data of the reference viewpoint,
When the determination unit determines that the block is a mixed block, referring to the identification data, the block image data of interest includes an occlusion area including occlusion pixels and a non-occlusion pixel including non-occlusion pixels. The prediction area image data corresponding to each area is generated with reference to the image data of the reference viewpoint, and the obtained prediction area image data is integrated to obtain the corresponding block image data. Predicted image generation means for generating predicted block image data;
An image encoding apparatus comprising: encoding means for encoding a difference between predicted block image data generated by the predicted image generation means and the block image data of interest.

The predicted image generation means, when the block image data of interest is the mixed block, a boundary position between the predicted area image data of the occlusion area and the predicted area image data of the non-occlusion area And generating the predicted block image data for the block image data of interest by interpolating the pixel values after deletion with reference to the pixel values adjacent to the deleted pixels in each prediction region image The image encoding device according to claim 1.

The predicted image generation means includes
In the case where the target block image data is the mixed block, a difference between the occlusion region image data in the target block image data and the predicted region image data obtained by the search is equal to or greater than a preset threshold value Error determining means for determining whether or not
A division unit for dividing the block image data of interest into a plurality of small blocks smaller than the block when the error determination unit determines that the difference is equal to or greater than the threshold;
The predicted image is searched for a predicted image from the image data of the reference viewpoint in each of the regions composed of occlusion pixels in each divided small block, and the predicted image is integrated to predict the predicted region image data of the occlusion region in the block image data of interest. The image encoding apparatus according to claim 1, further comprising: means for generating

Whether the block image data of interest is a non-occlusion single block made up of only non-occlusion pixels, an occlusion single block made up of only occlusion pixels, or a mixed block in which occlusion pixels and non-occlusion pixels are mixed Determine whether or not
When the block image data of interest is a non-occlusion single block composed of only non-occlusion pixels, the prediction image generation means predicts the block image of the block of interest image data by projection conversion using the distance image. Data is generated. The image coding device according to any one of claims 1 to 3 characterized by things.

5. The image encoding device according to claim 1, further comprising a unit that encodes the image data at the reference viewpoint and the distance image data. 6.

A control method of an image encoding device for predictively encoding image data of a target viewpoint different from the reference viewpoint based on image data at a reference viewpoint and distance image data of the reference viewpoint,
An identification data generating step for generating identification data for identifying whether each pixel in the image data of the target viewpoint belongs to an occlusion part or a non-occlusion part, using the distance image data. When,
An input step for inputting block image data in units of blocks of a preset size from the image data of the target viewpoint;
Whether the target block image data input by the input unit is a mixed block in which an occlusion part and a non-occlusion part are mixed is determined by referring to the identification data generated by the identification data generation unit. A determination step for determining;
Predictive image generation means
If it is determined in the determination step that it is a non-mixed block, the prediction block image data for the block image data of interest is generated by referring to the image data of the reference viewpoint,
When it is determined that the block is a mixed block in the determination step, the identification block data is referred to, and the block image data of interest includes an occlusion area composed of occlusion pixels and a non-occlusion pixel composed of non-occlusion pixels. The prediction area image data corresponding to each area is generated with reference to the image data of the reference viewpoint, and the obtained prediction area image data is integrated to obtain the corresponding block image data. A predicted image generation step of generating predicted block image data;
A method for controlling an image encoding device, wherein the encoding means includes an encoding step of encoding a difference between the prediction block image data generated in the prediction image generation step and the block image data of interest.

A program for causing a computer to function as the image encoding device according to claim 1 by being read and executed by a computer.

A computer-readable storage medium storing the program according to claim 7.