JP5759357B2

JP5759357B2 - Video encoding method, video decoding method, video encoding device, video decoding device, video encoding program, and video decoding program

Info

Publication number: JP5759357B2
Application number: JP2011272316A
Authority: JP
Inventors: 信哉志水; 木全　英明; 英明木全; 志織杉本; 明小島
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2011-12-13
Filing date: 2011-12-13
Publication date: 2015-08-05
Anticipated expiration: 2031-12-13
Also published as: JP2013126006A

Description

本発明は、映像符号化方法、映像復号方法、映像符号化装置、映像復号装置、映像符号化プログラム及び映像復号プログラムに関する。 The present invention relates to a video encoding method, a video decoding method, a video encoding device, a video decoding device, a video encoding program, and a video decoding program.

自由視点映像とは、撮影空間内でのカメラの位置や向き（以下、視点と記す）をユーザが自由に指定できる映像のことである。自由視点映像では、ユーザが任意の視点を指定するため、その全ての可能性に対して映像を保持することは現実的でないため、指定された視点の映像を生成するのに必要な情報群によって構成される。なお、自由視点映像は、自由視点テレビ、任意視点映像、任意視点テレビなどと呼ばれることもある。 A free viewpoint video is a video that allows the user to freely specify the position and orientation of the camera in the shooting space (hereinafter referred to as the viewpoint). In a free viewpoint video, since the user designates an arbitrary viewpoint, it is not realistic to hold the video for all the possibilities, so depending on the information group necessary to generate the video of the designated viewpoint Composed. Note that the free viewpoint video may also be referred to as a free viewpoint television, an arbitrary viewpoint video, an arbitrary viewpoint television, or the like.

自由視点映像は様々なデータ形式を用いて表現されるが、最も一般的な形式として映像とその映像の各フレームに対するデプスマップ（距離画像）を用いる方式がある（例えば、非特許文献１参照）。ここで、デプスマップとは、カメラの撮影位置から被写体までのデプス（距離）を画素ごとに表現したものであり、被写体の三次元的な位置を表現している。デプスは二つのカメラ間の視差の逆数に比例しているため、ディスパリティマップ（視差画像）と呼ばれることもある。コンピュータグラフィックスの分野では、デプスはＺバッファに蓄積された情報となるためＺ画像やＺマップと呼ばれることもある。なお、カメラから被写体までの距離の他に、表現対象空間上に張られた三次元座標系のＺ軸に対する座標値をデプスとして用いることもある。一般に、撮影された画像に対して水平方向をＸ軸、垂直方向をＹ軸とするため、Ｚ軸はカメラの向きと一致するが、複数のカメラに対して共通の座標系を用いる場合など、Ｚ軸がカメラの向きと一致しない場合もある。以下では、距離・Ｚ値を区別せずにデプスと呼び、デプスを画素値として表した画像をデプスマップと呼ぶ。ただし、厳密にはディスパリティマップでは基準となるカメラ対を設定する必要がある。 A free viewpoint video is expressed using various data formats. As a most general format, there is a method using a video and a depth map (distance image) for each frame of the video (for example, see Non-Patent Document 1). . Here, the depth map represents the depth (distance) from the shooting position of the camera to the subject for each pixel, and represents the three-dimensional position of the subject. Since the depth is proportional to the reciprocal of the parallax between the two cameras, it is sometimes called a disparity map (parallax image). In the field of computer graphics, the depth is information stored in the Z buffer, so it is sometimes called a Z image or a Z map. In addition to the distance from the camera to the subject, a coordinate value with respect to the Z axis of the three-dimensional coordinate system stretched on the expression target space may be used as the depth. In general, since the horizontal direction is the X axis and the vertical direction is the Y axis with respect to the captured image, the Z axis coincides with the direction of the camera, but when a common coordinate system is used for a plurality of cameras, etc. In some cases, the Z-axis does not match the camera orientation. Hereinafter, the distance and the Z value are referred to as depth without distinction, and an image representing the depth as a pixel value is referred to as a depth map. However, strictly speaking, it is necessary to set a reference camera pair in the disparity map.

デプスを画素値として表す際に、物理量に対応する値をそのまま画素値とする方法と、最小値と最大値の間をある数に量子化して得られる値を用いる方法と、最小値からの差をあるステップ幅で量子化して得られる値を用いる方法がある。表現したい範囲が限られている場合には、最小値などの付加情報を用いるほうがデプスを高精度に表現することができる。また、等間隔に量子化する際に、物理量をそのまま量子化する方法と物理量の逆数を量子化する方法とがある。距離の逆数は視差に比例した値となるため、距離を高精度に表現する必要がある場合には、前者が使用され、視差を高精度に表現する必要がある場合には、後者が使用されることが多い。以下では、デプスの画素値化の方法や量子化の方法に関係なく、デプスが画像として表現されたものを全てデプスマップと呼ぶ。 When expressing the depth as a pixel value, the value corresponding to the physical quantity is directly used as the pixel value, the method using a value obtained by quantizing the value between the minimum value and the maximum value into a certain number, and the difference from the minimum value. There is a method of using a value obtained by quantizing with a step width. When the range to be expressed is limited, the depth can be expressed with higher accuracy by using additional information such as a minimum value. In addition, when quantizing at equal intervals, there are a method of quantizing a physical quantity as it is and a method of quantizing an inverse of a physical quantity. Since the reciprocal of the distance is a value proportional to the parallax, the former is used when the distance needs to be expressed with high accuracy, and the latter is used when the parallax needs to be expressed with high accuracy. Often. In the following description, everything in which depth is expressed as an image is referred to as a depth map regardless of the pixel value conversion method or the quantization method.

デプスマップは、各画素が一つの値を持つ画像として表現されるため、グレースケール画像とみなすことができる。また、被写体が実空間上で連続的に存在し、瞬間的に離れた位置へ移動することができないため、画像信号と同様に空間的相関および時間的相関を持つと言える。したがって、通常の画像信号や映像信号を符号化するために用いられる画像符号化方式や映像符号化方式によって、デプスマップや連続するデプスマップで構成される映像を空間的冗長性や時間的冗長性を取り除きながら効率的に符号化することが可能である。以下では、デプスマップとその映像を区別せずにデプスマップと呼ぶ。 The depth map can be regarded as a grayscale image because each pixel is expressed as an image having one value. In addition, since the subject exists continuously in the real space and cannot move to a position distant from the moment, it can be said that the subject has a spatial correlation and a temporal correlation like the image signal. Therefore, depending on the image coding method and video coding method used to encode normal image signals and video signals, images composed of depth maps and continuous depth maps can be spatially and temporally redundant. It is possible to efficiently encode while removing. Hereinafter, the depth map and its video are referred to as a depth map without distinction.

ここで、一般的な映像符号化について説明する。映像符号化では、被写体が空間的および時間的に連続しているという特徴を利用して効率的な符号化を実現するために、映像の各フレームをマクロブロックと呼ばれる処理単位ブロックに分割し、マクロブロックごとにその映像信号を空間的または時間的に予測し、その予測方法を示す予測情報と予測残差とを符号化する。映像信号を空間的に予測する場合は、例えば空間的な予測の方向を示す情報が予測情報となり、時間的に予測する場合は、例えば参照するフレームを示す情報とそのフレーム中の位置を示す情報とが予測情報となる。 Here, general video coding will be described. In video coding, in order to realize efficient coding using the feature that the subject is spatially and temporally continuous, each frame of the video is divided into processing unit blocks called macroblocks, The video signal is predicted spatially or temporally for each macroblock, and prediction information indicating the prediction method and a prediction residual are encoded. When predicting a video signal spatially, for example, information indicating the direction of spatial prediction becomes prediction information, and when predicting temporally, for example, information indicating a frame to be referenced and information indicating a position in the frame Is prediction information.

空間的に行う予測は、フレーム内の予測であることから、フレーム内予測（画面内予測、イントラ予測）と呼ばれる。Ｈ．２６４／ＡＶＣに代表される近年の映像符号化方式で用いられるフレーム内予測では、予測対象のブロックに対して隣接ブロックとのテクスチャの連続性など、そのブロックにおける画像における隣接ブロックとの空間的な相関の方向を示す情報を用いる。具体的には、予測対象のブロックに対して、予め用意されていた複数種類の方向の中から１つの方向を選択し、その方向に従って、予測対象のブロックに隣接する既に符号化済み（または復号済み）の画素の画素値を用いて予測画像を生成する。選択可能な予測方向を細かく設定することで、より高い精度で相関を利用することが可能になる。Ｈ．２６４／ＡＶＣでは、予測対象のブロックの大きさによって異なるが、最大で８種類の方向が利用可能である。（Ｈ．２６４／ＡＶＣの詳細については、例えば、非特許文献２参照）。 Since the prediction performed spatially is an intra-frame prediction, it is called intra-frame prediction (intra-screen prediction, intra prediction). H. In intra-frame prediction used in recent video coding systems represented by H.264 / AVC, spatial continuity with adjacent blocks in an image in the block, such as texture continuity with adjacent blocks, for a prediction target block. Information indicating the direction of correlation is used. Specifically, one direction is selected from a plurality of types of directions prepared in advance for the block to be predicted, and already encoded (or decoded) adjacent to the block to be predicted according to the direction. A predicted image is generated using the pixel value of the pixel of (completed). By finely setting the selectable prediction directions, the correlation can be used with higher accuracy. H. In H.264 / AVC, depending on the size of a block to be predicted, up to eight types of directions can be used. (For details of H.264 / AVC, see Non-Patent Document 2, for example).

フレーム内予測の予測方向を細かく設定可能にすることで、多様な向きのテクスチャの連続性に対応した予測が可能となる。しかしながら、予測方向を細かく設定可能にするだけでは、オクルージョンなどで予測対象のブロックとその隣接画素とで異なる被写体が写っている場合には、正しいフレーム内予測を行うことが不可能である。 By making it possible to finely set the prediction direction of intra-frame prediction, prediction corresponding to the continuity of textures in various directions becomes possible. However, if the prediction direction can be set finely, it is impossible to perform correct intraframe prediction when different subjects are captured in the prediction target block and its adjacent pixels due to occlusion or the like.

このような問題に対し、特許文献１では、予測方向に加えて、予測対象ブロックからの距離を指定し、隣接画素以外の画素を用いた予測画像の生成を行うことで、オクルージョンが発生する場合においても、正しいフレーム内予測を実現している。この方式を用いることで、隣接画素以外の離れた画素を参照したフレーム内予測を行うことが可能となるため、ブロック端にオクルージョンや雑音を含む場合や、空間的に周期的な信号が存在する場合において、高い精度で映像信号を予測することが可能となる。 With respect to such a problem, in Patent Document 1, in addition to the prediction direction, a distance from the prediction target block is specified, and generation of a predicted image using pixels other than adjacent pixels causes occlusion. In the case of, correct intra prediction is realized. By using this method, it is possible to perform intra-frame prediction with reference to distant pixels other than adjacent pixels. Therefore, there are occlusions and noise at the block end, and spatially periodic signals exist. In some cases, the video signal can be predicted with high accuracy.

再公表ＷＯ２００８／１０２８０５号公報Republished WO2008 / 102805

Y. Mori, N. Fukushima, T. Fujii, and M. Tanimoto, “View Generation with 3D Warping Using Depth Information for FTV ”,In Proceedings of 3DTV-CON2008, pp. 229-232, May 2008.Y. Mori, N. Fukushima, T. Fujii, and M. Tanimoto, “View Generation with 3D Warping Using Depth Information for FTV”, In Proceedings of 3DTV-CON2008, pp. 229-232, May 2008. Rec. ITU-T H.264,“Advanced video coding for generic audiovisual services”, March 2009.Rec. ITU-T H.264, “Advanced video coding for generic audiovisual services”, March 2009.

しかしながら、特許文献１の方法では、予測方向に加えて、ブロックごとにどの画素を参照画素として予測を行うかを示す情報を符号化する必要が生じてしまう。その結果、フレーム内予測を行うブロックにおける付加情報の符号量が増大してしまうため、効率的な符号化を実現することができないという問題がある。 However, in the method of Patent Document 1, in addition to the prediction direction, it is necessary to encode information indicating which pixel is used as a reference pixel for each block. As a result, since the code amount of the additional information in the block that performs intra-frame prediction increases, there is a problem that efficient coding cannot be realized.

また、特許文献１の方法は、予測対象ブロック毎に、１つの予測対象ブロックからの距離を定義し、予測対象ブロック内の全ての画素に対して同じ距離を用いて参照画素を指定するため、被写体形状がブロック境界に対して並行でない場合や、予測方向に対して複数の被写体が存在する場合において、精度の高い予測を行うことができないという問題がある。また、ブロック内の各画素に対して参照画素を示すための距離を指定するという方法を用いることも可能であるが、この方法では、画素ごとにどの画素を参照画素として予測を行うかを示す情報を符号化しなくてはならなくなり、付加情報の符号量の更なる増大を招き、効率的な符号化を実現できないという問題がある。 Moreover, since the method of patent document 1 defines the distance from one prediction object block for every prediction object block, and designates a reference pixel using the same distance with respect to all the pixels in a prediction object block, When the subject shape is not parallel to the block boundary, or when there are a plurality of subjects in the prediction direction, there is a problem that prediction with high accuracy cannot be performed. It is also possible to use a method of designating a distance for indicating a reference pixel for each pixel in the block, but this method indicates which pixel is used as a reference pixel for each pixel. There is a problem that the information must be encoded, and the amount of additional information is further increased, so that efficient encoding cannot be realized.

本発明は、このような事情に鑑みてなされたもので、映像とデプスマップとを構成要素に持つ自由視点映像データの符号化において、フレーム内予測の符号化効率を向上することができる映像符号化方法、映像復号方法、映像符号化装置、映像復号装置、映像符号化プログラム及び映像復号プログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and a video code capable of improving the encoding efficiency of intra-frame prediction in encoding of free-viewpoint video data having a video and a depth map as components. It is an object to provide an encoding method, a video decoding method, a video encoding device, a video decoding device, a video encoding program, and a video decoding program.

本発明は、映像に対応するデプスマップを使用しながら、前記映像を構成する各フレームを予め定められた大きさに分割した処理領域に対して、フレーム内予測を用いて予測符号化を行う映像符号化方法であって、前記フレーム内予測における予測方向を設定するフレーム内予測方向設定ステップと、前記設定した予測方向に従って、前記処理領域内の画素または所定数の画素群ごとに、前記処理領域から所定の距離範囲内にある画素を参照画素の候補として設定する参照画素候補設定ステップと、前記処理領域に対応するデプスマップと前記参照画素の候補に対応するデプスマップとを用いて、前記処理領域内の画素または所定数の画素群ごとに、前記参照画素の各候補に対する相違度を算出する参照画素相違度算出ステップと、前記相違度に基づいて、前記処理領域内の画素または所定数の画素群ごとに、前記参照画素の候補の中から参照画素を決定する参照画素決定ステップと、前記設定した予測方向に従って、前記参照画素から前記処理領域に対する予測画像を生成する予測画像生成ステップとを有することを特徴とする。 The present invention is a video in which predictive encoding is performed using intra-frame prediction on a processing region obtained by dividing each frame constituting the video into a predetermined size while using a depth map corresponding to the video. An encoding method, an intra-frame prediction direction setting step for setting a prediction direction in the intra-frame prediction, and the processing region for each pixel or a predetermined number of pixels in the processing region according to the set prediction direction A reference pixel candidate setting step for setting a pixel within a predetermined distance range from the reference region as a reference pixel candidate, a depth map corresponding to the processing region, and a depth map corresponding to the reference pixel candidate. A reference pixel dissimilarity calculating step for calculating a dissimilarity for each candidate of the reference pixel for each pixel in the region or a predetermined number of pixels, and the difference A reference pixel determining step for determining a reference pixel from among the reference pixel candidates for each pixel or a predetermined number of pixels in the processing region, and according to the set prediction direction, And a predicted image generation step for generating a predicted image for the processing region.

本発明によれば、映像信号の予測精度の向上に伴う、予測残差の符号化に必要な符号量を削減することができる。フレーム内予測において予測対象領域に隣接する画素だけではなく、そこから一定距離内かつフレーム内予測の予測方向に沿った位置に存在する画素からの予測が可能になったことにより、領域端にオクルージョンや雑音を含む場合や、予測方向に対して複数の被写体が存在する領域に対しても、同じ被写体からの画素値の予測が可能になるため、高精度な予測が実現できる。従来手法（例えば、特許文献１）では、予測対象領域からの距離を指定し、予測対象領域に隣接する画素以外を用いたフレーム内予測を実現していた。この場合、距離を指定する情報を符号化する必要が生じるため、画素ごとに距離を指定することができず、全ての画素に対して精度の高い予測を行えなかった。本発明では、別途伝送されているデプスマップの情報を用いて、参照画素の候補の中から、予測する対象の画素と同じ被写体が写っている画素を選択し、その選択された参照画素を用いて予測画像を生成する。このようにすることで、距離を指定する情報を符号化する必要がなくなる。その結果、画素ごとに距離を決定することが可能となり、全ての画素に対して精度の高い予測を実現できる。ここで、デプスマップの値は被写体に大きく依存した値であるため、デプスマップを用いた被写体の判定によって、高精度に同じ被写体が写っているか否かを判定することが可能である。なお、画素群ごとに距離を指定する場合においても、特許文献１では必要だった距離を指定する情報を符号化する必要がなくなることによって、その分の符号量を削減することが可能である。 ADVANTAGE OF THE INVENTION According to this invention, the code amount required for the encoding of a prediction residual with the improvement of the prediction precision of a video signal can be reduced. Occlusion at the edge of the region is enabled not only by the pixels adjacent to the prediction target region in intraframe prediction, but also by prediction from pixels located within a certain distance and along the prediction direction of intraframe prediction. In addition, pixel values from the same subject can be predicted even in a case where noise is included or in a region where there are a plurality of subjects in the prediction direction, so that highly accurate prediction can be realized. In the conventional method (for example, Patent Document 1), the distance from the prediction target region is specified, and intra-frame prediction using pixels other than the pixels adjacent to the prediction target region is realized. In this case, since it is necessary to encode information for specifying the distance, the distance cannot be specified for each pixel, and high-precision prediction cannot be performed for all the pixels. In the present invention, using the depth map information transmitted separately, a pixel in which the same subject as the pixel to be predicted is captured is selected from the reference pixel candidates, and the selected reference pixel is used. To generate a predicted image. In this way, it is not necessary to encode information specifying the distance. As a result, it is possible to determine the distance for each pixel, and it is possible to realize highly accurate prediction for all the pixels. Here, since the value of the depth map is largely dependent on the subject, it is possible to determine whether or not the same subject is captured with high accuracy by determining the subject using the depth map. Even when the distance is designated for each pixel group, it is not necessary to encode the information for designating the distance, which is necessary in Patent Document 1, so that the amount of code can be reduced.

本発明は、前記参照画素相違度算出ステップは、前記処理領域内の画素または所定数の画素群に対応する前記デプスマップの値と前記参照画素の候補に対応するデプスマップの値との差分値と、前記処理領域と前記参照画素との距離とを用いて、前記処理領域内の画素または所定数の画素群ごとに、前記参照画素の各候補に対する相違度を算出することを特徴とする。 In the present invention, the reference pixel dissimilarity calculating step includes a difference value between a value of the depth map corresponding to a pixel in the processing region or a predetermined number of pixel groups and a value of the depth map corresponding to the reference pixel candidate. And the distance between the processing region and the reference pixel is used to calculate the degree of difference for each candidate of the reference pixel for each pixel in the processing region or a predetermined number of pixel groups.

本発明によれば、予測対象の画素に対してより相関の高い画素から予測を行うことによる予測精度の向上に伴う、予測残差の符号化に必要な符号量の削減を達成することができる。デプスマップではテクスチャ情報が失われているため、同じ被写体内のテクスチャまで考慮に入れて参照画素を決定することができない。一般に、画像内での空間的な距離が近いほど画素値の相関が高いため、デプスマップの値に加えて、参照画素と予測対象画素との間の距離の基準に参照画素を決定することで、参照画像の候補に同じ被写体が写った画素が複数存在する場合に、より相関の高い画素を用いた予測が可能になる。 According to the present invention, it is possible to achieve a reduction in the amount of code necessary for encoding a prediction residual accompanying an improvement in prediction accuracy by performing prediction from a pixel having a higher correlation with respect to a prediction target pixel. . Since the texture information is lost in the depth map, it is not possible to determine the reference pixel taking into account the texture in the same subject. In general, the closer the spatial distance in the image, the higher the correlation between the pixel values. Therefore, in addition to the depth map value, the reference pixel is determined based on the distance standard between the reference pixel and the prediction target pixel. When there are a plurality of pixels in which the same subject appears in the reference image candidates, prediction using pixels with higher correlation becomes possible.

本発明は、前記参照画素決定ステップは、各参照画素に対する前記相違度の合計値と、前記処理領域と各参照画素との距離のばらつきを表す値との重み付き和が最小になるように、前記処理領域内の画素または所定数の画素群ごとに、前記参照画素の候補の中から参照画素を決定することを特徴とする。 In the present invention, in the reference pixel determining step, the weighted sum of the total value of the dissimilarities with respect to each reference pixel and a value representing a variation in the distance between the processing region and each reference pixel is minimized. A reference pixel is determined from the reference pixel candidates for each pixel in the processing region or a predetermined number of pixel groups.

本発明によれば、予測画像に不自然な高周波成分が発生するのを抑制することによる周波数領域での予測精度の向上に伴う、予測残差の符号化に必要な符号量を削減することができる。予測対象領域の画素ごとに参照画素を設定する場合、本来は連続していなかった画素を寄せ集めて予測画像が生成されることになる。その結果、予測画像における画素間の連続性が低下し、自然画像ではありえない高周波成分が発生することがある。そこで、予測対象領域の隣接画素間での参照画素までの距離の変化を抑制しつつ、できる限り同じ被写体が撮影されている画素を参照画素として設定することで、予測画像に不自然な高周波成分が発生するのを抑制することが可能となる。なお、一般的な符号化では予測残差は周波数領域で符号化されるため、このようにすることで予測残差における無駄な高周波成分の発生を防ぎ、より効率的な符号化を実現することが可能となる。 According to the present invention, it is possible to reduce the amount of code necessary for encoding a prediction residual accompanying improvement in prediction accuracy in the frequency domain by suppressing the occurrence of an unnatural high-frequency component in a predicted image. it can. When a reference pixel is set for each pixel in the prediction target region, a predicted image is generated by collecting pixels that were not originally continuous. As a result, the continuity between pixels in the predicted image is reduced, and a high-frequency component that cannot be a natural image may occur. Therefore, by suppressing the change in the distance to the reference pixel between adjacent pixels in the prediction target region, by setting a pixel in which the same subject is photographed as a reference pixel as much as possible, an unnatural high-frequency component in the predicted image Can be prevented from occurring. In general encoding, the prediction residual is encoded in the frequency domain, and thus, it is possible to prevent generation of useless high frequency components in the prediction residual and realize more efficient encoding. Is possible.

本発明は、前記参照画素決定ステップは、前記相違度に基づいて、前記処理領域内の画素または所定数の画素群ごとに、前記参照画素の候補の中から複数の参照画素を決定し、前記予測画像生成ステップは、前記予測方法に従って、複数の前記参照画素を用いて予測画像を生成することを特徴とする。 In the present invention, the reference pixel determining step determines a plurality of reference pixels from the reference pixel candidates for each pixel in the processing region or a predetermined number of pixel groups based on the degree of difference, The predicted image generation step generates a predicted image using a plurality of the reference pixels according to the prediction method.

本発明によれば、参照画素やデプスマップにおけるノイズに対する頑健性を向上することができる。デプスマップにノイズが含まれる場合、複数の参照画素の候補から１つの参照画素を選択する際に誤りが生じることがある。その場合、複数の参照画素を選択し、その参照画素群に対する画素値の平均値や中央値を用いて予測画像を生成することで、誤って異なる被写体が撮影されている参照画素が幾つか選択されていても、その影響を低減して予測画像を生成することができる。また、参照画素にノイズが含まれる場合においても、複数の参照画素を用いて平均値をとることでノイズ成分を低減し、より高精度な予測を実現することが可能となる。 According to the present invention, robustness against noise in a reference pixel or a depth map can be improved. If the depth map includes noise, an error may occur when one reference pixel is selected from a plurality of reference pixel candidates. In that case, by selecting a plurality of reference pixels and generating a predicted image using the average value or median of the pixel values for the reference pixel group, several reference pixels in which different subjects are captured by mistake are selected. Even if it is done, the influence can be reduced and a prediction image can be generated. Further, even when noise is included in the reference pixel, it is possible to reduce the noise component by taking an average value using a plurality of reference pixels and to realize more accurate prediction.

本発明は、前記予測画像生成ステップは、前記処理領域内の画素または所定数の画素群ごとに決定された複数の前記参照画素の各々に対して、前記相違度に従って重み係数を設定する重み係数設定ステップを含み、前記予測方法に従って、前記重み係数を用いた複数の前記参照画素の重み付き平均値または重み付き中央値を用いて予測画像を生成することを特徴とする。 In the present invention, the predicted image generation step sets a weighting factor according to the degree of difference for each of the plurality of reference pixels determined for each pixel in the processing region or for a predetermined number of pixel groups. Including a setting step, and generating a predicted image using a weighted average value or a weighted median value of the plurality of reference pixels using the weighting coefficient according to the prediction method.

本発明によれば、複数の参照画素を用いて予測を行う際の予測精度向上に伴う、予測残差の符号化に必要な符号量を削減することができる。複数の参照画素を用いて予測を行う場合、予測対象画素と参照画素との相違度に基づいた重み係数を用いた重み付き平均値や重み付き中央値によって予測を行うことで、予想対象画素との類似度が高いとされる画素を重視した予測となるため、ノイズへの頑健性を維持したまま予測精度を向上することが可能となる。 ADVANTAGE OF THE INVENTION According to this invention, the code amount required for the encoding of a prediction residual accompanying the prediction accuracy improvement at the time of performing prediction using a some reference pixel can be reduced. When performing prediction using a plurality of reference pixels, prediction is performed using a weighted average value or a weighted median value using a weighting factor based on the degree of difference between the prediction target pixel and the reference pixel. Therefore, it is possible to improve prediction accuracy while maintaining robustness against noise.

本発明は、映像に対応するデプスマップを使用しながら、前記映像を構成する各フレームを予め定められた大きさの分割した処理領域に対して、フレーム内予測を用いて映像信号の復号を行う映像復号方法であって、前記フレーム内予測における予測方向を設定するフレーム内予測方向設定ステップと、前記設定した予測方向に従って、前記処理領域内の画素または所定数の画素群ごとに、前記処理領域から所定の距離範囲内にある画素を参照画素の候補として設定する参照画素候補設定ステップと、前記処理領域に対応するデプスマップと前記参照画素の候補に対応するデプスマップとを用いて、前記処理領域内の画素または所定数の画素群ごとに、前記参照画素の各候補に対する相違度を算出する参照画素相違度算出ステップと、前記相違度に基づいて、前記処理領域内の画素または所定数の画素群ごとに、前記参照画素の候補の中から参照画素を決定する参照画素決定ステップと、前記設定した予測方向に従って、前記参照画素から前記処理領域に対する予測画像を生成する予測画像生成ステップとを有することを特徴とする。 The present invention decodes a video signal using intra-frame prediction for a processing region obtained by dividing each frame constituting the video into a predetermined size while using a depth map corresponding to the video. In the video decoding method, an intra-frame prediction direction setting step for setting a prediction direction in the intra-frame prediction, and the processing region for each pixel in the processing region or a predetermined number of pixels according to the set prediction direction A reference pixel candidate setting step for setting a pixel within a predetermined distance range from the reference region as a reference pixel candidate, a depth map corresponding to the processing region, and a depth map corresponding to the reference pixel candidate. A reference pixel dissimilarity calculating step for calculating a dissimilarity for each candidate of the reference pixels for each pixel in the region or a predetermined number of pixel groups; A reference pixel determining step for determining a reference pixel from among the reference pixel candidates for each pixel or a predetermined number of pixels in the processing region based on the degree, and from the reference pixel according to the set prediction direction And a predicted image generation step of generating a predicted image for the processing region.

本発明は、前記参照画素決定ステップは、前記相違度に基づいて、前記処理領域内の画素または所定数の画素群ごとに、前記参照画素の候補の中から複数の参照画素を決定し、前記予測画像生成ステップは、前記予測方向に従って、複数の前記参照画素を用いて予測画像を生成することを特徴とする。 In the present invention, the reference pixel determining step determines a plurality of reference pixels from the reference pixel candidates for each pixel in the processing region or a predetermined number of pixel groups based on the degree of difference, The predicted image generation step generates a predicted image using a plurality of the reference pixels according to the prediction direction.

本発明は、前記予測画像生成ステップは、前記処理領域内の画素または所定数の画素群ごとに決定された複数の前記参照画素の各々に対して、前記相違度に従って重み係数を設定する重み係数設定ステップを含み、前記予測方向に従って、前記重み係数を用いた複数の前記参照画素の重み付き平均値または重み付き中央値を用いて予測画像を生成することを特徴とする。 In the present invention, the predicted image generation step sets a weighting factor according to the degree of difference for each of the plurality of reference pixels determined for each pixel in the processing region or for a predetermined number of pixel groups. Including a setting step, and generating a predicted image using a weighted average value or a weighted median value of the plurality of reference pixels using the weighting coefficient according to the prediction direction.

本発明は、映像に対応するデプスマップを使用しながら、前記映像を構成する各フレームを予め定められた大きさに分割した処理領域に対して、フレーム内予測を用いて予測符号化を行う映像符号化装置であって、前記フレーム内予測における予測方向を設定するフレーム内予測方向設定手段と、前記設定した予測方向に従って、前記処理領域内の画素または所定数の画素群ごとに、前記処理領域から所定の距離範囲内にある画素を参照画素の候補として設定する参照画素候補設定手段と、前記処理領域に対応するデプスマップと前記参照画素の候補に対応するデプスマップとを用いて、前記処理領域内の画素または所定数の画素群ごとに、前記参照画素の各候補に対する相違度を算出する参照画素相違度算出手段と、前記相違度に基づいて、前記処理領域内の画素または所定数の画素群ごとに、前記参照画素の候補の中から参照画素を決定する参照画素決定手段と、前記設定した予測方向に従って、前記参照画素から前記処理領域に対する予測画像を生成する予測画像生成手段とを有することを特徴とする。 The present invention is a video in which predictive encoding is performed using intra-frame prediction on a processing region obtained by dividing each frame constituting the video into a predetermined size while using a depth map corresponding to the video. An encoding apparatus, an intra-frame prediction direction setting means for setting a prediction direction in the intra-frame prediction, and the processing region for each pixel or a predetermined number of pixels in the processing region according to the set prediction direction The reference pixel candidate setting means for setting a pixel within a predetermined distance range from the pixel as a reference pixel candidate, a depth map corresponding to the processing region, and a depth map corresponding to the reference pixel candidate. A reference pixel dissimilarity calculating means for calculating a dissimilarity for each candidate of the reference pixel for each pixel in the region or a predetermined number of pixel groups, and based on the dissimilarity Reference pixel determining means for determining a reference pixel from the reference pixel candidates for each pixel in the processing region or a predetermined number of pixels, and prediction from the reference pixel to the processing region according to the set prediction direction And a predicted image generating means for generating an image.

本発明は、前記参照画素相違度算出手段は、前記処理領域内の画素または所定数の画素群に対応する前記デプスマップの値と前記参照画素の候補に対応するデプスマップの値との差分値と、前記処理領域と前記参照画素との距離とを用いて、前記処理領域内の画素または所定数の画素群ごとに、前記参照画素の各候補に対する相違度を算出することを特徴とする。 In the present invention, the reference pixel dissimilarity calculating means may calculate a difference value between a value of the depth map corresponding to a pixel in the processing region or a predetermined number of pixel groups and a value of the depth map corresponding to the reference pixel candidate. And the distance between the processing region and the reference pixel is used to calculate the degree of difference for each candidate of the reference pixel for each pixel in the processing region or a predetermined number of pixel groups.

本発明は、前記参照画素決定手段は、各参照画素に対する前記相違度の合計値と、前記処理領域と各参照画素との距離のばらつきを表す値との重み付き和が最小になるように、前記処理領域内の画素または所定数の画素群ごとに、前記参照画素の候補の中から参照画素を決定することを特徴とする。 In the present invention, the reference pixel determining means may minimize the weighted sum of the total value of the differences with respect to each reference pixel and a value representing a variation in distance between the processing region and each reference pixel. A reference pixel is determined from the reference pixel candidates for each pixel in the processing region or a predetermined number of pixel groups.

本発明は、前記参照画素決定手段は、前記相違度に基づいて、前記処理領域内の画素または所定数の画素群ごとに、前記参照画素の候補の中から複数の参照画素を決定し、前記予測画像生成手段は、前記予測方法に従って、複数の前記参照画素を用いて予測画像を生成することを特徴とする。 In the present invention, the reference pixel determination means determines a plurality of reference pixels from the reference pixel candidates for each pixel in the processing region or a predetermined number of pixel groups based on the degree of difference, The predicted image generation means generates a predicted image using a plurality of the reference pixels according to the prediction method.

本発明は、前記予測画像生成手段は、前記処理領域内の画素または所定数の画素群ごとに決定された複数の前記参照画素の各々に対して、前記相違度に従って重み係数を設定する重み係数設定手段を含み、前記予測方法に従って、前記重み係数を用いた複数の前記参照画素の重み付き平均値または重み付き中央値を用いて予測画像を生成することを特徴とする。 In the present invention, the predicted image generation means sets a weighting factor according to the degree of difference for each of the plurality of reference pixels determined for each pixel in the processing region or a predetermined number of pixel groups. Including a setting unit, and generating a predicted image using a weighted average value or a weighted median value of the plurality of reference pixels using the weighting coefficient according to the prediction method.

本発明は、映像に対応するデプスマップを使用しながら、前記映像を構成する各フレームを予め定められた大きさの分割した処理領域に対して、フレーム内予測を用いて映像信号の復号を行う映像復号装置であって、前記フレーム内予測における予測方向を設定するフレーム内予測方向設定手段と、前記設定した予測方向に従って、前記処理領域内の画素または所定数の画素群ごとに、前記処理領域から所定の距離範囲内にある画素を参照画素の候補として設定する参照画素候補設定手段と、前記処理領域に対応するデプスマップと前記参照画素の候補に対応するデプスマップとを用いて、前記処理領域内の画素または所定数の画素群ごとに、前記参照画素の各候補に対する相違度を算出する参照画素相違度算出手段と、前記相違度に基づいて、前記処理領域内の画素または所定数の画素群ごとに、前記参照画素の候補の中から参照画素を決定する参照画素決定手段と、前記設定した予測方向に従って、前記参照画素から前記処理領域に対する予測画像を生成する予測画像生成手段とを有することを特徴とする。 The present invention decodes a video signal using intra-frame prediction for a processing region obtained by dividing each frame constituting the video into a predetermined size while using a depth map corresponding to the video. A video decoding apparatus, wherein an intra-frame prediction direction setting unit that sets a prediction direction in the intra-frame prediction, and the processing region for each pixel or a predetermined number of pixels in the processing region according to the set prediction direction The reference pixel candidate setting means for setting a pixel within a predetermined distance range from the pixel as a reference pixel candidate, a depth map corresponding to the processing region, and a depth map corresponding to the reference pixel candidate. A reference pixel dissimilarity calculating means for calculating a dissimilarity for each candidate of the reference pixel for each pixel in the region or a predetermined number of pixel groups, and based on the dissimilarity A reference pixel determining unit that determines a reference pixel from among the reference pixel candidates for each pixel or a predetermined number of pixels in the processing region, and from the reference pixel to the processing region according to the set prediction direction. And a predicted image generating means for generating a predicted image.

本発明は、前記参照画素決定手段は、前記相違度に基づいて、前記処理領域内の画素または所定数の画素群ごとに、前記参照画素の候補の中から複数の参照画素を決定し、前記予測画像生成手段は、前記予測方向に従って、複数の前記参照画素を用いて予測画像を生成することを特徴とする。 In the present invention, the reference pixel determination means determines a plurality of reference pixels from the reference pixel candidates for each pixel in the processing region or a predetermined number of pixel groups based on the degree of difference, The predicted image generation means generates a predicted image using a plurality of the reference pixels according to the prediction direction.

本発明は、前記予測画像生成手段は、前記処理領域内の画素または所定数の画素群ごとに決定された複数の前記参照画素の各々に対して、前記相違度に従って重み係数を設定する重み係数設定手段を含み、前記予測方向に従って、前記重み係数を用いた複数の前記参照画素の重み付き平均値または重み付き中央値を用いて予測画像を生成することを特徴とする。 In the present invention, the predicted image generation means sets a weighting factor according to the degree of difference for each of the plurality of reference pixels determined for each pixel in the processing region or a predetermined number of pixel groups. Including a setting unit, and generating a predicted image using a weighted average value or a weighted median value of the plurality of reference pixels using the weighting coefficient according to the prediction direction.

本発明は、前記映像符号化方法をコンピュータに実行させるための映像符号化プログラムである。 The present invention is a video encoding program for causing a computer to execute the video encoding method.

本発明は、前記映像復号方法をコンピュータに実行させるための映像復号プログラムである。 The present invention is a video decoding program for causing a computer to execute the video decoding method.

本発明によれば、映像がその映像に対するデプスマップのように被写体に大きく依存した値を持つデータと一緒に伝送される場合、ブロック端にオクルージョンや雑音を含むブロックや、予測方向に対して複数の被写体が存在するブロックに対して、効率的なフレーム内予測を実行することが可能となり、符号量を削減することが可能になるという効果が得られる。 According to the present invention, when a video is transmitted together with data having a value that greatly depends on a subject, such as a depth map for the video, a block including occlusion and noise at the block end, a plurality of blocks with respect to a prediction direction, This makes it possible to perform efficient intra-frame prediction on a block in which a subject exists, and to reduce the amount of codes.

本発明の一実施形態における映像符号化装置の構成を示すブロック図である。It is a block diagram which shows the structure of the video coding apparatus in one Embodiment of this invention. 図１に示す映像符号化装置の処理動作を示すフローチャートである。It is a flowchart which shows the processing operation of the video coding apparatus shown in FIG. 本発明の一実施形態における映像復号装置の構成を示すブロック図である。It is a block diagram which shows the structure of the video decoding apparatus in one Embodiment of this invention. 図３に示す映像復号装置の処理動作を示すフローチャートである。It is a flowchart which shows the processing operation of the video decoding apparatus shown in FIG. 映像符号化装置をコンピュータとソフトウェアプログラムとによって構成する場合のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example in the case of comprising a video coding apparatus with a computer and a software program. 映像復号装置をコンピュータとソフトウェアプログラムとによって構成する場合のハードウェア構成例を示す図である。FIG. 25 is a diagram illustrating a hardware configuration example in a case where a video decoding device is configured by a computer and a software program. 垂直下向きのフレーム内予測を使用する際の参照画素の候補を示す図である。It is a figure which shows the candidate of a reference pixel at the time of using the vertically downward intra prediction. 参照ラインを用いて参照画素を表現する際の例を示す図である。It is a figure which shows the example at the time of expressing a reference pixel using a reference line. 垂直下向きのフレーム内予測を使用する際に角度幅をもって参照画素の候補を設定した例を示す図である。It is a figure which shows the example which set the candidate of the reference pixel with the angle width when using the vertically downward intra prediction. 垂直下向きのフレーム内予測を使用する際に画素幅をもって参照画素の候補をした例を示す図である。It is a figure which shows the example which made the candidate of a reference pixel with a pixel width when using the vertically downward intra prediction.

以下、図面を参照して、本発明の一実施形態による映像符号化装置および映像復号装置を説明する。 Hereinafter, a video encoding device and a video decoding device according to an embodiment of the present invention will be described with reference to the drawings.

＜映像符号化装置＞
図１は同実施形態における映像符号化装置の構成を示すブロック図である。映像符号化装置１００は、符号化対象映像入力部１０１、入力フレームメモリ１０２、デプスマップ入力部１０３、デプスマップメモリ１０４、予測方向設定部１０５、参照画素決定部１０６、予測画像生成部１０７、予測方向符号化部１０８、映像信号符号化部１０９、映像信号復号部１１０、参照フレームメモリ１１１および多重化部１１２を備えている。 <Video encoding device>
FIG. 1 is a block diagram showing a configuration of a video encoding apparatus in the embodiment. The video encoding device 100 includes an encoding target video input unit 101, an input frame memory 102, a depth map input unit 103, a depth map memory 104, a prediction direction setting unit 105, a reference pixel determination unit 106, a predicted image generation unit 107, a prediction A direction encoding unit 108, a video signal encoding unit 109, a video signal decoding unit 110, a reference frame memory 111, and a multiplexing unit 112 are provided.

符号化対象映像入力部１０１は、外部から符号化対象となる映像を入力する。以下の説明において、この符号化対象となる映像のことを符号化対象映像と呼び、特に処理を行うフレームを符号化対象フレームまたは符号化対象画像と呼ぶ。入力フレームメモリ１０２は、入力された符号化対象映像を記憶する。デプスマップ入力部１０３は、符号化対象映像に対応するデプスマップを入力する。このデプスマップは符号化対象映像の各フレームの各画素に写っている被写体のデプスを表すものである。デプスマップメモリ１０４は、入力されたデプスマップを記憶する。 The encoding target video input unit 101 inputs video to be encoded from the outside. In the following description, the video to be encoded is referred to as an encoding target video, and a frame to be processed in particular is referred to as an encoding target frame or an encoding target image. The input frame memory 102 stores the input encoding target video. The depth map input unit 103 inputs a depth map corresponding to the encoding target video. This depth map represents the depth of the subject shown in each pixel of each frame of the encoding target video. The depth map memory 104 stores the input depth map.

予測方向設定部１０５は、予測画像を生成するにあたって、フレーム内予測を行う方向を設定する。参照画素決定部１０６は、設定された予測方向に存在する符号化済み画素群と処理領域に対するデプスマップを用いて、処理領域の画素ごとに、予測画像を生成する際の参照画素を決定する。予測画像生成部１０７は、決定された参照画素に基づいて予測画像を生成する。 The prediction direction setting unit 105 sets a direction in which intra-frame prediction is performed when generating a predicted image. The reference pixel determination unit 106 determines a reference pixel for generating a predicted image for each pixel in the processing region, using a coded pixel group existing in the set prediction direction and a depth map for the processing region. The predicted image generation unit 107 generates a predicted image based on the determined reference pixel.

予測方向符号化部１０８は、予測方向設定部１０５において設定した予測方向を示す情報（予測情報）を符号化する。映像信号符号化部１０９は、生成された予測画像を用いて、符号化対象フレームの映像信号を予測符号化する。映像信号復号部１１０は、生成された予測画像を用いて、生成された符号データを復号して復号フレームを生成する。参照フレームメモリ１１１は、生成された復号フレームを記憶する。多重化部１１２は、予測情報の符号データと映像信号の符号データとを多重化して外部へ出力する。 The prediction direction encoding unit 108 encodes information (prediction information) indicating the prediction direction set in the prediction direction setting unit 105. The video signal encoding unit 109 predictively encodes the video signal of the encoding target frame using the generated predicted image. The video signal decoding unit 110 decodes the generated code data using the generated predicted image to generate a decoded frame. The reference frame memory 111 stores the generated decoded frame. The multiplexing unit 112 multiplexes the code data of the prediction information and the code data of the video signal and outputs them to the outside.

次に、図２を参照して、図１に示す映像符号化装置１００の動作を説明する。図２は、図１に示す映像符号化装置１００の動作を示すフローチャートである。ここでは符号化対象映像中のある１フレームを符号化する処理について説明する。説明する処理をフレームごとに繰り返すことで、映像の符号化が実現できる。 Next, the operation of the video encoding device 100 shown in FIG. 1 will be described with reference to FIG. FIG. 2 is a flowchart showing the operation of the video encoding device 100 shown in FIG. Here, a process of encoding one frame in the video to be encoded will be described. By repeating the processing described for each frame, video encoding can be realized.

まず、符号化対象映像入力部１０１は、符号化対象フレームを入力し、入力した符号化対象フレームを入力フレームメモリ１０２に記憶する。一方、デプスマップ入力部１０３は、符号化対象フレームに対するデプスマップを入力し、入力したデプスマップをデプスマップメモリ１０４に記憶する（ステップＳ１０１）。なお、ここでは入力された符号化対象フレームが順次符号化されるものとしているが、入力順と符号化順は必ずしも一致している必要はない。入力順と符号化順が異なる場合は、次に符号化するフレームが入力されるまで、入力されたフレーム及びデプスマップは入力フレームメモリ１０２及びデプスマップメモリ１０４に記憶する。記憶された符号化対象フレーム及びデプスマップは、以下で説明する符号化処理によって符号化されたら、各メモリから削除しても構わない。 First, the encoding target video input unit 101 inputs an encoding target frame and stores the input encoding target frame in the input frame memory 102. On the other hand, the depth map input unit 103 inputs a depth map for the encoding target frame, and stores the input depth map in the depth map memory 104 (step S101). Note that although the input encoding target frames are sequentially encoded here, the input order and the encoding order do not necessarily match. If the input order and the encoding order are different, the input frame and depth map are stored in the input frame memory 102 and the depth map memory 104 until the next frame to be encoded is input. The stored encoding target frame and depth map may be deleted from each memory after being encoded by the encoding process described below.

なお、ステップＳ１０１において入力するデプスマップは、既に符号化済みのデプスマップを復号したものなど、復号側で得られるデプスマップとする。これは、映像復号装置で得られる情報と全く同じ情報を用いることで、ドリフト等の符号化ノイズの発生を抑えるためである。ただし、そのような符号化ノイズの発生を許容する場合には、符号化前のオリジナルのものが入力されても構わない。その他の復号側で得られるデプスマップの例としては、別の視点の符号化済みデプスマップを復号したものを用いて合成されたデプスマップや、別の視点の符号化済み画像群を復号したものからステレオマッチング等によって推定したデプスマップなどがある。 Note that the depth map input in step S101 is a depth map obtained on the decoding side, such as a depth map that has already been encoded. This is to suppress the occurrence of coding noise such as drift by using exactly the same information as the information obtained by the video decoding apparatus. However, when the generation of such encoding noise is allowed, the original one before encoding may be input. Examples of depth maps obtained on the other decoding side include a depth map synthesized using a decoded depth map of another viewpoint and a decoded image group of another viewpoint. There is a depth map estimated by stereo matching or the like.

符号化対象フレームとデプスマップの記憶が終了したら、符号化対象フレームを予め定められた大きさの領域に分割し、分割した領域ごとに、符号化対象フレームの映像信号を符号化する（ステップＳ１０２〜Ｓ１１１）。すなわち、符号化対象領域インデックスをｂｌｋ、１フレーム中の総符号化対象領域数をｎｕｍＢｌｋｓで表すとすると、ｂｌｋを０で初期化し（ステップＳ１０２）、その後、ｂｌｋに１を加算しながら（ステップＳ１１０）、ｂｌｋがｎｕｍＢｌｋｓになるまで（ステップＳ１１１）、以下の処理（ステップＳ１０３〜ステップＳ１０９）を繰り返す。一般的な符号化では１６画素ｘ１６画素のマクロブロックと呼ばれる処理単位ブロックへ分割するが、復号側と同じであればその他の大きさのブロックに分割しても構わない。 When the encoding target frame and the depth map are stored, the encoding target frame is divided into regions of a predetermined size, and the video signal of the encoding target frame is encoded for each divided region (step S102). ~ S111). That is, assuming that the encoding target area index is blk and the total number of encoding target areas in one frame is represented by numBlks, blk is initialized to 0 (step S102), and then 1 is added to blk (step S110). ), The following processing (step S103 to step S109) is repeated until blk becomes numBlks (step S111). In general encoding, it is divided into processing unit blocks called macroblocks of 16 pixels × 16 pixels, but may be divided into blocks of other sizes as long as they are the same as those on the decoding side.

符号化対象領域ごとに繰り返される処理において、予測方向設定部１０５は、フレーム内予測で用いる予測方向を設定する（ステップＳ１０３）。予測方向とは、符号化対象領域ｂｌｋの映像信号を予測する際に参照する既に符号化済み画素が存在する方向のことである。符号化対象領域ｂｌｋ内の各画素について、この予測方向で示される元に存在する既に符号化済み画素群の復号画素値を用いて予測画像が生成される。ここで設定する予測方向はどのように決定しても構わない。ただし、符号化効率を最大にする場合は、設定した予測方向によって生成される予測画像による予測効率を評価し、その予測効率が最大になるものに決定したほうがよい。 In the process repeated for each encoding target region, the prediction direction setting unit 105 sets a prediction direction used for intra-frame prediction (step S103). The prediction direction is a direction in which already encoded pixels exist that are referred to when predicting the video signal of the encoding target region blk. For each pixel in the encoding target region blk, a prediction image is generated using the decoded pixel value of the already encoded pixel group existing in the source indicated by the prediction direction. The prediction direction set here may be determined in any way. However, when maximizing the coding efficiency, it is better to evaluate the prediction efficiency based on the prediction image generated according to the set prediction direction and determine the one that maximizes the prediction efficiency.

すなわち、予測方向をｖとするとき、（１）式で与えられる予測方向Ｖ_ｂｌｋを、符号化対象領域ｂｌｋに対する予測方向に決定する。なお、予測方向はどのように表現しても構わない。ここでは、設定可能な予測方向の候補をＮｕｍＶとし、０からＮｕｍＶ−１までの整数で表されるインデックス値で予測方向を示すものとする。各インデックス値はそれぞれ異なる向き（２次元ベクトル）に対応している。

That is, when the prediction direction is v, the prediction direction V _blk given by the equation (1) is determined as the prediction direction for the encoding target region blk. Note that the prediction direction may be expressed in any way. Here, a settable prediction direction candidate is NumV, and the prediction direction is indicated by an index value represented by an integer from 0 to NumV-1. Each index value corresponds to a different direction (two-dimensional vector).

なお、Ｅ_ｂｌｋ（ｖ）の値は小さいほど符号化効率が高いことを示しており、ａｒｇｍｉｎは与えられた関数を最小化するパラメータを返す関数を示す。最小化対象のパラメータはａｒｇｍｉｎの下部で与えられる。予測方向の候補には任意のものを使用しても構わない。例えば、Ｈ．２６４／ＡＶＣの４×４ブロックにおけるイントラ予測と同様に８種類の予測方向を候補としても構わないし、文献「K. McCann, W.-J. Han, and I. Kim, “Samsung's Response to the Call for Proposals on Video Compression Technology", Input document to Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, JCTVC-A124, April 2010.」のようにさらに多数の予測方向を候補としても構わない。 Note that the smaller the value of E _blk (v) is, the higher the encoding efficiency is, and argmin is a function that returns a parameter that minimizes a given function. The parameter to be minimized is given at the bottom of argmin. Any prediction direction candidates may be used. For example, H.M. Similarly to intra prediction in H.264 / AVC 4 × 4 block, eight kinds of prediction directions may be used as candidates, and the document “K. McCann, W.-J. Han, and I. Kim,“ Samsung's Response to the Call ”. for Proposals on Video Compression Technology ", Input document to Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO / IEC JTC1 / SC29 / WG11, JCTVC-A124, April 2010. '' A large number of prediction directions may be used as candidates.

また、常に固定である必要もなく、ピクチャタイプやブロックサイズなどに応じて、候補を変更しても構わない。例えば、Ｈ．２６４／ＡＶＣでは、４×４ブロックでは前述の通り８通りの予測方向を使用可能としているが、１６×１６ブロックでは３種類の予測方向のみを使用可能としている。予測効率の評価値Ｅ_ｂｌｋ（ｖ）には、どのようなものを用いても構わない。例えば、（２）式、（３）式で表される符号化対象画像と予測画像の差分絶対値和（ＳＡＤ）や差分二乗和（ＳＳＤ）を用いても構わない。

Further, it is not always necessary to be fixed, and the candidate may be changed according to the picture type, block size, or the like. For example, H.M. In H.264 / AVC, 8 types of prediction directions can be used in the 4 × 4 block as described above, but only 3 types of prediction directions can be used in the 16 × 16 block. Any evaluation efficiency evaluation value E _blk (v) may be used. For example, the sum of absolute differences (SAD) or sum of squares of differences (SSD) between the encoding target image and the predicted image represented by the equations (2) and (3) may be used.

ここで、Ｏｒｇは符号化対象フレームを示し、Ｐｒｅｄ_Ｖは予測方向をｖとした際に後述する方法で生成された予測画像を示す。これら以外に符号化対象画像と予測画像の差分値をＤＣＴやアダマール変換などを用いて変換した値を用いた方法がある。その変換を行列Ａで表すと、（４）式で表すことができる。なお、‖Ｘ‖はＸのノルムを表す。

Here, Org indicates an encoding target frame, and Pred _V indicates a predicted image generated by a method described later when the prediction direction is v. In addition to these methods, there is a method using a value obtained by converting a difference value between an encoding target image and a predicted image using DCT, Hadamard transform, or the like. If the transformation is represented by matrix A, it can be represented by equation (4). Note that ‖X‖ represents the norm of X.

また、上記のように符号化対象フレームと予測画像との乖離度のみを評価する方法ではなく、発生する符号量と歪み量を鑑みたＲＤコストを用いても構わない。ここで用いるＲＤコストは、Ｐｒｅｄ_Ｖを予測画像としてＯｒｇを符号化した際の符号量Ｒ_ｂｌｋ（ｖ）と歪み量Ｄ_ｂｌｋ（ｖ）とを用いて、（５）式で表すことができる。なお、λはラグランジュの未定乗数であり、予め定められた値を用いる。

In addition, as described above, instead of a method for evaluating only the degree of divergence between the encoding target frame and the predicted image, an RD cost in consideration of the generated code amount and distortion amount may be used. The RD cost used here can be expressed by Equation (5) using the code amount R _blk (v) and the distortion amount D _blk (v) when Org is encoded using Pred _V as a predicted image. Note that λ is a Lagrange multiplier, and a predetermined value is used.

正確なＲＤコストを算出するためには、予測残差の符号化および復号を行う必要がある。これには非常に多くの演算を必要とするため、予測情報の符号量Ｒ_ｂｌｋ’（ｖ）と、予測画像における符号化対象フレームからの歪み量Ｄ_ｂｌｋ’（ｖ）とを用いて算出したＲＤコストを用いて予測方向ｖを決定しても構わない。このとき、ラグランジュの未定乗数はλとは異なる値を用いることが多い。また、符号量Ｒ_ｂｌｋ’も実際に符号化して求めず、簡易な方法で符号量Ｒ_ｂｌｋ’の推定量を求めて、それを使用することで更なる高速化を図ることも可能である。 In order to calculate an accurate RD cost, it is necessary to encode and decode a prediction residual. Since this requires a large number of operations, the calculation is performed using the code amount R _blk ′ (v) of the prediction information and the distortion amount D _blk ′ (v) from the encoding target frame in the prediction image. The prediction direction v may be determined using the RD cost. At this time, Lagrange's undetermined multiplier often uses a value different from λ. Further, the code amount R _blk ′ is not actually obtained by encoding, but an estimated amount of the code amount R _blk ′ is obtained by a simple method, and it is possible to further increase the speed by using it.

予測方向の設定ができたら、参照画素決定部１０６は、符号化対象領域ｂｌｋと参照画素の候補に対するデプスマップを用いて、符号化対象領域の画素ごとに、予測画像を生成する際の参照画素を決定する（ステップＳ１０４）。参照画素の候補は符号化対象領域の画素ごとに設定され、符号化対象領域の画素に対して予測方向の元となる方向へ辿っていくことで見つかる既に符号化済みの画素の集合である。 When the prediction direction is set, the reference pixel determining unit 106 uses the depth map for the encoding target region blk and the reference pixel candidate to generate a reference pixel for generating a prediction image for each pixel in the encoding target region. Is determined (step S104). The reference pixel candidates are set for each pixel in the encoding target area, and are a set of already encoded pixels that are found by tracing the pixels in the encoding target area in the original direction of the prediction direction.

すなわち、上のブロックから予測を行う垂直予測の場合、予測方向の元となる方向は上であり、図７に示すように符号化対象画素から上に辿ることで見つかる既に符号化済みの画素の集合である。参照画素を決定するに当たって、符号化対象領域またはその画素からの距離を選択基準の一つとして使用しても構わない。本実施形態では、予測方向がｖの際のある画素ｐの参照画素の候補集合を｛ｒｐ_{ｐ，ｖ，ｉ}｜ｉ＝１，２，…Ｎ_ｐ，ｖ｝で表す。Ｎ_ｐは参照画素の候補数であり、ｉの値が小さいほど符号化対象画素に近い画素であることを示すものとする。 That is, in the case of vertical prediction in which prediction is performed from the upper block, the direction from which the prediction direction is based is up, and as shown in FIG. 7, the already encoded pixels found by tracing up from the encoding target pixel It is a set. In determining the reference pixel, the encoding target region or the distance from the pixel may be used as one of selection criteria. In the present embodiment, a candidate set of reference pixels of a pixel p having a prediction direction v is represented by {rp _{p, v, i} | i = 1, 2,... N _{p, v} }. N _p is the number of reference pixel candidates, and the smaller the value of i, the closer the pixel is to the encoding target pixel.

なお、参照画素の候補数は固定であっても可変であっても構わない。例えば、候補数を予め指定しておいても構わないし、遡る距離を予め設定しておいても構わないし、それらの組み合わせで指定しても構わない。指定された条件や数は、ブロック・スライス・フレーム・ＧＯＰ（Group Of Pictures）などの単位で決定し、その値をブロックヘッダ・スライスヘッダ・ピクチャヘッダ・ＧＯＰヘッダなどの位置で符号化しても構わない。遡る距離は符号化対象ブロック端からの距離として定義しても構わないし、符号化対象の画素からの距離として定義しても構わない。なお、常に固定の条件を用いる場合は、その情報を符号化する必要はない。 Note that the number of reference pixel candidates may be fixed or variable. For example, the number of candidates may be specified in advance, the retroactive distance may be set in advance, or a combination thereof may be specified. The specified condition and number are determined in units of block, slice, frame, GOP (Group Of Pictures), etc., and the value may be encoded at positions such as block header, slice header, picture header, GOP header. Absent. The retroactive distance may be defined as the distance from the end of the encoding target block, or may be defined as the distance from the encoding target pixel. If a fixed condition is always used, it is not necessary to encode the information.

また、符号化対象領域の画素ごとに、図８に示すｒｅｆｌｉｎｅ＿ｘ及びｒｅｆｌｉｎｅ＿ｙを決定することで、参照画素を決定しても構わない。図中、Ｍｘ及びＭｙは符号化対象領域または符号化対象画素から参照画素までの最大距離によって定義される値である。この場合、ｒｅｆｌｉｎｅ＿ｘ及びｒｅｆｌｉｎｅ＿ｙの組み合わせが参照画素の候補となる。ただし、予測方向によってはｒｅｆｌｉｎｅ＿ｘまたはｒｅｆｌｉｎｅ＿ｙのどちらか一方のみを決定するだけで良い場合もある。例えば、垂直下向きに予測を行う場合はｒｅｆｌｉｎｅ＿ｘが不要であり、水平右向きに予測を行う場合はｒｅｆｌｉｎｅ＿ｙが不要である。 Further, the reference pixel may be determined by determining refline_x and refline_y shown in FIG. 8 for each pixel in the encoding target region. In the figure, Mx and My are values defined by the maximum distance from the encoding target region or the encoding target pixel to the reference pixel. In this case, a combination of refline_x and refline_y is a reference pixel candidate. However, depending on the prediction direction, only one of refline_x and refline_y may be determined. For example, refline_x is unnecessary when prediction is performed vertically downward, and refline_y is not required when prediction is performed horizontally right.

参照画素は、（６）式で表される通り、設定した参照画素の候補中で、符号化対象領域の画素との相違度が最小となる画素として決定する。

As represented by Expression (6), the reference pixel is determined as a pixel having a minimum difference from the pixel in the encoding target region among the set reference pixel candidates.

Ｄｉｓｓｉｍｉｌａｒｉｔｙは与えられた２つの画素の相違度を示すものであり、例えば（７）式、（８）式で表されるデプスマップの値の差分絶対値や差分二乗値を用いることが可能である。

Dissimilarity indicates the degree of dissimilarity between two given pixels. For example, it is possible to use a difference absolute value or a difference square value of the depth map values expressed by the equations (7) and (8). .

また、デプス値の違いだけでなく、符号化対象領域または符号化対象領域の画素から参照画素までの距離を考慮した尺度を用いることも可能である。例えば、（９）式を用いても構わない。

It is also possible to use a scale that considers not only the difference in depth value, but also the distance from the encoding target area or the pixels in the encoding target area to the reference pixel. For example, equation (9) may be used.

αは別途定められた重み係数であり、Ｄｉｓｔａｎｃｅは与えられた２つの画素の距離を示すものである。前述の定義を用いる場合は参照画素の候補がｒｐ_{ｐ，ｖ，ｉ}の場合は、ｉを距離として用いることができる。また、ｒｅｆｌｉｎｅ＿ｘとｒｅｆｌｉｎｅ＿ｙの合計値を距離として用いることも可能である。距離を考慮することで、デプスマップの値による相違度が一緒の場合に、より符号化対象領域に近い画素を参照画素として使用することになる。一般に、空間的な距離が近いほど相関が高いと考えられるため、このような距離を考慮した尺度を用いることで、より適切な参照画素を選択することが可能となる。 α is a weighting factor determined separately, and distance indicates the distance between two given pixels. When the above definition is used, if the reference pixel candidate is rp _{p, v, i} , i can be used as the distance. Also, the total value of refline_x and refline_y can be used as the distance. By considering the distance, when the degree of difference due to the depth map value is the same, a pixel closer to the encoding target region is used as a reference pixel. In general, it is considered that the closer the spatial distance is, the higher the correlation is. Therefore, it is possible to select a more appropriate reference pixel by using a scale considering such a distance.

更に、符号化対象領域または符号化対象領域内の画素から参照画素までの距離を考慮する際に、その距離の符号化対象領域内の画素毎の変化を考慮しても構わない。その場合、符号化対象領域の各画素に対する参照画素は、（１０）式で与えられる。

Furthermore, when considering the distance from the encoding target area or the pixel in the encoding target area to the reference pixel, the change of the distance for each pixel in the encoding target area may be considered. In this case, the reference pixel for each pixel in the encoding target area is given by the expression (1 0 ).

ここで、集合Ｘは符号化対象領域内の画素に対する参照画素の集合であり、ｘ_ｐはＸの第ｐ元、すなわち画素ｐに対する参照画素を示す。画素ｐに対する参照画素は前述と同様に候補｛ｒｐ_{ｐ，ｖ，ｉ}｜ｉ＝１，２，…Ｎ_ｐ，ｖ｝の中から選択される。Ｅは符号化対象領域内の画素間の隣接関係を表す集合であり、（ｐ，ｑ）∈Ｅのとき画素ｐと画素ｑは隣接していると考える。また、ｈ_ｐｑは画素毎に選択される参照画素のばらつきを抑えるための平滑化項で、ここでは参照画素間の距離に対する増加関数であれば、どのようなものを用いても構わない。なお、参照画素間の距離としては、画素間の距離を計算して用いても構わないし、それぞれの参照画素に対する参照画素と符号化対象画素との距離を求め、その差を用いても構わない。それぞれの参照画素に対する参照画素と符号化対象画素との距離の差は、ｘ_ｐがｒｐ_{ｐ，ｖ，ｉ}で、ｘ_ｑがｒｐ_{ｑ，ｖ，ｊ}の時、参照画素間の距離は｜ｉ−ｊ｜で得られる。 Here, the set X is a set of reference pixels for the pixels in the encoding target region, and x _p represents the _{p-th element} of X, that is, the reference pixels for the pixel p. The reference pixel for the pixel p is selected from the candidates {rp _{p, v, i} | i = 1, 2,... N _{p, v} } as described above. E is a set representing the adjacency relationship between pixels in the encoding target region, and it is considered that the pixel p and the pixel q are adjacent when (p, q) εE. Further, _hpq is a smoothing term for suppressing the variation of the reference pixels selected for each pixel, and any one can be used here as long as it is an increasing function with respect to the distance between the reference pixels. Note that the distance between the reference pixels may be calculated and used, or the distance between the reference pixel and the encoding target pixel for each reference pixel may be obtained and the difference between them may be used. . The difference in distance between the reference pixel and the encoding target pixel with respect to each reference pixel is that when x _p is rp _{p, v, i} and x _q is rp _{q, v, j} , the distance between the reference pixels is | i −j |.

（１０）式の最小化はグラフカット等のマルコフ確率場の最小化問題を解くアルゴリズムを用いて解くことができる。ただし、厳密な意味で右辺を最小化する解を求めるのが困難な場合があるため、近似解を用いたとしても構わない。なお、近似解を用いる場合は、復号側でも同様の解が得られる必要があるため、予め定められた条件を用いて最小化をやめるか、最小化の停止条件を符号化して伝送する必要がある。 The minimization of equation (10) can be solved using an algorithm that solves the Markov random field minimization problem such as graph cut. However, since it may be difficult to find a solution that minimizes the right side in a strict sense, an approximate solution may be used. When an approximate solution is used, it is necessary to obtain a similar solution on the decoding side. Therefore, it is necessary to stop minimization using a predetermined condition or to encode and transmit a stop condition for minimization. is there.

また、（１０）式では符号化対象領域の全ての画素を対象に最小化を行っているが、演算量を考慮して、符号化対象領域内の行毎又は列毎など、部分集合毎に最小化を行って、その結果を近似解として用いることも可能である。特に隣接関係が１次元に限定された場合、動的計画法を用いることで、その部分集合に対する最小解を高速に見つけることが可能となる。 In addition, in equation (10), minimization is performed for all pixels in the encoding target area, but in consideration of the amount of calculation, for each subset such as for each row or column in the encoding target area. It is also possible to perform minimization and use the result as an approximate solution. In particular, when the adjacency relationship is limited to one dimension, the minimum solution for the subset can be found at high speed by using dynamic programming.

次に、参照画素が決定したら、予測画像生成部１０７は、予測画像を生成する（ステップＳ１０５）。予測画像は一般的なフレーム内予測と同様に、画素毎に、参照画素の画素値を割り当てることで生成する。すなわち、予測画像は（１１）式に基づいて生成される。Ｄｅｃは参照フレームメモリ１１１に記憶されている符号化対象画像の既に符号化済み領域に対する復号画像を示す。

Next, when the reference pixel is determined, the predicted image generation unit 107 generates a predicted image (step S105). A prediction image is generated by assigning a pixel value of a reference pixel for each pixel, as in general intra-frame prediction. That is, the predicted image is generated based on the equation (11). Dec indicates a decoded image corresponding to an already encoded area of the encoding target image stored in the reference frame memory 111.

次に、予測画像の生成が終了したら、予測方向符号化部１０８は予測方向の符号化を行う（ステップＳ１０６）。予測方向の符号化にはどのような方法を用いても構わない。ただし、正しく復号可能にするためには、復号側で復号可能な方式を用いる必要がある。例えば、予測方向のインデックス値を予め定められた方法を用いて２値化して２値算術符号化によって符号化しても構わない。また、隣接ブロックを符号化する際に使用されたフレーム内予測の予測方向から、予測方向の予測値を生成し、その予測が正しいか否かに加え、予測が誤っている場合にのみ予測方向を符号化する方法もある。 Next, when the generation of the prediction image is completed, the prediction direction encoding unit 108 performs encoding of the prediction direction (step S106). Any method may be used for encoding the prediction direction. However, in order to enable correct decoding, it is necessary to use a method that can be decoded on the decoding side. For example, the index value in the prediction direction may be binarized using a predetermined method and encoded by binary arithmetic coding. In addition, a prediction value of the prediction direction is generated from the prediction direction of the intra-frame prediction used when coding the adjacent block, and in addition to whether the prediction is correct or not, the prediction direction only when the prediction is incorrect There is also a method of encoding.

次に、生成された予測画像を用いて、映像信号符号化部１０９は符号化対象フレームＯｒｇの符号化対象領域ｂｌｋにおける映像信号を符号化する（ステップＳ１０７）。復号側で正しく復号可能であるならば、符号化にはどのような方法を用いても構わない。ＭＰＥＧ−２やＨ．２６４／ＡＶＣなどの一般的な符号化では、ブロックｂｌｋの映像信号と予測画像との差分信号に対して、ＤＣＴなどの周波数変換、量子化、２値化、エントロピー符号化を順に施すことで符号化を行う。 Next, using the generated predicted image, the video signal encoding unit 109 encodes the video signal in the encoding target region blk of the encoding target frame Org (step S107). Any method may be used for encoding as long as decoding is possible on the decoding side. MPEG-2 and H.264 In general encoding such as H.264 / AVC, encoding is performed by sequentially performing frequency conversion such as DCT, quantization, binarization, and entropy encoding on a difference signal between a video signal of a block blk and a predicted image. Do.

次に、映像信号復号部１１０は、映像信号の符号データと予測画像とを用いて、ブロックｂｌｋに対する映像信号を復号し、復号結果であるところの復号フレームＤｅｃ［ｂｌｋ］を参照フレームメモリ１１１に記憶する（ステップＳ１０８）。ここでは、符号化時に用いた手法に対応する手法を用いる。例えば、ＭＰＥＧ−２やＨ．２６４／ＡＶＣなどの一般的な符号化であれば、符号データに対して、エントロピー復号、逆２値化、逆量子化、ＩＤＣＴなどの周波数逆変換を順に施し、得られた２次元信号に対して予測画像を加え、最後に画素値の値域でクリッピングを行うことで映像信号を復号する。 Next, the video signal decoding unit 110 decodes the video signal for the block blk using the code data of the video signal and the predicted image, and the decoded frame Dec [blk] as a decoding result is stored in the reference frame memory 111. Store (step S108). Here, a method corresponding to the method used at the time of encoding is used. For example, MPEG-2 and H.264. In general encoding such as H.264 / AVC, the code data is subjected to frequency inverse transform such as entropy decoding, inverse binarization, inverse quantization, and IDCT in order, and the obtained two-dimensional signal Then, the predicted image is added, and finally the video signal is decoded by performing clipping in the pixel value range.

なお、符号化側での処理がロスレスになる直前のデータと予測画像を受け取り、簡略化した復号処理によって復号処理を行っても構わない。すなわち、前述の例であれば符号化時に量子化処理を加えた後の値と予測画像を受け取り、その量子化後の値に逆量子化、周波数逆変換を順に施して得られた２次元信号に対して予測画像を加え、画素値の値域でクリッピングを行うことで映像信号を復号しても構わない。 Note that the data immediately before the process on the encoding side becomes lossless and the predicted image may be received, and the decoding process may be performed by a simplified decoding process. That is, in the above-described example, a two-dimensional signal obtained by receiving a value obtained by applying quantization processing at the time of encoding and a predicted image, and sequentially performing inverse quantization and frequency inverse transform on the quantized value. It is also possible to decode a video signal by adding a predicted image to the image and clipping in the range of pixel values.

次に、多重化部１１２は、予測方向の符号データと、映像信号の符号データとを多重化して出力する（ステップＳ１０９）。なお、ここではブロックごとに多重化しているが、フレーム単位で多重化しても構わない。ただし、その場合には、復号時に１フレーム分の符号データをバッファリングしてから復号する必要が生じる。 Next, the multiplexing unit 112 multiplexes and outputs the code data in the prediction direction and the code data of the video signal (step S109). In addition, although it multiplexed for every block here, you may multiplex by a frame unit. However, in that case, it is necessary to decode the code data for one frame at the time of decoding.

なお、図２に示す一部の処理は、その順序が前後しても構わない。具体的には、ステップＳ１０６の予測方向の符号化処理は、ステップＳ１０３の予測方向の決定処理よりも後ろであればいつ行っても構わないし、ステップＳ１０９の符号データの多重化処理は、ステップＳ１０６の予測方向の符号化処理とステップＳ１０７の映像信号の符号化処理より後ろであればどのタイミングで行っても構わない。 Note that the order of some processes shown in FIG. 2 may be changed. Specifically, the encoding process of the prediction direction in step S106 may be performed any time after the determination process of the prediction direction in step S103, and the multiplexing process of the code data in step S109 is performed in step S106. Any timing may be used as long as it is after the encoding process in the prediction direction and the encoding process of the video signal in step S107.

＜映像復号装置＞
次に、映像復号装置について説明する。図３は、本発明の一実施形態による映像復号装置の構成を示すブロック図である。映像復号装置２００は、符号データ入力部２０１、符号データメモリ２０２、デプスマップ入力部２０３、デプスマップメモリ２０４、分離部２０５、予測方向復号部２０６、参照画素決定部２０７、予測画像生成部２０８、映像信号復号部２０９および参照フレームメモリ２１０を備えている。 <Video decoding device>
Next, the video decoding device will be described. FIG. 3 is a block diagram showing a configuration of a video decoding apparatus according to an embodiment of the present invention. The video decoding apparatus 200 includes a code data input unit 201, a code data memory 202, a depth map input unit 203, a depth map memory 204, a separation unit 205, a prediction direction decoding unit 206, a reference pixel determination unit 207, a predicted image generation unit 208, A video signal decoding unit 209 and a reference frame memory 210 are provided.

符号データ入力部２０１は、外部から復号対象となる映像の符号データ（例えば、図１に示す映像符号化装置１００から出力された符号化データ）を入力する。以下の説明においては、この復号対象となる映像のことを復号対象映像と呼び、特に処理を行うフレームを復号対象フレームまたは復号対象画像と呼ぶ。符号データメモリ２０２は入力された符号データを記憶する。デプスマップ入力部２０３は、復号対象映像に対応するデプスマップを入力する。このデプスマップは復号対象映像の各フレームの各画素に写っている被写体のデプスを表すものである。デプスマップメモリ２０４は、入力されたデプスマップを記憶する。 The code data input unit 201 inputs code data of a video to be decoded from the outside (for example, encoded data output from the video encoding device 100 illustrated in FIG. 1). In the following description, the video to be decoded is referred to as a decoding target video, and a frame to be processed in particular is referred to as a decoding target frame or a decoding target image. The code data memory 202 stores the input code data. The depth map input unit 203 inputs a depth map corresponding to the decoding target video. This depth map represents the depth of the subject shown in each pixel of each frame of the decoding target video. The depth map memory 204 stores the input depth map.

分離部２０５は、入力された符号データで多重化されている予測情報の符号データと映像信号の符号データとを分離する。予測方向復号部２０６は、予測画像を生成する際に行うフレーム内予測を行う方向（予測方向）を示す情報を符号データから復号する。参照画素決定部２０７は、復号された予測方向に存在する符号化済み画素群と処理領域に対するデプスマップを用いて、処理領域の画素ごとに、予測画像を生成する際の参照画素を決定する。予測画像生成部２０８は、決定された参照画素に基づいて予測画像を生成する。映像信号復号部２０９は、生成された予測画像を用いて、符号データを復号して復号フレームを生成する。参照フレームメモリ２１０は、生成された復号フレームを記憶する。 The separation unit 205 separates the code data of the prediction information multiplexed with the input code data and the code data of the video signal. The prediction direction decoding unit 206 decodes information indicating the direction (prediction direction) in which intra-frame prediction is performed when generating a predicted image from the code data. The reference pixel determination unit 207 determines, for each pixel in the processing region, a reference pixel for generating a predicted image, using the decoded pixel group existing in the decoded prediction direction and the depth map for the processing region. The predicted image generation unit 208 generates a predicted image based on the determined reference pixel. The video signal decoding unit 209 decodes the code data using the generated predicted image to generate a decoded frame. The reference frame memory 210 stores the generated decoded frame.

次に、図４を参照して、図２に示す映像復号装置２００の動作を説明する。図４は、図３に示す映像復号装置２００の動作を示すフローチャートである。ここでは復号対象映像中のある１フレームを復号する処理について説明する。説明する処理をフレームごとに繰り返すことで、映像の復号が実現できる。 Next, the operation of the video decoding apparatus 200 shown in FIG. 2 will be described with reference to FIG. FIG. 4 is a flowchart showing the operation of the video decoding apparatus 200 shown in FIG. Here, a process of decoding one frame in the decoding target video will be described. By repeating the processing described for each frame, video decoding can be realized.

まず、符号データ入力部２０１は、復号対象映像の符号データを入力し、入力した符号データを符号データメモリ２０２に記憶する。一方、デプスマップ入力部２０３は、復号対象映像に対するデプスマップを入力し、入力したデプスマップをデプスマップメモリ２０４に記憶する（ステップＳ２０１）。なお、ここでは入力された符号データから復号対象フレームが順次復号して出力するとしているが、入力順と出力順は必ずしも一致している必要はない。入力順と出力順が異なる場合は、次に出力するフレームが復号されるまで、復号されたフレームは参照フレームメモリ２１０に記憶する。記憶されたデプスマップは、以下で説明する復号処理によって対応するフレームが復号されたら、各メモリから削除しても構わない。 First, the code data input unit 201 inputs code data of a decoding target video, and stores the input code data in the code data memory 202. On the other hand, the depth map input unit 203 inputs a depth map for the decoding target video, and stores the input depth map in the depth map memory 204 (step S201). Although the decoding target frames are sequentially decoded and output from the input code data here, the input order and the output order are not necessarily the same. If the input order and the output order are different, the decoded frame is stored in the reference frame memory 210 until the next output frame is decoded. The stored depth map may be deleted from each memory when the corresponding frame is decoded by the decoding process described below.

なお、ステップＳ２０１で入力するデプスマップは、符号化時に使用したデプスマップと同じものとする。これは映像符号化装置１００で使用した情報と全く同じ情報を用いることで、ドリフト等の符号化ノイズの発生を抑えるためである。ただし、そのような符号化ノイズの発生を許容する場合には、符号化に使用されたものとは異なるものが入力されても構わない。入力されるデプスマップとしては、例えば、別途復号されたデプスマップや、別の視点に対して復号されたデプスマップを用いて合成されたデプスマップや、別の視点に対して復号された画像群からステレオマッチング等によって推定したデプスマップなどがある。 Note that the depth map input in step S201 is the same as the depth map used during encoding. This is to suppress the occurrence of encoding noise such as drift by using exactly the same information as that used in the video encoding apparatus 100. However, when the generation of such encoding noise is allowed, a different one from that used for encoding may be input. As the input depth map, for example, a depth map decoded separately, a depth map synthesized using a depth map decoded for another viewpoint, or an image group decoded for another viewpoint There is a depth map estimated by stereo matching or the like.

次に、符号データとデプスマップの記憶が終了したら、復号対象フレームを予め定められた大きさに分割した領域ごとに、復号対象フレームの映像信号を復号する（ステップＳ２０２〜Ｓ２１０）。すなわち、復号対象領域インデックスをｂｌｋ、１フレーム中の総復号対象領域数をｎｕｍＢｌｋｓで表すとすると、ｂｌｋを０で初期化し（ステップＳ２０２）、その後、ｂｌｋに１を加算しながら（ステップＳ２０９）、ｂｌｋがｎｕｍＢｌｋｓになるまで（ステップＳ２１０）、以下の処理（ステップＳ２０３〜ステップＳ２０８）を繰り返す。処理領域のサイズは符号化側で使用されたものと同じ大きさとなる。一般的な符号化では１６×１６画素のマクロブロックと呼ばれる処理単位ブロックが使用されるが、符号化側と同じであればその他の大きさのブロックごとに処理を行う。 Next, when the code data and the depth map are stored, the video signal of the decoding target frame is decoded for each area obtained by dividing the decoding target frame into a predetermined size (steps S202 to S210). That is, assuming that the decoding target region index is blk, and the total number of decoding target regions in one frame is represented by numBlks, blk is initialized with 0 (step S202), and then 1 is added to blk (step S209). The following processing (step S203 to step S208) is repeated until blk becomes numBlks (step S210). The size of the processing area is the same as that used on the encoding side. In general encoding, a processing unit block called a 16 × 16 pixel macroblock is used, but if it is the same as the encoding side, processing is performed for each block of other sizes.

復号対象領域ごとに繰り返される処理において、まず、分離部２０５は符号データから、復号対象領域ｂｌｋの予測方向の符号データと映像信号の符号データとを分離する（ステップＳ２０３）。なお、ここでは復号対象領域ごとに分離しているが、フレーム単位など他の単位で分離しても構わない。ただしフレーム単位で分離する場合は、入力された符号データではなく、分離された符号データを記憶する必要が生じる。 In the process repeated for each decoding target area, first, the separation unit 205 separates the code data in the prediction direction of the decoding target area blk and the code data of the video signal from the code data (step S203). Note that although the decoding target areas are separated here, they may be separated in other units such as a frame unit. However, when separation is performed in units of frames, it is necessary to store the separated code data, not the input code data.

次に、予測方向復号部２０６で、分離された予測方向に対する符号データを復号し、復号対象領域ｂｌｋに対するフレーム内予測の予測方向を設定する（ステップＳ２０４）。予測方向の復号が終了したら、参照画素決定部２０７は、復号対象領域ｂｌｋと参照画素の候補に対するデプスマップを用いて、復号対象領域の画素ごとに、予測画像を生成する際の参照画素を決定する（ステップＳ２０５）。ここでの処理は図２に示すステップＳ１０４と同じである。どのような規則で参照画素を決定しても構わないが、符号化側と同じ規則に従って決定する必要がある。なお、符号化側で使用した規則についての情報が符号化されている場合は、適切なタイミング（シーケンス・フレーム・ブロック群・ブロックなどの先頭）で、その情報を復号し、復号した規則に従って参照画素を決定する必要がある。 Next, the prediction direction decoding unit 206 decodes the encoded data for the separated prediction direction, and sets the prediction direction of intra-frame prediction for the decoding target region blk (step S204). After decoding in the prediction direction, the reference pixel determining unit 207 determines a reference pixel for generating a predicted image for each pixel in the decoding target area, using a depth map for the decoding target area blk and the reference pixel candidate. (Step S205). The process here is the same as step S104 shown in FIG. The reference pixel may be determined according to any rule, but it is necessary to determine according to the same rule as that on the encoding side. If information about the rules used on the encoding side is encoded, the information is decoded at an appropriate timing (the head of the sequence, frame, block group, block, etc.) and referenced according to the decoded rules. It is necessary to determine the pixel.

次に、復号対象領域の各画素に対する参照画素の決定が終了したら、予測画像生成部２０８は予測画像を生成する（ステップＳ２０６）。ここでの処理は図２に示すステップＳ１０５と同じであり、一般的なフレーム内予測と同様に、画素毎に参照画素の画素値を割り当てることで予測画像を生成する。 Next, when the determination of the reference pixel for each pixel in the decoding target area is completed, the predicted image generation unit 208 generates a predicted image (step S206). The process here is the same as step S105 shown in FIG. 2, and a predicted image is generated by assigning a pixel value of a reference pixel for each pixel, as in general intra-frame prediction.

次に、予測画像の生成が終了したら、生成された予測画像を用いて、映像信号復号部２０９は、映像信号に対する符号データから復号対象領域ｂｌｋに対する映像信号を復号し（ステップＳ２０７）、復号画像を生成する（ステップＳ２０８）。復号画像は映像復号装置２００の出力となるとともに、参照フレームメモリ２１０に記憶する。復号処理は符号化時に用いた手法に対応する手法を用いる。例えば、ＭＰＥＧ−２やＨ．２６４／ＡＶＣなどの一般的な符号化が用いられている場合は、符号データに対して、エントロピー復号、逆２値化、逆量子化、ＩＤＣＴなどの周波数逆変換を順に施し、得られた２次元信号に対して予測画像を加え、最後に画素値の値域でクリッピングを行うことで映像信号を復号する。 Next, when the generation of the predicted image is completed, using the generated predicted image, the video signal decoding unit 209 decodes the video signal for the decoding target area blk from the code data for the video signal (step S207), Is generated (step S208). The decoded image becomes an output of the video decoding device 200 and is stored in the reference frame memory 210. The decoding process uses a method corresponding to the method used at the time of encoding. For example, MPEG-2 and H.264. When general coding such as H.264 / AVC is used, the code data is subjected to frequency inverse transformation such as entropy decoding, inverse binarization, inverse quantization, and IDCT in order, and the obtained 2 The predicted image is added to the dimension signal, and finally, the video signal is decoded by performing clipping in the pixel value range.

前述した説明においては、符号化対象領域全体および復号対象領域全体（以下、符号化および復号を区別せず処理と呼ぶ）に対して１つの予測方向を設定しているが、処理対象領域を分割し、その分割ごとに参照情報を設定しても構わない。この場合、処理対象領域の分割方法も予測方向を示すための情報の一部分となる。すなわち、予測方向を示す情報は分割方法を示す情報Ｐ_ｂｌｋ、分割ごとの予測方向の集合｛Ｖ_{ｂｌｋ，ｉ}｜ｉ＝１，…，Ｍ_ｂｌｋ｝で構成される。なお、Ｍ_ｂｌｋは分割方法Ｐ_ｂｌｋで処理対象領域を分割した際の分割数を表す。このようにすることで、処理対象領域内において、部分ごとに写っている被写体が異なる場合、すなわち映像信号の空間相関の向きが大きく異なる場合においても、それを考慮した予測画像の生成を行うことができる。この場合においても、デプスマップを用いた参照画素の決定を行うことで、細かい領域分割を設定することなく、各画素に写っている被写体を反映した予測画像を生成することが可能である。 In the above description, one prediction direction is set for the entire encoding target region and the entire decoding target region (hereinafter, encoding and decoding are not distinguished from each other), but the processing target region is divided. However, reference information may be set for each division. In this case, the method for dividing the processing target region is also a part of information for indicating the prediction direction. That is, the information indicating the prediction direction includes information P _blk indicating the division method and a set of prediction directions for each division {V _{blk, i} | i = 1,..., M _blk }. M _blk represents the number of divisions when the processing target area is divided by the division method P _blk . In this way, even when the subject captured in each part is different within the processing target area, that is, when the direction of the spatial correlation of the video signal is greatly different, the prediction image is generated in consideration thereof. Can do. Even in this case, by determining the reference pixel using the depth map, it is possible to generate a predicted image reflecting the subject reflected in each pixel without setting a fine region division.

また、処理領域内の画素ごとに参照画素を決定しているが、符号化対象領域からの距離すなわち参照ライン（ｒｅｆｌｉｎｅ＿ｘおよびｒｅｆｌｉｎｅ＿ｙ）を決定することで参照画素を決定する場合、処理領域の画素ごとではなく、処理領域内の行（水平ライン）や列（鉛直ライン）、Ｎ×Ｎ画素のブロックなどの画素群ごとに参照画素を決定しても構わない。すなわち、予めまたは別途定められた画素群ごとに１つの参照ライン（または参照ラインの組、以下では両者を区別せず参照ラインと記す）を決定しても構わない。この場合、ステップＳ１０４及びステップＳ２０５の参照画素を決定する処理は、上記説明した（６）式ではなく、（１２）式に従って行う。

In addition, the reference pixel is determined for each pixel in the processing region. When the reference pixel is determined by determining the distance from the encoding target region, that is, the reference line (refline_x and refline_y), for each pixel in the processing region Instead, the reference pixel may be determined for each pixel group such as a row (horizontal line) or column (vertical line) in the processing region, or an N × N pixel block. That is, one reference line (or a set of reference lines, hereinafter referred to as a reference line without distinguishing both) may be determined for each pixel group determined in advance or separately. In this case, the process of determining the reference pixel in step S104 and step S205 is performed according to equation (12) instead of equation (6) described above.

ここで、ｓｕｂｂｌｋは、１つの参照ラインを決定する対象の画素群の集合を表す。ｒｅｆｌｉｎｅは、処理領域左または処理領域上の参照ラインのいずれか一方を決定する場合はｒｅｆｌｉｎｅ＿ｘまたはｒｅｆｌｉｎｅ＿ｙのいずれか一方を表し、参照ラインの組を決定する場合は参照ラインの組（ｒｅｆｌｉｎｅ＿ｘ，ｒｅｆｌｉｎｅ＿ｙ）を表す。また、ｒｅｆｐ（ｐ，ｖ，ｒｌ）は、画素ｐに対して、予測方向ｖ、参照ラインｒｌでフレーム内予測を行う際の参照画素を示す。なお、ｐ１≠ｐ２のとき、ｒｅｆｐ（ｐ１，ｖ，ｒｌ）＝ｒｅｆｐ（ｐ２，ｖ，ｒｌ）であるとは限らない。ｐ１とｐ２の位置関係が予測方向ｖと並行であれば、ｒｅｆｐ（ｐ１，ｖ，ｒｌ）＝ｒｅｆｐ（ｐ２，ｖ，ｒｌ）となる。これは、参照ラインが処理対処領域からの距離で定義されているためである。 Here, subblk represents a set of target pixel groups for determining one reference line. refline represents either refline_x or refline_y when determining one of the processing line left or reference line on the processing area, and refline_x or refline_y when determining a reference line pair. Represents. Further, refp (p, v, rl) indicates a reference pixel when performing intra-frame prediction with respect to the pixel p in the prediction direction v and the reference line rl. When p1 ≠ p2, refp (p1, v, rl) = refp (p2, v, rl) is not always satisfied. If the positional relationship between p1 and p2 is parallel to the prediction direction v, refp (p1, v, rl) = refp (p2, v, rl). This is because the reference line is defined by the distance from the processing coping area.

また、前述した（１０）式を用いた参照画素の決定処理のように、処理領域内の参照ラインの変化を考慮して、定められた画素群ごとに１つの参照ラインを設定することで、参照画素を決定することも可能である。その場合の数式は（１３）式で与えられる。

In addition, by setting a single reference line for each defined pixel group in consideration of changes in the reference line in the processing region, as in the reference pixel determination process using the above-described equation (10), It is also possible to determine the reference pixel. The mathematical formula in that case is given by equation (13).

ここで、集合ＲＬは参照ラインの集合であり、ｒｌ_ｓｂはＲＬの第ｓｂ元、すなわち参照ラインを設定する単位の画素集合ｓｂに対する参照ラインを示す。Ｅ‘は処理対象領域内の参照ラインを設定する単位の画素集合の隣接関係を表す集合であり、（ｓ，ｔ）∈Ｅ’のとき画素集合ｓと画素集合ｔは隣接していると考える。また、ｈ_ｓｔは画素集合毎に決定される参照ラインのばらつきを抑えるための平滑化項で、ここでは参照ライン間の距離に対する増加関数であれば、どのようなものを用いても構わない。例えば、参照ラインの差分絶対値和を予め定められた係数でスケールした値を使っても構わない。参照ラインの差分絶対値和とは、各参照ラインにおけるｒｅｆｌｉｎｅ＿ｘの値の差分の絶対値とｒｅｆｌｉｎｅ＿ｙの値の差分の絶対値の和である。 Here, the set RL is a set of reference lines, and rl _sb indicates a reference line for the pixel set sb of the unit for setting the reference line, that is, the _{sb element} of RL. E ′ is a set representing the adjacency relationship between the pixel sets of the unit for setting the reference line in the processing target area. When (s, t) ∈E ′, the pixel set s and the pixel set t are considered to be adjacent. . Further, h _st is a smoothing term for suppressing the variation of the reference line determined for each pixel set. Here, any function may be used as long as it is an increasing function with respect to the distance between the reference lines. For example, a value obtained by scaling the sum of absolute differences of reference lines with a predetermined coefficient may be used. The sum of absolute differences of reference lines is the sum of absolute values of differences between the values of refline_x and refline_y in each reference line.

また、前述した説明においては、予測方向を指定して行うフレーム内予測のみを対象としたが、予測方向を指定せずに処理対象領域に隣接する処理済み画素群から予測値を生成するフレーム内予測にも適用することができる。具体的には参照画素を決定する処理を、参照ライン（ｒｅｆｌｉｎｅ＿ｘおよびｒｅｆｌｉｎｅ＿ｙ）を決定する処理と考えることで本発明を適用することが可能である。この場合、ステップＳ１０４及びステップＳ２０５の参照画素を決定する処理は、ｓｕｂｂｌｋ＝ｂｌｋとして（１２）式を用いても参照ラインを決定しても構わない。 In the above description, only intra-frame prediction performed by specifying a prediction direction is targeted. However, within a frame in which a prediction value is generated from a processed pixel group adjacent to a processing target region without specifying a prediction direction. It can also be applied to prediction. Specifically, the present invention can be applied by considering the process of determining the reference pixels as the process of determining the reference lines (refline_x and refline_y). In this case, the process of determining the reference pixels in step S104 and step S205 may use the equation (12) or determine the reference line with subblk = blk.

また、別の方法として、処理対象領域に対するデプスマップの値の代表値と、参照ラインによって指定される予測値を生成するために使用される参照画素群の各画素に対するデプスマップの値との乖離度の合計値が最小になるように、参照ラインを決定しても構わない。その場合の数式は（１４）式で与えられる。

As another method, the difference between the representative value of the depth map for the processing target region and the value of the depth map for each pixel of the reference pixel group used to generate the predicted value specified by the reference line. The reference line may be determined so that the total value of the degrees is minimized. The mathematical formula in that case is given by formula (14).

ここで、ｒｅｐＤは処理対象領域に対するデプスマップの値の代表値を示し、当該領域に対するデプスマップの値の平均値、中央値、最も多く存在する値などを用いることが可能である。ｒｅｆ（ｒｌ）は参照ラインをｒｌとした際の参照画素群を示す。なお、乖離度として差分絶対値を用いたが、差分二乗値などデプスマップの値の違いに対する増加関数であれば任意のものを用いることが可能である。また、参照画素毎に乖離度を算出してその合計値で評価するのではなく、参照画素群に対するデプスマップの値の代表値を求め、その代表値と処理対象領域に対するデプスマップの値の代表値との乖離度で評価する方法もある。 Here, repD indicates a representative value of the depth map value for the processing target region, and an average value, a median value, a most abundant value, or the like of the depth map value for the region can be used. ref (rl) indicates a reference pixel group when the reference line is rl. Although the difference absolute value is used as the divergence degree, an arbitrary function can be used as long as it is an increase function for the difference in the depth map value such as the difference square value. Also, instead of calculating the degree of divergence for each reference pixel and evaluating the total value, the representative value of the depth map value for the reference pixel group is obtained, and the representative value and the representative value of the depth map value for the processing target area are obtained. There is also a method of evaluating by the degree of deviation from the value.

また、前述した説明においては、図７に例示されるように、処理領域の画素に対して予測方向の元となる方向へ辿っていくことで見つかる既に処理済みの画素の集合を参照画素の候補としているが、図９に示すように一定の角度幅をもつ予測方向を用いることで、より多くの画素を参照画素の候補とすることが可能である。また、角度幅ではなく、図１０に示すように予測方向の元となる方向へ辿っていくことで見つかる既に処理済みの画素及びその周辺の画素群を参照画素の候補とする方法もある。このように予測方向と直交する方向にも参照画素の候補を増やすことで、より広い範囲に存在で予測に用いる参照画素を設定可能になることで予測性能を向上させることが可能となる。また、一つの方向で多数の方向をカバーできることにもなるため、設定可能な予測方向の数を減らし、予測方向を示す情報を符号化するのに必要な符号量を削減することも可能となる。 In the above description, as illustrated in FIG. 7, a set of already processed pixels that are found by tracing in the direction of the prediction direction with respect to the pixels in the processing region are used as reference pixel candidates. However, by using a prediction direction having a certain angular width as shown in FIG. 9, it is possible to make more pixels as reference pixel candidates. In addition, there is a method in which, as shown in FIG. 10, instead of the angular width, the already processed pixels found by tracing in the direction of the prediction direction and the surrounding pixel groups are used as reference pixel candidates. Thus, by increasing the number of reference pixel candidates in the direction orthogonal to the prediction direction, it is possible to set the reference pixels used for prediction in a wider range, thereby improving the prediction performance. In addition, since a large number of directions can be covered in one direction, it is possible to reduce the number of prediction directions that can be set and to reduce the amount of code necessary to encode information indicating the prediction direction. .

また、前述した説明においては、複数存在する参照画素の候補から１つを選択しているが、複数の参照画素を設定しても構わない。複数の参照画素を設定する方法としては、参照画素の候補中で符号化対象領域の画素との相違度を基準として、相違度が小さいものから順に一定数のものを参照画素として設定する方法や、相違度が一定値以下のものを参照画素として設定する方法や、相違度が一定値以下で小さいものから順に一定数のものを参照画素として設定する方法がある。また、複数の参照画素から予測画像の画素値（予測値）を生成する方法としては、複数の参照画素の平均値や中央値を予測値とする方法や、相違度によって参照画素毎に重み係数を設定した重み付け平均値や重み付け中央値を予測値とする方法がある。なお、複数の参照画素の設定方法と複数の参照画素からの予測値の生成方法には上記を含め任意の方法を用いることができるが、符号化と復号で一致させる必要がある。 In the above description, one of the plurality of reference pixel candidates is selected, but a plurality of reference pixels may be set. As a method of setting a plurality of reference pixels, a method of setting a certain number of reference pixels as a reference pixel in order from the smallest difference based on the degree of difference from the pixel of the encoding target region among the reference pixel candidates, There are a method of setting a reference pixel that has a difference degree equal to or less than a certain value, and a method of setting a certain number of pixels as a reference pixel in order from the smallest difference value that is less than a certain value. In addition, as a method for generating a pixel value (predicted value) of a predicted image from a plurality of reference pixels, a method of using an average value or median value of a plurality of reference pixels as a predicted value, or a weighting factor for each reference pixel depending on the degree of difference There is a method in which a weighted average value or a weighted median value in which is set as a predicted value. Although any method including the above can be used as a method for setting a plurality of reference pixels and a method for generating a prediction value from a plurality of reference pixels, it is necessary to match between encoding and decoding.

また、前述した説明においては、入力したデプスマップをそのまま用いているが、符号化されたデプスマップを用いる場合などは、デプスマップにノイズが生じているため、そのノイズを低減させるためにデプスマップにローパスフィルタをかけても構わない。また、被写体の違いが分かる程度のビット深度があれば、被写体を考慮した予測画像の生成が行えるため、入力されたデプスマップに対してビット深度変換を施して、デプスマップのビット深度を小さくする処理を施しても構わない。なお、単純なビット深度変換を行っても構わないが、デプスマップから被写体数を判定するなどして、その結果に応じて、被写体を区別するだけの情報に変換しても構わない。 In the above description, the input depth map is used as it is. However, when an encoded depth map is used, noise is generated in the depth map. Therefore, the depth map is used to reduce the noise. You may apply a low-pass filter to. In addition, if there is a bit depth that can detect the difference between subjects, a predicted image can be generated in consideration of the subject. Therefore, bit depth conversion is performed on the input depth map to reduce the bit depth of the depth map. Processing may be performed. Note that simple bit depth conversion may be performed, but the number of subjects may be determined from the depth map, and the information may be converted into information that only distinguishes the subjects according to the result.

また、前述した説明においては、フレーム全体を符号化／復号する処理するものとして説明したが、画像の一部分のみに適用することも可能である。この場合、処理を適用するか否かを判断して、それを示すフラグを符号化／復号しても構わないし、なんらか別の手段でそれを指定しても構わない。 In the above description, the entire frame is encoded / decoded. However, the present invention can be applied to only a part of an image. In this case, it may be determined whether or not to apply the process, and a flag indicating the process may be encoded / decoded, or may be designated by some other means.

また、別途情報を与えて指定するのではなく、デプスマップから前述した参照画素の決定を行うか否かを判定しても構わない。例えば、処理対象領域の各画素に対するデプスマップの値がほぼ等しい場合（デプスマップの値の分散値が小さい場合）、１つの被写体しか含まれていないことになるため、常に処理対象領域に隣接する処理済みの画素を参照画素とするとして、デプスマップの値を用いた参照画素の決定を行わないようにする方法がある。なお、処理対象領域内の画素だけでなく、設定された予測方向に従って、従来の方法（Ｈ．２６４／ＡＶＣなど）を用いて予測を行う際に、参照画素として設定される画素群を含めた画素集合に対して、対応するデプスマップの値がほぼ等しいか否かを判定しても構わない。このようにすることで、参照画素群を含めた領域に１つの被写体しか含まれていないか否かを判定基準とできるため、より正確な切り替えを行うことが可能となる。 In addition, instead of designating information separately, it may be determined whether or not to determine the reference pixel described above from the depth map. For example, when the depth map value for each pixel in the processing target area is substantially equal (when the variance value of the depth map value is small), only one subject is included, and therefore always adjacent to the processing target area. There is a method in which the reference pixel is not determined using the value of the depth map, assuming that the processed pixel is the reference pixel. In addition to the pixels in the processing target region, a pixel group set as a reference pixel is included when performing prediction using a conventional method (such as H.264 / AVC) according to the set prediction direction. It may be determined whether or not the corresponding depth map values are approximately equal for the pixel set. In this way, it is possible to use more accurate switching because it is possible to use whether or not only one subject is included in the area including the reference pixel group.

また、前述した説明においては、符号化／復号対象映像に対するデプスマップを用いているが、法線マップや温度画像などの被写体に依存した値を持つ画像情報を代わりに用いることも可能である。ただし、符号化側で使用されたものが復号側でも同様に入手できる必要がある。 In the above description, the depth map for the video to be encoded / decoded is used. However, image information having a value depending on the subject such as a normal map or a temperature image can be used instead. However, what is used on the encoding side needs to be available on the decoding side as well.

以上説明した映像符号化および映像復号の処理は、コンピュータとソフトウェアプログラムとによっても実現することができ、そのプログラムをコンピュータ読み取り可能な記録媒体に記録して提供することも、ネットワークを通して提供することも可能である。 The video encoding and video decoding processes described above can also be realized by a computer and a software program. The program can be provided by being recorded on a computer-readable recording medium or provided via a network. Is possible.

図５に、画像符号化装置をコンピュータとソフトウェアプログラムとによって構成する場合のハードウェア構成例を示す。本システムは、プログラムを実行するＣＰＵ５０と、ＣＰＵ５０がアクセスするプログラムやデータが格納されるＲＡＭ等のメモリ５１と、カメラ等からの符号化対象の映像信号を入力する符号化対象映像入力部５２（ディスク装置等による映像信号を記憶する記憶部でもよい）と、例えばネットワークを介して符号化対象映像に対するデプスマップを入力するデプスマップ入力部５３（ディスク装置等によるデプスマップを記憶する記憶部でもよい）と、図２に示す処理をＣＰＵ５０に実行させるソフトウェアプログラムである映像符号化プログラム５４１が格納されたプログラム記憶装置５４と、ＣＰＵ５０がメモリ５１にロードされた映像符号化プログラム５４１を実行することにより生成された符号データを、例えばネットワークを介して出力する符号データ出力部５５（ディスク装置等による多重化符号データを記憶する記憶部でもよい）とが、バスで接続された構成になっている。 FIG. 5 shows a hardware configuration example in the case where the image encoding apparatus is configured by a computer and a software program. This system includes a CPU 50 that executes a program, a memory 51 such as a RAM that stores programs and data accessed by the CPU 50, and an encoding target video input unit 52 that inputs an encoding target video signal from a camera or the like. A storage unit that stores a video signal from a disk device or the like) and a depth map input unit 53 that inputs a depth map for a video to be encoded via a network, for example (a storage unit that stores a depth map from a disk device or the like). ), And a program storage device 54 storing a video encoding program 541 that is a software program for causing the CPU 50 to execute the processing shown in FIG. 2, and the CPU 50 executing the video encoding program 541 loaded in the memory 51. Generate code data, for example, network A code data output unit 55 to be output (or a storage unit for storing the multiplexed code data by the disc unit, etc.), have become connected to each other by a bus.

図示を省略するが、他に、符号データ記憶部、参照フレーム記憶部などのハードウェアが設けられ、本手法の実施に利用される。また、映像信号符号データ記憶部、予測方向符号データ記憶部などが用いられることもある。 Although not shown, other hardware such as a code data storage unit and a reference frame storage unit is provided and used to implement this method. Also, a video signal code data storage unit, a prediction direction code data storage unit, and the like may be used.

図６に、映像復号装置をコンピュータとソフトウェアプログラムとによって構成する場合のハードウェア構成例を示す。本システムは、プログラムを実行するＣＰＵ６０と、ＣＰＵ６０がアクセスするプログラムやデータが格納されるＲＡＭ等のメモリ６１と、映像符号化装置が本手法により符号化した符号データを入力する符号データ入力部６２（ディスク装置等による多重化符号データを記憶する記憶部でもよい）と、例えばネットワークを介して復号対象の映像に対するデプスマップを入力するデプスマップ入力部６３（ディスク装置等によるデプスマップを記憶する記憶部でもよい）と、図４に示す処理をＣＰＵ６０に実行させるソフトウェアプログラムである映像復号プログラム６４１が格納されたプログラム記憶装置６４と、ＣＰＵ６０がメモリ６１にロードされた映像復号プログラム６４１を実行することにより、符号データを復号して得られた復号映像を、再生装置などに出力する復号映像出力部６５とが、バスで接続された構成になっている。 FIG. 6 shows an example of a hardware configuration when the video decoding apparatus is configured by a computer and a software program. This system includes a CPU 60 that executes a program, a memory 61 such as a RAM that stores programs and data accessed by the CPU 60, and a code data input unit 62 that inputs code data encoded by the video encoding apparatus according to the present method. (It may be a storage unit that stores multiplexed code data by a disk device or the like) and, for example, a depth map input unit 63 that inputs a depth map for a video to be decoded via a network (a storage that stores a depth map by a disk device or the like) A program storage device 64 storing a video decoding program 641 which is a software program for causing the CPU 60 to execute the processing shown in FIG. 4, and the CPU 60 executing the video decoding program 641 loaded in the memory 61. Obtained by decoding the code data The issue picture, the decoded video output unit 65 to output to the reproduction unit has the connecting configurations bus.

図示省略するが、他に、参照フレーム記憶部などのハードウェアが設けられ、本手法の実施に利用される。また、映像信号符号データ記憶部、予測方向符号データ記憶部が用いられることもある。 Although not shown in the drawing, other hardware such as a reference frame storage unit is provided and used to implement this method. Also, a video signal code data storage unit and a prediction direction code data storage unit may be used.

以上説明したように、前述した映像符号化装置及び映像復号装置によれば、映像信号の予測精度の向上に伴う、予測残差の符号化に必要な符号量を削減することができる。フレーム内予測において予測対象領域に隣接する画素だけではなく、そこから一定距離内かつフレーム内予測の予測方向に沿った位置に存在する画素からの予測が可能になったことにより、領域端にオクルージョンや雑音を含む場合や、予測方向に対して複数の被写体が存在する領域に対しても、同じ被写体からの画素値の予測が可能になるため、高精度な予測が実現できる。従来手法（例えば、特許文献１）では、予測対象領域からの距離を指定し、予測対象領域に隣接する画素以外を用いたフレーム内予測を実現していた。この場合、距離を指定する情報を符号化する必要が生じるため、画素ごとに距離を指定することができず、全ての画素に対して精度の高い予測を行えなかった。本発明では、別途伝送されているデプスマップの情報を用いて、参照画素の候補の中から、予測する対象の画素と同じ被写体が写っている画素を選択し、その選択された参照画素を用いて予測画像を生成する。このようにすることで、距離を指定する情報を符号化する必要がなくなる。その結果、画素ごとに距離を決定することが可能となり、全ての画素に対して精度の高い予測を実現できる。ここで、デプスマップの値は被写体に大きく依存した値であるため、デプスマップを用いた被写体の判定によって、高精度に同じ被写体が写っているか否かを判定することが可能である。なお、画素群ごとに距離を指定する場合においても、特許文献１では必要だった距離を指定する情報を符号化する必要がなくなることによって、その分の符号量を削減することが可能である。 As described above, according to the video encoding device and the video decoding device described above, it is possible to reduce the amount of code necessary for encoding the prediction residual accompanying the improvement of the prediction accuracy of the video signal. Occlusion at the edge of the region is enabled not only by the pixels adjacent to the prediction target region in intraframe prediction, but also by prediction from pixels located within a certain distance and along the prediction direction of intraframe prediction. In addition, pixel values from the same subject can be predicted even in a case where noise is included or in a region where there are a plurality of subjects in the prediction direction, so that highly accurate prediction can be realized. In the conventional method (for example, Patent Document 1), the distance from the prediction target region is specified, and intra-frame prediction using pixels other than the pixels adjacent to the prediction target region is realized. In this case, since it is necessary to encode information for specifying the distance, the distance cannot be specified for each pixel, and high-precision prediction cannot be performed for all the pixels. In the present invention, using the depth map information transmitted separately, a pixel in which the same subject as the pixel to be predicted is captured is selected from the reference pixel candidates, and the selected reference pixel is used. To generate a predicted image. In this way, it is not necessary to encode information specifying the distance. As a result, it is possible to determine the distance for each pixel, and it is possible to realize highly accurate prediction for all the pixels. Here, since the value of the depth map is largely dependent on the subject, it is possible to determine whether or not the same subject is captured with high accuracy by determining the subject using the depth map. Even when the distance is designated for each pixel group, it is not necessary to encode the information for designating the distance, which is necessary in Patent Document 1, so that the amount of code can be reduced.

また、予測対象の画素に対してより相関の高い画素から予測を行うことによる予測精度の向上に伴う、予測残差の符号化に必要な符号量を削減することができる。デプスマップではテクスチャ情報が失われているため、同じ被写体内のテクスチャまで考慮に入れて参照画素を決定することができない。一般に、画像内での空間的な距離が近いほど画素値の相関が高いため、デプスマップの値に加えて、参照画素と予測対象画素との間の距離の基準に参照画素を決定することで、参照画像の候補に同じ被写体が写った画素が複数存在する場合に、より相関の高い画素を用いた予測が可能になる。 In addition, it is possible to reduce the amount of code necessary for encoding the prediction residual accompanying improvement in prediction accuracy by performing prediction from a pixel having a higher correlation with respect to the pixel to be predicted. Since the texture information is lost in the depth map, it is not possible to determine the reference pixel taking into account the texture in the same subject. In general, the closer the spatial distance in the image, the higher the correlation between the pixel values. Therefore, in addition to the depth map value, the reference pixel is determined based on the distance standard between the reference pixel and the prediction target pixel. When there are a plurality of pixels in which the same subject appears in the reference image candidates, prediction using pixels with higher correlation becomes possible.

また、予測画像に不自然な高周波成分が発生するのを抑制することによる周波数領域での予測精度の向上に伴う、予測残差の符号化に必要な符号量を削減することができる。予測対象領域の画素ごとに参照画素を設定する場合、本来は連続していなかった画素を寄せ集めて予測画像が生成されることになる。その結果、予測画像における画素間の連続性が低下し、自然画像ではありえない高周波成分が発生することがある。そこで、予測対象領域の隣接画素間での参照画素までの距離の変化を抑制しつつ、できる限り同じ被写体が撮影されている画素を参照画素として設定することで、予測画像に不自然な高周波成分が発生するのを抑制することが可能となる。なお、一般的な符号化では予測残差は周波数領域で符号化されるため、このようにすることで予測残差における無駄な高周波成分の発生を防ぎ、より効率的な符号化を実現することが可能となる。 In addition, it is possible to reduce the amount of code necessary for encoding the prediction residual accompanying improvement in prediction accuracy in the frequency domain by suppressing the occurrence of an unnatural high-frequency component in the predicted image. When a reference pixel is set for each pixel in the prediction target region, a predicted image is generated by collecting pixels that were not originally continuous. As a result, the continuity between pixels in the predicted image is reduced, and a high-frequency component that cannot be a natural image may occur. Therefore, by suppressing the change in the distance to the reference pixel between adjacent pixels in the prediction target region, by setting a pixel in which the same subject is photographed as a reference pixel as much as possible, an unnatural high-frequency component in the predicted image Can be prevented from occurring. In general encoding, the prediction residual is encoded in the frequency domain, and thus, it is possible to prevent generation of useless high frequency components in the prediction residual and realize more efficient encoding. Is possible.

また、参照画素やデプスマップにおけるノイズに対する頑健性を向上することができる。デプスマップにノイズが含まれる場合、複数の参照画素の候補から１つの参照画素を選択する際に誤りが生じることがある。その場合、複数の参照画素を選択し、その参照画素群に対する画素値の平均値や中央値を用いて予測画像を生成することで、誤って異なる被写体が撮影されている参照画素が幾つか選択されていても、その影響を低減して予測画像を生成することができる。また、参照画素にノイズが含まれる場合においても、複数の参照画素を用いて平均値をとることでノイズ成分を低減し、より高精度な予測を実現することが可能となる。 In addition, robustness against noise in the reference pixel and the depth map can be improved. If the depth map includes noise, an error may occur when one reference pixel is selected from a plurality of reference pixel candidates. In that case, by selecting a plurality of reference pixels and generating a predicted image using the average value or median of the pixel values for the reference pixel group, several reference pixels in which different subjects are captured by mistake are selected. Even if it is done, the influence can be reduced and a prediction image can be generated. Further, even when noise is included in the reference pixel, it is possible to reduce the noise component by taking an average value using a plurality of reference pixels and to realize more accurate prediction.

また、複数の参照画素を用いて予測を行う際の予測精度向上に伴う、予測残差の符号化に必要な符号量を削減することができる。複数の参照画素を用いて予測を行う場合、予測対象画素と参照画素との相違度に基づいた重み係数を用いた重み付き平均値や重み付き中央値によって予測を行うことで、予想対象画素との類似度が高いとされる画素を重視した予測となるため、ノイズへの頑健性を維持したまま予測精度を向上することが可能となる。 In addition, it is possible to reduce the amount of code necessary for encoding the prediction residual accompanying improvement in prediction accuracy when performing prediction using a plurality of reference pixels. When performing prediction using a plurality of reference pixels, prediction is performed using a weighted average value or a weighted median value using a weighting factor based on the degree of difference between the prediction target pixel and the reference pixel. Therefore, it is possible to improve prediction accuracy while maintaining robustness against noise.

なお、図１、図３における処理部の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより映像符号化処理及び映像復号処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータシステム」は、ホームページ提供環境（あるいは表示環境）を備えたＷＷＷシステムも含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。 A program for realizing the functions of the processing units in FIGS. 1 and 3 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into a computer system and executed to execute an image. Encoding processing and video decoding processing may be performed. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer system” includes a WWW system having a homepage providing environment (or display environment). The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Further, the “computer-readable recording medium” refers to a volatile memory (RAM) in a computer system that becomes a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In addition, those holding programs for a certain period of time are also included.

また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであってもよい。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよい。 The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, what is called a difference file (difference program) may be sufficient.

以上、図面を参照して本発明の実施の形態を説明してきたが、上記実施の形態は本発明の例示に過ぎず、本発明が上記実施の形態に限定されるものではないことは明らかである。したがって、本発明の精神及び範囲を逸脱しない範囲で構成要素の追加、省略、置換、その他の変更を行っても良い。 As mentioned above, although embodiment of this invention has been described with reference to drawings, the said embodiment is only the illustration of this invention, and it is clear that this invention is not limited to the said embodiment. is there. Accordingly, additions, omissions, substitutions, and other modifications of components may be made without departing from the spirit and scope of the present invention.

映像とデプスマップとを構成要素に持つ自由視点映像データの符号化において、フレーム内予測の符号化効率を向上することが不可欠な用途に適用できる。 In encoding free-viewpoint video data having video and depth maps as components, it can be applied to applications where it is essential to improve the encoding efficiency of intra-frame prediction.

１００・・・映像符号化装置・・・、１０１・・・符号化対象映像入力部、１０２・・・入力フレームメモリ、１０３・・・デプスマップ入力部、１０４・・・デプスマップメモリ、１０５・・・予測方向設定部、１０６・・・参照画素決定部、１０７・・・予測画像生成部、１０８・・・予測方向符号化部、１０９・・・映像信号符号化部、１１０・・・映像信号復号部、１１１・・・参照フレームメモリ、１１２・・・多重化部、２００・・・映像復号装置、２０１・・・符号データ入力部、２０２・・・符号データメモリ、２０３・・・デプスマップ入力部、２０４・・・デプスマップメモリ、２０５・・・分離部、２０６・・・予測方向復号部、２０７・・・参照画素決定部、２０８・・・予測画像生成部、２０９・・・映像信号復号部、２１０・・・参照フレームメモリ DESCRIPTION OF SYMBOLS 100 ... Video coding apparatus ..., 101 ... Encoding target video input unit, 102 ... Input frame memory, 103 ... Depth map input unit, 104 ... Depth map memory, 105 ..Prediction direction setting unit 106. Reference pixel determination unit 107 107 Prediction image generation unit 108. Prediction direction encoding unit 109 109 Video signal encoding unit 110 110 Video Signal decoding unit, 111 ... reference frame memory, 112 ... multiplexing unit, 200 ... video decoding device, 201 ... code data input unit, 202 ... code data memory, 203 ... depth Map input unit, 204 ... Depth map memory, 205 ... Separation unit, 206 ... Prediction direction decoding unit, 207 ... Reference pixel determination unit, 208 ... Prediction image generation unit, 209 ... Video signal recovery Department, 210 ... reference frame memory

Claims

A video encoding method that performs predictive encoding using intra-frame prediction on a processing region obtained by dividing each frame constituting the video into a predetermined size while using a depth map corresponding to the video. There,
An intra-frame prediction direction setting step for setting a prediction direction in the intra-frame prediction;
For each pixel in the processing area or for a predetermined number of pixels, a pixel that exists in a direction opposite to the set prediction direction and is already encoded within a predetermined distance range from the processing area is a reference pixel candidate. A reference pixel candidate setting step to be set as
Using the depth map corresponding to the processing region and the depth map corresponding to the reference pixel candidate, a difference degree for each reference pixel candidate is calculated for each pixel in the processing region or a predetermined number of pixel groups. A reference pixel difference calculating step,
A reference pixel determination step of determining a reference pixel from among the reference pixel candidates for each pixel in the processing region or a predetermined number of pixel groups based on the degree of difference;
A prediction image generation step of generating a prediction image for the processing region from the reference pixel according to the set prediction direction.

In the reference pixel candidate setting step, in addition to the reference pixel candidate, a direction orthogonal to the set prediction direction with respect to the reference pixel candidate for each pixel in the processing region or a predetermined number of pixel groups. 2. The video encoding method according to claim 1, wherein an already encoded pixel that is present in a predetermined distance range from the reference pixel candidate is also set as the reference pixel candidate.

In the reference pixel candidate setting step, in addition to the reference pixel candidates, a predetermined angle range centered on a direction opposite to the set prediction direction is set for each pixel in the processing region or for a predetermined number of pixel groups. 2. The video encoding method according to claim 1, wherein an already encoded pixel that exists and is within a predetermined distance range from the processing region is also set as the reference pixel candidate.

The reference pixel dissimilarity calculating step includes a difference value between a depth map value corresponding to a pixel in the processing region or a predetermined number of pixel groups and a depth map value corresponding to the reference pixel candidate, and the processing by using the distance between the reference pixel and area, for each pixel or a predetermined number of pixel groups of the processing region, from claim 1, characterized in that to calculate the dissimilarity for each candidate of the reference pixels 3 The video encoding method according to any one of the above.

The reference pixel determination step, the sum is minimized with increasing function value for the distance between the reference pixel for smoothing the sum of the dissimilarity, the variation of the reference picture element in the processing region for each reference pixel as, for each pixel or a predetermined number of pixel groups of said processing region, the image of any one of claims 1 to 4, characterized in that to determine the reference pixels from the candidates of the reference pixel Encoding method.

The reference pixel determining step determines a plurality of reference pixels from the reference pixel candidates for each pixel in the processing region or a predetermined number of pixel groups based on the degree of difference,
The predicted image generating step, the following prediction Direction, video encoding method according to any one of claims 1 5, characterized in that to generate a predictive image using a plurality of reference pixels.

The predicted image generation step includes a weighting factor setting step of setting a weighting factor according to the degree of difference for each of the plurality of reference pixels determined for each pixel in the processing region or for a predetermined number of pixel groups. ,
In accordance with the prediction Direction, video encoding method according to claim 6, characterized in that generating the predicted image by using a weighted average or weighted median of the plurality of reference pixels using the weighting factor .

A video decoding method that decodes a video signal using intra-frame prediction on a processing area obtained by dividing each frame constituting the video into a predetermined size while using a depth map corresponding to the video. There,
An intra-frame prediction direction setting step for setting a prediction direction in the intra-frame prediction;
For each pixel or a predetermined number of pixel groups of said processing region, present in the prediction direction opposite the direction that the setting, as already candidates for reference pixels decoded pixels within a predetermined distance range from the processing region A reference pixel candidate setting step to be set;
Using the depth map corresponding to the processing region and the depth map corresponding to the reference pixel candidate, a difference degree for each reference pixel candidate is calculated for each pixel in the processing region or a predetermined number of pixel groups. A reference pixel difference calculating step,
A reference pixel determination step of determining a reference pixel from among the reference pixel candidates for each pixel in the processing region or a predetermined number of pixel groups based on the degree of difference;
A predictive image generating step of generating a predictive image for the processing region from the reference pixel according to the set prediction direction.

In the reference pixel candidate setting step, in addition to the reference pixel candidate, a direction orthogonal to the set prediction direction with respect to the reference pixel candidate for each pixel in the processing region or a predetermined number of pixel groups. The video decoding method according to claim 8, wherein pixels that have already been decoded and are within a predetermined distance range from the reference pixel candidate are also set as the reference pixel candidates.

In the reference pixel candidate setting step, in addition to the reference pixel candidates, a predetermined angle range centered on a direction opposite to the set prediction direction is set for each pixel in the processing region or for a predetermined number of pixel groups. 9. The video decoding method according to claim 8, wherein pixels that exist and have already been decoded within a predetermined distance range from the processing area are also set as candidates for the reference pixel.

The reference pixel dissimilarity calculating step includes a difference value between a depth map value corresponding to a pixel in the processing region or a predetermined number of pixel groups and a depth map value corresponding to the reference pixel candidate, and the processing by using the distance between the reference pixel and area, for each pixel or a predetermined number of pixel groups of the processing region, from claim 8, characterized in that to calculate the dissimilarity for each candidate of the reference pixels 10 The video decoding method according to any one of the above.

The reference pixel determination step, the sum is minimized with increasing function value for the distance between the reference pixel for smoothing the sum of the dissimilarity, the variation of the reference picture element in the processing region for each reference pixel as, for each pixel or a predetermined number of pixel groups of said processing region, the image of any one of claims 8 11, characterized by determining the reference pixels from the candidates of the reference pixel Decryption method.

The reference pixel determining step determines a plurality of reference pixels from the reference pixel candidates for each pixel in the processing region or a predetermined number of pixel groups based on the degree of difference,
The predictive image generation step, according to the prediction direction, the video decoding method according to any one of claims 8 12, wherein generating a predicted image using a plurality of reference pixels.

The predicted image generation step includes a weighting factor setting step of setting a weighting factor according to the degree of difference for each of the plurality of reference pixels determined for each pixel in the processing region or for a predetermined number of pixel groups. ,
The video decoding method according to claim 13 , wherein a predicted image is generated using a weighted average value or a weighted median value of the plurality of reference pixels using the weighting coefficient according to the prediction direction.

A video encoding device that performs predictive encoding using intra-frame prediction on a processing region obtained by dividing each frame constituting the video into a predetermined size while using a depth map corresponding to the video. There,
An intra-frame prediction direction setting means for setting a prediction direction in the intra-frame prediction;
For each pixel in the processing area or for a predetermined number of pixels, a pixel that exists in a direction opposite to the set prediction direction and is already encoded within a predetermined distance range from the processing area is a reference pixel candidate. Reference pixel candidate setting means for setting as:
Using the depth map corresponding to the processing region and the depth map corresponding to the reference pixel candidate, a difference degree for each reference pixel candidate is calculated for each pixel in the processing region or a predetermined number of pixel groups. Reference pixel dissimilarity calculating means
Reference pixel determining means for determining a reference pixel from among the reference pixel candidates for each pixel in the processing region or a predetermined number of pixel groups based on the degree of difference;
A video encoding device, comprising: predicted image generation means for generating a predicted image for the processing region from the reference pixel according to the set prediction direction.

The reference pixel candidate setting means, in addition to the reference pixel candidate, a direction orthogonal to the set prediction direction for the reference pixel candidate for each pixel in the processing region or a predetermined number of pixel groups. 16. The video encoding apparatus according to claim 15, wherein an already encoded pixel that is present in a predetermined distance range from the reference pixel candidate is also set as the reference pixel candidate.

In addition to the reference pixel candidates, the reference pixel candidate setting means has a predetermined angle range centered on a direction opposite to the set prediction direction for each pixel in the processing region or a predetermined number of pixel groups. 16. The video encoding apparatus according to claim 15, wherein an already encoded pixel that exists and is within a predetermined distance range from the processing region is also set as the reference pixel candidate.

A video decoding device that decodes a video signal using intra-frame prediction for a processing region obtained by dividing each frame constituting the video into a predetermined size while using a depth map corresponding to the video. There,
An intra-frame prediction direction setting means for setting a prediction direction in the intra-frame prediction;
For each pixel or a predetermined number of pixel groups of said processing region, present in the prediction direction opposite the direction that the setting, as already candidates for reference pixels decoded pixels within a predetermined distance range from the processing region Reference pixel candidate setting means for setting;
Using the depth map corresponding to the processing region and the depth map corresponding to the reference pixel candidate, a difference degree for each reference pixel candidate is calculated for each pixel in the processing region or a predetermined number of pixel groups. Reference pixel dissimilarity calculating means
Reference pixel determining means for determining a reference pixel from among the reference pixel candidates for each pixel in the processing region or a predetermined number of pixel groups based on the degree of difference;
A video decoding apparatus comprising: a predicted image generating unit configured to generate a predicted image for the processing region from the reference pixel according to the set prediction direction.

The reference pixel candidate setting means, in addition to the reference pixel candidate, a direction orthogonal to the set prediction direction for the reference pixel candidate for each pixel in the processing region or a predetermined number of pixel groups. 19. The video decoding apparatus according to claim 18, wherein already decoded pixels that are present in a predetermined distance range from the reference pixel candidates are also set as the reference pixel candidates.

In addition to the reference pixel candidates, the reference pixel candidate setting means has a predetermined angle range centered on a direction opposite to the set prediction direction for each pixel in the processing region or a predetermined number of pixel groups. 19. The video decoding apparatus according to claim 18, wherein an already decoded pixel that exists and is within a predetermined distance range from the processing region is also set as the reference pixel candidate.

A video encoding program for causing a computer to execute the video encoding method according to any one of claims 1 to 7 .

A video decoding program for causing a computer to execute the video decoding method according to any one of claims 8 to 14 .