JP2013157950A

JP2013157950A - Encoding method, decoding method, encoder, decoder, encoding program and decoding program

Info

Publication number: JP2013157950A
Application number: JP2012019176A
Authority: JP
Inventors: Shinya Shimizu; 信哉志水; Shiori Sugimoto; 志織杉本; Hideaki Kimata; 英明木全; Akira Kojima; 明小島
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2012-01-31
Filing date: 2012-01-31
Publication date: 2013-08-15
Anticipated expiration: 2032-01-31
Also published as: JP5809574B2

Abstract

PROBLEM TO BE SOLVED: To enhance encoding quality by filtering a depth map when the video for a depth map to be encoded can also be obtained on the decoding side, thereby reducing the encoding noise and restoring the edge, and to reduce the code amount required for encoding the predictive residual by enhancing the prediction efficiency of a depth map.SOLUTION: A degree of similarity calculation unit 1091 calculates the degree of similarity of a reference pixel for a pixel to be filtered, from a corresponding video signal. A reliability distribution estimation unit 1092 estimates the reliability distribution for the pixel value of a pixel to be filtered, from a pair of a pixel value obtained for every reference pixel and the degree of similarity. A pixel value determination unit 1093 outputs a pixel value giving a maximum reliability, as the pixel value of a pixel to be filtered.

Description

本発明は、映像情報等を符号化または復号する符号化方法、復号方法、符号化装置、復号装置、符号化プログラム及び復号プログラムに関する。 The present invention relates to an encoding method, a decoding method, an encoding device, a decoding device, an encoding program, and a decoding program for encoding or decoding video information and the like.

自由視点映像とは、撮影空間内でのカメラの位置や向き（以下、視点と称する）をユーザが自由に指定できる映像のことである。自由視点映像では、ユーザが任意の視点を指定するため、その全ての可能性に対して映像を保持することは不可能である。そのため、自由視点映像は、指定された視点の映像を生成するのに必要な情報群によって構成される。なお、自由視点映像は、自由視点テレビ、任意視点映像、任意視点テレビなどと呼ばれることもある。 A free viewpoint video is a video that allows the user to freely specify the position and orientation (hereinafter referred to as the viewpoint) of the camera in the shooting space. In the free viewpoint video, since the user designates an arbitrary viewpoint, it is impossible to hold the video for all the possibilities. For this reason, the free viewpoint video is composed of a group of information necessary to generate a video of the designated viewpoint. Note that the free viewpoint video may also be referred to as a free viewpoint television, an arbitrary viewpoint video, an arbitrary viewpoint television, or the like.

自由視点映像は様々なデータ形式を用いて表現されるが、最も一般的な形式として映像とその映像の各フレームに対するデプスマップ（距離画像）を用いる方式がある（例えば、非特許文献１参照）。ここで、デプスマップとは、カメラから被写体までのデプス（距離）を画素ごとに表現したものであり、被写体の三次元的な位置を表現している。デプスは二つのカメラ間の視差の逆数に比例しているため、ディスパリティマップ（視差画像）と呼ばれることもある。コンピュータグラフィックスの分野では、デプスはＺバッファに蓄積された情報となるためＺ画像やＺマップと呼ばれることもある。なお、カメラから被写体までの距離の他に、表現対象空間上に張られた三次元座標系のＺ軸に対する座標値をデプスとして用いることもある。 A free viewpoint video is expressed using various data formats. As a most general format, there is a method using a video and a depth map (distance image) for each frame of the video (for example, see Non-Patent Document 1). . Here, the depth map is a representation of the depth (distance) from the camera to the subject for each pixel, and represents the three-dimensional position of the subject. Since the depth is proportional to the reciprocal of the parallax between the two cameras, it is sometimes called a disparity map (parallax image). In the field of computer graphics, the depth is information stored in the Z buffer, so it is sometimes called a Z image or a Z map. In addition to the distance from the camera to the subject, a coordinate value with respect to the Z axis of the three-dimensional coordinate system stretched on the expression target space may be used as the depth.

一般に、撮影された画像に対して水平方向をＸ軸、垂直方向をＹ軸とするため、Ｚ軸はカメラの向きと一致するが、複数のカメラに対して共通の座標系を用いる場合など、Ｚ軸がカメラの向きと一致しない場合もある。以下では、距離・Ｚ値を区別せずにデプスと呼び、デプスを画素値として表した画像をデプスマップと呼ぶ。ただし、厳密にはディスパリティマップでは基準となるカメラ対を設定する必要がある。 In general, since the horizontal direction is the X axis and the vertical direction is the Y axis with respect to the captured image, the Z axis coincides with the direction of the camera, but when a common coordinate system is used for a plurality of cameras, etc. In some cases, the Z-axis does not match the camera orientation. Hereinafter, the distance and the Z value are referred to as depth without distinction, and an image representing the depth as a pixel value is referred to as a depth map. However, strictly speaking, it is necessary to set a reference camera pair in the disparity map.

デプスを画素値として表す際に、物理量に対応する値をそのまま画素値とする方法と、最小値と最大値の間をある数に量子化して得られる値を用いる方法と、最小値からの差をあるステップ幅で量子化して得られる値を用いる方法がある。表現したい範囲が限られている場合には、最小値などの付加情報を用いるほうがデプスを高精度に表現することができる。また、等間隔に量子化する際に、物理量をそのまま量子化する方法と物理量の逆数を量子化する方法とがある。距離の逆数は視差に比例した値となるため、距離を高精度に表現する必要がある場合には、前者が使用され、視差を高精度に表現する必要がある場合には、後者が使用されることが多い。以下では、デプスの画素値化の方法や量子化の方法に関係なく、デプスが画像として表現されたものを全てデプスマップと呼ぶ。 When expressing the depth as a pixel value, the value corresponding to the physical quantity is directly used as the pixel value, the method using a value obtained by quantizing the value between the minimum value and the maximum value into a certain number, and the difference from the minimum value. There is a method of using a value obtained by quantizing with a step width. When the range to be expressed is limited, the depth can be expressed with higher accuracy by using additional information such as a minimum value. In addition, when quantizing at equal intervals, there are a method of quantizing a physical quantity as it is and a method of quantizing an inverse of a physical quantity. Since the reciprocal of the distance is a value proportional to the parallax, the former is used when the distance needs to be expressed with high accuracy, and the latter is used when the parallax needs to be expressed with high accuracy. Often. In the following description, everything in which depth is expressed as an image is referred to as a depth map regardless of the pixel value conversion method or the quantization method.

デプスマップは、各画素が一つの値を持つ画像として表現されるため、グレースケール画像とみなすことができる。また、被写体が実空間上で連続的に存在し、瞬間的に離れた位置へ移動することができないため、画像信号と同様に空間的相関及び時間的相関を持つと言える。したがって、通常の画像信号や映像信号を符号化するために用いられる画像符号化方式や映像符号化方式によって、デプスマップや連続するデプスマップで構成される映像を空間的冗長性や時間的冗長性を取り除きながら効率的に符号化することが可能である。以下では、デプスマップとその映像を区別せずにデプスマップと呼ぶ。 The depth map can be regarded as a grayscale image because each pixel is expressed as an image having one value. In addition, it can be said that the subject has a spatial correlation and a temporal correlation as in the case of the image signal because the subject exists continuously in the real space and cannot move instantaneously to a distant position. Therefore, depending on the image coding method and video coding method used to encode normal image signals and video signals, images composed of depth maps and continuous depth maps can be spatially and temporally redundant. It is possible to efficiently encode while removing. Hereinafter, the depth map and its video are referred to as a depth map without distinction.

ここで、一般的な映像符号化について説明する。規格Ｈ．２６４／ＡＶＣに代表される近年の映像符号化では、被写体が空間的及び時間的に連続しているという特徴を利用して効率的な符号化を実現するために、映像の各フレームをマクロブロックと呼ばれる処理単位ブロックに分割し、マクロブロックごとにその映像信号を空間的または時間的に予測し、その予測方法を示す予測情報と予測残差とを符号化する。映像信号を空間的に予測する場合は、例えば空間的な予測の方向を示す情報が予測情報となり、時間的に予測する場合は、例えば参照するフレームを示す情報とそのフレーム中の位置を示す情報とが予測情報となる。通常の映像符号化では、劇的に符号量を削減するために、主観品質を考慮しながらこの予測残差をロッシー符号化することで効率的な符号化を実現している（規格Ｈ．２６４／ＡＶＣの詳細については、例えば、非特許文献２参照）。 Here, general video coding will be described. Standard H. In recent video coding represented by H.264 / AVC, in order to realize efficient coding using the feature that the subject is spatially and temporally continuous, each frame of the video is macroblocked. Are divided into processing unit blocks, and the video signal is predicted spatially or temporally for each macroblock, and prediction information indicating a prediction method and a prediction residual are encoded. When predicting a video signal spatially, for example, information indicating the direction of spatial prediction becomes prediction information, and when predicting temporally, for example, information indicating a frame to be referenced and information indicating a position in the frame Is prediction information. In normal video coding, in order to dramatically reduce the amount of codes, efficient coding is realized by lossy coding of this prediction residual in consideration of subjective quality (standard H.264). For details of / AVC, see Non-Patent Document 2, for example).

予測残差をロッシー符号化することで、符号化したフレームにはノイズが重畳する。映像信号を時間的に予測する際に、そのような符号化ノイズが重畳したフレームを参照フレームとして用いると、予測信号にもノイズが重畳し、予測効率が低下してしまう。そのため、符号化したフレームに対してノイズを低減するフィルタを適用することで予測効率を向上している。ＨＥＶＣと呼ばれる新しい映像符号化では、デブロッキングフィルタと適応的インループフィルタが用いられている（ＨＥＶＣの詳細については、例えば、非特許文献３参照）。 By performing lossy encoding of the prediction residual, noise is superimposed on the encoded frame. When a video signal is predicted temporally, if a frame on which such encoding noise is superimposed is used as a reference frame, noise is also superimposed on the prediction signal, resulting in a decrease in prediction efficiency. Therefore, the prediction efficiency is improved by applying a filter that reduces noise to the encoded frame. In a new video encoding called HEVC, a deblocking filter and an adaptive in-loop filter are used (for details of HEVC, see Non-Patent Document 3, for example).

しかしながら、これらのフィルタでは写っている被写体の違いを区別しないため、デプスマップのような値が被写体に大きく依存するようなフレームにおいては、適切なノイズ除去を実現できないという問題がある。この問題に対して、デプスマップに対応する映像フレームが一緒に符号化される場合、非特許文献４では、映像フレームにおける画素ごとの類似度を用いることで、画素毎に写っている被写体を区別しながらフィルタ処理を行うことで、値が被写体に大きく依存するようなフレームにおいても適切なノイズ除去を実現している。 However, since these filters do not distinguish between the differences in the photographed subject, there is a problem that appropriate noise removal cannot be realized in a frame in which a value such as a depth map greatly depends on the subject. In order to solve this problem, when video frames corresponding to the depth map are encoded together, Non-Patent Document 4 uses a similarity for each pixel in the video frame to distinguish an object captured for each pixel. However, by performing the filtering process, appropriate noise removal is realized even in a frame whose value greatly depends on the subject.

Y. Mori, N. Fukusima, T. Fuji, and M. Tanimoto, “View Generation with 3D Warping Using Depth Information for FTV ”,In Proceedings of 3DTV-CON2008, pp. 229-232, May 2008.Y. Mori, N. Fukusima, T. Fuji, and M. Tanimoto, “View Generation with 3D Warping Using Depth Information for FTV”, In Proceedings of 3DTV-CON2008, pp. 229-232, May 2008. Rec. ITU-T H.264,“Advanced video coding for generic audiovisual services”, March 2009.Rec. ITU-T H.264, “Advanced video coding for generic audiovisual services”, March 2009. B. Boss, W.-J. Han, J.-R. Ohm, G. J. Sullivan, and T. Wiegand,“WD5: Working Draft 5 of High-Efficiency Video Coding”, JCTVC-G1103_d0, pp.129-150, 7th JCT-VC Meeting, November 2011.B. Boss, W.-J. Han, J.-R. Ohm, GJ Sullivan, and T. Wiegand, “WD5: Working Draft 5 of High-Efficiency Video Coding”, JCTVC-G1103_d0, pp.129-150, 7th JCT-VC Meeting, November 2011. S. Liu, P. Lai, D. Tian, and C. W. Chen,“New Depth Coding Techniques With Utilization of Corresponding Video”, IEEE Transactions on Broadcasting, vol.57, no.2, pp.551-561, June 2011.S. Liu, P. Lai, D. Tian, and C. W. Chen, “New Depth Coding Techniques With Utilization of Corresponding Video”, IEEE Transactions on Broadcasting, vol.57, no.2, pp.551-561, June 2011.

しかしながら、非特許文献４の方法では、フィルタ対象画素の周辺に存在する複数の参照画素ごとに、フィルタ対象画素に対する類似度を計算し、その類似度による参照画素の画素値の重み付き和によってフィルタ結果を生成するため、処理フレームは平滑されることになり、デプスマップで重要な強いエッジを保つことができない。また、符号化処理によって失ってしまった強いエッジを復元することもできない。 However, in the method of Non-Patent Document 4, the similarity to the filter target pixel is calculated for each of the plurality of reference pixels existing around the filter target pixel, and the filter is performed by the weighted sum of the pixel values of the reference pixel based on the similarity. In order to produce a result, the processing frame will be smoothed, and important strong edges cannot be preserved in the depth map. Also, it is impossible to restore strong edges that have been lost due to the encoding process.

この問題に対して、類似度に基づいて参照画素を絞り込む事で、平滑化の影響を弱めることができる。特に参照画素を１つに絞り込むことで被写体に依存したエッジを生成することができる。しかしながら、そのような方法では、類似度の高い参照画素にノイズが含まれる場合に、ノイズを除去できないだけでなく、ノイズを拡散してしまうという問題がある。 For this problem, the effect of smoothing can be weakened by narrowing down the reference pixels based on the similarity. In particular, by narrowing down the reference pixel to one, an edge depending on the subject can be generated. However, in such a method, when noise is included in a reference pixel having a high degree of similarity, there is a problem that not only the noise cannot be removed but also the noise is diffused.

また、デプスマップに対応する映像フレームも符号化されていることから、そこから算出される類似度も符号化ノイズの影響を受ける。そのため、類似度を直接用いた絞込みでは適切な参照画素を絞りこむことができない。特に参照画素を１つに絞込む際に、誤った参照画素が選択されてしまった場合は、本来は存在しなかったエッジを生成してノイズを増加させてしまうことがある。 Further, since the video frame corresponding to the depth map is also encoded, the similarity calculated therefrom is also affected by the encoding noise. Therefore, it is not possible to narrow down an appropriate reference pixel by narrowing down directly using similarity. In particular, when an erroneous reference pixel is selected when narrowing down to one reference pixel, an edge that did not originally exist may be generated to increase noise.

本発明は、このような事情に鑑みてなされたもので、符号化結果の品質向上及び予測効率改善により符号量を削減することができる符号化方法、復号方法、符号化装置、復号装置、符号化プログラム及び復号プログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and an encoding method, a decoding method, an encoding device, a decoding device, and an encoding method that can reduce the amount of code by improving the quality of the encoding result and improving the prediction efficiency. An object is to provide an encryption program and a decryption program.

本発明は、映像情報に対応するデプスマップの符号化を行う符号化方法であって、前記デプスマップの符号化対象フレーム情報の予測情報を生成する予測ステップと、前記符号化対象フレーム情報と前記予測情報とから生成した予測残差情報を符号化し、該符号化予測残差情報と、該符号化予測残差情報を復号した復号予測残差情報を出力する予測残差符号化ステップと、前記予測ステップにおいて用いた予測方法を示す情報と、前記符号化予測残差情報とを符号化して出力する符号化ステップと、前記予測情報と、前記復号予測残差情報とから生成した一時復号情報を出力する一時復号ステップと、前記符号化対象フレーム情報の各画素に対して参照画素を設定する参照画素設定ステップと、前記符号化対象フレーム情報に対応する前記映像情報を用いて、前記参照画素と、前記参照画素が設定された画素との類似度を計算する類似度計算ステップと、前記類似度と、前記一時復号情報とから、前記参照画素が設定された画素の復号情報に対する信頼度分布情報を推定する信頼度分布推定ステップと、前記符号化対象フレーム情報の画素毎に、前記信頼度分布情報において、最大の信頼度を与える画素値を、前記復号情報として出力する復号ステップとを有し、前記予測ステップにおいては、前記復号ステップにより出力する前記復号情報に基づいて前記予測情報を出力することを特徴とする。 The present invention is an encoding method for encoding a depth map corresponding to video information, a prediction step for generating prediction information of encoding target frame information of the depth map, the encoding target frame information, and the encoding information A prediction residual encoding step of encoding prediction residual information generated from the prediction information, and outputting the encoded prediction residual information and decoded prediction residual information obtained by decoding the encoded prediction residual information; Temporary decoding information generated from the information indicating the prediction method used in the prediction step, the encoding step for encoding and outputting the encoded prediction residual information, the prediction information, and the decoded prediction residual information. A temporary decoding step for outputting; a reference pixel setting step for setting a reference pixel for each pixel of the encoding target frame information; and the video corresponding to the encoding target frame information. Using the information, the reference pixel is set from the similarity calculation step for calculating the similarity between the reference pixel and the pixel in which the reference pixel is set, the similarity, and the temporary decoding information A reliability distribution estimation step for estimating reliability distribution information with respect to decoding information of a pixel, and a pixel value that gives maximum reliability in the reliability distribution information for each pixel of the encoding target frame information; A decoding step that outputs the prediction information, and the prediction step outputs the prediction information based on the decoding information output by the decoding step.

本発明は、前記信頼度分布推定ステップは、前記符号化対象フレーム情報の各画素に対する前記信頼度分布情報を初期化する信頼度分布初期化ステップと、前記信頼度分布情報を更新する範囲を設定する更新範囲設定ステップと、前記更新する範囲に含まれる画素値毎に、前記参照画素の重み係数を設定する重み係数設定ステップと、前記類似度と前記重み係数とを用いて、前記更新する範囲に含まれる画素値の更新信頼度を計算する更新信頼度計算ステップと、前記更新信頼度に基づいて前記信頼度情報を更新する信頼度情報更新ステップとを含むことを特徴とする。 In the present invention, the reliability distribution estimation step sets a reliability distribution initialization step for initializing the reliability distribution information for each pixel of the encoding target frame information, and sets a range for updating the reliability distribution information. An updating range setting step, a weighting factor setting step for setting a weighting factor of the reference pixel for each pixel value included in the updating range, and the range to be updated using the similarity and the weighting factor. An update reliability calculation step for calculating an update reliability of the pixel value included in the image, and a reliability information update step for updating the reliability information based on the update reliability.

本発明は、前記符号化対象フレーム情報の各画素に対して、前記参照画素の画素値の最大値と最小値とから、前記信頼度の定義域を設定する定義域設定ステップを更に含み、前記信頼度分布推定ステップでは、前記符号化対象フレーム情報の画素毎に設定された前記定義域に対してのみ、前記信頼度分布情報を推定することを特徴とする。 The present invention further includes a domain setting step for setting the domain of the reliability from the maximum value and the minimum value of the pixel value of the reference pixel for each pixel of the encoding target frame information, In the reliability distribution estimation step, the reliability distribution information is estimated only for the domain defined for each pixel of the encoding target frame information.

本発明は、映像情報に対応するデプスマップ情報が符号化された符号データを復号する復号方法であって、前記符号データから予測方法を示す情報と復号予測残差情報を復号する復号ステップと、前記復号された前記予測方法を示す情報に従って復号対象フレーム情報の予測情報を出力する予測ステップと、前記予測情報と、前記復号予測残差情報とから一時復号情報を生成する一時復号ステップと、前記復号対象フレーム情報の各画素に対して参照画素を設定する参照画素設定ステップと、前記復号対象フレーム情報に対応する前記映像情報を用いて、前記参照画素と、前記参照画素が設定された画素との類似度を計算する類似度計算ステップと、前記類似度と、前記一時復号情報とから、前記参照画素が設定された画素の復号情報に対する信頼度分布情報を推定する信頼度分布推定ステップと、前記復号対象フレーム情報の画素毎に、前記信頼度分布情報において、最大の信頼度を与える画素値を、前記復号情報として出力する復号ステップとを有し、前記予測ステップにおいては、前記復号ステップにより出力する前記復号情報に基づいて前記予測情報を出力することを特徴とする。 The present invention is a decoding method for decoding code data in which depth map information corresponding to video information is encoded, and a decoding step for decoding information indicating a prediction method and decoded prediction residual information from the code data; A prediction step of outputting prediction information of decoding target frame information according to the decoded information indicating the prediction method; a temporary decoding step of generating temporary decoding information from the prediction information; and the decoded prediction residual information; A reference pixel setting step for setting a reference pixel for each pixel of the decoding target frame information, and the reference pixel and a pixel in which the reference pixel is set using the video information corresponding to the decoding target frame information The similarity calculation step of calculating the similarity of the image, the similarity, and the temporary decoding information, the decoding information of the pixel in which the reference pixel is set A reliability distribution estimation step for estimating reliability distribution information, and a decoding step for outputting, as the decoding information, a pixel value giving the maximum reliability in the reliability distribution information for each pixel of the decoding target frame information; In the prediction step, the prediction information is output based on the decoded information output in the decoding step.

本発明は、前記信頼度分布推定ステップは、前記復号対象フレーム情報の各画素に対する前記信頼度分布情報を初期化する信頼度分布初期化ステップと、前記信頼度分布情報を更新する範囲を設定する更新範囲設定ステップと、前記更新する範囲に含まれる画素値毎に、前記参照画素の重み係数を設定する重み係数設定ステップと、前記類似度と前記重み係数とを用いて、前記更新する範囲に含まれる画素値の更新信頼度を計算する更新信頼度計算ステップと前記更新信頼度に基づいて前記信頼度情報を更新する信頼度情報更新ステップとを含むことを特徴とする。 In the present invention, the reliability distribution estimation step sets a reliability distribution initialization step for initializing the reliability distribution information for each pixel of the decoding target frame information, and a range for updating the reliability distribution information. An update range setting step, a weighting factor setting step for setting a weighting factor of the reference pixel for each pixel value included in the updating range, and the range to be updated using the similarity and the weighting factor. An update reliability calculation step for calculating an update reliability of the included pixel value and a reliability information update step for updating the reliability information based on the update reliability are included.

本発明は、前記復号対象フレーム情報の各画素に対して、前記参照画素の画素値の最大値と最小値とから、前記信頼度の定義域を設定する定義域設定ステップを更に含み、前記信頼度分布推定ステップでは、前記復号対象フレーム情報の画素毎に設定された前記定義域に対してのみ、前記信頼度分布情報を推定することを特徴とする。 The present invention further includes a domain setting step for setting a domain of the reliability from the maximum value and the minimum value of the pixel value of the reference pixel for each pixel of the decoding target frame information, In the degree distribution estimation step, the reliability degree distribution information is estimated only for the domain defined for each pixel of the decoding target frame information.

本発明は、映像情報に対応するデプスマップの符号化を行う符号化装置であって、前記デプスマップの符号化対象フレーム情報の予測情報を生成する予測手段と、前記符号化対象フレーム情報と前記予測情報とから生成した予測残差情報を符号化し、該符号化予測残差情報と、該符号化予測残差情報を復号した復号予測残差情報を出力する予測残差符号化手段と、前記予測手段において用いた予測方法を示す情報と、前記符号化予測残差情報とを符号化して出力する符号化手段と、前記予測情報と、前記復号予測残差情報とから生成した一時復号情報を出力する一時復号手段と、前記符号化対象フレーム情報の各画素に対して参照画素を設定する参照画素設定手段と、前記符号化対象フレーム情報に対応する前記映像情報を用いて、前記参照画素と、前記参照画素が設定された画素との類似度を計算する類似度計算手段と、前記類似度と、前記一時復号情報とから、前記参照画素が設定された画素の復号情報に対する信頼度分布情報を推定する信頼度分布推定手段と、前記符号化対象フレーム情報の画素毎に、前記信頼度分布情報において、最大の信頼度を与える画素値を、前記復号情報として出力する復号手段とを備え、前記予測手段は、前記復号手段により出力する前記復号情報に基づいて前記予測情報を出力することを特徴とする。 The present invention is an encoding device that performs encoding of a depth map corresponding to video information, a prediction unit that generates prediction information of encoding target frame information of the depth map, the encoding target frame information, and the encoding information Prediction residual encoding means for encoding prediction residual information generated from prediction information, and outputting the encoded prediction residual information and decoded prediction residual information obtained by decoding the encoded prediction residual information; Temporary decoding information generated from information indicating a prediction method used in the prediction means, encoding means for encoding and outputting the encoded prediction residual information, the prediction information, and the decoded prediction residual information Using the temporal decoding means for outputting, reference pixel setting means for setting a reference pixel for each pixel of the encoding target frame information, and the video information corresponding to the encoding target frame information, the reference A degree of similarity with respect to the decoding information of the pixel for which the reference pixel is set from the similarity calculation means for calculating the degree of similarity between the element and the pixel for which the reference pixel is set, the similarity and the temporary decoding information Reliability distribution estimation means for estimating distribution information; and decoding means for outputting, as the decoding information, a pixel value giving the maximum reliability in the reliability distribution information for each pixel of the encoding target frame information. And the prediction means outputs the prediction information based on the decoded information output by the decoding means.

本発明は、前記信頼度分布推定手段は、前記符号化対象フレーム情報の各画素に対する前記信頼度分布情報を初期化する信頼度分布初期化手段と、前記信頼度分布情報を更新する範囲を設定する更新範囲設定手段と、前記更新する範囲に含まれる画素値毎に、前記参照画素の重み係数を設定する重み係数設定手段と、前記類似度と前記重み係数とを用いて、前記更新する範囲に含まれる画素値の更新信頼度を計算する更新信頼度計算手段と、前記更新信頼度に基づいて前記信頼度情報を更新する信頼度情報更新手段とを含むことを特徴とする。 In the present invention, the reliability distribution estimation means sets reliability distribution initialization means for initializing the reliability distribution information for each pixel of the encoding target frame information, and sets a range for updating the reliability distribution information. Update range setting means, weight coefficient setting means for setting a weight coefficient of the reference pixel for each pixel value included in the update range, and the range to be updated using the similarity and the weight coefficient Update reliability calculation means for calculating the update reliability of the pixel value included in the image data, and reliability information update means for updating the reliability information based on the update reliability.

本発明は、前記符号化対象フレーム情報の各画素に対して、前記参照画素の画素値の最大値と最小値とから、前記信頼度の定義域を設定する定義域設定手段を更に備え、前記信頼度分布推定手段では、前記符号化対象フレーム情報の画素毎に設定された前記定義域に対してのみ、前記信頼度分布情報を推定することを特徴とする。 The present invention further comprises domain setting means for setting the domain of the reliability from the maximum value and the minimum value of the pixel value of the reference pixel for each pixel of the encoding target frame information, The reliability distribution estimation means estimates the reliability distribution information only for the definition area set for each pixel of the encoding target frame information.

本発明は、映像情報に対応するデプスマップ情報が符号化された符号データを復号する復号装置であって、前記符号データから予測方法を示す情報と復号予測残差情報を復号する復号手段と、前記復号された前記予測方法を示す情報に従って復号対象フレーム情報の予測情報を出力する予測手段と、前記予測情報と、前記復号予測残差情報とから一時復号情報を生成する一時復号手段と、前記復号対象フレーム情報の各画素に対して参照画素を設定する参照画素設定手段と、前記復号対象フレーム情報に対応する前記映像情報を用いて、前記参照画素と、前記参照画素が設定された画素との類似度を計算する類似度計算手段と、前記類似度と、前記一時復号情報とから、前記参照画素が設定された画素の復号情報に対する信頼度分布情報を推定する信頼度分布推定手段と、前記復号対象フレーム情報の画素毎に、前記信頼度分布情報において、最大の信頼度を与える画素値を、前記復号情報として出力する復号手段とを備え、前記予測手段は、前記復号手段により出力する前記復号情報に基づいて前記予測情報を出力することを特徴とする。 The present invention is a decoding device for decoding code data in which depth map information corresponding to video information is encoded, the decoding means for decoding information indicating a prediction method and decoded prediction residual information from the code data, Prediction means for outputting prediction information of decoding target frame information according to the decoded information indicating the prediction method; temporary decoding means for generating temporary decoding information from the prediction information; and the decoded prediction residual information; Reference pixel setting means for setting a reference pixel for each pixel of decoding target frame information, the reference pixel, and a pixel in which the reference pixel is set using the video information corresponding to the decoding target frame information The degree of reliability distribution information for the decoding information of the pixel in which the reference pixel is set is deduced from the similarity calculation means for calculating the similarity of the reference pixel, the similarity, and the temporary decoding information. A reliability distribution estimation unit that performs decoding, and a decoding unit that outputs, as the decoding information, a pixel value that gives the maximum reliability in the reliability distribution information for each pixel of the decoding target frame information, and the prediction unit Is characterized in that the prediction information is output based on the decoded information output by the decoding means.

本発明は、前記信頼度分布推定手段は、前記復号対象フレーム情報の各画素に対する前記信頼度分布情報を初期化する信頼度分布初期化手段と、前記信頼度分布情報を更新する範囲を設定する更新範囲設定手段と、前記更新する範囲に含まれる画素値毎に、前記参照画素の重み係数を設定する重み係数設定手段と、前記類似度と前記重み係数とを用いて、前記更新する範囲に含まれる画素値の更新信頼度を計算する更新信頼度計算手段と前記更新信頼度に基づいて前記信頼度情報を更新する信頼度情報更新手段とを含むことを特徴とする。 In the present invention, the reliability distribution estimation means sets reliability distribution initialization means for initializing the reliability distribution information for each pixel of the decoding target frame information, and sets a range for updating the reliability distribution information. Update range setting means, weight coefficient setting means for setting a weight coefficient of the reference pixel for each pixel value included in the range to be updated, and the range to be updated using the similarity and the weight coefficient Update reliability calculation means for calculating the update reliability of the pixel value included, and reliability information update means for updating the reliability information based on the update reliability.

本発明は、前記復号対象フレーム情報の各画素に対して、前記参照画素の画素値の最大値と最小値とから、前記信頼度の定義域を設定する定義域設定手段を更に備え、前記信頼度分布推定手段では、前記復号対象フレーム情報の画素毎に設定された前記定義域に対してのみ、前記信頼度分布情報を推定することを特徴とする。 The present invention further comprises a domain setting means for setting a domain of the reliability from the maximum value and the minimum value of the pixel value of the reference pixel for each pixel of the decoding target frame information, The degree distribution estimation means estimates the degree of reliability distribution information only for the domain defined for each pixel of the decoding target frame information.

本発明は、コンピュータを前記符号化装置として機能させるための符号化プログラムである。 The present invention is an encoding program for causing a computer to function as the encoding device.

本発明は、コンピュータを前記復号装置として機能させるための復号プログラムである。 The present invention is a decoding program for causing a computer to function as the decoding device.

本発明によれば、デプスマップのように被写体に大きく依存した値を持つデータをそれに対応する映像と一緒に伝送する場合、符号化によって発生したノイズを低減するとともに、符号化によって平滑化されてしまったエッジを復元することが可能となる。そのため、符号化結果の品質向上及び予測効率改善による符号量の削減を達成することが可能になるという効果が得られる。 According to the present invention, when data having a value that greatly depends on a subject, such as a depth map, is transmitted together with a corresponding video, noise generated by encoding is reduced and smoothed by encoding. It becomes possible to restore the edge that has been closed. For this reason, it is possible to achieve an effect that it is possible to achieve a reduction in code amount by improving the quality of encoding results and improving prediction efficiency.

本発明の第１実施形態によるデプスマップ符号化装置の構成を示すブロック図である。It is a block diagram which shows the structure of the depth map encoding apparatus by 1st Embodiment of this invention. 復号フレームフィルタ部の構成を示すブロック図である。It is a block diagram which shows the structure of a decoding frame filter part. 図１に示すデプスマップ符号化装置の処理フローチャートである。It is a process flowchart of the depth map encoding apparatus shown in FIG. フィルタ処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of a filter process. 信頼度の定義域を制限することで演算量を削減したフィルタ処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of the filter process which reduced the computational complexity by restrict | limiting the domain of reliability. 参照画素ごとに信頼度分布の更新範囲を設定して、一部の信頼度のみを更新することで信頼度分布を推定することで、フィルタ処理を行う場合のフローチャートである。It is a flowchart in the case of performing filter processing by setting the update range of the reliability distribution for each reference pixel and estimating the reliability distribution by updating only a part of the reliability. 参照画素ごとに信頼度分布の更新範囲を設定して、一部の信頼度のみを更新することで信頼度分布を推定することで、フィルタ処理を行う場合のより具体的な処理の例を記載したフローチャートである。An example of more specific processing when filtering is performed by setting the update range of the reliability distribution for each reference pixel and estimating the reliability distribution by updating only part of the reliability. This is a flowchart. 本発明の第２実施形態によるデプスマップ復号装置の構成を示すブロック図である。It is a block diagram which shows the structure of the depth map decoding apparatus by 2nd Embodiment of this invention. 図８に示すデプスマップ復号装置の処理フローチャートである。It is a process flowchart of the depth map decoding apparatus shown in FIG. デプスマップ符号化装置をコンピュータとソフトウェアプログラムによって構成する場合のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example in the case of comprising a depth map encoding apparatus by a computer and a software program. デプスマップ復号装置をコンピュータとソフトウェアプログラムによって構成する場合のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example in the case of comprising a depth map decoding apparatus by a computer and a software program.

以下、図面を参照して、本発明の一実施形態によるデプスマップ符号化装置を説明する。図１は同実施形態におけるデプスマップ符号化装置の構成を示すブロック図である。デプスマップ符号化装置１００は、符号化対象デプスマップ入力部１０１、デプスマップメモリ１０２、映像入力部１０３、映像メモリ１０４、デプス予測部１０５、予測残差符号化部１０６、可変長符号化部１０７、ビットストリーム出力部１０８、復号フレームフィルタ部１０９及び参照フレームメモリ１１０を備える。 Hereinafter, a depth map encoding apparatus according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a depth map encoding apparatus according to the embodiment. The depth map encoding apparatus 100 includes an encoding target depth map input unit 101, a depth map memory 102, a video input unit 103, a video memory 104, a depth prediction unit 105, a prediction residual encoding unit 106, and a variable length encoding unit 107. A bit stream output unit 108, a decoded frame filter unit 109, and a reference frame memory 110.

符号化対象デプスマップ入力部１０１は、符号化対象となるデプスマップ情報を入力する。以下の説明では、この符号化対象となるデプスマップ情報のことを符号化対象デプスマップと称し、特に処理を行うフレームを符号化対象フレームと称する。デプスマップメモリ１０２は、入力した符号化対象デプスマップを記憶する。映像入力部１０３は、符号化対象デプスマップに対応する映像情報を入力する。この映像の各フレームの各画素に写っている被写体のデプスが符号化対象デプスマップで表される。以下の説明では、この映像情報を補助映像と称し、符号化対象フレームに対する補助映像のフレームを補助映像フレームと称する。映像メモリ１０４は、入力された映像を記憶する。 The encoding target depth map input unit 101 inputs depth map information to be encoded. In the following description, the depth map information to be encoded is referred to as an encoding target depth map, and a frame to be processed in particular is referred to as an encoding target frame. The depth map memory 102 stores the input encoding target depth map. The video input unit 103 inputs video information corresponding to the encoding target depth map. The depth of the subject shown in each pixel of each frame of the video is represented by the encoding target depth map. In the following description, this video information is referred to as an auxiliary video, and the auxiliary video frame corresponding to the encoding target frame is referred to as an auxiliary video frame. The video memory 104 stores the input video.

デプス予測部１０５は、符号化対象フレームに対する予測信号を生成する。予測残差符号化部１０６は、符号化対象フレーム（原信号）と予測信号との差分の予測残差信号を符号化し、符号化対象予測残差値とその復号値に対応する復号予測残差信号を出力する。可変長符号化部１０７は、予測方法を示す情報と符号化対象予測残差値を可変長符号化してビットストリームを生成する。ビットストリーム出力部１０８は、符号化結果となるビットストリームを出力する。復号フレームフィルタ部１０９は、補助映像フレーム（補助映像信号）を用いながら、予測信号と復号予測残差信号の和で得られる一時復号信号にフィルタをかける。フィルタ結果の復号信号は参照フレームメモリ１１０に蓄えられ、映像予測に使用される。 The depth prediction unit 105 generates a prediction signal for the encoding target frame. The prediction residual encoding unit 106 encodes the prediction residual signal of the difference between the encoding target frame (original signal) and the prediction signal, and decodes the prediction residual corresponding to the encoding target prediction residual value and the decoded value thereof. Output a signal. The variable length coding unit 107 performs variable length coding on the information indicating the prediction method and the encoding target prediction residual value to generate a bitstream. The bit stream output unit 108 outputs a bit stream that is an encoding result. The decoded frame filter unit 109 filters the temporary decoded signal obtained by the sum of the prediction signal and the decoded prediction residual signal while using the auxiliary video frame (auxiliary video signal). The decoded signal of the filter result is stored in the reference frame memory 110 and used for video prediction.

次に、図２を参照して、図１に示す復号フレームフィルタ部１０９の詳細な構成を説明する。図２は、図１に示す復号フレームフィルタ部１０９の構成を示すブロック図である。復号フレームフィルタ部１０９は、類似度算出部１０９１、信頼度分布推定部１０９２及び画素値決定部１０９３を備える。 Next, the detailed configuration of the decoded frame filter unit 109 shown in FIG. 1 will be described with reference to FIG. FIG. 2 is a block diagram showing a configuration of the decoded frame filter unit 109 shown in FIG. The decoded frame filter unit 109 includes a similarity calculation unit 1091, a reliability distribution estimation unit 1092, and a pixel value determination unit 1093.

類似度算出部１０９１は、一時復号信号と補助映像信号を用いてフィルタ対象画素に対する類似度を参照画素毎に算出する。信頼度分布推定部１０９２は、参照画素の類似度と画素値から画素値に対する真値としての確からしさの分布を推定する。以下の説明においてはこの真値としての確からしさのことを信頼度と称する。画素値決定部１０９３は類似度分布に基づいてフィルタ対象画素に対するフィルタ結果の画素値を決定し、復号信号として出力する。 The similarity calculation unit 1091 calculates the similarity to the filter target pixel for each reference pixel using the temporary decoded signal and the auxiliary video signal. The reliability distribution estimation unit 1092 estimates a probability distribution as a true value for the pixel value from the similarity of the reference pixel and the pixel value. In the following description, the certainty as the true value is referred to as reliability. The pixel value determination unit 1093 determines the pixel value of the filter result for the filter target pixel based on the similarity distribution, and outputs it as a decoded signal.

次に、図３を参照して、図１に示すデプスマップ符号化装置１００の動作を説明する。図３は、図１に示すデプスマップ符号化装置１００の動作を示すフローチャートである。ここでは符号化対象デプスマップ中のある１フレームを符号化する処理について説明する。説明する処理をフレームごとに繰り返すことで、複数フレームで構成されるデプスマップの符号化が実現できる。 Next, the operation of the depth map encoding apparatus 100 shown in FIG. 1 will be described with reference to FIG. FIG. 3 is a flowchart showing the operation of the depth map encoding apparatus 100 shown in FIG. Here, a process of encoding one frame in the encoding target depth map will be described. By repeating the processing to be described for each frame, it is possible to realize encoding of a depth map composed of a plurality of frames.

まず、符号化対象デプスマップ入力部１０１は符号化対象フレームを入力し、デプスマップメモリ１０２に記憶しする。一方、映像入力部１０３は補助映像フレームを入力し、映像メモリ１０４に記憶する（ステップＳ１０１）。なお、ここでは入力された符号化対象フレームが順次符号化されるものとしているが、入力順と符号化順は必ずしも一致している必要はない。入力順と符号化順が異なる場合は、次に符号化するフレームが入力されるまで、入力されたデプスマップ及び補助映像のフレームはデプスマップメモリ１０２及び映像メモリ１０４に記憶する。記憶した符号化対象フレーム及び補助映像フレームは、以下において説明する符号化処理による符号化が完了したら、各メモリから削除しても構わない。 First, the encoding target depth map input unit 101 inputs an encoding target frame and stores it in the depth map memory 102. On the other hand, the video input unit 103 inputs an auxiliary video frame and stores it in the video memory 104 (step S101). Note that although the input encoding target frames are sequentially encoded here, the input order and the encoding order do not necessarily match. When the input order and the encoding order are different, the input depth map and auxiliary video frame are stored in the depth map memory 102 and the video memory 104 until the next frame to be encoded is input. The stored encoding target frame and auxiliary video frame may be deleted from each memory when encoding by the encoding process described below is completed.

また、ステップＳ１０１で入力する補助映像フレームは、既に符号化済みの補助映像を復号して得られたものなど、復号側で得ることのできる補助映像のフレームとする。これは、復号装置で得られる情報と全く同じ情報を用いることで、ドリフト等の符号化ノイズの発生を抑えるためである。ただし、そのような符号化ノイズの発生を許容する場合には、符号化前のオリジナルのものが入力されても構わない。その他の復号側で得られる補助映像として、別の視点の符号化済み補助映像を復号したものを用いて合成された補助映像を用いても構わない。 Further, the auxiliary video frame input in step S101 is a frame of auxiliary video that can be obtained on the decoding side, such as that obtained by decoding already encoded auxiliary video. This is to suppress the occurrence of coding noise such as drift by using exactly the same information as the information obtained by the decoding device. However, when the generation of such encoding noise is allowed, the original one before encoding may be input. As an auxiliary video obtained on the other decoding side, an auxiliary video synthesized by decoding an encoded auxiliary video of another viewpoint may be used.

符号化対象フレームと補助映像フレームの記憶が終了したら、デプス予測部１０５は、符号化対象フレームに対する予測信号を生成する（ステップＳ１０２）。予測信号の生成にはどのような方法を用いても構わない。一般的な符号化では、空間的な相関や時間的な相関を利用して、参照フレームメモリ１１０に記憶されている符号化対象フレームの既に符号化した部分や過去に符号化したフレームの復号信号から予測信号を生成する。代表的な方法としては、被写体の空間的な連続性を利用したイントラ予測や、被写体の動きを補償しながら被写体のデプスの時間的な連続性を利用して予測する動き補償予測がある。これら予測方法の詳細については、例えば非特許文献２及び非特許文献３に記載されており、公知であるため詳細な説明を省略する。 When the storage of the encoding target frame and the auxiliary video frame is completed, the depth prediction unit 105 generates a prediction signal for the encoding target frame (step S102). Any method may be used to generate the prediction signal. In general encoding, using a spatial correlation or a temporal correlation, a decoded signal of an already encoded portion of a frame to be encoded stored in the reference frame memory 110 or a previously encoded frame. To generate a prediction signal. As typical methods, there are intra prediction using the spatial continuity of the subject, and motion compensation prediction which uses the temporal continuity of the subject depth while compensating for the motion of the subject. Details of these prediction methods are described in, for example, Non-Patent Document 2 and Non-Patent Document 3, and are well known, and thus detailed description thereof is omitted.

予測信号が生成されたら、予測方法及び予測残差の符号化を行う（ステップＳ１０３）。予測方法とは予測信号を生成する方法を指定するための情報であり、イントラ予測と動き補償予測が利用可能な場合は、どちらの予測を用いたのか、イントラ予測を用いた場合はどのようなイントラ予測を用いたのか、動き補償予測の場合はどの参照フレームを用いてどのような動きを補償したのかなどを示す情報が予測方法である。どのような方法を用いて予測方法を符号化しても構わない。例えば、非特許文献２や非特許文献３では、予測方法を２値化して得られるバイナリ列をコンテキスト適応型エントロピー符号化によって符号化を行っている。ここでは、可変長符号化部１０７で可変長符号化するものとする。 When the prediction signal is generated, the prediction method and the prediction residual are encoded (step S103). The prediction method is information for specifying the method for generating the prediction signal. When intra prediction or motion compensation prediction is available, which prediction is used, what kind of prediction is used when intra prediction is used? Information indicating whether intra prediction has been used or what motion is compensated by using which reference frame in the case of motion compensation prediction is the prediction method. Any method may be used to encode the prediction method. For example, in Non-Patent Document 2 and Non-Patent Document 3, a binary sequence obtained by binarizing a prediction method is encoded by context adaptive entropy encoding. Here, the variable length coding unit 107 performs variable length coding.

予測残差とは、符号化対象フレームと予測信号との画素ごとの差分信号である。予測残差もどのように符号化しても構わない。ＭＰＥＧやＪＰＥＧに代表される一般的な符号化では、まず、直交変換などを用いて、予測残差信号を周波数領域での係数情報へと変換する。 The prediction residual is a difference signal for each pixel between the encoding target frame and the prediction signal. The prediction residual may be encoded in any way. In general coding represented by MPEG and JPEG, first, a prediction residual signal is converted into coefficient information in the frequency domain by using orthogonal transformation or the like.

次に、各係数情報をターゲットとするビットレートや品質に基づいて量子化する。そして、得られた量子化代表値をエントロピー符号化等の可変長符号化を用いて符号化を行う。予測残差信号は予測残差符号化部１０６により変換・量子化し、得られた量子化代表値を符号化対象予測残差値として、可変長符号化部１０７において可変長符号化を行う。 Next, each coefficient information is quantized based on the target bit rate and quality. Then, the obtained quantized representative value is encoded using variable length encoding such as entropy encoding. The prediction residual signal is transformed and quantized by the prediction residual coding unit 106, and the variable length coding unit 107 performs variable length coding using the obtained quantized representative value as a coding target prediction residual value.

ビットストリーム出力部１０８は、可変長符号化部１０７の出力であるビットストリームをデプスマップ符号化装置１００の出力結果として出力する。なお、符号化順と出力順が一致しない場合は、ビットストリームメモリを更に備え、可変長符号化部１０７の出力を一時的に記憶した後に、ビットストリーム出力部１０８はビットストリームメモリから出力を行っても構わない。 The bit stream output unit 108 outputs the bit stream that is the output of the variable length encoding unit 107 as an output result of the depth map encoding apparatus 100. If the encoding order does not match the output order, the bit stream memory is further provided and the output of the variable length encoding unit 107 is temporarily stored, and then the bit stream output unit 108 outputs from the bit stream memory. It doesn't matter.

予測方法及び予測残差の符号化が終了したら、符号化した予測残差を復号する（ステップＳ１０４）。ここで行う復号は、ビットストリームを復号することで行っても構わないし、エントロピー符号化等でロスレス符号化する前までの情報を復号することで行っても構わない。ここでは、予測残差信号符号化部１０６おいて変換・量子化して得られた量子化代表値を逆量子化・逆変換を行うことで復号予測残差信号を得る。 When encoding of the prediction method and the prediction residual is completed, the encoded prediction residual is decoded (step S104). The decoding performed here may be performed by decoding the bitstream, or may be performed by decoding information before lossless encoding such as entropy encoding. Here, a decoded prediction residual signal is obtained by performing inverse quantization / inverse transformation on the quantized representative value obtained by transform / quantization in the prediction residual signal encoding unit 106.

予測残差の復号が終了したら、得られた復号予測残差信号と予測信号との和信号を生成することで一時復号信号を生成する（ステップＳ１０５）。そして、復号フレームフィルタ部１０９は、一時復号信号にフィルタ処理を施して復号信号を生成し（ステップＳ１０６）、得られた復号信号を以降のデプスマップに対する予測信号の生成に利用するために参照フレームメモリ１１０に記憶する（ステップＳ１０７）。この処理（ステップＳ１０１〜Ｓ１０７）は全フレームについて行う（ステップＳ１０８）。 When decoding of the prediction residual is completed, a temporary decoded signal is generated by generating a sum signal of the obtained decoded prediction residual signal and the prediction signal (step S105). Then, the decoded frame filter unit 109 performs filtering on the temporary decoded signal to generate a decoded signal (Step S106), and uses the obtained decoded signal for generating a prediction signal for the subsequent depth map. Store in the memory 110 (step S107). This process (steps S101 to S107) is performed for all frames (step S108).

つぎに、図４を参照して、図１に示す復号フレームフィルタ部１０９が、図３に示す一時復号信号をフィルタリングするフィルタ処理（ステップＳ１０６）について詳細に説明する。図４は、フィルタ処理の詳細動作を示すフローチャートである。ここでは１フレーム分の一時復号信号が得られた後にフィルタ処理を行う場合について説明するが、部分ごとにフィルタ処理を行っても構わない。その場合、フィルタ済みの領域やフィルタ前の領域について、一時復号信号が得られないことになるが、その場合はそれらの領域を後述する参照画素群に選ばないとしたり、フィルタ済みの領域については参照フレームメモリ１１０に記憶されている対応する復号信号を代わりに用いたり、別途一時復号信号を記憶しておいて使用しても構わない。 Next, the filtering process (step S106) in which the decoded frame filter unit 109 shown in FIG. 1 filters the temporary decoded signal shown in FIG. 3 will be described in detail with reference to FIG. FIG. 4 is a flowchart showing the detailed operation of the filter process. Although the case where the filtering process is performed after the temporary decoded signal for one frame is obtained will be described here, the filtering process may be performed for each part. In that case, a temporary decoded signal cannot be obtained for the filtered region and the region before the filter, but in that case, these regions may not be selected as a reference pixel group to be described later, The corresponding decoded signal stored in the reference frame memory 110 may be used instead, or a temporary decoded signal may be separately stored and used.

まず、処理対象フレームの一時復号信号と、それに対する補助映像信号とを入力する（ステップＳ１６１１）。続いて、フィルタ対象とする画素を１つ選択して設定する（ステップＳ１６１２）。そして、フィルタ対象画素から一定の距離内に存在する画素群を参照画素群として設定する（ステップＳ１６１３）。なお、フィルタ対象画素自身を参照画素に含めても構わない。また、距離はどのように設定しても構わないが復号側と同じものが使用される必要がある。ただし、ドリフト等の符号化ノイズが発生しても構わない場合は、この限りではない。例えば、画像空間におけるマンハッタン距離やユークリッド距離に従って条件を設定しても構わない。 First, a temporarily decoded signal of a processing target frame and an auxiliary video signal corresponding thereto are input (step S1611). Subsequently, one pixel to be filtered is selected and set (step S1612). Then, a pixel group existing within a certain distance from the filter target pixel is set as a reference pixel group (step S1613). Note that the filter target pixel itself may be included in the reference pixel. Further, the distance may be set in any way, but it is necessary to use the same distance as that on the decoding side. However, this is not the case when coding noise such as drift may be generated. For example, the condition may be set according to the Manhattan distance or the Euclidean distance in the image space.

次に、類似度算出部１０９１は、参照画素群の設定が完了したら、処理対象画素との類似度を補助映像信号を用いて参照画素ごとに求める（ステップＳ１６１４）。類似度にはどのような定義を用いても構わないが、処理対象画素との距離が近いほど、補助映像信号が近いほど、より大きな類似度となるような定義を用いる。これらの条件に加えて、一時復号信号が近いほど大きな類似度となるように設定しても構わない。 Next, when the setting of the reference pixel group is completed, the similarity calculation unit 1091 obtains the similarity with the processing target pixel for each reference pixel using the auxiliary video signal (step S1614). Any definition may be used for the degree of similarity, but a definition is used such that the closer the distance to the pixel to be processed and the closer the auxiliary video signal, the greater the degree of similarity. In addition to these conditions, the closer the temporary decoded signal is, the higher the similarity may be set.

例えば、処理対象画素ｐに対する参照画素ｑの類似度Ｓ_ｐ（ｑ）は（１）式で与えられる。

For example, the similarity S _p (q) of the reference pixel q with respect to the processing target pixel p is given by Equation (1).

ここで、ｆ，ｇ，ｈは引数が一致するときに最大値をとるような非負の関数であり、Ｉ及びＤは、それぞれ引数で与えられる画素位置の補助映像信号の画素値と一時復号信号の画素値を表す。例えば、ｆ，ｇ，ｈとして、引数の差のノルムに対するガウス関数を用いても構わない。 Here, f, g, and h are non-negative functions that take a maximum value when the arguments match, and I and D are the pixel value of the auxiliary video signal at the pixel position given by the argument and the temporary decoded signal, respectively. Represents the pixel value. For example, a Gaussian function for the norm of the argument difference may be used as f, g, and h.

また、デプスマップはカメラから被写体までの距離を示す情報であるため、（２）〜（４）式のように、参照画素の位置と組み合わせることで三次元座標を構築し、その三次元座標を用いて類似度を定義しても構わない。

In addition, since the depth map is information indicating the distance from the camera to the subject, as shown in equations (2) to (4), a three-dimensional coordinate is constructed by combining it with the position of the reference pixel, and the three-dimensional coordinate is obtained. You may use it to define similarity.

次に、各参照画素に対する類似度の計算が終了したら、信頼度分布推定部１０９２は、参照画素に対する一時復号信号の画素値と類似度の組を用いて、画素値に対する信頼度分布を推定する（ステップＳ１６１５）。どのような信頼度分布を仮定しても構わないし、与えられた情報からどのように信頼度分布を推定しても構わない。 Next, when the calculation of the similarity for each reference pixel is completed, the reliability distribution estimation unit 1092 estimates the reliability distribution for the pixel value using a set of the pixel value and the similarity of the temporarily decoded signal for the reference pixel. (Step S1615). Any reliability distribution may be assumed, and the reliability distribution may be estimated from given information.

例えば、信頼度分布を多項式で近似できるものとし、最小二乗法を用いてその多項式を求めても構わない。また、信頼度分布をガウス分布または混合ガウス分布であると仮定して、ＥＭアルゴリズム等で最尤推定しても構わない。 For example, the reliability distribution may be approximated by a polynomial, and the polynomial may be obtained using the least square method. Further, assuming that the reliability distribution is a Gaussian distribution or a mixed Gaussian distribution, maximum likelihood estimation may be performed by an EM algorithm or the like.

信頼度分布が推定できたら、画素値決定部１０９３は、得られた信頼度分布から最大の信頼度を与える画素値を見つけ、フィルタ対象画素のフィルタ結果として出力する（ステップＳ１６１６）。そして、全ての画素について処理が終わるまで、ステップＳ１６１２〜ステップＳ１６１６の処理が画素ごとに繰り返す（ステップＳ１６１７）。 If the reliability distribution can be estimated, the pixel value determination unit 1093 finds a pixel value that gives the maximum reliability from the obtained reliability distribution, and outputs it as a filter result of the pixel to be filtered (step S1616). Until all the pixels are processed, the processing from step S1612 to step S1616 is repeated for each pixel (step S1617).

例えば、類似度をそのまま信頼度としても構わない。その場合、参照画素ごとに画素値と類似度のデータが得られるため、信頼度分布を推定する際に、１つの画素値に対して複数の異なる信頼度のサンプルが得られる場合がある。この時、どちらか一方の信頼度（類似度）を選択しても構わない。 For example, the degree of similarity may be used as the reliability as it is. In this case, since pixel value and similarity data are obtained for each reference pixel, a plurality of samples with different reliability may be obtained for one pixel value when estimating the reliability distribution. At this time, either reliability (similarity) may be selected.

また、複数のサンプルがあるということは、その値である可能性が高いとも考えられるため、和によって信頼度を再定義しても構わない。また、類似度が確率のように０〜１の値で得られている場合、（５）式を用いて得られる値Ｒを新たな信頼度としても構わない。

In addition, since there is a high possibility that there is a plurality of samples, the reliability may be redefined by the sum. Further, when the similarity is obtained with a value of 0 to 1 like a probability, the value R obtained using the equation (5) may be set as a new reliability.

また、和を用いて信頼度を再定義する際に、同じ画素値に対する類似度だけでなく、近い画素値に対する類似度も重みを付けて加算しても構わない。この場合、信頼度は（６）式で表される。

ここで、ＲＥＦ（ｐ）が画素ｐに対する参照画素群の集合であり、ｗは引数が０の時に最大値となる非負の関数である。 In addition, when redefining the reliability using the sum, not only the similarity to the same pixel value but also the similarity to a close pixel value may be added with a weight. In this case, the reliability is expressed by equation (6).

Here, REF (p) is a set of reference pixel groups for the pixel p, and w is a non-negative function having a maximum value when the argument is zero.

信頼度分布は画素値の値域全体に対して求めても構わないが、画素値空間は離散空間であるため、定義域を限定して求めることでその処理を高速化することが可能である。次に、定義域を限定した場合のフィルタ処理について説明する。図５は、定義域を限定した場合のフィルタ処理の動作示すフローチャートである。図５において、図４に示す動作と同一の部分には同一の符号を付し、その説明を省略する。図５に示す動作が図４に示す動作と異なる点は、定義域を設定するステップ（ステップＳ１６２５）が追加され、信頼度分布を求めるステップ（ステップＳ１６２６）が定義域に対してのみ分布を求めるようになっている点である。 The reliability distribution may be obtained for the entire range of pixel values. However, since the pixel value space is a discrete space, it is possible to speed up the processing by obtaining it by limiting the definition range. Next, filter processing when the domain is limited will be described. FIG. 5 is a flowchart showing the operation of the filter processing when the domain is limited. In FIG. 5, the same parts as those shown in FIG. 4 are denoted by the same reference numerals, and the description thereof is omitted. The operation shown in FIG. 5 differs from the operation shown in FIG. 4 in that a step of setting a domain (step S1625) is added, and a step of obtaining a reliability distribution (step S1626) obtains a distribution only for the domain. This is the point.

すなわち、参照画素ごとに類似度を計算した後、参照画素群における画素値の最大値と最小値を求め、それを信頼度の定義域として設定する（ステップＳ１６２５）。信頼度は画素値に対して単調増加または単調減少する可能性がある。また、全体としては単調増加、または単調減少でなくても、参照画素の画素値の最大値付近で増加関数となったり、参照画素の画素値の最小値付近で減少関数となったりすることもある。そのため、この方法のように参照画素群の画素値の最小値と最大値を用いて制限してしまう場合、正しい信頼度分布を求めることができない可能性がある。 That is, after calculating the similarity for each reference pixel, the maximum value and the minimum value of the pixel value in the reference pixel group are obtained and set as the reliability definition area (step S1625). The reliability may increase or decrease monotonously with respect to the pixel value. In addition, even if it is not monotonically increasing or decreasing as a whole, it may become an increasing function near the maximum pixel value of the reference pixel or a decreasing function near the minimum pixel value of the reference pixel. is there. For this reason, in the case of limiting using the minimum and maximum pixel values of the reference pixel group as in this method, there is a possibility that a correct reliability distribution cannot be obtained.

しかしながら、一般にデプスマップは周辺の画素との相関が高く、周辺の画素と大きく異なる値をもつことは少ない。一方、参照画素の最大値付近で信頼度分布が増加関数または最小値付近で減少関数となる場合、後のステップＳ１６１６においてフィルタ後の値として選ばれる値は、どの参照画素とも大きく異なる値を持つことになる。そのため、フィルタ処理によって、一般ではありえないデプスマップを生成し、ノイズを新たに発生してしまうことになる。すなわち、上述のような最大値と最小値を用いた制限をすることで、演算量を減らすだけでなく、ノイズの発生を抑制することが可能である。 However, in general, the depth map has a high correlation with surrounding pixels and rarely has a value greatly different from the surrounding pixels. On the other hand, when the reliability distribution is an increase function near the maximum value of the reference pixel or a decrease function near the minimum value, the value selected as the filtered value in the subsequent step S1616 has a value greatly different from any reference pixel. It will be. For this reason, a depth map that cannot be generally generated is generated by the filtering process, and noise is newly generated. That is, by using the maximum value and the minimum value as described above, it is possible not only to reduce the calculation amount but also to suppress the generation of noise.

なお、参照画素の最大値や最小値より、少しだけ増加や減少することは十分に考えられ、そのような場合を考慮するために、最大値と最小値で得られる定義域を少しだけ拡大するようにしてもよい。 Note that it is fully possible to slightly increase or decrease the maximum or minimum value of the reference pixel. In order to consider such a case, the definition area obtained with the maximum and minimum values is slightly expanded. You may do it.

モデルを仮定した信頼度分布の推定では、得られる信頼度分布の精度はその仮定の確からしさに大きく依存する。また、複雑なモデルを用いると分布を推定する際の演算量が増加してしまうという問題がある。そこで、画素値空間が離散空間であることから、各画素値に対する信頼度を求めることで、分布を推定したとみなすことが可能である。このようにすることで複雑なモデルを仮定せずに、任意な形状の信頼度分布を求めることができる。具体的な手法として前述の（６）式に従って、全ての画素値に対して信頼度を計算する方法がある。 In the estimation of the reliability distribution assuming the model, the accuracy of the obtained reliability distribution largely depends on the probability of the assumption. In addition, when a complicated model is used, there is a problem that the amount of calculation for estimating the distribution increases. Therefore, since the pixel value space is a discrete space, it is possible to consider that the distribution is estimated by obtaining the reliability for each pixel value. In this way, a reliability distribution of an arbitrary shape can be obtained without assuming a complicated model. As a specific method, there is a method of calculating the reliability with respect to all pixel values according to the above-described equation (6).

また、各画素値に対する信頼度を求める場合、参照画素ごとに求めた類似度を記憶し、画素値ごとに信頼度を再計算する処理を繰り返すのではなく、参照画素ごとに一部の画素値の信頼度を更新することを繰り返して、信頼度分布を得ても構わない。この動作を図６を参照して、説明する。図６は、参照画素ごとに一部の画素値の信頼度を更新することを繰り返して、信頼度分布を得る動作を示すフローチャートである。図６において、図４に示す動作と同一の部分には同一の符号を付し、その説明を省略する。 In addition, when obtaining the reliability for each pixel value, the similarity obtained for each reference pixel is stored, and instead of repeating the process of recalculating the reliability for each pixel value, some pixel values are obtained for each reference pixel. The reliability distribution may be obtained by repeatedly updating the reliability. This operation will be described with reference to FIG. FIG. 6 is a flowchart showing an operation of obtaining a reliability distribution by repeatedly updating the reliability of some pixel values for each reference pixel. In FIG. 6, the same parts as those shown in FIG. 4 are denoted by the same reference numerals, and the description thereof is omitted.

図６に示す動作が図４に示す動作と異なる点は、信頼度分布を初期化した後に（ステップＳ１６３４）、類似度を計算する処理（ステップＳ１６３６）とその類似度に従って参照画素の画素値周辺の信頼度分布を更新する処理（ステップＳ１６３７）とを、参照画素ごとに繰り返す（ステップＳ１６３５、ステップＳ１６３８）ことで、信頼度分布を生成している点である。なお、この処理動作では信頼度を更新する画素値を注目する参照画素の画素値±Ｌとしている。 The operation shown in FIG. 6 is different from the operation shown in FIG. 4 in that after the reliability distribution is initialized (step S1634), the processing for calculating the similarity (step S1636) and the vicinity of the pixel value of the reference pixel according to the similarity The process of updating the reliability distribution (step S1637) is repeated for each reference pixel (steps S1635 and S1638), thereby generating the reliability distribution. In this processing operation, the pixel value whose reliability is updated is set to the pixel value ± L of the reference pixel to be noted.

このようにすることで、参照画素ごとに求めた類似度を記憶するメモリを削減できるのに加え、参照画素毎に限られた数の画素値の信頼度を更新するだけで済むため、トータルの演算量を削減することが可能である。全ての画素値に対する信頼度を更新する場合と比べて、信頼度分布の推定精度が低下すると考えられるが、参照画素の画素値から離れるほど、その参照画素が信頼度に与える影響は減衰すると考えられるため、一定範囲で更新を打ち切っても最大の信頼度を与える画素値を同定する精度は低下しない。すなわち、この方法によってフィルタのノイズ削減及びエッジの復元効果は低下しない。 By doing this, in addition to reducing the memory for storing the degree of similarity obtained for each reference pixel, it is only necessary to update the reliability of a limited number of pixel values for each reference pixel. The amount of calculation can be reduced. Compared to updating the reliability for all pixel values, it is considered that the estimation accuracy of the reliability distribution is reduced. However, the further away from the pixel value of the reference pixel, the less influence the reference pixel has on the reliability. Therefore, the accuracy of identifying the pixel value that gives the maximum reliability is not lowered even if the update is terminated within a certain range. That is, this method does not reduce the noise reduction and edge restoration effects of the filter.

次に、図７を参照して、更新動作について説明する。図７は、更新動作の一例を示すフローチャートである。まず、信頼度分布Ｒ_ｐを０で初期化し（ステップＳ１６５４）、参照画素毎に類似度分布の更新処理を繰り返しても構わない。すなわち、参照画素のインデックスをｉとし、参照画素数をｎｕｍＲｅｆｓとすると、ｉを０で初期化した後（ステップＳ１６５５）、ｉに１を加算しながら（ステップＳ１６６１）、ｉがｎｕｍＲｅｆｓより小さい間（ステップＳ１６６２）、フィルタ対象画素ｐの参照画素ｑｉに対する類似度Ｓ_ｐ（ｑ_ｉ）の計算処理（ステップＳ１６５６）と、信頼度分布の更新処理（ステップＳ１６５７〜Ｓ１６６０）とを交互に繰り返す。 Next, the update operation will be described with reference to FIG. FIG. 7 is a flowchart illustrating an example of the update operation. First, the reliability distribution R _p initialized to 0 (step S1654), it may be repeated updating of similarity distribution for each reference pixel. That is, if the index of the reference pixel is i and the number of reference pixels is numRefs, after i is initialized to 0 (step S1655), 1 is added to i (step S1661), while i is smaller than numRefs (step S1661). In step S1662, the calculation process (step S1656) of the similarity S _p (q _i ) with respect to the reference pixel qi of the filter target pixel p and the reliability distribution update process (steps S1657 to S1660) are alternately repeated.

信頼度分布の更新処理は、画素値オフセットｊを更新範囲の最小値−Ｌで初期化した後（ステップＳ１６５７）、ｊに１を加算しながら（ステップＳ１６５９）、ｊが更新範囲の最大値Ｌよりも大きくなるまで（ステップＳ１６６０）、参照画素ｑｉの画素値に画素値オフセットを加えた画素値、すなわちＤ（ｑ_ｉ）＋ｊに対する信頼度に、Ｓ_ｐ（ｑ_ｉ）にオフセット値の大きさに応じた重み係数ｗ（ｊ）を乗じた値を加算する処理（ステップＳ１６５８）を繰り返すことで行う。ｗ（ｊ）はｊ＝０のときに最大となる非負の関数であり、ガウス関数などを用いることができる。 In the reliability distribution update process, after initializing the pixel value offset j with the minimum value −L of the update range (step S1657), j is incremented by 1 (step S1659), and j is the maximum value L of the update range. Until it becomes larger (step S1660), the pixel value obtained by adding the pixel value offset to the pixel value of the reference pixel qi, that is, the reliability with respect to D (q _i ) + j, and the magnitude of the offset value in S _p (q _i ) This is performed by repeating the process (step S1658) of adding the value multiplied by the weighting factor w (j) corresponding to. w (j) is a non-negative function that becomes maximum when j = 0, and a Gaussian function or the like can be used.

なお、類似度Ｓ_ｐ（ｑ_ｉ）が、参照画素ｑ_ｉから推定されるフィルタ対象画素の画素値がＤ（ｑ_ｉ）である確率を表している場合、ステップＳ１６５８における信頼度分布を（７）式で更新するようにしても構わない。

When the similarity S _p (q _i ) represents the probability that the pixel value of the filter target pixel estimated from the reference pixel q _i is D (q _i ), the reliability distribution in step S1658 is expressed as (7 You may make it update with a formula.

このとき、演算回数を削減するために、ステップＳ１６５４で相違度分布Ｒ_ｐ’を１で初期化した後、ステップＳ１６５８において相違度分布を（８）式で更新し、ステップＳ１６１６で最小の相違度となる画素値をフィルタ対象画素と同位置の復号信号として設定しても構わない。

At this time, in order to reduce the number of computations, the dissimilarity distribution R _p ′ is initialized with 1 in step S 1654, and then the dissimilarity distribution is updated with equation (8) in step S 1658, and the minimum dissimilarity is determined in step S 1616. May be set as a decoded signal at the same position as the pixel to be filtered.

ある参照画素のフィルタ対象画素に対する類似度が高い場合、フィルタ対象画素の真の画素値はその参照画素の画素値に非常に近いと考えられる。そのため、上記のように参照画素の画素値を中心とする範囲で、参照画素の画素値からの距離に応じて減衰する重み係数をつけて、信頼度を更新することで正しい信頼度分布が得られる。また、フィルタ対象画素の真の画素値は、その参照画素の画素値とは大きく違わない可能性が高い、重み係数の減衰は非常に早いものと考えられる。一方、類似度が低い場合は、逆に、その参照画素の画素値とある程度異なる可能性も高いといえるため、重み係数の減衰は比較的緩やかである。 When the similarity of a certain reference pixel to the filter target pixel is high, the true pixel value of the filter target pixel is considered to be very close to the pixel value of the reference pixel. Therefore, a correct reliability distribution is obtained by updating the reliability with a weighting factor that attenuates according to the distance from the pixel value of the reference pixel within the range centered on the pixel value of the reference pixel as described above. It is done. In addition, it is highly possible that the true pixel value of the pixel to be filtered is not significantly different from the pixel value of the reference pixel, and it is considered that the attenuation of the weight coefficient is very fast. On the other hand, when the degree of similarity is low, it can be said that there is a high possibility that it differs from the pixel value of the reference pixel to some extent, and therefore the attenuation of the weighting factor is relatively gradual.

このことを反映させるために、重み係数の関数ｗをその参照画素の類似度に応じて変化させても構わない。すなわち、類似度が大きな場合には減衰が早い重み関数を用い、類似度が小さな場合には減衰の遅い重み関数を用いる。重み関数としてガウス関数を用いる場合は、類似度が大きいほど小さな分散のガウス関数を用いることに相当する。 In order to reflect this, the function w of the weight coefficient may be changed according to the similarity of the reference pixel. That is, when the similarity is large, a weighting function with a fast decay is used, and when the similarity is small, a weighting function with a slow decay is used. When a Gaussian function is used as the weighting function, it corresponds to using a Gaussian function having a smaller variance as the degree of similarity increases.

また、共通の重み関数を用いる場合においても、更新範囲Ｌを参照画素の類似度に従って変化させることで、影響範囲を制御しても構わない。すなわち、参照画素の類似度が大きいほど更新範囲Ｌに小さな値を設定する。参照画素の類似度に従って重み関数を可変にする方法と比べて、精度は低下するが、少ないメモリ量で重み係数のテーブルを持つことが可能となる。 Even when a common weight function is used, the influence range may be controlled by changing the update range L according to the similarity of the reference pixels. That is, a smaller value is set in the update range L as the similarity of the reference pixels is larger. Compared with the method of making the weighting function variable according to the similarity of the reference pixels, the accuracy is lowered, but it is possible to have a table of weighting coefficients with a small amount of memory.

次に、本発明の一実施形態によるデプスマップ復号装置を説明する。図８は、同実施形態によるデプスマップ復号装置の構成を示すブロック図である。デプスマップ復号装置２００はビットストリーム入力部２０１、ビットストリームモリ２０２、映像入力部２０３、映像メモリ２０４、可変長復号部２０５、デプスマップ予測部２０６、予測残差復号部２０７、復号フレームフィルタ部２０８、参照フレームメモリ２０９及びデプスマップ出力部２１０を備える。 Next, a depth map decoding apparatus according to an embodiment of the present invention will be described. FIG. 8 is a block diagram showing the configuration of the depth map decoding apparatus according to the embodiment. The depth map decoding apparatus 200 includes a bit stream input unit 201, a bit stream memory 202, a video input unit 203, a video memory 204, a variable length decoding unit 205, a depth map prediction unit 206, a prediction residual decoding unit 207, and a decoded frame filter unit 208. A reference frame memory 209 and a depth map output unit 210.

ビットストリーム入力部２０１は、復号対象となるデプスマップのビットストリームを入力する。以下の説明においては、この復号対象となるデプスマップのことを復号対象デプスマップと称し、特に処理を行うフレームを復号対象フレームと称する。ビットストリームメモリ２０２は入力されたビットストリームを記憶する。 The bit stream input unit 201 inputs a bit map of a depth map to be decoded. In the following description, the depth map to be decoded is referred to as a decoding target depth map, and a frame to be processed in particular is referred to as a decoding target frame. The bit stream memory 202 stores the input bit stream.

映像入力部２０３は、復号対象デプスマップに対応する映像を入力する。この映像の各フレームの各画素に写っている被写体のデプスが復号対象デプスマップで表される。以下の説明においては、この映像を補助映像と称し、復号対象フレームに対する補助映像のフレームを補助映像フレームと称する。映像メモリ２０４は、入力された補助映像を記憶する。 The video input unit 203 inputs video corresponding to the decoding target depth map. The depth of the subject shown in each pixel of each frame of this video is represented by a decoding target depth map. In the following description, this video is referred to as an auxiliary video, and the auxiliary video frame for the decoding target frame is referred to as an auxiliary video frame. The video memory 204 stores the input auxiliary video.

可変長復号部２０５は、入力されたビットストリームにおいて可変長符号化されている予測方法を示す情報と復号対象予測残差値を復号する。デプスマップ予測部２０６は、復号された予測方法に従って復号対象フレームに対する予測信号を生成する。予測残差復号部２０７は、復号対象予測残差値から復号対象フレームと予測信号との差分の予測残差信号を復号して出力する。 The variable length decoding unit 205 decodes information indicating a prediction method that is variable length encoded in the input bitstream and a decoding target prediction residual value. The depth map prediction unit 206 generates a prediction signal for the decoding target frame according to the decoded prediction method. The prediction residual decoding unit 207 decodes and outputs a prediction residual signal of a difference between the decoding target frame and the prediction signal from the decoding target prediction residual value.

復号フレームフィルタ部２０８は、補助映像フレーム（補助映像信号）を用いながら、予測信号と復号した予測残差信号の和で得られる一時復号信号にフィルタをかける。フィルタ結果の復号信号を参照フレームメモリ２０９に記憶し、映像予測に使用されるとともに、デプスマップ出力部２１０から出力される。なお、復号フレームフィルタ部２０８は符号化装置と同様に図２で示す構成と同様であるため、ここでは詳細な説明を省略する。 The decoded frame filter unit 208 filters the temporary decoded signal obtained by the sum of the prediction signal and the decoded prediction residual signal while using the auxiliary video frame (auxiliary video signal). The decoded signal of the filter result is stored in the reference frame memory 209, used for video prediction, and output from the depth map output unit 210. Note that the decoded frame filter unit 208 has the same configuration as that shown in FIG.

次に、図９を参照して、図８に示すデプスマップ復号装置２００の動作を説明する。図９は、図８に示すデプスマップ復号装置２００の動作を示すフローチャートである。ここでは復号対象デプスマップ中のある１フレームを復号する処理について説明する。説明する処理をフレームごとに繰り返すことで、複数フレームで構成されるデプスマップの復号が実現できる。 Next, the operation of the depth map decoding apparatus 200 shown in FIG. 8 will be described with reference to FIG. FIG. 9 is a flowchart showing the operation of the depth map decoding apparatus 200 shown in FIG. Here, a process of decoding one frame in the decoding target depth map will be described. Depth map decoding composed of a plurality of frames can be realized by repeating the processing described for each frame.

まず、ビットストリーム入力部２０１は、復号対象デプスマップのビットストリームを入力し、ビットストリームメモリ２０２に記憶する。一方、映像入力部２０３は補助映像フレームを入力し、映像メモリ２０４に記憶する（ステップＳ２０１）。ここでは入力されたビットストリームから復号対象フレームを順次復号して出力するとしているが、入力順と出力順は必ずしも一致している必要はない。入力順と出力順が異なる場合は、次に出力するフレームが復号されるまで、復号されたフレームは参照フレームメモリ２０９に記憶され、デプスマップ出力部２１０は出力順に従って参照フレームメモリ２０９からデプスマップを読み出して出力する。記憶された補助映像フレームは、以下で説明する復号処理によって対応するフレームが復号されたら、各メモリから削除しても構わない。 First, the bitstream input unit 201 inputs a bitstream of a decoding target depth map and stores the bitstream in the bitstream memory 202. On the other hand, the video input unit 203 inputs an auxiliary video frame and stores it in the video memory 204 (step S201). Here, it is assumed that the decoding target frames are sequentially decoded from the input bit stream and output, but the input order and the output order are not necessarily the same. When the input order and the output order are different, the decoded frame is stored in the reference frame memory 209 until the next output frame is decoded, and the depth map output unit 210 performs the depth map from the reference frame memory 209 according to the output order. Is read and output. The stored auxiliary video frame may be deleted from each memory when the corresponding frame is decoded by the decoding process described below.

また、ステップＳ２０１において入力する補助映像フレームは、符号化された補助映像を復号して得られたものなど、符号化時に使用した補助映像のフレームとする。これは、符号化装置で用いた情報と全く同じ情報を用いることで、ドリフト等の符号化ノイズの発生を抑えるためである。ただし、そのような符号化ノイズの発生を許容する場合には、符号化時に使用したものと異なるものが入力されても構わない。直接復号したもの以外としては、別の視点に対して復号した補助映像を用いて合成したものを用いても構わない。 In addition, the auxiliary video frame input in step S201 is a frame of the auxiliary video used at the time of encoding, such as that obtained by decoding the encoded auxiliary video. This is to suppress the occurrence of encoding noise such as drift by using exactly the same information as that used in the encoding apparatus. However, when allowing the generation of such encoding noise, a different one from that used at the time of encoding may be input. In addition to those decoded directly, those synthesized using an auxiliary video decoded for another viewpoint may be used.

符号データとデプスマップの格納が終了したら、可変長復号部２０５は、ビットストリームから復号対象フレームに対する予測信号を生成するための方法を示した情報を復号し、その情報を元に復号対象フレームに対する予測信号を生成する（ステップＳ２０２）。なお、予測信号の生成にはどのような方法を用いても構わない。一般的な符号化では、空間的な相関や時間的な相関を利用して、参照フレームメモリ２０９に記憶されている復号対象フレームの既に復号した部分や過去に復号したフレームの復号信号から予測信号を生成する。ただし、復号した情報に基づいて符号化時に使用された予測信号の生成方法と同じ方法を用いて予測信号を生成する。 When the storage of the code data and the depth map is completed, the variable length decoding unit 205 decodes information indicating a method for generating a prediction signal for the decoding target frame from the bitstream, and based on the information, the variable length decoding unit 205 decodes the decoding target frame. A prediction signal is generated (step S202). Note that any method may be used to generate the prediction signal. In general encoding, a prediction signal is obtained from a part of an already decoded frame of a decoding target frame stored in the reference frame memory 209 or a decoded signal of a previously decoded frame using a spatial correlation or a temporal correlation. Is generated. However, the prediction signal is generated using the same method as the generation method of the prediction signal used at the time of encoding based on the decoded information.

次に、ビットストリームから予測残差信号を復号する（ステップＳ２０３）。復号処理は符号化処理に対応した方法を用いる。ここでは、可変長符号化部２０５でビットストリームから予測残差信号の周波数領域での量子化代表値であるところの復号対象予測残差値を復号し、復号した復号対象予測残差値を予測残差復号部２０７で逆量子化・逆変換することで予測残差信号を復号するものとする。 Next, the prediction residual signal is decoded from the bit stream (step S203). The decoding process uses a method corresponding to the encoding process. Here, the variable-length encoding unit 205 decodes the decoding target prediction residual value that is the quantized representative value in the frequency domain of the prediction residual signal from the bitstream, and predicts the decoded decoding target prediction residual value. Assume that the residual decoding unit 207 decodes the prediction residual signal by inverse quantization and inverse transformation.

予測信号と予測残差信号が得られたら、その和信号で表される一時復号信号を生成する（ステップＳ２０４）。そして、復号フレームフィルタ部２０８は、一時復号信号にフィルタ処理を施すことで復号信号を生成する（ステップＳ２０５）。ここで行われる処理は、図３に示すステップＳ１０６と同じであり、図２〜図７を用いて上記説明したものと同じである。 When the prediction signal and the prediction residual signal are obtained, a temporary decoded signal represented by the sum signal is generated (step S204). Then, the decoded frame filter unit 208 generates a decoded signal by performing filter processing on the temporary decoded signal (step S205). The processing performed here is the same as step S106 shown in FIG. 3, and is the same as that described above with reference to FIGS.

次に、デプスマップ出力部２１０は、得られた復号信号をデプスマップ復号装置２００の出力結果として出力すると共に、以降のデプスマップに対する予測信号の生成に利用するために参照フレームメモリ２０９に記憶する（ステップＳ２０６）。この処理（ステップＳ２０１〜Ｓ２０６）は全フレームについて行う（ステップＳ２０７）。 Next, the depth map output unit 210 outputs the obtained decoded signal as an output result of the depth map decoding apparatus 200, and stores it in the reference frame memory 209 for use in generating a prediction signal for the subsequent depth map. (Step S206). This process (steps S201 to S206) is performed for all frames (step S207).

前述した説明においては、符号化対象フレーム及び復号対象フレーム全体に対して、一度に予測信号の生成やフィルタ処理などを行っているが、処理対象のフレームを分割して、部分毎に処理を行っても構わない。また、予測信号の生成や予測方法を示す情報の符号化・復号、予測残差の符号化・復号などは部分毎に行って、フィルタ処理はフレーム全体に対する一時復号信号ができてから行っても構わない。その場合、一時復号信号を一時的に記憶する必要がある。また、予測信号の作成に一時復号信号を参照可能にしても構わない。 In the above description, the prediction signal is generated and the filtering process is performed on the entire encoding target frame and the decoding target frame at once. However, the processing target frame is divided and processed for each part. It doesn't matter. Also, encoding / decoding of information indicating prediction signal generation and prediction method, encoding / decoding of prediction residual, etc. are performed for each part, and filtering processing may be performed after a temporary decoded signal is generated for the entire frame. I do not care. In that case, it is necessary to temporarily store the temporarily decoded signal. In addition, the temporary decoded signal may be referred to when generating the prediction signal.

また、前述した説明においては、空間的に近接する画素のみを参照画素として設定しているが、時間方向の距離も考慮して処理済みのフレーム内の画素を参照画素として設定しても構わない。その場合、処理対象の画素に対して動きベクトルが得られる場合は、その動きベクトルを考慮して時間的な距離を定義しても構わない。なお、処理済みのフレーム内の画素を参照画素とする場合は、そのフレームに対する補助映像フレームを記憶しておく必要がある他、参照画素に対する類似度を設定するために一時復号信号を記憶しておくか、一時復号信号の代わりに復号信号を用いる必要がある。 In the above description, only pixels that are spatially close to each other are set as reference pixels. However, a pixel in a processed frame may be set as a reference pixel in consideration of a distance in the time direction. . In that case, when a motion vector is obtained for the pixel to be processed, the temporal distance may be defined in consideration of the motion vector. When a pixel in a processed frame is used as a reference pixel, it is necessary to store an auxiliary video frame for that frame, and a temporary decoded signal is stored to set a similarity to the reference pixel. Alternatively, it is necessary to use a decoded signal instead of the temporary decoded signal.

以上説明した映像符号化及び映像復号の処理は、コンピュータとソフトウェアプログラムとによっても実現することができ、そのプログラムをコンピュータ読み取り可能な記録媒体に記録して提供することも、ネットワークを通して提供することも可能である。 The video encoding and video decoding processes described above can also be realized by a computer and a software program. The program can be provided by being recorded on a computer-readable recording medium or provided via a network. Is possible.

次に、図１０を参照して、デプスマップ符号化装置をコンピュータとソフトウェアプログラムとによって構成する場合のハードウェア構成を説明する。図１０は、デプスマップ符号化装置をコンピュータとソフトウェアプログラムとによって構成する場合のハードウェア構成を示すブロック図である。図１０に示すシステムは、プログラムを実行するＣＰＵ５０と、ＣＰＵ５０がアクセスするプログラムやデータが格納されるＲＡＭ等のメモリ５１と、カメラ等からの符号化対象のデプスマップの信号を入力する符号化対象デプスマップ入力部５２（ディスク装置等によるデプスマップの信号を記憶する記憶部でもよい）と、例えばネットワークを介して符号化対象デプスマップに対する映像を入力する映像入力部５３（ディスク装置等による映像信号を記憶する記憶部でもよい）を備える。また、図３に示す処理動作をＣＰＵ５０に実行させるソフトウェアプログラムであるデプスマップ符号化プログラム５４１が格納されたプログラム記憶装置５４と、ＣＰＵ５０がメモリ５１にロードされたデプスマップ符号化プログラム５４１を実行することにより生成されたビットストリームを、例えばネットワークを介して出力するビットストリーム出力部５５（ディスク装置等によるビットストリームを記憶する記憶部でもよい）とが、バスで接続された構成になっている。 Next, a hardware configuration when the depth map encoding apparatus is configured by a computer and a software program will be described with reference to FIG. FIG. 10 is a block diagram showing a hardware configuration when the depth map encoding apparatus is configured by a computer and a software program. The system shown in FIG. 10 includes a CPU 50 that executes a program, a memory 51 such as a RAM that stores programs and data accessed by the CPU 50, and an encoding target that receives a depth map signal to be encoded from a camera or the like. Depth map input unit 52 (may be a storage unit that stores a signal of a depth map by a disk device or the like), and a video input unit 53 (video signal by the disk device or the like) that inputs a video for a depth map to be encoded via, for example, a network It may also be a storage unit that stores. Also, the program storage device 54 in which the depth map encoding program 541 which is a software program for causing the CPU 50 to execute the processing operation shown in FIG. 3 is executed, and the CPU 50 executes the depth map encoding program 541 loaded in the memory 51. For example, a bit stream output unit 55 (which may be a storage unit that stores a bit stream by a disk device or the like) that outputs the bit stream generated through the network is connected via a bus.

なお、図示は省略するが、他に、参照フレーム記憶部などのハードウェアが設けられ、本手法の実施に利用される。また、一時復号信号記憶部、ビットストリーム記憶部などが用いられることもある。 In addition, although illustration is omitted, other hardware such as a reference frame storage unit is provided and used to implement this method. In addition, a temporary decoded signal storage unit, a bit stream storage unit, or the like may be used.

次に、図１１を参照して、デプスマップ復号装置をコンピュータとソフトウェアプログラムとによって構成する場合のハードウェア構成を説明する。図１１は、デプスマップ復号装置をコンピュータとソフトウェアプログラムとによって構成する場合のハードウェア構成を示すブロック図である。図１１に示すシステムは、プログラムを実行するＣＰＵ６０と、ＣＰＵ６０がアクセスするプログラムやデータが格納されるＲＡＭ等のメモリ６１と、デプスマップ符号化装置が本手法により符号化したビットストリームを入力するビットストリーム入力部６２（ディスク装置等によるビットストリームを記憶する記憶部でもよい）と、例えばネットワークを介して復号対象のデプスマップに対する映像を入力する映像入力部６３（ディスク装置等による映像信号を記憶する記憶部でもよい）を備える。また、図９に処理動作をＣＰＵ６０に実行させるソフトウェアプログラムであるデプスマップ復号プログラム６４１が格納されたプログラム記憶装置６４と、ＣＰＵ６０がメモリ６１にロードされたデプスマップ復号プログラム６４１を実行することにより、ビットストリームを復号して得られた復号デプスマップを、再生装置などに出力する復号デプスマップ出力部６５とが、バスで接続された構成になっている。 Next, a hardware configuration when the depth map decoding apparatus is configured by a computer and a software program will be described with reference to FIG. FIG. 11 is a block diagram showing a hardware configuration when the depth map decoding apparatus is configured by a computer and a software program. The system shown in FIG. 11 includes a CPU 60 that executes a program, a memory 61 such as a RAM that stores programs and data accessed by the CPU 60, and a bit stream that is input by the depth map encoding apparatus according to the present technique. A stream input unit 62 (may be a storage unit that stores a bit stream by a disk device or the like), and a video input unit 63 (stores a video signal by the disk device or the like) that inputs video for a depth map to be decoded, for example, via a network. It may be a storage unit). Further, FIG. 9 shows a program storage device 64 that stores a depth map decoding program 641 that is a software program that causes the CPU 60 to execute processing operations, and the CPU 60 executes a depth map decoding program 641 that is loaded into the memory 61. A decoding depth map output unit 65 that outputs a decoding depth map obtained by decoding the bitstream to a playback device or the like is connected by a bus.

なお、図示は省略するが、他に、参照フレーム記憶部などのハードウェアが設けられ、本手法の実施に利用される。また、一時復号信号記憶部、ビットストリーム記憶部などが用いられることもある。 In addition, although illustration is omitted, other hardware such as a reference frame storage unit is provided and used to implement this method. Further, a temporary decoded signal storage unit, a bit stream storage unit, or the like may be used.

また、本実施形態はデプスマップの符号化及び復号のみに限定されるものではないことは明らかである。すなわち、対応する映像情報と一緒に符号化及び復号する場合の被写体に大きく依存するようなデータ（例えば温度情報など）の符号化及び復号に適用できることは明らかである。 In addition, it is obvious that the present embodiment is not limited to only encoding and decoding of a depth map. That is, it is obvious that the present invention can be applied to encoding and decoding of data (for example, temperature information) that greatly depends on a subject when encoding and decoding together with corresponding video information.

以上説明したように、映像とデプスマップとを構成要素に持つ自由視点映像データの符号化において、参照画素ごとに計算した類似度から、画素値に対する真値としての確からしさの分布を推定し、推定した分布に基づいてフィルタ処理を行うことで、デプスマップに生じる符号化ノイズの抑制と被写体に依存したエッジの復元とを両立し、フレーム間予測の予測効率を向上する新たな映像符号化技術の提供することができる。 As described above, in the encoding of free viewpoint video data having video and depth maps as components, the distribution of the probability as the true value for the pixel value is estimated from the similarity calculated for each reference pixel, A new video coding technology that improves the prediction efficiency of inter-frame prediction by performing filtering based on the estimated distribution to achieve both suppression of coding noise generated in the depth map and restoration of the edge depending on the subject. Can be provided.

なお、図１、８における処理部の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによりデプスマップ符号化処理及びデプスマップ復号処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータシステム」は、ホームページ提供環境（あるいは表示環境）を備えたＷＷＷシステムも含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。 1 and 8 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into the computer system and executed to execute the depth map. An encoding process and a depth map decoding process may be performed. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer system” includes a WWW system having a homepage providing environment (or display environment). The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Further, the “computer-readable recording medium” refers to a volatile memory (RAM) in a computer system that becomes a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In addition, those holding programs for a certain period of time are also included.

また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであってもよい。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよい。 The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, what is called a difference file (difference program) may be sufficient.

以上、図面を参照して本発明の実施の形態を説明してきたが、上記実施の形態は本発明の例示に過ぎず、本発明が上記実施の形態に限定されるものではないことは明らかである。したがって、本発明の技術思想及び範囲を逸脱しない範囲で構成要素の追加、省略、置換、その他の変更を行っても良い。 As mentioned above, although embodiment of this invention has been described with reference to drawings, the said embodiment is only the illustration of this invention, and it is clear that this invention is not limited to the said embodiment. is there. Accordingly, additions, omissions, substitutions, and other changes of the components may be made without departing from the technical idea and scope of the present invention.

対応する映像情報と一緒に符号化及び復号する場合の被写体に大きく依存するようなデータ（例えば温度情報など）の符号化及び復号することが不可欠な用途に適用できる。 The present invention can be applied to applications where it is essential to encode and decode data (for example, temperature information) that greatly depends on a subject when encoding and decoding together with corresponding video information.

１００・・・デプスマップ符号化装置、１０１・・・符号化対象デプスマップ入力部、１０２・・・デプスマップメモリ、１０３・・・映像入力部、１０４・・・映像メモリ、１０５・・・デプス予測部、１０６・・・予測残差符号化部、１０７・・・可変長符号化部、１０８・・・ビットストリーム出力部、１０９・・・復号フレームフィルタ部、１１０・・・参照フレームメモリ、２００・・・デプスマップ復号装置、２０１・・・ビットストリーム入力部、２０２・・・ビットストリームモリ、２０３・・・映像入力部、２０４・・・映像メモリ、２０５・・・可変長復号部、２０６・・・デプスマップ予測部、２０７・・・予測残差復号部、２０８・・・復号フレームフィルタ部、２０９・・・参照フレームメモリ、２１０・・・デプスマップ出力部、１０９１・・・類似度算出部、１０９２・・・信頼度分布推定部、１０９３・・・画素値決定部 DESCRIPTION OF SYMBOLS 100 ... Depth map encoding apparatus, 101 ... Decoding target depth map input part, 102 ... Depth map memory, 103 ... Video input part, 104 ... Video memory, 105 ... Depth Prediction unit 106 ... Prediction residual encoding unit 107 ... Variable length encoding unit 108 ... Bit stream output unit 109 ... Decoded frame filter unit 110 ... Reference frame memory DESCRIPTION OF SYMBOLS 200 ... Depth map decoding apparatus, 201 ... Bit stream input part, 202 ... Bit stream memory, 203 ... Video input part, 204 ... Video memory, 205 ... Variable length decoding part, 206 ... Depth map prediction unit, 207 ... Prediction residual decoding unit, 208 ... Decoded frame filter unit, 209 ... Reference frame memory, 210 ... Depth SMAP output unit, 1091 ... similarity calculation unit, 1092 ... reliability distribution estimating unit, 1093 ... pixel value determining unit

Claims

An encoding method for encoding a depth map corresponding to video information,
A prediction step of generating prediction information of encoding target frame information of the depth map;
A prediction that encodes prediction residual information generated from the encoding target frame information and the prediction information, and outputs the encoded prediction residual information and decoded prediction residual information obtained by decoding the encoded prediction residual information A residual encoding step;
An encoding step for encoding and outputting information indicating a prediction method used in the prediction step and the encoded prediction residual information;
A temporary decoding step of outputting temporary decoding information generated from the prediction information and the decoded prediction residual information;
A reference pixel setting step of setting a reference pixel for each pixel of the encoding target frame information;
A similarity calculation step of calculating a similarity between the reference pixel and a pixel in which the reference pixel is set, using the video information corresponding to the encoding target frame information;
A reliability distribution estimation step of estimating reliability distribution information for the decoding information of the pixel in which the reference pixel is set, from the similarity and the temporary decoding information;
A decoding step of outputting, as the decoding information, a pixel value that gives the maximum reliability in the reliability distribution information for each pixel of the encoding target frame information, and
In the prediction step, the prediction information is output based on the decoding information output in the decoding step.

The reliability distribution estimation step includes:
A reliability distribution initialization step of initializing the reliability distribution information for each pixel of the encoding target frame information;
An update range setting step for setting a range for updating the reliability distribution information;
A weighting factor setting step for setting a weighting factor of the reference pixel for each pixel value included in the range to be updated;
An update reliability calculation step of calculating an update reliability of a pixel value included in the range to be updated using the similarity and the weighting factor;
The encoding method according to claim 1, further comprising: a reliability information update step of updating the reliability information based on the update reliability.

A definition area setting step of setting a definition area of the reliability from the maximum value and the minimum value of the pixel value of the reference pixel for each pixel of the encoding target frame information;
3. The reliability distribution estimation step estimates the reliability distribution information only for the domain defined for each pixel of the encoding target frame information. 4. The encoding method described.

A decoding method for decoding code data in which depth map information corresponding to video information is encoded,
A decoding step of decoding information indicating a prediction method and decoded prediction residual information from the code data;
A prediction step of outputting prediction information of decoding target frame information according to the decoded information indicating the prediction method;
A temporary decoding step of generating temporary decoding information from the prediction information and the decoded prediction residual information;
A reference pixel setting step of setting a reference pixel for each pixel of the decoding target frame information;
A similarity calculation step of calculating a similarity between the reference pixel and a pixel in which the reference pixel is set, using the video information corresponding to the decoding target frame information;
A reliability distribution estimation step of estimating reliability distribution information for the decoding information of the pixel in which the reference pixel is set, from the similarity and the temporary decoding information;
A decoding step of outputting, as the decoding information, a pixel value giving the maximum reliability in the reliability distribution information for each pixel of the decoding target frame information;
In the prediction step, the prediction information is output based on the decoding information output in the decoding step.

The reliability distribution estimation step includes:
A reliability distribution initialization step of initializing the reliability distribution information for each pixel of the decoding target frame information;
An update range setting step for setting a range for updating the reliability distribution information;
A weighting factor setting step for setting a weighting factor of the reference pixel for each pixel value included in the range to be updated;
An update reliability calculation step for calculating an update reliability of a pixel value included in the update range using the similarity and the weighting factor, and a reliability for updating the reliability information based on the update reliability The decoding method according to claim 4, further comprising: an information updating step.

A domain setting step for setting a domain of the reliability from the maximum value and the minimum value of the pixel value of the reference pixel for each pixel of the decoding target frame information;
The reliability distribution estimation step estimates the reliability distribution information only for the domain defined for each pixel of the decoding target frame information. Decryption method.

An encoding device that encodes a depth map corresponding to video information,
Prediction means for generating prediction information of encoding target frame information of the depth map;
A prediction that encodes prediction residual information generated from the encoding target frame information and the prediction information, and outputs the encoded prediction residual information and decoded prediction residual information obtained by decoding the encoded prediction residual information Residual encoding means;
Encoding means for encoding and outputting information indicating a prediction method used in the prediction means and the encoded prediction residual information;
Temporary decoding means for outputting temporary decoding information generated from the prediction information and the decoded prediction residual information;
Reference pixel setting means for setting a reference pixel for each pixel of the encoding target frame information;
Similarity calculation means for calculating a similarity between the reference pixel and a pixel in which the reference pixel is set, using the video information corresponding to the encoding target frame information;
Reliability distribution estimation means for estimating reliability distribution information for the decoding information of the pixel in which the reference pixel is set from the similarity and the temporary decoding information;
Decoding means for outputting, as the decoding information, a pixel value giving the maximum reliability in the reliability distribution information for each pixel of the encoding target frame information,
The encoding device, wherein the prediction means outputs the prediction information based on the decoded information output by the decoding means.

The reliability distribution estimation means includes
Reliability distribution initialization means for initializing the reliability distribution information for each pixel of the encoding target frame information;
Update range setting means for setting a range for updating the reliability distribution information;
Weight coefficient setting means for setting a weight coefficient of the reference pixel for each pixel value included in the range to be updated;
Update reliability calculation means for calculating update reliability of pixel values included in the range to be updated using the similarity and the weighting factor;
The encoding apparatus according to claim 7, further comprising: reliability information updating means for updating the reliability information based on the update reliability.

A definition area setting unit that sets a definition area of the reliability from the maximum value and the minimum value of the pixel value of the reference pixel for each pixel of the encoding target frame information,
9. The reliability distribution estimation unit estimates the reliability distribution information only for the domain defined for each pixel of the encoding target frame information. The encoding device described.

A decoding device that decodes code data in which depth map information corresponding to video information is encoded,
Decoding means for decoding information indicating a prediction method and decoded prediction residual information from the code data;
Prediction means for outputting prediction information of decoding target frame information according to the decoded information indicating the prediction method;
Temporary decoding means for generating temporary decoding information from the prediction information and the decoded prediction residual information;
Reference pixel setting means for setting a reference pixel for each pixel of the decoding target frame information;
Similarity calculation means for calculating a similarity between the reference pixel and a pixel in which the reference pixel is set, using the video information corresponding to the decoding target frame information;
Reliability distribution estimation means for estimating reliability distribution information for the decoding information of the pixel in which the reference pixel is set from the similarity and the temporary decoding information;
Decoding means for outputting, as the decoding information, a pixel value giving the maximum reliability in the reliability distribution information for each pixel of the decoding target frame information;
The decoding device, wherein the prediction means outputs the prediction information based on the decoding information output by the decoding means.

The reliability distribution estimation means includes
Reliability distribution initialization means for initializing the reliability distribution information for each pixel of the decoding target frame information;
Update range setting means for setting a range for updating the reliability distribution information;
Weight coefficient setting means for setting a weight coefficient of the reference pixel for each pixel value included in the range to be updated;
Update reliability calculation means for calculating update reliability of pixel values included in the range to be updated using the similarity and the weighting factor, and reliability for updating the reliability information based on the update reliability The decoding apparatus according to claim 10, further comprising: information updating means.

A definition area setting unit that sets a definition area of the reliability from the maximum value and the minimum value of the pixel value of the reference pixel for each pixel of the decoding target frame information,
12. The reliability distribution estimation means estimates the reliability distribution information only for the domain set for each pixel of the decoding target frame information. Decoding device.

An encoding program for causing a computer to function as the encoding device according to any one of claims 7 to 9.

A decoding program for causing a computer to function as the decoding device according to any one of claims 10 to 12.