JP4815004B2

JP4815004B2 - Multi-view image encoding device

Info

Publication number: JP4815004B2
Application number: JP2010094574A
Authority: JP
Inventors: 敦稔〆野; 端内海
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2010-04-16
Filing date: 2010-04-16
Publication date: 2011-11-16
Anticipated expiration: 2030-04-16
Also published as: JP2011228821A; WO2011129164A1

Description

本発明は、多視点画像の処理技術に関するもので、特に視点画像の符号化処理における処理負荷の低減に関するものである。 The present invention relates to multi-viewpoint image processing technology, and more particularly to reduction of processing load in viewpoint image encoding processing.

複数の視点に対応する画像を利用することにより、従来の単一視点画像だけでは得られない、より臨場感の高い映像表現が可能になる。複数視点画像の代表的な利用例として、立体画像表示技術と任意視点画像表示技術がある。立体画像表示技術は、表示する画像自体は平面画像すなわち二次元情報でありながら、図５のように、観察者の左眼と右眼に対して視差のある画像５０１，５０２を観察させるように与えることにより、脳内で知覚される映像５０３が擬似的に実際の三次元物体・三次元空間の観察時と同様の立体感を与えるものである。 By using images corresponding to a plurality of viewpoints, it is possible to express a video with a higher sense of presence that cannot be obtained by a conventional single viewpoint image alone. As a typical use example of a multi-viewpoint image, there are a stereoscopic image display technique and an arbitrary viewpoint image display technique. In the stereoscopic image display technique, the displayed image itself is a planar image, that is, two-dimensional information, and the images 501 and 502 having parallax with respect to the left and right eyes of the observer are observed as shown in FIG. By giving the image, the image 503 perceived in the brain gives a pseudo three-dimensional feeling similar to that when observing an actual three-dimensional object / three-dimensional space.

また、任意視点画像表示技術は、図６のように、複数の視点から撮影された画像データ６０１ｖ〜６０３ｖと、カメラと被写体間の距離情報６０１ｄ〜６０３ｄなどを利用し、撮影していない視点からの画像６０４ｖ，６０５ｖ等を生成することによって、観察者の好みの位置すなわち任意の視点からの画像を観察可能にするものである。 In addition, the arbitrary viewpoint image display technique uses image data 601v to 603v photographed from a plurality of viewpoints and distance information 601d to 603d between the camera and the subject, as shown in FIG. By generating the images 604v, 605v, etc., it is possible to observe an image from an observer's favorite position, that is, an arbitrary viewpoint.

このように、任意視点画像の生成・表示技術は、撮影していない視点を含め、観察者の好みの位置からの画像を観察可能にするものである。任意視点画像生成のためには、映像内の各被写体の位置情報（奥行き情報）が必要となる。例えば、奥行き情報を用いない場合、ある視点からは見える領域Ａが、別の視点から見ると物体Ｂの後ろに隠れて見えなくなるが、そのような関係を任意の視点すべてにおいて再現するためには、無限の数の視点画像を用意する必要がある。しかし、奥行き情報があれば、少ない視点画像で、任意視点の画像を再現することが可能となる。 As described above, the arbitrary viewpoint image generation / display technique enables observation of an image from an observer's favorite position including a viewpoint that has not been shot. In order to generate an arbitrary viewpoint image, position information (depth information) of each subject in the video is required. For example, when the depth information is not used, the area A that can be seen from one viewpoint is hidden behind the object B when viewed from another viewpoint, but in order to reproduce such a relationship in all arbitrary viewpoints, It is necessary to prepare an infinite number of viewpoint images. However, if there is depth information, an image of an arbitrary viewpoint can be reproduced with a small number of viewpoint images.

例えば、非特許文献記載には、任意視点画像の生成技術の基本となる３ＤＷａｒｐｉｎｇに関し、その基本手法をベースにしつつ、さらに高品質な任意視点画像の生成を行うための方法が開示されている。この方法は、視点画像に関する奥行き画像を利用するもので、概略次のような手順で、入力される２視点の画像および対応する各奥行き画像を利用して、観察したい視点である仮想視点からの画像を生成する。すなわち、
（１）仮想カメラを設置し、仮想視点に対して奥行き画像を投影する。
（２）投影した奥行き画像を平滑化する。
（３）平滑化された奥行き画像に対して、実画像の画素値をマッピングする。
（４）残った位置の画素を周囲の画素を利用して修復する。
このように、２視点の画像とその奥行き画像を利用することで、それら視点の近傍にある任意視点からの画像を生成することが可能である。 For example, the non-patent literature description discloses a method for generating a higher-quality arbitrary viewpoint image based on the basic method of 3D Warping, which is the basis of the arbitrary viewpoint image generation technique. This method uses a depth image related to a viewpoint image. From the virtual viewpoint that is the viewpoint to be observed by using the input two viewpoint images and the corresponding depth images in the following procedure. Generate an image. That is,
(1) A virtual camera is installed and a depth image is projected onto a virtual viewpoint.
(2) Smooth the projected depth image.
(3) Map the pixel value of the actual image to the smoothed depth image.
(4) The remaining pixels are restored using surrounding pixels.
In this way, by using the two viewpoint images and the depth image thereof, it is possible to generate an image from an arbitrary viewpoint in the vicinity of these viewpoints.

このような任意視点画像の生成技術は、前述の立体画像表示に対しても臨場感の向上や改善に応用することが出来る。例えば、図７のように２つのカメラ７０１，７０２で被写体７０４，７０５を撮影した立体表示用の画像データ７０１ｖ，７０２ｖがあったとする。しかし、カメラの間隔７０６が人の左右眼の距離（６５ｍｍ前後と言われている）より離れすぎているため、立体画像として観察した場合に不自然な画像になるか、あるいは全く立体画像として知覚できない画像になってしまう。 Such a technique for generating an arbitrary viewpoint image can be applied to the above-described stereoscopic image display for improving and improving the sense of reality. For example, it is assumed that there are stereoscopic display image data 701v and 702v obtained by photographing subjects 704 and 705 with two cameras 701 and 702 as shown in FIG. However, since the camera interval 706 is too far from the distance between the left and right eyes of the person (which is said to be around 65 mm), the image becomes unnatural when viewed as a stereoscopic image, or perceived as a stereoscopic image at all. It becomes an image that can not be.

このようなケースに、前述の任意視点画像生成技術を適用して、カメラ位置７０１から人の左右眼の距離７０７に相当する仮想視点位置７０３における画像７０３ｖを生成することにより、画像７０１ｖと７０３ｖを用いて、適正な立体画像として観察することが可能になる。 In such a case, by applying the above-described arbitrary viewpoint image generation technique and generating the image 703v at the virtual viewpoint position 703 corresponding to the distance 707 between the left and right eyes of the person from the camera position 701, the images 701v and 703v are obtained. It becomes possible to observe as a proper stereoscopic image.

逆に、２つのカメラ７０１，７０２の間隔が狭すぎる場合には、撮影された画像７０１ｖ，７０２ｖはほとんど立体感の得られない立体画像になってしまうが、このようなケースも同様に、いずれかの視点から左右眼の距離に相当する仮想視点を設定し、その仮想視点における画像を生成することにより、充分な立体感が得られる立体画像として観察することが可能になる。さらに、以上の原理を応用して、任意視点からの立体画像を観察することや、任意視点における立体画像観察時に立体感を調整することが可能である。 On the other hand, if the distance between the two cameras 701 and 702 is too narrow, the captured images 701v and 702v are stereoscopic images with almost no stereoscopic effect. By setting a virtual viewpoint corresponding to the distance between the left and right eyes from the viewpoint and generating an image at the virtual viewpoint, it is possible to observe a stereoscopic image with a sufficient stereoscopic effect. Furthermore, by applying the above principle, it is possible to observe a stereoscopic image from an arbitrary viewpoint, or to adjust the stereoscopic effect when observing a stereoscopic image from an arbitrary viewpoint.

上述のように、複数の視点画像および対応する奥行き画像を利用することにより、画像表示システムの表現機能を向上させることができる。しかし一方で、奥行き画像データを必要とするため、その分、記録・伝送時の符号量が増えるという問題が生じる。この問題を解決するため、多視点画像の符号化復号装置においては、各種の工夫がなされている。 As described above, the expression function of the image display system can be improved by using a plurality of viewpoint images and corresponding depth images. However, on the other hand, since depth image data is required, there is a problem that the amount of code at the time of recording / transmission increases accordingly. In order to solve this problem, various devices have been made in the multi-view image encoding / decoding device.

例えば、特許文献１では、多視点画像符号化を行う際に、画像の視点からの距離である遠近情報に応じて動きベクトル探索の範囲を制御する技術が開示されている。画像の遠近情報は、右眼画像と左眼画像から取得する。この情報を用い、視点から遠い領域においては動きベクトルの探索範囲を狭くし、視点に近い領域においては動きベクトルの探索範囲を広くすることで、所望のデータ量において視点に近い領域の画質を劣化させることなく符号化を行うことを可能にしている。 For example, Patent Document 1 discloses a technique for controlling a motion vector search range according to perspective information that is a distance from an image viewpoint when multi-view image encoding is performed. The perspective information of the image is acquired from the right eye image and the left eye image. Using this information, the motion vector search range is narrowed in the region far from the viewpoint, and the motion vector search range is widened in the region near the viewpoint, thereby degrading the image quality of the region near the viewpoint in the desired data amount. Thus, it is possible to perform encoding without causing the error to occur.

特開２００１−２８５８９５号公報JP 2001-285895 A

森、他：奥行き画像を用いた３Ｄｗａｒｐｉｎｇによる自由視点画像生成，電子情報通信学会総合大会情報・システム講演論文集２，Ｄ−１１−７，２００８年Mori, et al .: Free viewpoint image generation by 3D warping using depth images, IEICE General Conference, Information and Systems Proceedings 2, D-11-7, 2008

しかし、特許文献１の方法に従う場合、画像全体において、視点に近い領域の面積の占める割合が、視点から遠い領域の面積に対して大きい場合、動きベクトル探索範囲の大きい領域の割合が多くなり、画面全体の動きベクトル探索量が増大してしまう。この処理負荷の増大により、リアルタイム処理が必要な場合（例えば、テレビ中継など）において符号化処理が間に合わないなどの問題が生じる可能性がある。特に画像の画素数や入力フレームレートが大きくなると、処理負荷増大がより顕著になり、リアルタイム処理に支障を来たす可能性がある。さらには、立体映像に用いる視点画像の視点数が増加すると、より負荷が増大することになる。
本発明は、上記の課題に対し、多視点画像の符号化処理における処理負荷の低減を行うことを目的とするものである。 However, in the case of following the method of Patent Document 1, when the ratio of the area of the region close to the viewpoint in the entire image is larger than the area of the region far from the viewpoint, the ratio of the region having a large motion vector search range increases. The amount of motion vector search for the entire screen increases. Due to this increase in processing load, there is a possibility that a problem such as inadequate encoding processing may occur when real-time processing is required (for example, television relaying). In particular, when the number of pixels of an image and the input frame rate increase, the processing load increases more significantly, which may hinder real-time processing. Furthermore, when the number of viewpoints of viewpoint images used for stereoscopic video increases, the load increases.
An object of the present invention is to reduce the processing load in the encoding process of a multi-viewpoint image.

上記課題を解決するため、本発明による多視点画像符号化装置は下記の各手段を備える。
（１）奥行き情報と前記視点画像によって、２つの画像フレーム間の動きベクトル探索により動き補償を行う符号化モード、または動きベクトル探索により動き補償を行わない符号化モードのいずれかの符号化モードを選択し、選択した符号化モードを示す符号化モード選択情報を出力する符号化モード選択手段と、該符号化モード選択情報に従い、前記視点画像の符号化を行う視点画像符号化手段と、前記奥行き情報の符号化を行う奥行き情報符号化手段とを備え、前記符号化モード選択手段は、前記視点画像の画素数、または画素数とフレームレート、または画素数とフレームレートと視点数、によって前記奥行き情報に対する閾値を決定し、前記閾値と、前記視点画像内の小領域ごとの前記奥行き情報との比較結果に応じて、前記小領域ごとに符号化モードを選択する。 In order to solve the above problems, a multi-view image encoding apparatus according to the present invention includes the following units.
(1) Depending on the depth information and the viewpoint image, an encoding mode in which motion compensation is performed by motion vector search between two image frames or an encoding mode in which motion compensation is not performed by motion vector search is performed. A coding mode selection unit that selects and outputs coding mode selection information indicating the selected coding mode; a viewpoint image coding unit that codes the viewpoint image according to the coding mode selection information; and the depth Depth information encoding means for encoding information , wherein the encoding mode selection means determines the depth according to the number of pixels of the viewpoint image, or the number of pixels and the frame rate, or the number of pixels, the frame rate, and the number of viewpoints. Determining a threshold value for information, and depending on a comparison result between the threshold value and the depth information for each small region in the viewpoint image, Select the encoding mode for each area .

（２）前記符号化モード選択手段は、前記閾値より視点に近い小領域に対し、前記動きベクトル探索により動き補償を行わない符号化モードを選択する前記符号化モード選択情報を出力する。
（３）前記符号化モード選択手段は、前記閾値より視点から遠い小領域に対し、前記２つの画像フレーム間の動きベクトル探索により動き補償を行う符号化モードを選択する前記符号化モード選択情報を出力する。
（４）前記奥行き情報符号化手段は、前記符号化モード選択情報に従い、奥行き情報の符号化を行う。 (2) The coding mode selection means outputs the coding mode selection information for selecting a coding mode in which motion compensation is not performed by the motion vector search for a small region closer to the viewpoint than the threshold .
(3) The encoding mode selection means selects the encoding mode selection information for selecting an encoding mode for performing motion compensation by searching for a motion vector between the two image frames for a small region farther from the viewpoint than the threshold. Output .
(4) The depth information encoding means encodes depth information according to the encoding mode selection information .

上記構成を備えることにより、本発明の多視点画像符号化装置は、次の効果を奏する。すなわち、多視点画像を圧縮符号化する際に、奥行き情報の大小に応じて符号化モードを選択することにより、符号化処理の処理負荷を低減することができる。その際、符号化モードの選択は、数値の比較処理のみのため、容易に実装が可能である。 By providing the above configuration, the multi-view image encoding device of the present invention has the following effects. That is, when compressing and encoding a multi-viewpoint image, the processing load of the encoding process can be reduced by selecting an encoding mode according to the depth information. In this case, the selection of the encoding mode is easy because it is only a numerical value comparison process.

本発明の第一の実施形態である多視点画像符号化装置の内部構成を示すブロック図である。It is a block diagram which shows the internal structure of the multiview image coding apparatus which is 1st embodiment of this invention. 視点画像および対応する奥行き画像、奥行き画像を符号化モードによって２値化した画像を示す図である。It is a figure which shows the image which binarized the viewpoint image, the corresponding depth image, and the depth image by the encoding mode. 符号化モード選択部における処理フローを示す図である。It is a figure which shows the processing flow in an encoding mode selection part. 本発明の第二の実施形態である多視点画像符号化装置の内部構成を示すブロック図である。It is a block diagram which shows the internal structure of the multiview image coding apparatus which is 2nd embodiment of this invention. 背景技術である立体画像表示の概念説明図である。It is a conceptual explanatory drawing of the stereoscopic image display which is background art. 背景技術である任意視点画像生成の概念説明図である。It is a conceptual explanatory view of arbitrary viewpoint image generation which is background art. 任意視点画像生成技術を利用した立体画像表示の補正処理を説明する図である。It is a figure explaining the correction process of the stereoscopic image display using arbitrary viewpoint image generation techniques. 符号化モード選択部における処理フローを示す図である。It is a figure which shows the processing flow in an encoding mode selection part.

以下、図面を参照して、本発明の各実施形態について説明する。本発明の第一の実施形態である多視点画像符号化装置について説明する。図１に、本実施例の多視点画像符号化装置の内部ブロック図を示す。図１の多視点画像符号化装置１００は、複数の視点画像および対応する奥行き情報を入力として、符号化処理を施して情報量を削減するための装置である。以下、図中各機能ブロックの動作を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. A multi-view image encoding apparatus according to the first embodiment of the present invention will be described. FIG. 1 shows an internal block diagram of the multi-view image encoding apparatus of the present embodiment. The multi-viewpoint image encoding apparatus 100 in FIG. 1 is an apparatus for reducing the amount of information by performing encoding processing using a plurality of viewpoint images and corresponding depth information as input. Hereinafter, the operation of each functional block in the figure will be described.

符号化モード選択部１０１は奥行き情報と視点画像の画素数の情報に基づき、符号化モード選択情報を出力する。ここで符号化モードとは、動画像符号化を行う際に、画像フレームを小領域に切り分けた小領域毎に行う符号化の方法であり、フレーム内（イントラ）予測、フレーム間（インター）予測などのモードがある。インター予測においては、２つの画像フレーム間の動きベクトル探索により動き補償を行う。 The encoding mode selection unit 101 outputs encoding mode selection information based on the depth information and the information on the number of pixels of the viewpoint image. Here, the encoding mode is an encoding method performed for each small region obtained by dividing an image frame into small regions when performing moving image encoding, and includes intra-frame (intra) prediction and inter-frame (inter) prediction. There are modes such as. In inter prediction, motion compensation is performed by motion vector search between two image frames.

視点画像符号化部１０２は、視点画像の圧縮符号化を行う。その際、符号化モード選択部１０１からの符号化モード選択情報により、動きベクトル探索のＯＮ／ＯＦＦの選択を行う。奥行き情報符号化部１０３は視点画像符号化部１０２と同様に圧縮符号化を行う。多重化部１０４は、符号化済みの視点画像と奥行き情報を多重化し、符号化データとして装置外部に出力する。 The viewpoint image encoding unit 102 performs compression encoding of the viewpoint image. At that time, ON / OFF selection of motion vector search is performed based on the encoding mode selection information from the encoding mode selection unit 101. The depth information encoding unit 103 performs compression encoding similarly to the viewpoint image encoding unit 102. The multiplexing unit 104 multiplexes the encoded viewpoint image and depth information, and outputs them as encoded data to the outside of the apparatus.

ここで、符号化モード選択部１０１の処理について詳細に説明する。図３に符号化モード選択部１０１の処理のフローチャートを示す。符号化モード選択部１０１は、まず、図２（Ｂ）に示すような奥行き情報を小領域ｂｘ（ｘ＝１，・・・，ｎ）に分割し（ステップＳ１）、この小領域ｂｘ毎に奥行き値の平均値としてＺｘ（ｘ＝１，・・・，ｎ）を算出する（ステップＳ２）。 Here, the processing of the encoding mode selection unit 101 will be described in detail. FIG. 3 shows a flowchart of the process of the encoding mode selection unit 101. First, the encoding mode selection unit 101 divides depth information as shown in FIG. 2B into small regions bx (x = 1,..., N) (step S1), and for each small region bx. Zx (x = 1,..., N) is calculated as an average value of the depth values (step S2).

奥行き情報は、例えば図２（Ａ）のような視点画像に対して、各画素に対応する被写体の視点からの距離に応じた数値が与えられ、図２（Ｂ）のような輝度画像として表される。この時、奥行き値Ｚｘは視点に近い程大きく、視点から離れるほど小さいものとし、図２（Ｂ）の輝度画像は視点に近いほど輝度が大きいとする。 For the depth information, for example, a numerical value corresponding to the distance from the viewpoint of the subject corresponding to each pixel is given to the viewpoint image as shown in FIG. 2A, and is expressed as a luminance image as shown in FIG. Is done. At this time, it is assumed that the depth value Zx is larger as it is closer to the viewpoint, and is smaller as it is farther from the viewpoint, and the luminance image in FIG.

一般に、奥行き情報は、カメラ位置から視点画像上の物体がどのくらい離れているか（実際には近い程大きい値になるよう変換してある）を示しており、各カメラによって、各画素に対する奥行き値、最大奥行き値（最も近い）、最小奥行き値（最も遠い）が定義されている。
奥行き情報の生成方法としては、例えば、赤外線を使った距離計測装置による方法や、ソフトウエアによる方法がある。赤外線を使った方法では、出射した光線が戻ってくるまでの時間を用いて距離を計測する。また、ソフトウエアによる方法には、左右の視点画像の各画素をマッチングした際の画素ずれから距離を算出する方法などがある。本発明は、奥行き情報の生成方法を限定するものではなく、従来公知の方法に基づく奥行き情報の生成方法を適宜適用することができる。 In general, the depth information indicates how far the object on the viewpoint image is from the camera position (actually converted to a larger value as it gets closer), and the depth value for each pixel by each camera, A maximum depth value (closest) and a minimum depth value (farthest) are defined.
As a method for generating depth information, for example, there are a method using a distance measuring device using infrared rays and a method using software. In the method using infrared rays, the distance is measured using the time until the emitted light beam returns. In addition, as a method using software, there is a method of calculating a distance from a pixel shift when matching each pixel of the left and right viewpoint images. The present invention does not limit the depth information generation method, and a depth information generation method based on a conventionally known method can be appropriately applied.

一方、視点画像については、視点画像から画素数ｐを取得し、この画素数ｐに従い、閾値θｚを決定する（ステップＳ３）。閾値θｚは画素数ｐが増加するほど小さくなるような関数ｆ（ｐ）を用いて決定する。最後に、奥行き値Ｚｘと閾値θｚを用いて動きベクトル探索のＯＮ／ＯＦＦを決定する（ステップＳ４）。具体的には、Ｚｘがθｚ以上となるとき、小領域ｂｘにおける動きベクトル探索をＯＦＦ（０）とし、Ｚｘがθｚより小さい場合は動きベクトル探索をＯＮ（１）とする。以上の結果、図２（Ｃ）のような２値画像が符号化モード選択情報として出力される。 On the other hand, for the viewpoint image, the pixel number p is acquired from the viewpoint image, and the threshold value θz is determined according to the pixel number p (step S3). The threshold value θz is determined using a function f (p) that decreases as the number of pixels p increases. Finally, ON / OFF of motion vector search is determined using the depth value Zx and the threshold value θz (step S4). Specifically, when Zx is equal to or greater than θz, the motion vector search in the small region bx is turned off (0), and when Zx is smaller than θz, the motion vector search is turned on (1). As a result, a binary image as shown in FIG. 2C is output as encoding mode selection information.

視点画像符号化部１０２においては、視点画像と前記符号化モード選択情報を用いて符号化モードの制御を行う。具体的には、小領域ｂｘの前記符号化モード選択情報が１の場合には動きベクトル探索を行い、符号化モード選択情報が０の場合には動きベクトル探索は行なわず、その他の符号化処理（例えば、イントラ予測）を用いて符号化を行うことで処理負荷を低減する。 The viewpoint image encoding unit 102 controls the encoding mode using the viewpoint image and the encoding mode selection information. Specifically, when the coding mode selection information of the small area bx is 1, a motion vector search is performed, and when the coding mode selection information is 0, no motion vector search is performed, and other coding processes are performed. The processing load is reduced by performing encoding using (for example, intra prediction).

なお、符号化モード選択部１０１の処理において、奥行き情報における小領域毎の奥行き平均値Ｚｘを閾値と比較する方法を上記に示したが、Ｚｘは小領域毎の奥行き値の最大値や最小値としても良い。最大値とする場合は、平均値を利用する場合と比較して、動きベクトル探索ＯＦＦと決定する小領域が増えるため処理負荷をより低減することができ、一方最小値とする場合は、動きベクトル探索ＯＮと決定する小領域が増えるため、処理負荷低減よりも符号化効率向上を重視するケースで有用である。 In the processing of the encoding mode selection unit 101, the method of comparing the depth average value Zx for each small region in the depth information with the threshold value has been described above. Zx is the maximum value or the minimum value of the depth value for each small region. It is also good. When the maximum value is used, the processing load can be further reduced because the number of small areas determined to be motion vector search OFF is increased compared to the case where the average value is used. Since the number of small areas determined to be search ON increases, it is useful in cases where emphasis is placed on improving coding efficiency rather than reducing processing load.

次に、本発明の第二の実施形態である多視点画像符号化装置について説明する。図４に、本実施例の多視点画像符号化装置の内部ブロック図を示す。符号化モード選択部１０１、視点画像符号化部１０２、多重化部１０４においての処理は、第一の実施形態と同様であるため説明は省略する。本実施形態は、さらに奥行き情報符号化部１０３においても、符号化モード選択情報に応じて符号化を行う。すなわち、奥行き情報に関しても、視点に近い小領域に対しては動きベクトル探索を行わないよう制御することで、処理負荷を低減させることが可能である。 Next, a multi-view image encoding apparatus that is a second embodiment of the present invention will be described. FIG. 4 shows an internal block diagram of the multi-view image encoding apparatus of the present embodiment. Since the processing in the encoding mode selection unit 101, the viewpoint image encoding unit 102, and the multiplexing unit 104 is the same as that in the first embodiment, description thereof is omitted. In the present embodiment, the depth information encoding unit 103 also performs encoding according to the encoding mode selection information. That is, regarding the depth information, it is possible to reduce the processing load by controlling not to perform the motion vector search for the small area close to the viewpoint.

なお、符号化モード選択部の処理は、図８に示すフローチャートのような構成でもよい。すなわち、符号化モード選択の際に、奥行き情報および視点画像の画素数だけでなく、符号化フレームレートを考慮する。具体的には、画素数ｐとフレームレートｒの２つの値に応じて閾値θｚを決定する（ステップＳ３´）。その他の処理（ステップＳ１，Ｓ２，Ｓ４）は第１の実施形態と同様であるので説明は省略する。 Note that the processing of the encoding mode selection unit may be configured as in the flowchart shown in FIG. That is, when the encoding mode is selected, not only the depth information and the number of pixels of the viewpoint image but also the encoding frame rate is considered. Specifically, the threshold value θz is determined according to two values of the number of pixels p and the frame rate r (step S3 ′). The other processes (steps S1, S2, and S4) are the same as those in the first embodiment, and a description thereof will be omitted.

また、符号化モード選択の際には、上記に加えて符号化する視点画像の視点数をさらに考慮しても良い。視点画像の視点数が異なる例として、例えば、３つ以上の視点画像を用いる立体視テレビの技術が挙げられる。これは主に裸眼立体視テレビのための技術であるが、裸眼立体視テレビでは、２つの視点のみである場合、画像が立体に見える領域が非常に狭く、少しでも横に顔を移動させると立体に見えないため、複数の視点の画像を用いて、立体視可能な領域を広げる手法が用いられている。この視点画像は、視点画像合成技術によって生成することも可能であるが、より高精細な画像を求める場合、実際のカメラで撮影した画像を複数伝送する方式も必要になると考えられる。また、他の例として、任意視点画像への対応がある。任意視点画像技術は、様々な位置や角度から撮影した画像群を用いて、任意の視点の画像を合成することのできる技術であり、これには、当然かなりの視点数の画像が必要となると考えられる。 In addition, when selecting an encoding mode, the number of viewpoint images to be encoded may be further considered in addition to the above. As an example in which the number of viewpoint images is different, for example, there is a technique of stereoscopic television using three or more viewpoint images. This is mainly a technology for autostereoscopic television. However, in autostereoscopic television, if there are only two viewpoints, the area where the image appears stereoscopic is very narrow. Since the image does not look stereoscopic, a method of expanding a stereoscopically viewable region using images from a plurality of viewpoints is used. Although this viewpoint image can be generated by a viewpoint image synthesis technique, a method for transmitting a plurality of images taken by an actual camera is considered necessary when a higher-definition image is required. As another example, there is a correspondence to an arbitrary viewpoint image. Arbitrary viewpoint image technology is a technology that can synthesize images of arbitrary viewpoints using a group of images taken from various positions and angles, and this naturally requires a considerable number of viewpoints. Conceivable.

このような多視点画像を適用する場合、符号化する視点数が増加するとその分符号化処理の負荷が増大することになるため、符号化モード選択部の処理を以下に述べるような構成にすることによって、処理負荷の増大を抑えることができる。
例えば、視点数ｖが所定の閾値を超えた場合に、奥行きに対する閾値θｚを更に低く再設定する。または、画素数・フレームレート・視点数の３つを総合的に考慮し、閾値θｚを決定する。その他、視点数ｖによって、符号化モード選択処理自体のＯＮ／ＯＦＦを決定してもよい。つまり符号化選択モード選択部では、視点画像の画素数、または画素数とフレームレート、または画素数とフレームレートと視点数、によって、奥行き情報に対する閾値の決定を行うようにすることができる。ここでは、いずれの方法をとる場合であっても、符号化における処理負荷を更に低減することが可能である。 When such a multi-viewpoint image is applied, if the number of viewpoints to be encoded increases, the load of the encoding process increases accordingly. Therefore, the process of the encoding mode selection unit is configured as described below. As a result, an increase in processing load can be suppressed.
For example, when the number of viewpoints v exceeds a predetermined threshold value, the threshold value θz for the depth is reset again. Alternatively, the threshold θz is determined by comprehensively considering the number of pixels, the frame rate, and the number of viewpoints. In addition, ON / OFF of the encoding mode selection process itself may be determined based on the number of viewpoints v. That is, the encoding selection mode selection unit can determine a threshold for depth information based on the number of pixels of the viewpoint image, or the number of pixels and the frame rate, or the number of pixels, the frame rate, and the number of viewpoints. Here, in any case, the processing load in encoding can be further reduced.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も特許請求の範囲に含まれる。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and the design and the like within the scope of the present invention are also within the scope of the claims. include.

１００…多視点画像符号化装置、１０１…符号化モード選択部、１０２…視点画像符号化部、１０３…奥行き情報符号化部、１０４…多重化部。 DESCRIPTION OF SYMBOLS 100 ... Multiview image coding apparatus, 101 ... Coding mode selection part, 102 ... Viewpoint image coding part, 103 ... Depth information coding part, 104 ... Multiplexing part.

Claims

A multi-viewpoint image encoding device that encodes a plurality of viewpoint images and corresponding depth information,
Based on the depth information and the viewpoint image , a coding mode is selected which is either a coding mode in which motion compensation is performed by motion vector search between two image frames or a coding mode in which motion compensation is not performed by motion vector search. Encoding mode selection means for outputting encoding mode selection information indicating the selected encoding mode;
Viewpoint image encoding means for encoding the viewpoint image according to the encoding mode selection information ;
Depth information encoding means for encoding the depth information ;
The encoding mode selection unit determines a threshold for the depth information based on the number of pixels of the viewpoint image, or the number of pixels and the frame rate, or the number of pixels, the frame rate, and the number of viewpoints. A multi- view image encoding apparatus , wherein an encoding mode is selected for each small region in accordance with a comparison result with the depth information for each small region .

The encoding mode selection means outputs the encoding mode selection information for selecting an encoding mode in which motion compensation is not performed by the motion vector search for a small region closer to the viewpoint than the threshold value, The multi-view image encoding apparatus according to claim 1 .

The encoding mode selection means outputs the encoding mode selection information for selecting an encoding mode for performing motion compensation by searching for a motion vector between the two image frames for a small region farther from the viewpoint than the threshold. The multi-viewpoint image encoding apparatus according to claim 1 , wherein:

The multi-view image encoding apparatus according to any one of claims 1 to 3 , wherein the depth information encoding unit encodes depth information according to the encoding mode selection information.