JP2017103564A

JP2017103564A - Control apparatus, control method, and program

Info

Publication number: JP2017103564A
Application number: JP2015234267A
Authority: JP
Inventors: 智一佐藤; Tomokazu Sato; 正輝北郷; Masateru Kitago
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2015-11-30
Filing date: 2015-11-30
Publication date: 2017-06-08
Anticipated expiration: 2035-11-30
Also published as: JP6622575B2

Abstract

PROBLEM TO BE SOLVED: To properly control a frame rate of a motion image to be acquired for forming a three-dimensional shape in accordance with a subject.SOLUTION: A control apparatus which acquires a frame from an imaging device imaging a motion image for forming a distance image of a subject, and controls an output of the motion image, includes: acquisition means of acquiring the distance image corresponding to a frame constructing the motion image imaged by the imaging device; calculation means of calculating a complication level of a shape of the subject in the distance image on the basis of the distance image; determination means of determining a frame rate of the motion image outputted corresponding to the complication level; and forming means of acquiring the frame from the motion image imaged by the imaging device in accordance with the frame rate acquired by the determination means, and forming the motion image of the frame rate.SELECTED DRAWING: Figure 1

Description

本発明は、カメラで撮影した複数枚の画像を用いて被写体の３次元形状を生成ための撮影を制御する技術に関する。 The present invention relates to a technique for controlling photographing for generating a three-dimensional shape of a subject using a plurality of images photographed by a camera.

従来、被写体に対して視点の異なる複数枚の画像を取得し、撮影した画像から被写体の３次元形状を復元する技術（Ｓｔｒｕｃｔｕｒｅｆｒｏｍｍｏｔｉｏｎ）が知られている。被写体の三次元形状を動画として生成するためには、被写体に対して視点の異なる複数の画像を連続的に撮影する必要がある。そこで、被写体に対して動画を撮影した結果に基づいて、被写体の３次元形状を生成する技術がある。ただし、３次元形状を生成するために撮影する動画のフレームレートは、３次元形状の生成精度とのトレードオフを考えなければならない。動画のフレームレートが高い（取得できる画像の枚数が多い）ほど、３次元形状の生成精度は上がる。一方、データ量や処理負荷は増大する。データ量や処理負荷を優先して、フレームレートを下げる（取得する画像の枚数を減らす）と、３次元形状を生成するために必要な情報が欠落してしまう場合がある。 2. Description of the Related Art Conventionally, a technique (Structure from motion) for acquiring a plurality of images with different viewpoints with respect to a subject and restoring the three-dimensional shape of the subject from the captured image is known. In order to generate a three-dimensional shape of a subject as a moving image, it is necessary to continuously shoot a plurality of images with different viewpoints with respect to the subject. Therefore, there is a technique for generating a three-dimensional shape of a subject based on a result of shooting a moving image of the subject. However, the frame rate of a moving image shot to generate a three-dimensional shape must consider a trade-off with the generation accuracy of the three-dimensional shape. The higher the frame rate of a moving image (the more images can be acquired), the higher the accuracy of generating a three-dimensional shape. On the other hand, the data amount and processing load increase. If the frame rate is reduced (the number of images to be acquired is reduced) giving priority to the amount of data and processing load, information necessary for generating a three-dimensional shape may be lost.

特許文献１には、車載カメラから撮影した動画から自動車から見える被写体の３次元情報を取得する方法において、動的に動画のフレームレートを制御する方法を開示している。具体的には、車両の速度や視野変化量などに基づいて衝突の危険があると判定された場合には、高フレームレートで動画を撮影し、それ以外の場合は低フレームレートで動画を撮影する。これにより、状況把握に必要な情報を確保しながら、保存するデータ量を削減している。 Patent Document 1 discloses a method for dynamically controlling the frame rate of a moving image in a method for acquiring three-dimensional information of a subject viewed from a car from a moving image taken from an in-vehicle camera. Specifically, if it is determined that there is a risk of collision based on the speed of the vehicle or the amount of visual field change, the movie is shot at a high frame rate, otherwise the movie is shot at a low frame rate. To do. As a result, the amount of data to be saved is reduced while securing information necessary for grasping the situation.

特開２０１０−２７３１７８号公報JP 2010-273178 A

しかしながら特許文献１に開示された方法では、適切にフレームレートを制御できない場合がある。低フレームレートの動画でも３次元形状を生成できる被写体を高フレームレートで動画を撮影してしまったり、高フレームレートで撮影しなければ３次元形状を生成できない被写体に対いて低フレームレートで動画を撮影してしまったりすることがある。 However, in the method disclosed in Patent Document 1, the frame rate may not be appropriately controlled. Shoot a video at a high frame rate for a subject that can generate a 3D shape even with a low frame rate video, or a video at a low frame rate for a subject that cannot generate a 3D shape unless it is shot at a high frame rate. Sometimes I shoot.

そこで本発明は、３次元形状を生成するために取得する動画のフレームレートを被写体に応じて適切に制御することを目的とする。 Therefore, an object of the present invention is to appropriately control the frame rate of a moving image acquired to generate a three-dimensional shape according to the subject.

上記課題を解決するため本発明は、被写体の距離画像を生成するための動画を撮像する撮像装置からフレームを取得し、動画の出力を制御する制御装置であって、前記撮像装置が撮像した動画を構成するフレームに対応する距離画像を取得する取得手段と、前記距離画像に基づいて、前記距離画像における被写体の形状の複雑度を算出する算出手段と、
前記複雑度に応じて、出力する動画のフレームレートを決定する決定手段と、前記決定手段が取得したフレームレートに従って前記撮像装置が撮像した動画からフレームを取得し、前記フレームレートの動画を生成する生成手段とを有することを特徴とする。 In order to solve the above-described problems, the present invention provides a control device that acquires a frame from an imaging device that captures a moving image for generating a distance image of a subject and controls the output of the moving image, the moving image captured by the imaging device Acquisition means for acquiring a distance image corresponding to a frame constituting the image, and calculation means for calculating the complexity of the shape of the subject in the distance image based on the distance image;
A determining unit that determines a frame rate of a moving image to be output according to the complexity, and a frame is acquired from the moving image captured by the imaging device according to the frame rate acquired by the determining unit, and a moving image having the frame rate is generated. And generating means.

本発明の効果は、３次元形状を生成するために取得する動画のフレームレートを被写体に応じて適切に制御することができる。 As an effect of the present invention, the frame rate of a moving image acquired to generate a three-dimensional shape can be appropriately controlled according to the subject.

第１の実施形態の３次元形状生成を行う撮影制御装置のブロック図1 is a block diagram of an imaging control apparatus that performs three-dimensional shape generation according to a first embodiment. 第１の実施形態の被写体の撮影処理のフローチャートFlowchart of subject photographing processing according to the first embodiment 第１の実施形態の形状生成のフローチャートFlowchart of shape generation according to the first embodiment 第１の実施形態の複雑度算出処理及びフレームレート算出処理のフローチャートFlow chart of complexity calculation processing and frame rate calculation processing of the first embodiment 並進移動しながら平面を撮影するカメラA camera that shoots a plane while translating カメラの撮影範囲を表す図A diagram showing the shooting range of the camera 凹凸のある平面を撮影するカメラとオクルージョンの様子Camera and occlusion taking pictures of uneven surfaces 凹凸のある平面を撮影するカメラとオクルージョンの様子の詳細Details of camera and occlusion shooting on uneven surface 符号化におけるＧＯＰの構成Structure of GOP in encoding コンピュータのブロック構成図Block diagram of computer 第２の実施形態の３次元形状生成を行う撮影制御装置のブロック図The block diagram of the imaging | photography control apparatus which performs the three-dimensional shape production | generation of 2nd Embodiment 第２の実施形態の３次元形状生成を行う撮影制御装置のフローチャートFlowchart of an imaging control apparatus that performs 3D shape generation according to the second embodiment 凹領域を検出するためのフィルタの一例を示す図The figure which shows an example of the filter for detecting a recessed area

以下、添付図面を参照し、本発明の好適な実施形態について説明する。なお、以下の実施形態において示す各構成は、一例にすぎず、本発明は図示された構成に限定されるものではない。 Preferred embodiments of the present invention will be described below with reference to the accompanying drawings. In addition, each structure shown in the following embodiment is only an example, and this invention is not limited to the structure shown in figure.

［第１の実施形態］
第１の実施形態では、被写体に対してカメラを一定の速度で並進移動させながら動画を撮影し、撮影した動画のうち少なくとも一部をストレージに記録する。撮影した動画に基づいて、撮影したシーンにおける距離画像を生成し、距離画像に基づいてシーンに含まれる被写体の３次元形状を生成する。特に本実施形態では、動画撮影時に簡易的に算出したシーンの距離画像に基づいて、動的に取得する動画のフレームレートを制御する。なお、本実施形態において動画とは、連続した時刻で撮影された複数の画像をフレームとした画像列を意味する。具体的なユースケースとしては、ゴンドラにカメラを載せてビルの壁面を撮影したり、車にカメラを積んで道路を撮影したりするなど、様々な用途が考えられる。 [First Embodiment]
In the first embodiment, a moving image is shot while the camera is translated at a constant speed with respect to the subject, and at least a part of the shot moving image is recorded in the storage. A distance image in the captured scene is generated based on the captured moving image, and a three-dimensional shape of a subject included in the scene is generated based on the distance image. In particular, in the present embodiment, the frame rate of a moving image that is dynamically acquired is controlled based on a distance image of a scene that is simply calculated during moving image shooting. In the present embodiment, the moving image means an image sequence using a plurality of images taken at successive times as frames. As specific use cases, various applications such as shooting a wall surface of a building with a camera mounted on a gondola or shooting a road with a camera mounted on a car are conceivable.

図１は、第１の実施形態に適用可能な撮像装置１０９と撮像装置１０９から得られたカラー画像を利用して３次元形状を生成する画像処理装置１１２のブロック図である。本実施形態において撮像装置１０９は、多視点撮影が可能なカメラである。具体的には、プレノプティックカメラを用いる。プレノプティックカメラは、撮像センサの前にマイクロレンズアレイを設置することで入射光を分離し、同時に多視点画像を撮影できるカメラである。なお、多視点撮影が可能なカメラとしては、複数の撮像部が配置された多眼カメラを用いてもよい。撮像部１１１は、撮像センサやマイクロレンズアレイなど光学系である。さらに撮像装置１０９には、撮像部１１１が撮像して得られる画像に対して、各種処理やフレームレートを制御する制御装置１１０が内蔵されている。一方画像処理装置１１２は、撮像装置１０９とは異なる装置であり、撮像装置１０９から取得したデータに基づいて画像処理を行い、３次元形状を生成する。画像処理装置１１２としては、例えばパーソナルコンピュータなどにより実現される。 FIG. 1 is a block diagram of an imaging apparatus 109 that can be applied to the first embodiment and an image processing apparatus 112 that generates a three-dimensional shape using a color image obtained from the imaging apparatus 109. In the present embodiment, the imaging device 109 is a camera capable of multi-viewpoint shooting. Specifically, a plenoptic camera is used. A plenoptic camera is a camera that separates incident light by installing a microlens array in front of an imaging sensor and can simultaneously capture multi-viewpoint images. In addition, as a camera capable of multi-view shooting, a multi-view camera in which a plurality of imaging units are arranged may be used. The imaging unit 111 is an optical system such as an imaging sensor or a microlens array. Further, the imaging device 109 has a built-in control device 110 that controls various processes and a frame rate for an image obtained by the imaging unit 111. On the other hand, the image processing device 112 is a device different from the imaging device 109 and performs image processing based on data acquired from the imaging device 109 to generate a three-dimensional shape. The image processing apparatus 112 is realized by, for example, a personal computer.

まず、撮像装置１０９について説明する。撮像部１１１は、動画撮影開始の指示を受けて、所定のフレームレートで被写体を動画撮影する。なおここでは、レッド（Ｒ），ブルー（Ｂ），グリーン（Ｇ）からなるカラーのデジタル画像によって構成された動画を撮影する。また、多視点で画像を取得できるため、実質的には互いに異なる視点からの複数の動画を取得していることになる。多視点画像取得部１０１は、撮像部１１１から動画を構成する各フレームの画像を順次、取得する。また、多視点画像取得部１０１は、フレームレート決定部１０３から入力されるフレームレートに従って、撮像部１１１が取得した動画から一部のフレームを取得し、距離画像生成部１０２及び単視点画像生成部１０４に出力する。 First, the imaging device 109 will be described. In response to an instruction to start moving image shooting, the imaging unit 111 shoots a subject at a predetermined frame rate. Here, a moving image composed of color digital images composed of red (R), blue (B), and green (G) is taken. In addition, since images can be acquired from multiple viewpoints, a plurality of moving images from different viewpoints are acquired substantially. The multi-viewpoint image acquisition unit 101 sequentially acquires images of each frame constituting the moving image from the imaging unit 111. In addition, the multi-viewpoint image acquisition unit 101 acquires some frames from the moving image acquired by the imaging unit 111 according to the frame rate input from the frame rate determination unit 103, and the distance image generation unit 102 and the single-viewpoint image generation unit To 104.

距離画像生成部１０２は、取得した多視点画像から距離画像を生成する。距離画像とは、画素単位でカメラ位置から被写体までの距離を表す距離情報を格納した画像である。距離情報は、メートル単位系で表わされる距離そのものでもよいし、視差などの距離に変換可能な値であってもよい。距離画像生成部１０２は、取得した動画のうち、同時刻に撮影された複数の互いに異なる視点の画像（フレーム）から、１つの距離画像を生成する。ここでは、例えばステレオマッチングによる視差推定を行い視差画像（ディスパリティマップ）を、距離画像として生成する。なおここで算出した距離画像を、第一の距離画像とする。距離画像生成部１０２は、距離画像をフレームレート決定部１０３に出力する。 The distance image generation unit 102 generates a distance image from the acquired multi-viewpoint image. A distance image is an image that stores distance information representing the distance from the camera position to the subject in pixel units. The distance information may be a distance itself expressed in a metric unit system, or may be a value that can be converted into a distance such as parallax. The distance image generation unit 102 generates one distance image from a plurality of different viewpoint images (frames) taken at the same time among the acquired moving images. Here, for example, parallax estimation is performed by stereo matching, and a parallax image (disparity map) is generated as a distance image. Note that the distance image calculated here is the first distance image. The distance image generation unit 102 outputs the distance image to the frame rate determination unit 103.

フレームレート決定部１０３は、距離画像から被写体の複雑度を算出する。被写体の複雑度とは、カメラから撮影可能な被写体領域において、被写体の形状の複雑さを表す指標である。フレームレート決定部１０３は、被写体の複雑度に応じて、多視点画像取得部１０１が取得すべき動画のフレームレートを決定する。複雑度及びフレームレートの算出処理については後述する。 The frame rate determination unit 103 calculates the complexity of the subject from the distance image. The complexity of the subject is an index representing the complexity of the shape of the subject in the subject area that can be photographed from the camera. The frame rate determination unit 103 determines the frame rate of the moving image that the multi-viewpoint image acquisition unit 101 should acquire according to the complexity of the subject. The complexity and frame rate calculation processing will be described later.

単視点画像生成部１０４は、取得した動画において、対応するフレームである互いに視点の異なる複数の画像から単視点の画像を順に生成し、符号化画像ストレージ１０６に出力する。ここでは、単視点画像生成部１０４は、同時刻に対応する視点の異なる複数の画像を、対応する画素毎に画素値を平均した値を算出することにより、単視点画像を生成する。単視点画像生成部１０４が生成する単視点画像も、多視点画像同様、ＲＧＢからなるカラー画像である。単視点画像生成部１０４は、動画を構成する複数のフレームから順に、単視点画像を生成し、出力されるため、単視点画像生成部１０４から出力される画像は、単視点における動画を構成する画像（フレーム）列とも言える。さらに単視点画像生成部１０４から出力される動画は、被写体の形状に応じて動的にフレームレートが変更された動画である。ここで生成された単視点画像は、後段の３次元形状生成処理に利用される。なお、単視点画像の生成方法は、複数の多視点画像から１視点の画像を選ぶ方法でもよいし、多視点画像における各画素の画素値の平均ではなく、各画像の画素値の和を単視点画像における画素値として算出する方法を取ってもよい。 The single-viewpoint image generation unit 104 sequentially generates single-viewpoint images from a plurality of images with different viewpoints that are corresponding frames in the acquired moving image, and outputs the generated images to the encoded image storage 106. Here, the single-viewpoint image generation unit 104 generates a single-viewpoint image by calculating a value obtained by averaging pixel values for each corresponding pixel of a plurality of images having different viewpoints corresponding to the same time. Similarly to the multi-viewpoint image, the single-viewpoint image generated by the single-viewpoint image generation unit 104 is a color image composed of RGB. Since the single-viewpoint image generation unit 104 generates and outputs a single-viewpoint image in order from a plurality of frames constituting the moving image, the image output from the single-viewpoint image generation unit 104 forms a single-viewpoint moving image. It can also be said to be an image (frame) sequence. Furthermore, the moving image output from the single-viewpoint image generation unit 104 is a moving image whose frame rate is dynamically changed according to the shape of the subject. The single viewpoint image generated here is used for the subsequent three-dimensional shape generation processing. Note that the single-viewpoint image generation method may be a method of selecting a single-viewpoint image from a plurality of multi-viewpoint images, or a single sum of pixel values of each image instead of an average of pixel values of each pixel in the multi-viewpoint image. You may take the method of calculating as a pixel value in a viewpoint image.

画像符号化部１０５は、単視点画像生成部１０４から出力される単視点画像を順次、所定の符号化方式により符号化し、符号化画像ストレージ１０６へ送る。ここでは、前述の通り多視点画像生成部１０４からは単視点画像をフレームとして順に出力されるため、フレーム間の時間相関を利用するＨ．２６４を用いて符号化する。なお、符号化方法は前記単視点カラー画像を効率的に符号化できるものであれよく、例えば各単視点画像を独立にＪＰＥＧを用いて符号化してもよい。 The image encoding unit 105 sequentially encodes the single viewpoint images output from the single viewpoint image generation unit 104 by a predetermined encoding method, and sends the encoded images to the encoded image storage 106. Here, as described above, the multi-viewpoint image generation unit 104 sequentially outputs single-viewpoint images as frames. H.264 is used for encoding. The encoding method may be one that can efficiently encode the single-view color image. For example, each single-view image may be independently encoded using JPEG.

次に、画像処理装置１１２を説明する。画像復号部１０７は、符号化画像ストレージ１０６から符号化されたデータを取得し、復号し、単視点の動画を出力する。３次元形状生成部１０８は入力された複数の単視点画像から距離画像を生成する。前述の通り、撮像装置１０９は、一定の速度で並進移動しているため、複数の単視点画像は、撮影した時刻は異なるものの互いに異なる視点から撮影した画像とみなすことができる。そこで３次元形状生成部１０８は、カメラが移動したことにより撮影された多視点画像を用いて、距離画像を生成する。なおここで算出された距離画像は、第二の距離画像とする。さらに、第二の距離画像に基づいて、被写体の３次元形状を生成する。 Next, the image processing apparatus 112 will be described. The image decoding unit 107 acquires the encoded data from the encoded image storage 106, decodes it, and outputs a single-view video. The three-dimensional shape generation unit 108 generates a distance image from a plurality of input single viewpoint images. As described above, since the imaging device 109 is translated at a constant speed, a plurality of single-viewpoint images can be regarded as images shot from different viewpoints although the shooting times are different. Therefore, the three-dimensional shape generation unit 108 generates a distance image using a multi-viewpoint image that is captured when the camera moves. The distance image calculated here is a second distance image. Furthermore, a three-dimensional shape of the subject is generated based on the second distance image.

次に、撮像装置１０９における処理の流れについて説明する。図２は、撮像装置１０９を実現するための処理フローを示す。ステップＳ２０１において多視点画像取得部１０１は、撮像部１１１から撮像部１１１が撮影した被写体の動画から着目するフレームの画像を取得する。ここで取得する画像は、同時刻に異なる複数の視点から撮影した多視点画像である。ステップＳ２０２において距離画像生成１０２は、同時刻に撮影された互いに視点の異なる複数の画像に基づいて、距離画像を生成する。 Next, the flow of processing in the imaging device 109 will be described. FIG. 2 shows a processing flow for realizing the imaging device 109. In step S <b> 201, the multi-viewpoint image acquisition unit 101 acquires a frame image of interest from the moving image of the subject captured by the imaging unit 111 from the imaging unit 111. The images acquired here are multi-viewpoint images taken from a plurality of different viewpoints at the same time. In step S202, the distance image generation 102 generates a distance image based on a plurality of images taken at the same time and having different viewpoints.

ステップＳ２０３においてフレームレート決定部１０３は、距離画像から被写体の複雑度を算出する。ステップＳ２０４においてフレームレート決定部１０４は、被写体の複雑度に応じて次に距離画像を生成するまで間に取得すべき多視点画像のフレーム数Ｍを算出する。これは、多視点画像のフレームレートが距離画像のフレームレートのＭ倍であることを意味する。なお、ステップＳ２０３とステップＳ２０４の詳細な説明は後述する。ステップＳ２０５において単視点画像生成部１０４は、互いに視点の異なる複数の画像から１つの単視点画像を生成する。 In step S203, the frame rate determination unit 103 calculates the complexity of the subject from the distance image. In step S204, the frame rate determination unit 104 calculates the number M of frames of the multi-viewpoint image to be acquired until the next generation of the distance image according to the complexity of the subject. This means that the frame rate of the multi-viewpoint image is M times the frame rate of the distance image. Detailed descriptions of step S203 and step S204 will be described later. In step S205, the single viewpoint image generation unit 104 generates one single viewpoint image from a plurality of images having different viewpoints.

ステップＳ２０６において画像符号化部１０５は、単視点画像を符号化し、符号化画像ストレージ１０６へ出力する。本実施形態では、撮像装置１０９が出力する動画のフレームレートが動的に変化することを鑑みて、以下のように符号化を行う。上述したように、距離画像はすべてのフレーム画像に対して算出されるのではなく、時間的に一定のフレーム周期で生成される。距離画像は、取得したカラーの多視点画像から生成される一方、全てのフレームの多視点画像から単視点画像が生成されている。そのため、必ず距離画像と同じ時刻に対応するカラーの単視点画像が存在し、かつ距離画像が対応する時刻間でフレーム数が変動する。そこで、距離画像が生成されるタイミングで、動画の符号化単位であるＧＯＰ（ＧｒｏｕｐＯｆＰｉｃｔｕｒｅ）を区切り、距離画像と同時刻に対応する単視点画像を、ＧＯＰのＩフレーム（先頭フレーム）とする。この時、ＧＯＰを構成するフレーム数はＩフレームを入れてＭとなる。ＧＯＰの様子を図９に示す。図９において、単視点画像のＩ、Ｐ、ＢはそれぞれＨ．２６４符号化のＩフレーム、Ｐフレーム、Ｂフレームを表している。通常、動画のフレームレートは、シーケンスレイヤのシーケンスヘッダに記述されるが、ここではＧＯＰ毎にフレームレートが異なるため、ＧＯＰレイヤのＧＯＰヘッダに構成フレーム数Ｍを記述しておく。また、フレームレートが低下すると、フレーム間の時間相関が低下する。そのためフレーム数Ｍが所定の閾値を下回ると、同ＧＯＰ内の全フレームをイントラ符号化する。 In step S <b> 206, the image encoding unit 105 encodes the single viewpoint image and outputs it to the encoded image storage 106. In the present embodiment, in consideration of the fact that the frame rate of the moving image output from the imaging device 109 changes dynamically, encoding is performed as follows. As described above, the distance image is not calculated for all the frame images, but is generated at a temporally constant frame period. The distance image is generated from the acquired multi-view image of color, while the single-view image is generated from the multi-view images of all frames. For this reason, there is always a color single-viewpoint image corresponding to the same time as the distance image, and the number of frames varies between the times corresponding to the distance image. Therefore, a GOP (Group Of Picture), which is a moving image coding unit, is divided at the timing when the distance image is generated, and a single viewpoint image corresponding to the same time as the distance image is defined as an I frame (first frame) of the GOP. . At this time, the number of frames constituting the GOP is M including I frames. The state of GOP is shown in FIG. In FIG. 9, I, P, and B of the single viewpoint image are H. H.264 encoded I frame, P frame, and B frame. Normally, the frame rate of a moving image is described in the sequence header of the sequence layer, but here, since the frame rate is different for each GOP, the number M of constituent frames is described in the GOP header of the GOP layer. Further, when the frame rate is lowered, the temporal correlation between frames is lowered. For this reason, when the number of frames M falls below a predetermined threshold, all the frames in the same GOP are intra-coded.

ステップＳ２０７、ステップＳ２０８、ステップＳ２０９はそれぞれ、ステップＳ２０１、ステップＳ２０５、ステップＳ２０６と同様の処理である。ステップＳ２０７〜ステップＳ２０９の処理は、多視点画像が取得されるフレームレートに合わせて、Ｍ−１回繰り返される。ステップＳ２１０において撮像装置１０９は、外部から入力される撮影終了信号を確認し、撮影が終了されていれば処理フローを終了し、終了されていなければステップＳ２０１に戻る。 Steps S207, S208, and S209 are the same processes as steps S201, S205, and S206, respectively. The processing from step S207 to step S209 is repeated M-1 times in accordance with the frame rate at which the multi-viewpoint image is acquired. In step S210, the imaging apparatus 109 confirms a shooting end signal input from the outside. If shooting has ended, the processing flow ends. If not, the process returns to step S201.

さらに、ステップＳ２０３およびステップＳ２０４の処理の流れを詳細に説明する。図３は、フレームレート決定部１０３を実現するための処理フローである。なお、ステップＳ２０３はステップＳ３０２〜Ｓ３０４、ステップＳ２０４はステップＳ３０５、Ｓ３０６に対応している。 Furthermore, the flow of processing in step S203 and step S204 will be described in detail. FIG. 3 is a processing flow for realizing the frame rate determination unit 103. Step S203 corresponds to steps S302 to S304, and step S204 corresponds to steps S305 and S306.

まずステップＳ３０１において複雑度ｎを０により初期化する。ステップＳ３０２において、距離画像における測距不能画素の数をｎに加算する。測距不能画素とは、距離情報を算出できなかった画素を意味する。距離画像を生成する際、オクルージョン領域など多視点画像のうち１視点の画像にしか撮像されなかった領域は、多視点画像間で対応点を検出することができず、測距不能となる。オクルージョン領域が多い被写体は、その分被写体形状が複雑である可能性が高く、測拒不能画素の数は、被写体形状の複雑度と相関がある。そこで、測拒不能画素数を検出し、複雑度ｎに加算する。 First, in step S301, the complexity n is initialized to zero. In step S302, the number of pixels that cannot be measured in the distance image is added to n. A pixel that cannot be measured means a pixel for which distance information could not be calculated. When a distance image is generated, an area such as an occlusion area that is captured only in one viewpoint image cannot detect a corresponding point between the multi-viewpoint images and cannot measure the distance. A subject with many occlusion areas is likely to have a complicated subject shape, and the number of non-measurable pixels correlates with the complexity of the subject shape. Therefore, the number of pixels that cannot be measured is detected and added to the complexity n.

ステップＳ３０３において、距離画像におけるエッジ画素を検出する。エッジ画素の検出には、距離画像に対してソーベルフィルタなどを用いることで実現できる。距離画像のエッジとは、距離が不連続な境界を表すものである。距離画像においてエッジ画素が多いということは、存在位置の異なる被写体が様々存在している、あるいは、被写体自体に、急峻な表面凹凸が多数存在している可能性が高い。このような場合、オクルージョン領域が発生しやすく、被写体が複雑な形状であるとみなすことができる。そこでステップＳ３０４において、算出されたエッジ画素の数をカウントし、複雑度ｎに加算する。 In step S303, edge pixels in the distance image are detected. The detection of the edge pixel can be realized by using a Sobel filter or the like for the distance image. The edge of the distance image represents a boundary where the distance is discontinuous. If there are many edge pixels in the distance image, there is a high possibility that there are various subjects with different positions, or there are many steep surface irregularities in the subject itself. In such a case, an occlusion area is likely to occur, and the subject can be regarded as having a complicated shape. In step S304, the calculated number of edge pixels is counted and added to the complexity n.

ステップＳ３０５において、算出した複雑度ｎに基づいて、フレームレートＦを算出する。まず、３次元形状を生成するために最低限必要なフレームレートＦｍｉｎを、図５を用いて説明する。図５は、平面を被写体として、平面に対してカメラが一定の距離を保って一定の速さで移動しながら撮影する様子を示している。図５においてカメラの位置は撮像した視点を、カメラから延びた破線は各視点位置におけるカメラの画角を表している。運動視差を利用して３次元形状を生成する場合、被写体である表面の全ての位置において、２フレーム以上で撮影されている必要がある。そこで、平面全体が常に２フレームの画角に収まるように撮影した様子を図５（ａ）に示している。原理的には図５（ａ）に示した撮像間隔であれば３次元形状を生成することは可能であるが、カメラの光学的性質上、画像端付近では空間解像度が低下することや画角の境界部分で対応点を検出できない場合がある。そこで、図５（ｂ）に示すように平面全体が常に３フレームの画角に収まるように撮像するフレームレートが、安定した３次元形状生成に最低限必要なフレームレートであると設定する。 In step S305, the frame rate F is calculated based on the calculated complexity n. First, a minimum frame rate Fmin necessary for generating a three-dimensional shape will be described with reference to FIG. FIG. 5 shows a situation in which a plane is taken as a subject and the camera is shooting while moving at a constant speed with a constant distance to the plane. In FIG. 5, the camera position indicates the captured viewpoint, and the broken line extending from the camera indicates the angle of view of the camera at each viewpoint position. When a three-dimensional shape is generated using motion parallax, it is necessary to capture at least two frames at all positions on the surface that is the subject. Therefore, FIG. 5A shows a state where the entire plane is always photographed within the angle of view of 2 frames. In principle, it is possible to generate a three-dimensional shape at the imaging interval shown in FIG. 5A. However, due to the optical properties of the camera, the spatial resolution decreases near the image edge and the angle of view. In some cases, the corresponding point cannot be detected at the boundary portion of. Therefore, as shown in FIG. 5B, the frame rate for imaging so that the entire plane always falls within the angle of view of 3 frames is set to be the minimum frame rate necessary for stable three-dimensional shape generation.

次に具体的なパラメータを用いて、最低限必要なフレームレートＦｍｉｎを算出する。ここで、カメラと平面の間の距離をＬ［ｍ］、カメラの移動速度をＶ［ｍ／ｓ］、カメラの画角をθ［°］とする。図６から（２／３）Ｌｔａｎ（θ／２）［ｍ］に一回の撮像が必要であることが分かるので、撮像周期は（２／３Ｖ）Ｌｔａｎ（θ／２）［ｓ］であり、フレームレートＦｍｉｎ［ｆｐｓ］は、式１に従って算出できる。
Ｆｍｉｎ＝３Ｖ／（２Ｌｔａｎ（θ／２））・・・式１
ここで、式１のパラメータＬ、Ｖ、θの求め方を説明する。距離Ｌは距離画像からカメラ被写体間のおおよその値を知ることができる。カメラの運動速度Ｖは、距離Ｌと動画符号化で利用される動きベクトルと組み合わせることで推定する。また、カメラの画角θは撮影装置の焦点距離と撮影素子の大きさから取得することができる。以上の通り、撮像装置１０９の撮像条件や取得した情報に基づいて、フレームレートＦｍｉｎを算出する。なお、事前にパラメータＶ，Ｌ，θを一定に保って撮影を行う環境がある場合は、事前にパラメータを入力して撮影を行ってもよい。 Next, the minimum required frame rate Fmin is calculated using specific parameters. Here, the distance between the camera and the plane is L [m], the moving speed of the camera is V [m / s], and the angle of view of the camera is θ [°]. Since it can be seen from FIG. 6 that one imaging is necessary for (2/3) Ltan (θ / 2) [m], the imaging cycle is (2 / 3V) Ltan (θ / 2) [s]. The frame rate Fmin [fps] can be calculated according to Equation 1.
Fmin = 3V / (2Ltan (θ / 2)) Equation 1
Here, how to obtain the parameters L, V, and θ of Equation 1 will be described. The distance L can know an approximate value between camera subjects from the distance image. The motion speed V of the camera is estimated by combining the distance L with a motion vector used in moving image encoding. Further, the angle of view θ of the camera can be obtained from the focal length of the photographing apparatus and the size of the photographing element. As described above, the frame rate Fmin is calculated based on the imaging conditions of the imaging device 109 and the acquired information. If there is an environment in which shooting is performed with the parameters V, L, and θ kept constant in advance, shooting may be performed by inputting parameters in advance.

さらに、被写体の形状が複雑な場合には、高フレームレートな画像列を生成することにより被写体の形状を生成するために必要な情報を取得する。例えば、被写体である平面上に図７（ａ）に示したような凹凸がある場合、図５（ｂ）に示す条件で撮影すると平面の面Ｂや面Ｃ、面Ｅ、面Ｆには１視点のカメラでしか撮像されない領域が存在する。従って面Ｂや面Ｃ、面Ｅ、面Ｆの領域では第二の距離画像を得ることができない。しかし、図７（ｂ）に示すように、被写体に対して一様に高フレームレートな画像列の動画として出力することは無駄が大きい。 Furthermore, when the shape of the subject is complicated, information necessary for generating the shape of the subject is acquired by generating an image sequence having a high frame rate. For example, if there is an uneven surface as shown in FIG. 7A on the plane that is the subject, 1 is displayed on the plane B, C, E, and F when shooting under the conditions shown in FIG. There is an area that can be imaged only by the viewpoint camera. Therefore, the second distance image cannot be obtained in the areas of the surface B, the surface C, the surface E, and the surface F. However, as shown in FIG. 7B, it is wasteful to output a moving image of an image sequence having a uniform high frame rate to a subject.

そこでステップＳ３０５では、プレノプティックカメラで得られた第一の距離画像を利用して算出した被写体の複雑度に応じてフレームレートＦを動的に調節する。フレームレートの最低値は、式１に示される最低限必要なフレームレートＦｍｉｎとする。また、最大値は撮影に用いたカメラが撮影可能な最大フレームレートＦｍａｘである。ステップＳ３０２からステップＳ３０４において算出した複雑度ｎは、測距不能画素やエッジ画素といった被写体の構造が複雑である部分を表す画素の数を示す。従って複雑度の最大値は、画像全体の画素数Ｎである。本実施形態ではフレームレートＦ-を式２に示すように設定する。 In step S305, the frame rate F is dynamically adjusted according to the complexity of the subject calculated using the first distance image obtained by the plenoptic camera. The minimum frame rate is the minimum required frame rate Fmin shown in Equation 1. The maximum value is the maximum frame rate Fmax that can be captured by the camera used for capturing. The complexity n calculated from step S302 to step S304 indicates the number of pixels representing a portion where the structure of the subject is complex, such as pixels that cannot be measured and edge pixels. Therefore, the maximum value of complexity is the number of pixels N of the entire image. In this embodiment, the frame rate F− is set as shown in Equation 2.

ただしＰ（Ｐ≧１）はパラメータであり、ここで求めたＦがＦｍａｘを超える場合はＦ＝Ｆｍａｘとする。ステップＳ３０６において距離画像のフレームレートと撮影している動画のフレームレートＦとに基づいて、フレーム数Ｍを算出する。まず、距離画像のフレームレートをＦｄとすると、フレーム数Ｍは、式３に従って算出できる。
However, P (P ≧ 1) is a parameter. When F obtained here exceeds Fmax, F = Fmax. In step S306, the number M of frames is calculated based on the frame rate of the distance image and the frame rate F of the moving image being shot. First, if the frame rate of the distance image is Fd, the number of frames M can be calculated according to Equation 3.

本実施形態では、Ｆｄ＝Ｆｍｉｎとする。従って式３は式４のように変形される。
In this embodiment, Fd = Fmin. Therefore, Equation 3 is transformed as Equation 4.

以上で、ステップＳ２０３およびステップＳ２０４の詳細な説明を終了する。
Above, detailed description of step S203 and step S204 is complete | finished.

次に、画像処理装置１１２が実行する３次元形状生成処理を、図４を用いて説明する。ステップＳ４０１において画像復号部１０７は、符号化されたデータを復号する。復号の結果、動画を構成する画像（フレーム）が得られ、１つの視点が一定の速度で並進移動しながら撮像した動画を取得できる。 Next, the three-dimensional shape generation process executed by the image processing apparatus 112 will be described with reference to FIG. In step S401, the image decoding unit 107 decodes the encoded data. As a result of decoding, an image (frame) constituting a moving image is obtained, and a moving image captured while a single viewpoint translates at a constant speed can be acquired.

ステップＳ４０２において３次元形状生成部１０８は、ステップＳ４０１におおいて取得した画像から局所特徴量を抽出し、画像上の対応する特徴量を探索する。さらに対応する特徴量の位置に基づいて、それぞれのフレームにおけるカメラの外部パラメータを推定する。ただし、内部パラメータは予め求められており、全フレームで共通であるものとする。内部パラメータは一般にカメラの焦点距離、画像の原点の位置、レンズ歪みの特性などを表す。外部パラメータはグローバル座標系におけるカメラ姿勢であり、一般的にはカメラのグローバル座標Tと光軸方向Ｒで表される。また、ここでは局所特徴量として、ＳＩＦＴ（Ｓｃａｌｅ−ＩｎｖａｒｉａｎｔＦｅａｔｕｒｅＴｒａｎｓｆｏｒｍ）を用いる。なお、特徴量は画像における局所的な対応を探索できる方法であればよく、ＳＵＲＦ（ＳｐｅｅｄｅｄＵｐＲｏｂｕｓｔＦｅａｔｕｒｅｓ）やその他の方法を用いてもよい。なおここでは、局所特徴量の計算及び対応点探索に加えて、Ｒａｎｓａｃ（ｒａｎｄｏｍｓａｍｐｌｅｃｏｎｓｅｎｓｕｓ）などを用いて、誤った対応点を除くことが望ましい。 In step S402, the three-dimensional shape generation unit 108 extracts a local feature amount from the image acquired in step S401, and searches for a corresponding feature amount on the image. Further, the external parameters of the camera in each frame are estimated based on the position of the corresponding feature amount. However, the internal parameters are obtained in advance and are common to all frames. The internal parameters generally represent the focal length of the camera, the position of the image origin, lens distortion characteristics, and the like. The external parameter is the camera posture in the global coordinate system, and is generally expressed by the global coordinate T and the optical axis direction R of the camera. Here, SIFT (Scale-Invariant Feature Transform) is used as the local feature amount. The feature amount may be any method that can search for local correspondence in an image, and SURF (Speeded Up Robust Features) or other methods may be used. Here, in addition to the calculation of local features and search for corresponding points, it is desirable to remove erroneous corresponding points using Ransac (random sample consensus) or the like.

ステップＳ４０３において３次元形状生成部１０８は、推定された内部／外部パラメータに基づいて、局所特徴量が対づいた複数の視点画像の組から視差を推定し、第二の距離画像を生成する。ここで得た第二の距離画像の距離値を、グローバルな座標系に投影し、被写体の表面を表す点群を生成する。ステップＳ４０４においてさらに、得られた点群から、被写体の３次元形状モデルを生成する。３次元形状モデルの生成方法は、ドロネー三角形分割など陽的にモデリングする方法を用いても構わないし、ポアソン法など陰的にモデリングする方法を用いても構わない。以上で被写体の３次元形状を生成する処理を完了する。 In step S403, based on the estimated internal / external parameters, the three-dimensional shape generation unit 108 estimates a parallax from a plurality of viewpoint image pairs with which local feature amounts are associated, and generates a second distance image. The distance value of the second distance image obtained here is projected onto a global coordinate system to generate a point group representing the surface of the subject. In step S404, a three-dimensional shape model of the subject is generated from the obtained point group. As a method for generating the three-dimensional shape model, an explicit modeling method such as Delaunay triangulation may be used, or an implicit modeling method such as Poisson method may be used. This completes the process of generating the three-dimensional shape of the subject.

以上の通り第１の実施形態によれば、被写体の形状に基づいて、距離画像を生成するための動画のフレームレートを制御する。被写体の形状が複雑であることが予測される場合には、高いフレームレートで動画を作成し、それ以外の場合には低いフレームレートで動画を作成する。これにより、距離画像の算出に必要な情報を取得しつつ、保存するデータ量を適切に抑制することができる。 As described above, according to the first embodiment, the frame rate of the moving image for generating the distance image is controlled based on the shape of the subject. When it is predicted that the shape of the subject is complicated, a moving image is created at a high frame rate, and in other cases, a moving image is created at a low frame rate. Thereby, the data amount to preserve | save can be suppressed appropriately, acquiring the information required for calculation of a distance image.

また、第１の実施形態では、同時刻に撮影された複数の視点における画像に基づいて生成される第一の距離画像を用いて、被写体の複雑度を算出した。プレノプティックカメラの基線長は、カメラの絞る幅に依存し、比較的短い。そのため、算出される距離情報の精度は低い。一方、カメラ運動を推定する必要がないため、比較的簡易に距離情報を得ることができる。そこで簡易な方法で生成した第一の距離画像を用いてフレームレートを制御しつつ、３次元形状の生成にはより高精度な運動視差に基づいて算出される第二の距離画像を用いた。これにより、画像処理装置１１０は多視点画像ではなく単視点画像からなる動画を取得すればよく、撮像装置１０９からの出力データ量および画像処理装置１１０が取得する入力データ量を抑えることができる。 In the first embodiment, the complexity of the subject is calculated using the first distance image generated based on images at a plurality of viewpoints photographed at the same time. The baseline length of the plenoptic camera is relatively short depending on the width of the camera. Therefore, the accuracy of the calculated distance information is low. On the other hand, since there is no need to estimate camera motion, distance information can be obtained relatively easily. Therefore, the second distance image calculated based on higher-precision motion parallax was used to generate the three-dimensional shape while controlling the frame rate using the first distance image generated by a simple method. Accordingly, the image processing apparatus 110 only needs to acquire a moving image that is not a multi-viewpoint image but a single-viewpoint image, and the amount of output data from the imaging apparatus 109 and the amount of input data acquired by the image processing apparatus 110 can be suppressed.

また、距離画像を出力可能な撮像装置１０９には、距離画像における各画素の測距値の正確さを表す信頼度を別途出力する形態の装置もある。この場合、出力される信頼度が小さい画素を測距不能画素として扱ってもよい。さらに、測拒不能画素の数やエッジ画素の数は、それぞれいずれか一方のみに基づいて複雑度を算出してもよい。 In addition, the imaging device 109 that can output a distance image includes a device that separately outputs a reliability indicating the accuracy of the distance measurement value of each pixel in the distance image. In this case, a pixel with low reliability output may be treated as a pixel that cannot be measured. Further, the complexity may be calculated based on only one of the number of non-measurable pixels and the number of edge pixels.

さらに、測拒不能画素の数やエッジ画素の数以外の指標を用いて、被写体の複雑度を算出してもよい。測拒不能画素の数やエッジ画素の数以外の指標として、以下に３つの例を説明する。まず、エッジ画素の全てではなく、特定の方向を持ったエッジ領域の画素数に基づいて複雑度を算出してもよい。図８は、被写体とカメラの位置関係を示す図である。図８（ａ）（ｂ）（ｃ）はそれぞれ、図７の視点７０６，７０７，７０８それぞれの画角に映っている領域を太線で表現している。面Ｅと面Ｆの接合部が画角に映っているのは、図８（ｂ）に示す視点７０７のみである。従って、面Ｅと面Ｆの接合部は、第二の距離情報を得ることができない。そこで面Ｅと面Ｆを含む凹部を撮影可能な視点７０７付近では、高いフレームレートで撮影することが望ましい。 Furthermore, the complexity of the subject may be calculated using an index other than the number of non-measurable pixels and the number of edge pixels. As an index other than the number of pixels that cannot be measured and the number of edge pixels, three examples will be described below. First, the complexity may be calculated based on the number of pixels in the edge region having a specific direction, instead of all the edge pixels. FIG. 8 is a diagram showing the positional relationship between the subject and the camera. 8A, 8B, and 8C, the areas shown in the respective angles of view of the viewpoints 706, 707, and 708 in FIG. 7 are represented by bold lines. It is only the viewpoint 707 shown in FIG. 8B that the joint between the surface E and the surface F is reflected in the angle of view. Therefore, the joint between the surface E and the surface F cannot obtain the second distance information. Therefore, it is desirable to photograph at a high frame rate in the vicinity of the viewpoint 707 where the concave portion including the surfaces E and F can be photographed.

面Ｅと面Ｆの接合部の特徴として、カメラの運動方向に対してエッジ方向が垂直であることがあげられる。このようなエッジの場合、このエッジを含むように撮影可能な視点位置は少ないため、面Ｅと面Ｆの接合部は複雑度を比較的大きい値にするとよい。そこで、カメラの動き方向に基づいて、特定の方向のエッジを構成する画素の数を複雑度として検出する。具体的には、距離画像を取得した直前の単視点画像を符号化した結果得られる動きベクトルから、以降のカメラの運動方向を推定する。符号化方式がＨ．２６４を用いている場合の具体的な処理を説明する。Ｈ．２６４ではマクロブロック及びサブブロック単位に動きベクトルが設定され、ブロックによっては動きベクトルが与えられない場合もある。画面全体の動きを得るには、各動きベクトルのブロックサイズに応じた重み付き平均をとればよい。つまり、動きベクトルが設定されたブロックの動きベクトルとそのブロックサイズをそれぞれＣｉ、Ｓｉとすると、画面全体の平均動きベクトルＭＣは、式５に従って算出できる。 A feature of the joint between the surface E and the surface F is that the edge direction is perpendicular to the movement direction of the camera. In the case of such an edge, since there are few viewpoint positions that can be photographed so as to include this edge, it is preferable that the complexity of the joint between the surface E and the surface F be a relatively large value. Therefore, the number of pixels constituting an edge in a specific direction is detected as the complexity based on the camera movement direction. Specifically, the subsequent motion direction of the camera is estimated from the motion vector obtained as a result of encoding the single viewpoint image immediately before the distance image is acquired. The encoding method is H.264. A specific process when H.264 is used will be described. H. In H.264, a motion vector is set for each macroblock and sub-block, and a motion vector may not be given depending on the block. In order to obtain the motion of the entire screen, a weighted average corresponding to the block size of each motion vector may be taken. That is, if the motion vector of the block in which the motion vector is set and the block size thereof are Ci and Si, respectively, the average motion vector MC of the entire screen can be calculated according to Equation 5.

式５により得られた平均動きベクトルの方向を利用して、平均動きベクトルと反対に近いエッジ画素のみを検出し、検出した画素数を複雑度に加算する。これにより、カメラの運動方向に応じてオクルージョンが発生しやすい形状について、より適切に複雑度を導出することができる。
Using the direction of the average motion vector obtained by Equation 5, only edge pixels close to the opposite of the average motion vector are detected, and the number of detected pixels is added to the complexity. Thereby, the complexity can be derived more appropriately for a shape in which occlusion is likely to occur according to the moving direction of the camera.

なお前述の実施形態において説明した全てのエッジ画素の数を算出することなく、特定の方向のエッジ画素の数のみを複雑度に加算してもよい。全てのエッジ画素を検出する場合、あらゆる方向のエッジを検出するため、少なくとも垂直と水平の２方向に対してフィルタをかける必要がある。これに対して、検出したいエッジ方向が限定されている場合は、その方向に合わせた一回のフィルタリングで十分であるため、検出の処理にかかる演算量を軽減することができる。 Note that only the number of edge pixels in a specific direction may be added to the complexity without calculating the number of all edge pixels described in the above embodiment. When all edge pixels are detected, it is necessary to filter at least two directions of vertical and horizontal in order to detect edges in all directions. On the other hand, when the edge direction to be detected is limited, a single filtering according to the direction is sufficient, so that the amount of calculation required for the detection process can be reduced.

次に、凹領域の検出し、複雑度を算出する例を説明する。オクルージョン領域は、被写体自信の凹凸に起因するものであり、特に凹部でオクルージョンが発生しやすい。そこで、距離画像に対して凹部を検出するフィルタ処理を施し、検出した凹部の画素数を複雑度として定義する。凹領域の検出には、フィルタ中心からの距離が遠くなるほど係数が小さくなるようなフィルタを用いる。凹領域の検出に用いるフィルタの例を図１３に示す。図１３（ａ）は３画素×３画素の場合、（ｂ）は５画素×５画素の場合である。検出できる凹領域の大きさはフィルタサイズに依存する。例えば、画像上で１００×１００ピクセルの大きな凹領域に対して、３×３のフィルタを用いても凹領域として検出することはできない。逆に、ピクセル数の小さな凹領域に対してサイズの大きなフィルタを利用しても検出することはできない。この性質を利用して、検出するべき凹みのサイズを決定することができる。また、どの程度の凹みを凹領域と判定するかはフィルタ処理後の値の絶対値に対して閾値を定めて決めることができ、閾値を大きくするほど深い凹みしか検出しなくなる。 Next, an example in which a concave area is detected and complexity is calculated will be described. The occlusion area is caused by unevenness of the subject's confidence, and occlusion is likely to occur particularly in the recess. Therefore, the distance image is subjected to filter processing for detecting a concave portion, and the number of pixels of the detected concave portion is defined as the complexity. For detection of the concave region, a filter is used in which the coefficient decreases as the distance from the filter center increases. An example of a filter used for detection of the concave region is shown in FIG. FIG. 13A shows a case of 3 pixels × 3 pixels, and FIG. 13B shows a case of 5 pixels × 5 pixels. The size of the concave region that can be detected depends on the filter size. For example, a large concave area of 100 × 100 pixels on an image cannot be detected as a concave area even if a 3 × 3 filter is used. Conversely, even if a large filter is used for a concave region with a small number of pixels, it cannot be detected. Using this property, the size of the dent to be detected can be determined. In addition, it can be determined by determining a threshold value with respect to the absolute value of the value after the filter processing, and only a deep recess is detected as the threshold value is increased.

さらに、距離画像における分散を算出し、分散に基づいて複雑度を算出してもよい。距離画像における分散が大きい場合、被写体とカメラとの距離が一様ではなく、被写体の形状が複雑であることが予測される。そこで、距離画像における分散の値が大きいほど、複雑度が大きくなるように、分散値と複雑度を対応づけておき、分散に基づいて複雑度を算出してもよい。特に分散のみに基づいて複雑度を算出する場合、複雑度の算出にフィルタ処理を必要としないため、より簡易な構成により実現することができる。なお、エッジ画素の数や測拒不能画素の数などと組み合わせて複雑度を算出する場合は、算出した分散は正規化することが好ましい。 Further, the variance in the distance image may be calculated, and the complexity may be calculated based on the variance. When the dispersion in the distance image is large, it is predicted that the distance between the subject and the camera is not uniform and the shape of the subject is complicated. Therefore, the complexity may be calculated based on the variance by associating the variance with the complexity so that the greater the variance in the distance image, the greater the complexity. In particular, when the complexity is calculated based only on the variance, the filter processing is not required for the calculation of the complexity, so that it can be realized with a simpler configuration. Note that when the complexity is calculated in combination with the number of edge pixels or the number of non-measurable pixels, it is preferable to normalize the calculated variance.

［第１の実施形態の変形例］
図１に示した制御装置１１１の各部は、撮像装置１０９に内蔵されたハードウェアで構成しても良いが、撮像装置１０９の外部装置として、ソフトウェア（コンピュータプログラム）として実装しても良い。この場合、このソフトウェアは、ＰＣ（パーソナルコンピュータ）等、一般のコンピュータのメモリにインストールされることになる。そしてこのコンピュータのＣＰＵがこのインストールされたソフトウェアを実行することで、このコンピュータは、上述の撮影部１１１の制御装置としての機能を実現することになる。第１の実施形態の撮影装置の制御装置に適用可能なコンピュータのハードウェア構成例について、図１０のブロック図を用いて説明する。 [Modification of First Embodiment]
Each unit of the control device 111 illustrated in FIG. 1 may be configured by hardware incorporated in the imaging device 109, but may be implemented as software (computer program) as an external device of the imaging device 109. In this case, the software is installed in a memory of a general computer such as a PC (personal computer). Then, when the CPU of this computer executes the installed software, this computer realizes the function as the control device of the photographing unit 111 described above. An example of the hardware configuration of a computer applicable to the control device of the photographing apparatus according to the first embodiment will be described with reference to the block diagram of FIG.

ＣＰＵ１００１は、ＲＡＭ１００２やＲＯＭ１００３に格納されているコンピュータプログラムやデータを用いて、コンピュータ全体の制御を行うと共に、撮影制御装置が行うものとして説明した上述の各処理を実行する。 The CPU 1001 controls the entire computer using computer programs and data stored in the RAM 1002 and the ROM 1003, and executes the above-described processes described as being performed by the imaging control apparatus.

ＲＡＭ１００２は、コンピュータ読み取り可能な記憶媒体の一例である。ＲＡＭ１００２は、外部記憶装置１００７や記憶媒体ドライブ１００８、更にはネットワークインタフェース１０１０からロードされたコンピュータプログラムやデータを一時的に記憶するためのエリアを、有する。更に、ＲＡＭ１００２は、ＣＰＵ１００１が各種の処理を実行する際に用いるワークエリアを有する。即ち、ＲＡＭ１００２は、各種のエリアを適宜提供することができる。ＲＯＭ１００３は、コンピュータ読み取り可能な記憶媒体の一例であり、コンピュータの設定データや、ブートプログラムなどが格納されている。 The RAM 1002 is an example of a computer-readable storage medium. The RAM 1002 has an area for temporarily storing computer programs and data loaded from the external storage device 1007, the storage medium drive 1008, and the network interface 1010. Furthermore, the RAM 1002 has a work area used when the CPU 1001 executes various processes. That is, the RAM 1002 can provide various areas as appropriate. The ROM 1003 is an example of a computer-readable storage medium, and stores computer setting data, a boot program, and the like.

キーボード１００４、マウス１００５は、コンピュータの操作者が操作することで、各種の指示をＣＰＵ１００１に対して入力することができる。表示装置１００６は、ＣＲＴや液晶画面などにより構成されており、ＣＰＵ１００１による処理結果を画像や文字などでもって表示することができる。 The keyboard 1004 and the mouse 1005 can be operated by a computer operator to input various instructions to the CPU 1001. The display device 1006 is configured by a CRT, a liquid crystal screen, or the like, and can display a processing result by the CPU 1001 using an image, text, or the like.

外部記憶装置１００７は、コンピュータ読み取り記憶媒体の一例であり、ハードディスクドライブ装置に代表される大容量情報記憶装置である。外部記憶装置１００７には、ＯＳ（オペレーティングシステム）や、図１に示した各処理をＣＰＵ１００１に実現させるためのコンピュータプログラムやデータ、上記の各種テーブル、データベース等が保存されている。外部記憶装置１００７に保存されているコンピュータプログラムやデータは、ＣＰＵ１００１による制御に従って適宜ＲＡＭ１００２にロードされ、ＣＰＵ１００１による処理対象となる。 The external storage device 1007 is an example of a computer-readable storage medium, and is a large-capacity information storage device represented by a hard disk drive device. The external storage device 1007 stores an OS (Operating System), computer programs and data for causing the CPU 1001 to perform the processes shown in FIG. 1, the above-described various tables, databases, and the like. Computer programs and data stored in the external storage device 1007 are appropriately loaded into the RAM 1002 under the control of the CPU 1001 and are processed by the CPU 1001.

記憶媒体ドライブ１００８は、ＣＤ−ＲＯＭやＤＶＤ−ＲＯＭなどの記憶媒体に記録されているコンピュータプログラムやデータを読み出し、読み出したコンピュータプログラムやデータを外部記憶装置１００７やＲＡＭ１００２に出力する。なお、外部記憶装置１００７に保存されているものとして説明した情報の一部若しくは全部をこの記憶媒体に記録させておき、この記憶媒体ドライブ１００８に読み取らせても良い。 The storage medium drive 1008 reads a computer program and data recorded on a storage medium such as a CD-ROM or DVD-ROM, and outputs the read computer program or data to the external storage device 1007 or the RAM 1002. Note that part or all of the information described as being stored in the external storage device 1007 may be recorded on this storage medium and read by this storage medium drive 1008.

Ｉ／Ｆ１００９は、外部からカラー画像、距離画像、等を入力したり、フレームレートを出力したりするためのインタフェースであり、一例として示すのであればＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）である。外部装置としては撮像装置１０９が接続されており、Ｉ／Ｆ１００７を通して、カラー画像や距離画像を出力する。１０１０は、上述の各部を繋ぐバスである。 An I / F 1009 is an interface for inputting a color image, a distance image, and the like from the outside, and outputting a frame rate. For example, the I / F 1009 is a USB (Universal Serial Bus). An imaging device 109 is connected as an external device, and a color image and a distance image are output through the I / F 1007. A bus 1010 connects the above-described units.

上述構成において、本コンピュータの電源がＯＮになると、ＣＰＵ１００１はＲＯＭ１００３に格納されているブートプログラムに従って、外部記憶装置１００７からＯＳをＲＡＭ１００２にロードする。この結果、キーボード１００４、マウス１００５を介した情報入力操作が可能となり、表示装置１００６にＧＵＩを表示することが可能となる。ユーザが、キーボード１００４やマウス１００５を操作し、外部記憶装置１００７に格納された符号化アプリケーションの起動指示を入力すると、ＣＰＵ１００１はこのプログラムをＲＡＭ１００２にロードし、実行する。これにより、本コンピュータが撮影制御装置として機能することになる。 In the above configuration, when the computer is turned on, the CPU 1001 loads the OS from the external storage device 1007 to the RAM 1002 according to the boot program stored in the ROM 1003. As a result, an information input operation can be performed via the keyboard 1004 and the mouse 1005, and a GUI can be displayed on the display device 1006. When the user operates the keyboard 1004 or the mouse 1005 and inputs an activation instruction for the encoded application stored in the external storage device 1007, the CPU 1001 loads the program into the RAM 1002 and executes it. As a result, the computer functions as an imaging control device.

なお、ＣＰＵ１００１が実行する符号化アプリケーションプログラムは、図２に示すフローチャートを実現するプログラムである。出力される符号化データは、外部記憶装置１００７に保存することになる。なお、このコンピュータは、以降の各実施形態に係る撮影制御装置にも同様に適用可能である。 Note that the encoding application program executed by the CPU 1001 is a program for realizing the flowchart shown in FIG. The output encoded data is stored in the external storage device 1007. Note that this computer can be similarly applied to imaging control apparatuses according to the following embodiments.

［第２の実施形態］
第１の実施形態では、プレノプティックカメラを用いて多視点画像を取得する場合を例に説明した。第２の実施形態では、単視点カメラを手に持って撮影する場合を例に説明する。ユースケースとしては、非常に複雑な形状を持った被写体に対して様々なカメラ姿勢で撮影する必要がある場合や、建物の中を巡回しながら内部の３次元形状を生成する等が考えられる。第２の実施形態では、距離画像を取得するため、カラー画像を取得する単視点カメラとは別途、赤外線照射型カメラを用いる。第１の実施形態では距離画像を一定の時間的周期で取得して、出力する単視点画像のフレームレートを制御し、出力する単視点画像のフレームレートは距離画像のフレームレート以上としていた。第２の実施形態では、単視点画像と距離画像は同じフレームレートで同期して取得されるものとし、フレームレート決定部は単視点画像と距離画像の両方のフレームレートを制御する。 [Second Embodiment]
In the first embodiment, the case where a multi-viewpoint image is acquired using a plenoptic camera has been described as an example. In the second embodiment, an example will be described in which a single-viewpoint camera is captured in hand. As a use case, there are cases where it is necessary to shoot a subject having a very complicated shape with various camera postures, or generating an internal three-dimensional shape while traveling around a building. In the second embodiment, in order to acquire a distance image, an infrared irradiation camera is used separately from a single viewpoint camera that acquires a color image. In the first embodiment, the distance image is acquired at a constant time period, and the frame rate of the single-view image to be output is controlled. The frame rate of the single-view image to be output is equal to or higher than the frame rate of the distance image. In the second embodiment, it is assumed that the single-viewpoint image and the distance image are acquired synchronously at the same frame rate, and the frame rate determination unit controls the frame rates of both the single-viewpoint image and the distance image.

図１１は、第２の実施形態に適用可能な撮影装置１１０９と画像処理装置１１１２の構成を示すブロック図である。以降では、図１と異なる部分を説明する。 FIG. 11 is a block diagram illustrating a configuration of an imaging device 1109 and an image processing device 1112 that can be applied to the second embodiment. Hereinafter, parts different from FIG. 1 will be described.

単視点画像取得部１１０１は、フレームレート決定部１１０５から得られたフレームレートに従って、撮像部１１１１から単視点画像を順次取得する。ここで取得する単視点画像は、ＲＧＢからなる画像データである。単視点画像取得部１１０１は、単視点画像を画像符号化部１１０４に出力する。距離画像取得部１１０２はフレームレート決定部１１０５から得られたフレームレートに従って、撮像部１１１１から赤外線画像を取得する。距離画像取得部１０２は、赤外線画像に基づいて距離画像を生成する。赤外線画像から距離画像を生成する方法は、例えばＳｔｒｕｃｔｕｒｅｄＬｉｇｈｔ方式や、ＴｉｍｅＯｆＦｌｉｇｈｔ（ＴＯＦ）方式等がある。なお単視点画像取得部１１０１が取得する単視点画像と、距離画像取得部１１０２が取得する赤外線画像は、ほぼ同じタイミングで撮像されたもので、フレームレートは対応している。距離画像取得部１１０２は、距離画像をフレームレート決定部１１０３及び、画像符号化部１１０４に出力する。画像符号化部１１０４は、単視点画像と距離画像の両方を符号化する。赤外線照射型の距離画像は、第１の実施形態の多視点型の距離画像よりも距離の精度がよく、後段の３次元形状の生成に利用できるように符号化した。３次元形状生成部１１０７は多視点画像と同時に取得された距離画像を利用して、より簡易に３次元形状を生成する。 The single viewpoint image acquisition unit 1101 sequentially acquires single viewpoint images from the imaging unit 1111 according to the frame rate obtained from the frame rate determination unit 1105. The single viewpoint image acquired here is image data composed of RGB. The single viewpoint image acquisition unit 1101 outputs the single viewpoint image to the image encoding unit 1104. The distance image acquisition unit 1102 acquires an infrared image from the imaging unit 1111 according to the frame rate obtained from the frame rate determination unit 1105. The distance image acquisition unit 102 generates a distance image based on the infrared image. As a method for generating a distance image from an infrared image, for example, there is a Structured Light method, a Time Of Flight (TOF) method, or the like. Note that the single viewpoint image acquired by the single viewpoint image acquisition unit 1101 and the infrared image acquired by the distance image acquisition unit 1102 are captured at substantially the same timing, and the frame rates correspond to each other. The distance image acquisition unit 1102 outputs the distance image to the frame rate determination unit 1103 and the image encoding unit 1104. The image encoding unit 1104 encodes both the single viewpoint image and the distance image. The infrared irradiation type distance image is encoded such that the distance accuracy is higher than that of the multi-viewpoint type distance image of the first embodiment and can be used for generating a three-dimensional shape in the subsequent stage. A three-dimensional shape generation unit 1107 generates a three-dimensional shape more easily using a distance image acquired simultaneously with the multi-viewpoint image.

次に、本実施形態における撮像装置１１０９が実行する処理のフローを示す。ステップＳ１２０１において単視点画像取得部１１０１と距離画像取得部１１０２はそれぞれ、同じタイミングの単視点画像及び距離画像を撮像部１１１１から取得する。ステップＳ１２０２において画像符号化部１１０４は、単視点画像を符号化する。ステップＳ１２０３においてフレームレート決定部１１０３は、距離画像から距離画像に映る被写体の複雑度を算出する。ステップＳ１２０４においてフレームレート決定部１１０３は、次のカラー画像及び距離画像を取得するまでの遅延時間を算出する。これは、フレームレートが１フレーム単位で変化することを意味する。なお、遅延時間の更新は必ずしも１フレーム毎に行われる必要はなく、例えば定数時間毎に一度行ってもよいし、一定フレーム毎に一度の頻度で行ってもよい。ステップＳ１２０５において画像符号化部１１０４は、距離画像を符号化する。ステップＳ１２０６において撮像装置１１０９は、撮影終了信号を確認し、終了であれば処理を終了し、そうでなければステップＳ１２０１に戻る。 Next, a flow of processing executed by the imaging apparatus 1109 in the present embodiment is shown. In step S <b> 1201, the single viewpoint image acquisition unit 1101 and the distance image acquisition unit 1102 each acquire a single viewpoint image and a distance image at the same timing from the imaging unit 1111. In step S1202, the image encoding unit 1104 encodes a single viewpoint image. In step S1203, the frame rate determination unit 1103 calculates the complexity of the subject shown in the distance image from the distance image. In step S1204, the frame rate determination unit 1103 calculates a delay time until the next color image and distance image are acquired. This means that the frame rate changes in units of one frame. The update of the delay time is not necessarily performed every frame, and may be performed once every constant time, for example, or may be performed once every fixed frame. In step S1205, the image encoding unit 1104 encodes the distance image. In step S1206, the imaging apparatus 1109 confirms the shooting end signal. If the shooting is completed, the imaging apparatus 1109 ends the process. If not, the process returns to step S1201.

次に、第２の実施形態における画像処理装置１１１２について説明する。第２の実施形態における画像処理装置１１１２は、撮像装置１１０９から距離画像を取得する。外部パラメータの推定Ｓ３０２は第１の実施形態と同様にカラー画像間の局所特徴量のマッチングにより運動情報を推定することができるが、距離画像から得られる点群同士の位置合わせによって推定することもできる。 Next, the image processing apparatus 1112 in the second embodiment will be described. The image processing device 1112 in the second embodiment acquires a distance image from the imaging device 1109. As in the first embodiment, the external parameter estimation S302 can estimate motion information by matching local feature amounts between color images, but it can also be estimated by aligning point groups obtained from distance images. it can.

被写体の表面を表す密な点群の生成Ｓ３０３は、外部パラメータを用いて第一の距離情報をグローバル座標系に投影することで、点群の生成を簡易に行うことができる。ただしこの時、第一の距離画像からは信頼度の低い点や被写体の素材の問題で距離情報が得られなかった領域の点群を補完するのに、第二の距離情報おｗ必要な部分のみ生成して用いることも可能である。 The dense point cloud generation S303 representing the surface of the subject can easily generate the point cloud by projecting the first distance information onto the global coordinate system using the external parameters. However, at this time, the second distance information is necessary to complement the point group of the area where distance information was not obtained from the first distance image due to problems of low reliability or subject material. It is also possible to generate and use only.

以上で本実施形態の説明を終える。以上の構成により、第１の実施形態と同様に被写体の形状の複雑度に応じて動的にフレームレートを制御しながら、被写体の撮影をする。これにより、距離画像の算出に必要な情報を取得しつつ、保存するデータ量を適切に抑制することができる。更に、本実施形態では、第１の実施形態で必要だった被写体表面を表す点群を生成する処理を大幅に削減することができるようになった。 This is the end of the description of the present embodiment. With the above configuration, the subject is photographed while the frame rate is dynamically controlled according to the complexity of the shape of the subject, as in the first embodiment. Thereby, the data amount to preserve | save can be suppressed appropriately, acquiring the information required for calculation of a distance image. Furthermore, in the present embodiment, it has become possible to greatly reduce the processing for generating the point cloud representing the subject surface required in the first embodiment.

１０１多視点画像取得部
１０２距離画像生成部
１０３フレームレート決定部
１０４単視点画像生成部
１０５画像符号化部
１１０制御装置 DESCRIPTION OF SYMBOLS 101 Multi viewpoint image acquisition part 102 Distance image generation part 103 Frame rate determination part 104 Single viewpoint image generation part 105 Image encoding part 110 Control apparatus

Claims

A control device that acquires a frame from an imaging device that captures a moving image for generating a distance image of a subject and controls output of the moving image,
Acquisition means for acquiring a distance image corresponding to a frame constituting a moving image captured by the imaging device;
Calculation means for calculating the complexity of the shape of the subject in the distance image based on the distance image;
Determining means for determining a frame rate of a moving image to be output according to the complexity;
A control device comprising: a generating unit configured to acquire a frame from a moving image captured by the imaging device according to the frame rate acquired by the determining unit and generate a moving image at the frame rate.

The control device according to claim 1, wherein the determination unit determines the frame rate of the moving image to be a higher frame rate as the complexity increases.

The determining means determines the number of frames to be acquired from the imaging device after the frame corresponding to the distance image acquired by the acquiring means based on the complexity,
The control device according to claim 1, wherein the generation unit acquires the number of frames from the frame corresponding to the distance image acquired by the acquisition unit.

The acquisition means acquires a distance image at a predetermined time among distance images corresponding to frames constituting a moving image captured by the imaging device,
The determination means determines the number of frames each time the acquisition means acquires a distance image,
The said generation means acquires the frame of the said frame number between the time when the said acquisition means acquired the distance image, and produces | generates the moving image from which the said frame number changes dynamically. Control device.

5. The control device according to claim 1, wherein the calculation unit calculates the complexity based on a number of non-measurable pixels in the distance image.

The control device according to claim 1, wherein the calculation unit detects an edge pixel in the distance image and calculates the complexity based on the number of the edge pixels.

The control device according to claim 1, wherein the calculation unit estimates a movement direction of the imaging device and detects an edge pixel in a specific direction according to the movement direction. .

8. The generation unit according to claim 1, wherein the generation unit encodes a distance image acquired by the acquisition unit and a frame from a moving image captured by the imaging device and outputs the frame as a moving image. Control device.

The generating means sets one GOP from a frame captured at the same time as the distance image acquired by the acquiring means to a frame immediately before a frame imaged at the same time as the distance image acquired by the acquiring means, 9. The control apparatus according to claim 8, wherein an image acquired simultaneously with the distance image is encoded as an I frame.

The control device according to claim 9, wherein the generation unit stores information indicating the frame rate calculated by the calculation unit in a header of the GOP.

The said acquisition means acquires the several image imaged from the mutually different viewpoint from the said imaging device, and produces | generates the said distance image based on the said several image, The one of Claims 1 thru | or 10 characterized by the above-mentioned. The control device described in 1.

A computer program for causing a computer to function as the control device according to any one of claims 1 to 11 by being read and executed by a computer.

A control method for acquiring a frame from an imaging device that captures a moving image for generating a distance image of a subject and controlling the output of the moving image,
Obtaining a distance image corresponding to a frame constituting a moving image captured by the imaging device;
Based on the distance image, calculate the complexity of the shape of the subject in the distance image,
According to the complexity, determine the frame rate of the output video,
A control method comprising: obtaining a frame from a moving image captured by the imaging device according to the frame rate, and generating a moving image at the frame rate.