JP2022173069A

JP2022173069A - Image processing apparatus and method, and imaging apparatus and method for controlling the same, program, and storage medium

Info

Publication number: JP2022173069A
Application number: JP2022052478A
Authority: JP
Inventors: 秀敏椿; Hidetoshi Tsubaki; 太省森; Taisho Mori
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2021-05-07
Filing date: 2022-03-28
Publication date: 2022-11-17

Abstract

To accurately acquire a distance image of the entire photographic scenes including a moving subject.SOLUTION: An image processing apparatus has: acquisition means that acquires a plurality of different viewpoint images obtained by photographing a single scene from different positions and at least one pair of pupil-divided parallax images having parallax; first creation means that creates a first distance image from the pair of parallax images; second creation means that creates a second distance image from the plurality of different viewpoint images; and integration means that integrates the first distance image and the second distance image to create an integrated distance image.SELECTED DRAWING: Figure 1

Description

本発明は、画像処理装置及び方法、及び、撮像装置及びその制御方法、プログラム、記憶媒体に関し、特に、撮像装置から得られた画像から距離画像を生成する技術に関するものである。 The present invention relates to an image processing apparatus and method, an imaging apparatus and its control method, a program, and a storage medium, and more particularly to technology for generating a distance image from an image obtained from an imaging apparatus.

従来、瞳分割された一対の像を用いて、当該像に含まれる各被写体までの距離を示す距離画像を得る技術が知られている（特許文献１）。この技術では、静止している被写体（以下、「静止被写体」と呼ぶ。）に加えて、多くの撮影シーンで主被写体となる動いている被写体（以下、「動被写体」と呼ぶ。）を含んだシーンの距離画像を、１回の撮影で得られるという利点がある。 2. Description of the Related Art Conventionally, there is known a technique of using a pair of pupil-divided images to obtain a distance image indicating the distance to each subject included in the images (Patent Document 1). In this technology, in addition to a stationary subject (hereinafter referred to as a "still subject"), a moving subject (hereinafter referred to as a "moving subject"), which is the main subject in many shooting scenes, is included. There is an advantage that a depth image of a scene can be obtained in one shot.

また、測距用のカメラとして、複眼ステレオカメラが古くから用いられている。複眼ステレオカメラはカメラ同士の位置関係が既知である複数のカメラを用いて、共通に写っている被写体の視差とカメラ間の基線長とから、被写体までの距離を算出する。しかしながら、基線長の短い複眼カメラでは、遠距離の被写体に対する像の視差が小さいため、測距精度が低くなってしまう。 Compound-eye stereo cameras have long been used as distance-measuring cameras. A compound-eye stereo camera uses a plurality of cameras whose positional relationships are known, and calculates the distance to an object from the parallax of a commonly captured object and the baseline length between the cameras. However, in a compound eye camera with a short base line length, the parallax of an image with respect to a subject at a long distance is small, so the distance measurement accuracy is low.

これに対し、単眼ステレオカメラでは、１台のカメラで位置及び姿勢を変えて複数枚の画像を撮影し、それらの画像に写った同一被写体の像の視差から被写体までの相対距離を算出する。単眼ステレオカメラでは基線長を大きく取れるため、遠距離の被写体に対しても精度よく相対距離を算出することができ、数十枚～数百枚という大量の画像を用いることにより測距精度を高めることができる。 On the other hand, in a monocular stereo camera, a single camera takes a plurality of images with different positions and postures, and calculates the relative distance to the subject from the parallax of the same subject imaged in those images. A monocular stereo camera can take a large baseline length, so it can accurately calculate the relative distance even for a distant subject, and using a large number of images from tens to hundreds increases the accuracy of distance measurement. be able to.

しかし、単眼ステレオカメラでは、被写体までの距離とカメラの位置及び姿勢を同時に推定するため、測距値が一意に定まらず、得られる測距値は絶対距離値ではなく相対距離値となる。 However, since the monocular stereo camera estimates the distance to the object and the position and orientation of the camera at the same time, the distance measurement value is not uniquely determined, and the obtained distance measurement value is a relative distance value rather than an absolute distance value.

ここで、相対距離値とは無次元量であり、例えば特定の被写体までの距離を１としたときの注目する被写体までの距離比などとして定義される。これに対して、絶対距離値とは長さの次元を持っており、例えば３［ｍ］などの値である。 Here, the relative distance value is a dimensionless quantity, and is defined, for example, as a ratio of the distance to a subject of interest when the distance to a specific subject is set to 1. On the other hand, the absolute distance value has a dimension of length, and is a value such as 3 [m].

そこで、単眼ステレオカメラと複眼ステレオカメラの両方を備えた測距装置において、複眼ステレオカメラで得られる情報を用いて単眼ステレオカメラで取得される相対距離値を絶対距離値に変換する技術が知られている。例えば、特許文献２では、複眼ステレオカメラによってある被写体の絶対距離値を取得し、そこからカメラの絶対位置及び姿勢を計算し、それを用いて全被写体の絶対距離値を取得している。 Therefore, in a distance measuring device equipped with both a monocular stereo camera and a compound eye stereo camera, there is known a technique for converting relative distance values obtained by the monocular stereo camera into absolute distance values using information obtained by the compound eye stereo camera. ing. For example, in Patent Document 2, the absolute distance value of a subject is obtained by a compound eye stereo camera, the absolute position and orientation of the camera are calculated therefrom, and the absolute distance values of all subjects are obtained.

特許第５１９２０９６号公報Japanese Patent No. 5192096 特開２０１１－１１２５０７号公報JP 2011-112507 A

Furukawa, Y. and Ponce, J., ２０１０."Accurate, dense,and robust multiview stereopsis",IEEE Transactionsactions Pattern Analysis and Machine Intelligence,３２(８), pp.１３６２-１３７６.Furukawa, Y. and Ponce, J., 2010. "Accurate, dense, and robust multiview stereopsis", IEEE Transactions Actions Pattern Analysis and Machine Intelligence, 32(8), pp.1362-1376. Abdullah Abuolaim and Michael S. Brown,"Defocus Deblurring Using Dual-Pixel Data",ECCV2020Abdullah Abuolaim and Michael S. Brown,"Defocus Deblurring Using Dual-Pixel Data",ECCV2020 Liyuan Pan, Shah Chowdhury, Richard Hartley, Miaomiao Liu, Hongguang Zhang, Hongdong Li,” Dual Pixel Exploration: Simultaneous Depth Estimation and Image Restoration”,arxiv(CVPR2021)Liyuan Pan, Shah Chowdhury, Richard Hartley, Miaomiao Liu, Hongguang Zhang, Hongdong Li,”Dual Pixel Exploration: Simultaneous Depth Estimation and Image Restoration”,arxiv(CVPR2021) Abhijith Punnappurath, Abdullah Abuolaim, Mahmoud Afifi and Michael S. Brown," Modeling Defocus-Disparity in Dual-Pixel Sensors",ICCP2020Abhijith Punnappurath, Abdullah Abuolaim, Mahmoud Afifi and Michael S. Brown,"Modeling Defocus-Disparity in Dual-Pixel Sensors",ICCP2020

しかしながら、特許文献１に開示された上述した従来技術においては、以下のような問題があった。即ち、瞳分割された一対の像を取得するための主画素と副画素との間の基線長が短いため、像内の被写体の内、撮像装置から近距離に無く、且つ、合焦距離から所定距離範囲外にある被写体までの距離を精度良く算出することができない。一方、特許文献１の構成で少しでも精度良く被写体までの距離を測定するために瞳を広げて基線長を長くすると、被写界深度が浅くなってしまい、今度は距離を算出することができる範囲が狭まってしまう。このように、撮影シーンの奥行範囲が広い場合、精度の良い距離画像を得ることが難しい。 However, the above-described conventional technology disclosed in Patent Document 1 has the following problems. That is, since the base line length between the main pixel and the sub-pixel for acquiring a pair of pupil-divided images is short, among the subjects in the image, there is no object at a short distance from the imaging device and from the in-focus distance. It is not possible to accurately calculate the distance to a subject outside the predetermined distance range. On the other hand, if the pupil is widened and the baseline length is lengthened in order to measure the distance to the subject with the configuration of Patent Document 1 with even a little accuracy, the depth of field becomes shallow, and the distance can be calculated this time. It narrows the range. Thus, when the depth range of the shooting scene is wide, it is difficult to obtain an accurate range image.

また、特許文献２に記載の方法によると、複眼ステレオカメラで取得する被写体の絶対距離が不正確であった場合には、正確なカメラの絶対位置及び姿勢を算出することができない。その結果として全被写体の絶対距離値も不正確になってしまうという課題がある。 Further, according to the method described in Patent Document 2, when the absolute distance of the subject acquired by the compound-eye stereo camera is inaccurate, the absolute position and orientation of the camera cannot be calculated accurately. As a result, there is a problem that the absolute distance values of all subjects also become inaccurate.

本発明は上記問題点を鑑みてなされたものであり、動いている被写体を含むシーン全体の距離画像を精度良く取得することを目的とする。 SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and it is an object of the present invention to accurately acquire a range image of an entire scene including a moving subject.

また、単眼カメラで取得した相対距離値を正確に絶対距離値に変換することを別の目的とする。 Another object of the present invention is to accurately convert a relative distance value obtained by a monocular camera into an absolute distance value.

上記目的を達成するために、本発明の画像処理装置は、同じシーンを異なる位置から撮影した複数の異視点画像と、瞳分割により視差を有する少なくとも一対の視差画像対と、を取得する取得手段と、前記視差画像対から、第１の距離画像を生成する第１の生成手段と、前記複数の異視点画像から、第２の距離画像を生成する第２の生成手段と、前記第１の距離画像と前記第２の距離画像とを統合し、統合距離画像を生成する統合手段とを有することを特徴とする画像処理装置。 To achieve the above object, the image processing apparatus of the present invention includes acquisition means for acquiring a plurality of different viewpoint images of the same scene photographed from different positions, and at least a pair of parallax images having parallax due to pupil division. a first generation means for generating a first distance image from the pair of parallax images; a second generation means for generating a second distance image from the plurality of different viewpoint images; An image processing apparatus, comprising integration means for integrating the distance image and the second distance image to generate an integrated distance image.

本発明によれば、動いている被写体を含む撮影シーン全体の距離画像を精度良く取得することができる。 According to the present invention, it is possible to accurately acquire a range image of an entire shooting scene including a moving subject.

また、単眼カメラで取得した相対距離値を正確に絶対距離値に変換することができる。 Also, it is possible to accurately convert the relative distance value acquired by the monocular camera into the absolute distance value.

本発明の第１の実施形態における撮像装置の概略構成を示すブロック図。1 is a block diagram showing a schematic configuration of an imaging device according to a first embodiment of the present invention; FIG. 第１の実施形態における撮像部を説明するための図。FIG. 4 is a diagram for explaining an imaging unit according to the first embodiment; 第１の実施形態における撮影及び統合距離画像生成の手順を示すフローチャート。4 is a flow chart showing the procedure of photographing and integrated range image generation in the first embodiment. 第１の実施形態における単写距離画像生成処理のフローチャート。4 is a flowchart of single-shot distance image generation processing according to the first embodiment; デフォーカス量から距離値への変換方法を説明する図。FIG. 5 is a diagram for explaining a method of converting a defocus amount into a distance value; マルチビュー距離画像の生成方法を説明する図。FIG. 4 is a diagram for explaining a method of generating a multi-view range image; 第１の実施形態における単写距離画像とマルチビュー距離画像の統合例を示す図。FIG. 4 is a diagram showing an example of integration of a single-shot distance image and a multi-view distance image according to the first embodiment; 第１の実施形態における単写距離画像とマルチビュー距離画像の別の統合例を示す図。FIG. 5 is a diagram showing another example of integration of a single-shot distance image and a multi-view distance image according to the first embodiment; 第１の実施形態における撮影タイミングのバリエーション例を説明する図。4A and 4B are diagrams for explaining examples of variations of imaging timings according to the first embodiment; FIG. 第１の実施形態における撮影条件の変更例を説明する図。4A and 4B are diagrams for explaining an example of changing imaging conditions according to the first embodiment; FIG. 第２の実施形態における画像処理装置の概略構成を示すブロック図。FIG. 2 is a block diagram showing a schematic configuration of an image processing apparatus according to a second embodiment; FIG. 第２の実施形態における後処理時の単写距離画像生成用のショットの選択を説明する図。9A and 9B are diagrams for explaining selection of shots for single-shot distance image generation during post-processing in the second embodiment; FIG. 変形例における複数の単写距離画像を統合する場合の統合距離画像の生成手順の概略を示す図。The figure which shows the outline of the production|generation procedure of the integrated range image at the time of integrating a several single shot range image in a modification. 第３の実施形態における画像処理装置の概略構成を示すブロック図。FIG. 11 is a block diagram showing a schematic configuration of an image processing apparatus according to a third embodiment; FIG. 第３の実施形態における撮影及び統合距離画像生成の手順を示すフローチャート。10 is a flowchart showing the procedure of photographing and integrated range image generation in the third embodiment; 第３の実施形態における焦点ボケ回復に用いるネットワークの入出力関係を示す図。FIG. 11 is a diagram showing the input/output relationship of a network used for defocus recovery in the third embodiment; 第３の実施形態における撮影から統合距離画像生成までの手順の概要を示す図。The figure which shows the outline|summary of the procedure from imaging|photography to integrated range image generation in 3rd Embodiment. 第３の実施形態における動画像撮影におけるマルチフレーム取得から統合距離画像の生成までの手順の概要を示す図。FIG. 11 is a diagram showing an overview of procedures from multi-frame acquisition to generation of an integrated range image in moving image shooting according to the third embodiment; 本発明の撮像装置の第４の実施形態の構成を示すブロック図。FIG. 11 is a block diagram showing the configuration of an imaging device according to a fourth embodiment of the present invention; 第４の実施形態の撮像装置の外観正面図。FIG. 11 is an external front view of an imaging device according to a fourth embodiment; 第４の実施形態の撮像装置の外観背面図。FIG. 11 is an external rear view of an imaging device according to a fourth embodiment; 第４の実施形態における測距処理の動作を示すフローチャート。10 is a flow chart showing the operation of distance measurement processing in the fourth embodiment; 図２０のＳ４０１、Ｓ４０２での詳しい処理を示すフローチャート。21 is a flowchart showing detailed processing in S401 and S402 of FIG. 20; 表示部のモード選択画面を示す図。The figure which shows the mode selection screen of a display part. 表示部の距離計測中の表示を示す図。The figure which shows the display during distance measurement of a display part. 単眼ステレオカメラでの特徴点追跡の様子を示す図。The figure which shows the mode of feature point tracking with a monocular stereo camera. 図２０のＳ４０３、Ｓ４０４での詳しい処理を示すフローチャート。21 is a flowchart showing detailed processing in S403 and S404 of FIG. 20; 複眼ステレオカメラでのウィンドウマッチングの様子を示す図。The figure which shows the mode of the window matching in a compound-eye stereo camera. 瞳分割型の撮像素子の構造と距離算出の原理を示す図。FIG. 2 is a diagram showing the structure of a split-pupil imaging device and the principle of distance calculation;

以下、添付図面を参照して実施形態を詳しく説明する。なお、以下の実施形態は特許請求の範囲に係る発明を限定するものではない。実施形態には複数の特徴が記載されているが、これらの複数の特徴の全てが発明に必須のものとは限らず、また、複数の特徴は任意に組み合わせられてもよい。さらに、添付図面においては、同一若しくは同様の構成に同一の参照番号を付し、重複した説明は省略する。 Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. In addition, the following embodiments do not limit the invention according to the scope of claims. Although multiple features are described in the embodiments, not all of these multiple features are essential to the invention, and multiple features may be combined arbitrarily. Furthermore, in the accompanying drawings, the same or similar configurations are denoted by the same reference numerals, and redundant description is omitted.

＜第１の実施形態＞
図１は、本発明の第１の実施形態における撮像装置１００の概略構成を示すブロック図であり、本発明の説明に必要な構成要素のみを示している。撮像装置１００は、光学系１０１１及び撮像素子１０１２を含む撮像部１０１、メモリ１０２、鑑賞画像生成部１０３、単写距離画像生成部１０４、マルチビュー距離画像生成部１０５、距離画像統合部１０６を有する。 <First embodiment>
FIG. 1 is a block diagram showing the schematic configuration of an imaging device 100 according to the first embodiment of the invention, showing only the components necessary for explaining the invention. The image capturing apparatus 100 includes an image capturing unit 101 including an optical system 1011 and an image sensor 1012, a memory 102, a viewing image generating unit 103, a single shooting range image generating unit 104, a multi-view range image generating unit 105, and a range image integrating unit 106. .

図２は、撮像部１０１を説明するための図である。
図２（ａ）において、光学系１０１１は、複数枚のレンズ及びミラー等から構成され、被写体１０からの反射光を撮像素子１０１２の受光面に結像させる。なお、図２（ａ）では、光学系１０１１を１枚のレンズにより表している。２０２は、光学系１０１１の光軸である。撮像素子１０１２は、光学系１０１１により結像された被写体１０の光学像を受光し、電気信号に変換して出力する。 FIG. 2 is a diagram for explaining the imaging unit 101. As shown in FIG.
In FIG. 2A, an optical system 1011 is composed of a plurality of lenses, mirrors, and the like, and forms an image of reflected light from the subject 10 on the light receiving surface of an imaging device 1012 . In addition, in FIG. 2A, the optical system 1011 is represented by one lens. 202 is the optical axis of the optical system 1011 . The imaging device 1012 receives the optical image of the subject 10 formed by the optical system 1011, converts it into an electrical signal, and outputs the electrical signal.

撮像素子１０１２には、図２（ｂ）に示すように、ｘｙ平面上に配列された、後述する赤（Ｒ）、緑（Ｇ）、青（Ｂ）のいわゆるベイヤー配列のカラーフィルタ２２２により覆われた多数の画素２１０が並置されている。これにより、撮像素子１０１２の各画素には、カラーフィルタ２２２が透過する波長帯域に応じた分光特性が与えられ、それぞれ、主として赤、緑、青の光に対する信号を出力する。 As shown in FIG. 2B, the imaging device 1012 is covered with so-called Bayer array color filters 222 of red (R), green (G), and blue (B) arranged on the xy plane. A number of separated pixels 210 are juxtaposed. As a result, each pixel of the image sensor 1012 is given spectral characteristics according to the wavelength band transmitted by the color filter 222, and outputs signals mainly for red, green, and blue light.

図２（ｃ）は、各画素の構成を示す断面図であり、マイクロレンズ２１１、カラーフィルタ２２２、光電変換部２１０ａ，２１０ｂ、及び導波路２１３を含んで構成される。基板２２４は、カラーフィルタ２２２が透過する波長帯域の光を吸収を有する材料、例えばシリコン（Ｓｉ）で形成され、イオン打ち込み等で、内部の少なくとも一部の領域に光電変換部２１０ａ，２１０ｂが形成される。また、各画素は、不図示の配線を備えている。 FIG. 2C is a cross-sectional view showing the configuration of each pixel, which includes a microlens 211, a color filter 222, photoelectric conversion units 210a and 210b, and a waveguide 213. FIG. The substrate 224 is made of a material, such as silicon (Si), that absorbs light in the wavelength band that the color filter 222 transmits, and the photoelectric conversion units 210a and 210b are formed in at least a part of the internal region by ion implantation or the like. be done. Also, each pixel has wiring (not shown).

光電変換部２１０ａ，２１０ｂには、それぞれ、光学系１０１１の射出瞳２３０の異なる瞳領域である第１の瞳領域２３１ａを通過した光束２３２ａと、第２の瞳領域２３１ｂを通過した光束２３２ｂとが入射する。これにより、各画素２１０から、瞳分割された第１の信号及び第２の信号を得ることができる。なお、各画素２１０からは、光電変換部２１０ａ，２１０ｂそれぞれの信号である第１の信号及び第２の信号を独立に読み出してもよい。または、第１の信号の読み出し後、第１の信号と第２の信号とを加算した信号を読み出し、加算した信号から第１の信号を差し引くことにより第２の信号を得てもよい。 In the photoelectric conversion units 210a and 210b, a light beam 232a that has passed through a first pupil region 231a and a light beam 232b that have passed through a second pupil region 231b, which are different pupil regions of the exit pupil 230 of the optical system 1011, respectively. Incident. Thereby, the pupil-divided first signal and second signal can be obtained from each pixel 210 . From each pixel 210, the first signal and the second signal, which are the signals of the photoelectric conversion units 210a and 210b, may be read out independently. Alternatively, after reading the first signal, a signal obtained by adding the first signal and the second signal may be read, and the second signal may be obtained by subtracting the first signal from the added signal.

図２（ｄ）は、撮像素子１０１２上の各マイクロレンズ２１１に対応する光電変換部２１０ａ，２１０ｂを、光軸の入射方向から見た配置を示した図であり、水平または垂直方向に分割される場合の一例である。なお、本発明はこれに限られるものでは無く、水平方向に分割された画素と、垂直方向に分割された画素とが混在するように配置してもよい。そのように配置することで、輝度の分布が水平方向に変化する被写体だけではなく、垂直方向に変化する被写体についても、後述するデフォーカス量を精度よく得ることが可能となる。また、各マイクロレンズ２１１に対して３つ以上の光電変換部を構成しても良く、図２（ｅ）はその一例として、１つの画素が水平及び垂直方向に分割された４つの光電変換部２１０ｃ～２１０ｆを有するように構成した場合を示している。 FIG. 2D is a diagram showing the arrangement of the photoelectric conversion units 210a and 210b corresponding to each microlens 211 on the image pickup device 1012 as seen from the incident direction of the optical axis. This is an example of when Note that the present invention is not limited to this, and pixels divided in the horizontal direction and pixels divided in the vertical direction may be arranged in a mixed manner. By arranging them in such a manner, it is possible to accurately obtain a defocus amount, which will be described later, not only for subjects whose luminance distribution changes in the horizontal direction, but also for subjects whose luminance distribution changes in the vertical direction. In addition, three or more photoelectric conversion units may be configured for each microlens 211, and FIG. 210c-210f.

光電変換部２１０ａ，２１０ｂから得られた第１の信号及び第２の信号は、撮像部１０１に含まれる演算処理部２０４に送られて電子情報化される。演算処理部２０４は、光電変換して取得された信号がアナログの場合、相関二重サンプリング（ＣＤＳ）によるノイズ除去、オートゲインコントロール（ＡＧＣ）でのゲインアップによる露出制御、黒レベル補正、Ａ／Ｄ変換等の基礎的な処理を行い、デジタル信号に変換した画像信号を得る。アナログ信号に対する前処理が主であるため、一般的にこれらの演算処理は、ＡＦＥ（アナログフロントエンド）処理とも呼ばれる。また、デジタル出力センサと対で使われる場合には、ＤＦＥ（デジタルフロントエンド）処理と呼ばれることがある。 The first signal and the second signal obtained from the photoelectric conversion units 210a and 210b are sent to the arithmetic processing unit 204 included in the imaging unit 101 and converted into electronic information. When the signal acquired by photoelectric conversion is analog, the arithmetic processing unit 204 performs noise removal by correlated double sampling (CDS), exposure control by gain increase by auto gain control (AGC), black level correction, A/ Basic processing such as D conversion is performed to obtain an image signal converted into a digital signal. These arithmetic processes are generally called AFE (analog front-end) processes because they mainly preprocess analog signals. Also, when used in combination with a digital output sensor, it is sometimes called DFE (digital front end) processing.

そして、撮像素子１０１２の複数の画素２１０から出力された第１の信号を集めたＡ像と、複数の画素２１０から出力された第２の信号を集めたＢ像とを生成する。Ａ像とＢ像とは互いに視差を有する画像であるため、以下、それぞれを「視差画像」、また、Ａ像とＢ像をまとめて「視差画像対」と呼ぶ。 Then, an A image that collects the first signals output from the plurality of pixels 210 of the image sensor 1012 and a B image that collects the second signals output from the plurality of pixels 210 are generated. Since the A image and the B image are images having parallax with each other, they are hereinafter referred to as a "parallax image", and the A image and the B image are collectively referred to as a "parallax image pair".

演算処理部２０４は、撮像素子１０１２がカラーセンサーの場合には、ベイヤー配列補間等も実施する。また、視差画像としての質、及び、視差画像対と共に出力される後述する鑑賞用の画像の質を向上するために、ローパス、ハイパス等のフィルタリング処理や鮮鋭化処理を実施してもよい。更に、ＨＤＲ（ハイダイナミックレンジ）処理等のダイナミックレンジ拡大を含む諧調補正、ＷＢ（ホワイトバランス）補正等の色調補正等の諸々の処理を実施してもよい。なお、演算処理部２０４の処理は、撮像素子１０１２とチップレベルまたはユニットレベルで統合される傾向にあるため、図１では不図示としている。 The arithmetic processing unit 204 also performs Bayer array interpolation and the like when the imaging element 1012 is a color sensor. Further, in order to improve the quality of the parallax image and the quality of the later-described viewing image output together with the parallax image pair, filtering processing such as low-pass and high-pass filtering and sharpening processing may be performed. Further, various processing such as gradation correction including dynamic range expansion such as HDR (high dynamic range) processing, and color tone correction such as WB (white balance) correction may be performed. Note that the processing of the arithmetic processing unit 204 tends to be integrated with the image sensor 1012 at the chip level or unit level, so it is not shown in FIG.

このように、撮像素子１０１２の受光面上に形成された複数のマイクロレンズ２１１それぞれの下に複数の光電変換部を形成することにより、複数の光電変換部が、それぞれ光学系１０１１の異なる瞳領域を通過した被写体光束を受光する。これにより、光学系１０１１の開口が１つであっても、視差画像対を１回の撮影で得ることが可能となる。
生成された視差画像対は、一旦メモリ１０２に格納される。 In this way, by forming a plurality of photoelectric conversion units under each of the plurality of microlenses 211 formed on the light receiving surface of the image pickup device 1012, the plurality of photoelectric conversion units are arranged in different pupil regions of the optical system 1011. receives the subject light flux that has passed through the As a result, even if the optical system 1011 has one aperture, it is possible to obtain a pair of parallax images in one shot.
The generated parallax image pair is temporarily stored in the memory 102 .

図１に戻り、鑑賞画像生成部１０３には、メモリ１０２に格納された１回の撮影で得られた視差画像対が伝送され、各画素ごとに第１の信号と第２の信号が加算されて１枚の画像が生成される。つまり光学系１０１１の全瞳領域を通過した光束により形成された像に対する、鑑賞用の画像（以下、「鑑賞画像」と呼ぶ。）が生成される。 Returning to FIG. 1, the viewing image generation unit 103 receives the parallax image pair obtained by one shooting stored in the memory 102, and adds the first signal and the second signal for each pixel. one image is generated. That is, an image for viewing (hereinafter referred to as a “viewing image”) is generated for an image formed by the light flux that has passed through the entire pupil region of the optical system 1011 .

なお、視差画像を用いた焦点状態の検出や距離画像の生成を行わない場合、鑑賞画像の生成を、撮像素子１０１２とチップレベルで統合された演算処理部２０４で行ったり、各画素内で第１の信号と第２の信号とを加算してから読み出すことにより行ってもよい。その場合には、伝送帯域の節約や読み出しに係る時間の短縮に貢献することができる。また、最終的に距離画像と対になる鑑賞画像を必要としない場合には、鑑賞画像生成部１０３が明示的には存在せず、マルチビュー距離画像生成部１０５に含まれることもある。 Note that when the detection of the focus state using the parallax image and the generation of the distance image are not performed, the viewing image is generated by the arithmetic processing unit 204 integrated with the image sensor 1012 at the chip level, or the Alternatively, the first signal and the second signal may be added and then read out. In that case, it is possible to contribute to saving the transmission band and shortening the time required for reading. Also, when there is no need for a viewing image that is finally paired with a range image, the viewing image generator 103 may not explicitly exist and may be included in the multi-view range image generator 105 .

単写距離画像生成部１０４には、メモリ１０２に格納された１回の撮影で得られた視差画像対が伝送される。その際に、輝度画像への変換を行ってもよい。そして入力した視差画像対の視差画像間の対応付けを行い、光学系１０１１のズーム状態から規定される焦点距離や絞り値等のカメラパラメータ、撮像素子１０１２の画素ピッチ等の撮像素子情報が含まれる撮像情報を基に、距離画像を生成する。以下、単写距離画像生成部１０４で生成された距離画像を、「単写距離画像」と呼ぶ。 A pair of parallax images obtained by one shooting stored in the memory 102 is transmitted to the single shooting range image generation unit 104 . At that time, conversion to a luminance image may be performed. Then, correspondence between parallax images of the input parallax image pair is performed, and camera parameters such as the focal length and aperture value defined from the zoom state of the optical system 1011 and image sensor information such as the pixel pitch of the image sensor 1012 are included. A distance image is generated based on the imaging information. Hereinafter, the distance image generated by the single-shot distance image generation unit 104 will be referred to as a "single-shot distance image".

マルチビュー距離画像生成部１０５では、同じシーンを異なる位置から複数回の連続撮影、すなわちマルチショットで取得された複数の視差画像対を、鑑賞画像生成部１０３で各ショットあたり１枚の画像に変換された鑑賞画像を入力として、複数の鑑賞画像（異視点画像）を用いて距離画像を生成する。以下、マルチビュー距離画像生成部１０５で生成された距離画像を、「マルチビュー距離画像」と呼ぶ。撮像装置が移動するカメラの場合、時系列で得たマルチショットを行って得た鑑賞画像間に視差が生じるため、撮像装置１００の移動や姿勢の変化が既知ならば、鑑賞画像間（異視点画像間）の視差から、マルチビュー距離画像を算出可能である。 In the multi-view range image generation unit 105, the same scene is continuously shot multiple times from different positions, that is, a plurality of parallax image pairs obtained by multi-shots are converted into one image for each shot in the viewing image generation unit 103. The obtained viewing image is used as an input, and a plurality of viewing images (images from different viewpoints) are used to generate a distance image. The distance image generated by the multi-view distance image generation unit 105 is hereinafter referred to as a "multi-view distance image". In the case of a camera whose imaging device moves, parallax occurs between viewing images obtained by performing multi-shots obtained in time series. A multi-view range image can be calculated from the parallax between images).

距離画像統合部１０６では、単写距離画像とマルチビュー距離画像とを統合し、統合距離画像を生成する。 A distance image integration unit 106 integrates the single-shot distance image and the multi-view distance image to generate an integrated distance image.

次に、図３のフロチャートを参照して、本実施形態における撮影及び統合距離画像生成の手順について説明する。
Ｓ１０１において、単写距離画像生成用の撮影（単写）と、マルチビュー距離画像生成用の撮影（連写）を行う。なお、撮影順に関しては、図９を参照して後述する。 Next, with reference to the flowchart of FIG. 3, the procedure of photographing and integrated range image generation in this embodiment will be described.
In S101, shooting for single-view distance image generation (single shooting) and shooting for multi-view distance image generation (continuous shooting) are performed. Note that the shooting order will be described later with reference to FIG.

Ｓ１０２では、単写により得られた視差画像対を用いて測距処理を行って、単写距離画像を算出する。図４は、Ｓ１０２で行われる単写距離画像生成処理のフローチャートである。図４のＳ２０１では、視差画像間の相対的な位置ズレ量である像ズレ量の算出を行う。像ズレ量の算出は公知の手法を用いることができる。例えば、次の式（１）を用いて、Ａ像とＢ像の信号データＡ（ｉ）及びＢ（ｉ）から相関値を算出する。

式（１）において、Ｓ（ｒ）は像シフト量ｒにおける２つの像の間の相関度を示す相関値、ｉは画素番号、ｒは２つの像の相対的な像シフト量である。ｐ及びｑは、相関値Ｓ（ｒ）の算出に用いる対象画素範囲を示している。相関値Ｓ（ｒ）の極小値を与える像シフト量ｒを求めることで像ズレ量を算出することができる。 In S102, distance measurement processing is performed using the pair of parallax images obtained by single shooting, and a single shooting distance image is calculated. FIG. 4 is a flow chart of the single-shot distance image generation processing performed in S102. In S201 of FIG. 4, an image displacement amount, which is a relative positional displacement amount between parallax images, is calculated. A known method can be used to calculate the amount of image shift. For example, using the following equation (1), the correlation value is calculated from the signal data A(i) and B(i) of the A image and the B image.

In equation (1), S(r) is a correlation value indicating the degree of correlation between two images at the image shift amount r, i is the pixel number, and r is the relative image shift amount of the two images. p and q indicate the target pixel range used to calculate the correlation value S(r). The image shift amount can be calculated by obtaining the image shift amount r that gives the minimum value of the correlation value S(r).

なお、像ズレ量の算出方法は、上述した方法に限定されるものではなく、他の公知の手法を用いてもよい。 Note that the method for calculating the image shift amount is not limited to the method described above, and other known methods may be used.

次に、Ｓ２０２では、Ｓ２０１で算出した像ズレ量から、デフォーカス量を算出する。被写体１０の像は、光学系１０１１を介して撮像素子１０１２に結像される。なお、上述した図２（ａ）に示す例では、射出瞳２３０を通過した光束が結像面２０７で焦点を結んだ、デフォーカス状態を示している。このデフォーカス状態とは、結像面２０７と、撮像素子１０１２の撮像面（受光面）とが一致せず、光軸２０２方向にズレた状態のことであり、デフォーカス量は、撮像素子１０１２の撮像面と結像面２０７との間の距離を示す。 Next, in S202, a defocus amount is calculated from the image shift amount calculated in S201. An image of the subject 10 is formed on the imaging element 1012 via the optical system 1011 . Note that the example shown in FIG. 2A described above shows a defocused state in which the luminous flux that has passed through the exit pupil 230 is focused on the imaging plane 207 . This defocus state is a state in which the imaging surface 207 and the imaging surface (light receiving surface) of the imaging device 1012 do not match and are deviated in the direction of the optical axis 202. and the imaging plane 207 .

ここで、図５に示す簡略化した撮像装置の光学配置図を用いて、デフォーカス量から距離値への変換方法の一例を説明する。 Here, an example of a conversion method from a defocus amount to a distance value will be described using the simplified optical layout diagram of the imaging apparatus shown in FIG.

図５は、被写体１０の像が撮像素子１０１２に対してデフォーカスした状態の光線２３２を示し、２０２は光軸、２０８は開口絞り、２０５は前側主点、２０６は後側主点、２０７は結像面を示す。また、ｄは像ズレ量、Ｗは基線長、Ｄは撮像素子１０１２と射出瞳２３０間の距離、Ｚは光学系１０１１の前側主点２０５と被写体１０間の距離、Ｌは撮像素子１０１２の撮像面と後側主点２０６の距離、ΔＬはデフォーカス量である。 FIG. 5 shows light rays 232 in a state in which the image of the object 10 is defocused with respect to the image sensor 1012, 202 is the optical axis, 208 is the aperture stop, 205 is the front principal point, 206 is the rear principal point, and 207 is 1 shows the imaging plane. In addition, d is the amount of image shift, W is the base length, D is the distance between the image sensor 1012 and the exit pupil 230, Z is the distance between the front principal point 205 of the optical system 1011 and the subject 10, and L is the image captured by the image sensor 1012. A distance ΔL between the surface and the rear principal point 206 is a defocus amount.

本実施形態の撮像装置では、デフォーカス量ΔＬに基づいて被写体１０の距離が検出される。各画素２１０の光電変換部２１０ａから取得した第１の信号に依拠するＡ像と、光電変換部２１０ｂから取得した第２の信号に依拠するＢ像との相対的位置ズレを示す像ズレ量ｄと、デフォーカス量ΔＬとは、式（２）に示す関係を有している。

式（２）は、比例係数Ｋを用いて、式（３）のように簡略化して書くことができる。 In the imaging apparatus of this embodiment, the distance of the subject 10 is detected based on the defocus amount ΔL. An image shift amount d representing a relative position shift between an A image based on the first signal obtained from the photoelectric conversion unit 210a of each pixel 210 and a B image based on the second signal obtained from the photoelectric conversion unit 210b. and the defocus amount .DELTA.L have the relationship shown in Equation (2).

Equation (2) can be simplified using a proportionality factor K to be written as Equation (3).

ΔＬ≒Ｋ・ｄ …（３）
像ズレ量をデフォーカス量に変換する係数を以下「換算係数」と呼ぶ。換算係数は、例えば、式（３）に示す比例係数Ｋあるいは基線長Ｗのことを言う。以降、基線長Ｗの補正は換算係数の補正と等価である。なお、デフォーカス量の算出方法は、本実施形態の方法に限定されるものではなく、他の公知の手法を用いてもよい。 ΔL≈K・d (3)
A coefficient for converting an image shift amount into a defocus amount is hereinafter referred to as a “conversion coefficient”. The conversion factor is, for example, the proportionality factor K or the baseline length W shown in Equation (3). Henceforth, the correction of the base length W is equivalent to the correction of the conversion factor. Note that the method of calculating the defocus amount is not limited to the method of this embodiment, and other known methods may be used.

さらにデフォーカス量から被写体距離への変換は、光学系１０１１と撮像素子１０１２の結像関係を示す以下の式（４）を用いて行えばよい。あるいは像ズレ量から変換係数を用いて直接被写体距離に変換しても良い。なお、式（４）においてｆは焦点距離である。

入力した複数の視差画像間、例えばＡ像とＢ像間の例えば全ての画素についてデフォーカス量を求めることで、デフォーカスマップを算出することができる。デフォーカスマップを式（４）の関係を用いて変換することで、対応する単写距離画像を算出することができる。 Furthermore, the conversion from the defocus amount to the subject distance may be performed using the following formula (4) showing the image forming relationship between the optical system 1011 and the imaging device 1012 . Alternatively, the image shift amount may be directly converted into the subject distance using a conversion coefficient. Note that f is the focal length in Equation (4).

A defocus map can be calculated by obtaining the defocus amount for all pixels, for example, between a plurality of input parallax images, for example, between the A image and the B image. By converting the defocus map using the relationship of Expression (4), the corresponding single-shot distance image can be calculated.

以上説明したような距離算出処理の手順により、瞳分割撮像系においては、１回の撮影で得られる視差画像対から単写距離画像を算出することができる。 According to the procedure of distance calculation processing as described above, in a split-pupil imaging system, a single-shot distance image can be calculated from a pair of parallax images obtained in one shot.

図３に戻り、次のＳ１０３では、鑑賞画像生成部１０３において、連写の各ショット毎に得られる視差画像対から生成された、全瞳領域を通過した光に相当する鑑賞画像を入力として、複数枚の鑑賞画像からマルチビュー距離画像を生成する。なお、撮影は手持ち等により移動しながら行っていることを前提とし、視野が重複する複数の鑑賞画像からシーンの三次元を再構成し、その過程で各ショットの距離画像を生成する。 Returning to FIG. 3, in the next step S103, the viewing image generation unit 103 inputs the viewing image corresponding to the light passing through the entire pupil region, which is generated from the pair of parallax images obtained for each shot of continuous shooting. To generate a multi-view distance image from a plurality of viewing images. It should be noted that, on the premise that shooting is being carried out while moving by hand or the like, a three-dimensional scene is reconstructed from a plurality of viewing images with overlapping fields of view, and in the process a range image for each shot is generated.

連写時の撮像装置１００のショット間の相対的な位置姿勢は、撮像装置１００に近年標準的に取り付けられているジャイロセンサ、加速度センサ、傾斜センサ等の姿勢センサや、撮像装置に関して近年の標準機能である防振機能における振れ検知や画像ベクトル等を併用した公知のカメラ姿勢推定等により求めることができる。なお、位置姿勢の求め方は既知であるため、ここでは説明を省略する。 The relative position and orientation between shots of the imaging device 100 during continuous shooting can be determined by orientation sensors such as a gyro sensor, an acceleration sensor, and a tilt sensor, which are standardly attached to the imaging device 100 in recent years, and recent standards for imaging devices. It can be obtained by well-known camera posture estimation using shake detection in the anti-shake function, image vector, and the like. Since the method of obtaining the position and orientation is known, the description is omitted here.

連写におけるショット間の撮像装置１００の位置や姿勢の変化が既知である場合、図６（ａ）に示すようにエピポーラ幾何制約により画像に写る被写体の距離の推定は単純な一次元探索の問題に置き換えることができる。また、複数フレームに亘って複数の鑑賞画像間で同じ空間上の被写体の点を追跡できた場合、このエピポーラ探索を複数の異なる鑑賞用画像の組の間で同一の距離を求めるマルチベースライン探索へと置き換えることができる。これを利用して、精度良く距離を求めることが可能となる。例えば図６（ｂ）のようにショットを１つ増やすだけで１ショット目の基準視点Ｃ１に対する距離画像推定が視点Ｃ２でのショットとの間だけでなく、視点Ｃ３とのショットとの間でも可能となる。 If changes in the position and orientation of the imaging device 100 between shots in continuous shooting are known, estimating the distance of the subject in the image due to the epipolar geometric constraint as shown in FIG. 6A is a simple one-dimensional search problem. can be replaced with In addition, when it is possible to track the points of a subject in the same space between a plurality of viewing images over a plurality of frames, this epipolar search is performed as a multi-baseline search for finding the same distance between a plurality of different sets of viewing images. can be replaced with By using this, it is possible to obtain the distance with high accuracy. For example, by adding one shot as shown in FIG. 6(b), distance image estimation for the reference viewpoint C1 in the first shot can be performed not only between shots at viewpoint C2 but also between shots at viewpoint C3. becomes.

このようにして関係するショットを増やすことにより、基準画像に設定したショットの画像内の各点の奥行値について複数の探索が可能となる。これにより、ロバストで精度の高い距離値の推定が可能となる。一方、動被写体については、各ショット間の対応付けの間でエピポーラ幾何が成立しなくなるため、マルチビュー距離画像では未算出となる。 By increasing the number of related shots in this way, it becomes possible to perform multiple searches for the depth value of each point in the image of the shot set as the reference image. This enables robust and highly accurate distance value estimation. On the other hand, the moving subject is not calculated in the multi-view range image because the epipolar geometry is not established between the correspondences between shots.

ショット間の対応付けには、既知の様々なものがある。例えば、ＰＭＶＳ等と呼ばれるパッチベースで対応付けを行い、法線方向も考慮して距離値を算出する方法（非特許文献１）や、プレーンスウィープという呼ばれる仮想的な奥行面を設定して各ショットからの逆投影により対応付けをしていく手法等を含む。マルチビュー距離画像の生成は、マルチビューステレオ法等とも呼ばれる。以上説明した手法により、連写撮影された全瞳領域からの光像の集約像である鑑賞画像群から、基準画像として選択したショットの鑑賞画像に対応したマルチビュー距離画像を得ることができる。 There are various known correspondences between shots. For example, a method called PMVS, etc., in which a patch-based correspondence is performed and a distance value is calculated in consideration of the normal direction (Non-Patent Document 1), or a virtual depth plane called plane sweep is set for each shot. It includes a method of matching by back projection from . Generating a multi-view range image is also called a multi-view stereo method or the like. With the above-described method, a multi-view range image corresponding to the viewing image of the shot selected as the reference image can be obtained from the viewing image group, which is an aggregated image of the light images from all the pupil areas continuously photographed.

Ｓ１０４では、単写距離画像とマルチビュー距離画像とを統合して、シーンの奥行方向全体に対して精度の高い統合距離画像を生成する。 In S104, the single-view range image and the multi-view range image are integrated to generate a highly accurate integrated range image for the entire depth direction of the scene.

以上説明したように、単写距離画像には静止被写体領域及び動被写体領域の距離情報が含まれるが、マルチビュー距離画像には静止被写体領域の距離情報のみ含まれる。これは、マルチビュー距離画像を生成する際に、動被写体領域の対応付けがエピポーラ幾何を満足しないためである。そこで、本実施形態の距離画像の統合においては、動被写体領域の距離値のみ単写距離画像のものを利用し、重複する静止被写体領域の距離情報はマルチビュー距離画像から取得して統合する。 As described above, the single-view distance image includes distance information on the still subject area and the moving subject area, but the multi-view distance image includes only distance information on the still subject area. This is because the matching of the moving subject areas does not satisfy the epipolar geometry when generating the multi-view range image. Therefore, in integrating distance images according to the present embodiment, only the distance values of the moving subject area from the single-shot distance image are used, and the distance information of overlapping still subject areas is acquired from the multi-view distance image and integrated.

単写距離画像とマルチビュー距離画像を重ね合わせたときに、共に距離情報が存在する領域が静止被写体領域、単写距離画像のみ距離情報が存在し、マルチビュー距離画像において未算出領域もしくは距離情報があっても信頼度の低い領域は動被写体領域とみなすことができる。従って、静止被写体領域の距離情報をマルチビュー距離画像から、動被写体領域の距離情報を単写距離画像から取得して統合することで、距離情報の精度の高い統合距離画像を得る。 When the single-shot distance image and the multi-view distance image are superimposed, the area where both distance information exists is the static subject area, the distance information exists only in the single-shot distance image, and the uncalculated area or distance information exists in the multi-view distance image. A region with a low reliability even if there is a , can be regarded as a moving subject region. Therefore, by obtaining and integrating the distance information of the still subject area from the multi-view distance image and the distance information of the moving subject area from the single-shot distance image, an integrated distance image with high accuracy of distance information is obtained.

図７に、単写距離画像とマルチビュー距離画像の統合例を示す。図７（ａ）は単写距離画像に対応した鑑賞画像、図７（ｂ）は単写距離画像、図７（ｃ）は対応するマルチビュー距離画像、（ｄ）は統合距離画像である。図７（ｂ）の単写距離画像においては、動被写体である人物領域の距離情報が存在するが、図７（ｃ）のマルチビュー距離画像においては、動被写体である人物領域の距離情報が存在しないため白で表わされている。そこで、動被写体領域の距離情報は図７（ｂ）の単写距離画像から、重複する静止被写体領域の距離情報は図７（ｃ）のマルチビュー距離画像から取得して、図７（ｄ）の統合距離画像を生成する。 FIG. 7 shows an example of integrating a single-view distance image and a multi-view distance image. 7A is a viewing image corresponding to a single-shot range image, FIG. 7B is a single-shot range image, FIG. 7C is a corresponding multi-view range image, and FIG. 7D is an integrated range image. In the single-shot distance image of FIG. 7(b), there is distance information of the human area, which is a moving subject, but in the multi-view distance image of FIG. It is white because it does not exist. Therefore, the distance information of the moving subject area is obtained from the single-shot distance image of FIG. 7B, and the distance information of the overlapping still subject area is obtained from the multi-view distance image of FIG. to generate an integrated range image of

また、上述したように、瞳分割撮像系ならではの特性として、撮像装置から近距離に無く、且つ、合焦距離から所定距離範囲外にある被写体までの距離は、瞳分割光学系の基線長が短いため精度良く算出できないという問題がある。この問題に対し、単写距離画像の距離情報を元に、動被写体領域及び、静止被写体領域のうち、近傍領域及び合焦距離の所定距離範囲内の領域の距離情報を単写距離画像から取得し、それ以外の静止被写体領域の距離情報をマルチビュー距離画像から取得して、統合距離画像を生成しても良い。 Further, as described above, as a characteristic of the pupil division imaging system, the distance to an object that is not at a short distance from the imaging device and is outside the predetermined distance range from the in-focus distance is determined by the baseline length of the pupil division optical system. Since it is short, there is a problem that it cannot be calculated accurately. To solve this problem, based on the distance information of the single-shooting distance image, the distance information of the neighboring area and the area within the predetermined distance range of the in-focus distance among the moving subject area and the still subject area is acquired from the single-shooting distance image. However, the integrated range image may be generated by acquiring the range information of other still subject areas from the multi-view range image.

図８は、単写距離画像の距離値に基づいて、統合距離画像の距離値をいずれから取得するかを決定する例を示す。ｕ０は合焦距離、ｖ１及びｖ２は合焦距離ｕ０からの所定距離範囲を示し、ｖ１は前ピン側、ｖ２が後ピン側である。また、ｕ１は、撮像装置１００から近傍領域を示す距離（閾値）とする。ｕを距離画像中の各位置の距離値とすると、
ｕ≦ｕ１ …（５）
－ｖ１≦ｕ－ｕ０≦ｖ２ …（６）
の式（５）及び式（６）の少なくともいずれかを満たす領域は単写距離画像から、それ以外の領域はマルチビュー距離画像から、距離値（距離情報）を取得する。これにより、単写距離画像のみよりもシーンの奥行方向全体に対して精度の高い距離画像を取得することができる。 FIG. 8 shows an example of determining from where to obtain the distance value of the integrated distance image based on the distance value of the single shot distance image. u0 is the focus distance, v1 and v2 are predetermined distance ranges from the focus distance u0, v1 is the front focus side, and v2 is the rear focus side. Also, u1 is a distance (threshold value) indicating a neighboring area from the imaging device 100 . Let u be the distance value at each position in the range image,
u≦u1 (5)
-v1≤u-u0≤v2 (6)
The distance value (distance information) is obtained from the single-view range image for the area that satisfies at least one of the formulas (5) and (6), and from the multi-view range image for the other areas. As a result, it is possible to acquire a range image with higher precision for the entire scene in the depth direction than only the single-shot range image.

また、マルチビュー距離画像では、動被写体領域の距離情報が未算出領域として抜け、それ以外の静止被写体領域の距離情報は単写距離画像の同一領域と重複して存在するが、距離画像では、動被写体領域と静止被写体領域との境界はぼやけたものになり易い。そこで、各距離画像からの距離情報の抽出に用いる領域境界には、鑑賞画像を入力とした動被写体領域検出を併用しても良い。例えば顔や人体を検出してその領域を切り出したり、学習等により事前知識も活用して動被写体領域の切り出し精度を高めたＣＮＮネットワークを活用しても良い。 Also, in the multi-view distance image, the distance information of the moving subject area is omitted as an uncalculated area, and the distance information of the still object area other than that is overlapped with the same area of the single-shot distance image, but in the distance image, The boundary between the moving subject area and the still subject area tends to be blurred. Therefore, moving subject area detection using a viewing image as an input may be used together with area boundaries used to extract distance information from each distance image. For example, a CNN network may be used in which a face or a human body is detected and its area is cut out, or prior knowledge is also used to improve the accuracy of cutting out a moving subject area through learning or the like.

なお、距離画像を構成する距離情報（画素）は必ずしも被写体距離（距離値）に限られるものでは無く、被写体距離に変換する前の像ズレ量や、デフォーカス量であっても良い。さらに、距離画像を構成する距離情報（画素）として被写体距離の逆数（逆距離値）を用いてもよい。 Note that the distance information (pixels) forming the distance image is not necessarily limited to the subject distance (distance value), and may be an image shift amount or defocus amount before conversion to the subject distance. Furthermore, the reciprocal of the subject distance (reverse distance value) may be used as the distance information (pixels) forming the distance image.

また、Ｓ１０１における単写及び連写撮影では、単写撮影及び連写撮影を様々なバリエーションで行うことができる。すなわち、単写距離画像を得るための撮影を、連写撮影のワンショットとして行っても良いし、単写撮影として行っても良い。また、連写撮影を、単写距離画像を得るための撮影となる単写撮影前に行っても良いし、後に行っても良い。また、連写時の撮像装置１００の移動量が小さく、基線長が稼げていない場合等には、時間をおいて後から撮影しても良い。 Also, in the single-shot and continuous-shot shooting in S101, single-shot shooting and continuous-shot shooting can be performed in various variations. That is, the shooting for obtaining the single-shot distance image may be performed as one shot of continuous shooting, or may be performed as single-shot shooting. Also, the continuous shooting may be performed before or after the single shooting for obtaining the single shooting range image. Further, when the amount of movement of the imaging apparatus 100 during continuous shooting is small and the base line length cannot be obtained, shooting may be performed later.

図９に撮影タイミングのバリエーション例を示す。図９（ａ）は、先にプリ撮影として連写撮影を行って複数の画像を取得し、その後、単写距離画像を得るための単写撮影を行う例を示している。なお、図９において、ＳＷ１は撮像装置１００のシャッタが半押しされた状態、ＳＷ２はシャッタが全押しされた状態を表す。プリ撮影とは循環バッファを活用した撮影法で、カメラ等の撮像専用の装置だけでなく、スマホのカメラ機能等でも標準搭載される一般的な機能となっている。 FIG. 9 shows an example of variation of shooting timing. FIG. 9A shows an example in which continuous shooting is first performed as pre-shooting to acquire a plurality of images, and then single shooting is performed to obtain a single shooting distance image. In FIG. 9, SW1 represents a state in which the shutter of the imaging apparatus 100 is half-pressed, and SW2 represents a state in which the shutter is fully-pressed. Pre-shooting is a shooting method that utilizes a circular buffer, and it is a general function that is standardly installed not only in devices dedicated to shooting such as cameras, but also in smartphone camera functions.

この撮影法では、シャッタ全押しＳＷ２の前の画像を予め一時記憶しておき、シャッタ全押しＳＷ２時に真のベストショットを決めるためにユーザーに選択させたり、後処理用に一定時間分の連写数を保存しておく。このプリ撮影画像をマルチビュー距離画像の生成のための入力として取得する。この際、プリ撮影画像はマルチビュー距離画像の生成にしか用いないため、撮影時に光電変換部２１０ａ，２１０ｂの信号を画素毎に加算し、鑑賞画像として画像データの容量を減らして保存しても良い。また、シャッタ全押しＳＷ２時に得られる画像は視差画像対として保存しておき、単写距離画像の生成に用いるが、画素毎に加算して鑑賞画像を生成し、マルチビュー距離画像の生成に用いても良い。その場合、マルチビュー距離画像はこの画像を基準として生成する。 In this photographing method, an image before the shutter full-press SW2 is temporarily stored in advance, and when the shutter full-press SW2 is pressed, the user is allowed to select a true best shot, or continuous shooting for a certain period of time is performed for post-processing. save the number. This pre-captured image is obtained as an input for generating a multi-view range image. At this time, since the pre-photographed image is used only for generating the multi-view distance image, the signals of the photoelectric conversion units 210a and 210b may be added for each pixel at the time of photographing, and the image data capacity may be reduced and saved as a viewing image. good. Images obtained when the shutter is fully pressed SW2 are stored as a parallax image pair and used to generate a single-shot distance image. can be In that case, a multi-view range image is generated using this image as a reference.

図９（ｂ）は、連写撮影を、単写距離画像を得るための撮影後に行う例を示している。撮像装置１００のシャッタ全押しＳＷ２後にも一定時間分の連写を行って画像を保存しておく撮影方法である。シャッタの全押しＳＷ２時に動被写体のベストショットを撮影する。そしてシャッタを離した後も撮影を続けて行い、マルチビュー距離画像の生成用の画像を得る。シャッタ解放後のポスト撮影分の画像はマルチビュー距離画像の生成にしか用いないため、プリ撮影の場合と同様に撮影時に画素毎に加算し、鑑賞画像として保存容量を減らして保存しても良い。ただし、シャッタの全押しＳＷ２時に得られる画像は視差画像対として保存して単写距離画像生成に用いる。なお、加算してこのショットもマルチビュー距離画像の生成に用いても良い。その場合、マルチビュー距離画像はこの画像を基準として生成する。 FIG. 9(b) shows an example in which continuous shooting is performed after shooting for obtaining a single shooting distance image. This is a photographing method in which continuous photographing is performed for a certain period of time even after the shutter switch SW2 of the image pickup apparatus 100 is fully depressed, and the images are stored. The best shot of a moving subject is photographed when the shutter is fully pressed SW2. Then, even after the shutter is released, shooting is continued to obtain an image for generating a multi-view distance image. Since the post-shooting image after releasing the shutter is used only for generating the multi-view distance image, it may be added for each pixel at the time of shooting as in the case of pre-shooting, and saved as a viewing image with a reduced storage capacity. . However, the image obtained when the shutter is fully pressed SW2 is stored as a pair of parallax images and used to generate a single-shot distance image. Note that this shot may also be added and used to generate the multi-view range image. In that case, a multi-view range image is generated using this image as a reference.

図９（ｃ）は、シャッタの全押しＳＷ２時の前後にプリ撮影及びポスト撮影を行う場合を示している。シャッタの全押しＳＷ２時以外で得られる画像は循環バッファから取得する。これらの画像はマルチビュー距離画像の生成にしか用いないため、撮影時に画素毎に加算して鑑賞画像として保存容量を減らして保存しても良い。また、シャッタの全押しＳＷ２時に得られる画像は視差画像対として保存して単写距離画像の生成に用いるが、画素毎に加算して鑑賞画像を生成し、マルチビュー距離画像の生成に用いても良い。その場合、マルチビュー距離画像はこの画像を基準として生成する。 FIG. 9C shows a case where pre-photographing and post-photographing are performed before and after the shutter is fully depressed SW2. Images obtained when the shutter is not fully pressed SW2 are obtained from the circular buffer. Since these images are used only for generating a multi-view distance image, they may be added for each pixel at the time of shooting and stored as a viewing image with reduced storage capacity. An image obtained when the shutter is fully pressed SW2 is stored as a pair of parallax images and used to generate a single-shot distance image. Also good. In that case, a multi-view range image is generated using this image as a reference.

また、撮像装置のシャッタ操作等で鑑賞画像と距離画像をセットとして得たいベストショットを撮影者が予め距離画像生成の前に意図的に選択する場合は、撮像条件を単写距離画像生成用と、マルチビュー距離画像生成用で明示的に切り替えても良い。例えば、図１０に示すようにマルチビュー距離画像生成用の連写撮影ではＡｖ（絞り）値を大きく取り、絞りを絞ってパンフォーカスでシーンの奥行方向全体で特徴点がとりやすいように撮影する。一方、シャッタ全押ＳＷ２時のみＡｖ値を小さくするように変化させ、瞳を開いて単写撮影での測距が合焦距離近傍で精度良く行えるようにすること等が考えられる。Ａｖ値に追従させてＴｖ値やＩＳＯ値はプログラムテーブル等に従い適切に設定する。 In addition, when the photographer intentionally selects in advance the best shot that he/she wants to obtain as a set of the viewing image and the distance image by operating the shutter of the imaging device, etc., before the distance image is generated, the imaging condition is set for single-shot distance image generation. , and may be explicitly switched for multi-view range image generation. For example, as shown in FIG. 10, in continuous shooting for multi-view range image generation, the Av (aperture) value is set large, the aperture is narrowed down, and pan-focusing is performed so that feature points can be easily captured in the entire depth direction of the scene. . On the other hand, it is conceivable to change the Av value so as to be smaller only when the shutter is fully pressed SW2 so that the pupil is opened and the distance measurement in the single-shot photographing can be accurately performed near the in-focus distance. Following the Av value, the Tv value and the ISO value are appropriately set according to a program table or the like.

また更に、実際は最終的に統合するのが距離値のみであるため、Ａｖ値及びＴｖ値、ＩＳＯの合算値であるＥｖ（露出）値を揃える必要はない。従って、図１０（ｂ）に示すように単写距離画像生成用とマルチビュー距離画像生成用の撮像間で、撮影条件のＥｖ値を大きく変えても良い。図１０（ｂ）は、単写距離画像生成用とマルチビュー距離画像生成用の撮像間で異なるＥｖ値を採用した例である。例えば暗所での撮影でこのような撮影条件を採用すると、本発明の技術を利用できる場面が広がる。 Furthermore, since only the distance value is actually integrated finally, there is no need to align the Ev (exposure) value, which is the total value of the Av value, Tv value, and ISO. Therefore, as shown in FIG. 10B, the Ev value of the photographing conditions may be greatly changed between imaging for single-view distance image generation and multi-view distance image generation. FIG. 10(b) is an example in which different Ev values are used for single-view range image generation and multi-view range image generation. For example, if such photographing conditions are adopted for photographing in a dark place, the scenes in which the technique of the present invention can be used are expanded.

例えば、連写撮影はパンフォーカスで撮影したいが、動きブレも抑制したいため、光量不足でもＥｖ値を下げて撮影条件を設定し、例えばＡｖ値及びＴｖ値、ＩＳＯ合算を１１とする。一方、単写撮影ではＡｖ値を大きくする分Ｅｖ値を大きくできるが、Ｅｖ値を無理に連写撮影時と整合させようとすると光量を減らしてノイズを増加させることになるため、敢えて不整合とする。このように撮影しても、撮影した画像自体を統合するわけではないので問題が生じにくい。また、単写撮影で得られた画像をマルチビュー距離画像の生成に利用する場合には、Ｅｖ値の差分の画像の画素値の諧調のスケール補正を行い、調整する。この場合も、最終結果の統合距離画像には間接的にしか関係しないため、影響が生じにくい。 For example, in continuous shooting, it is desired to perform pan-focus shooting, but also to suppress motion blur. On the other hand, in single-shot shooting, the Ev value can be increased by increasing the Av value. and Even if the images are captured in this way, the captured images themselves are not integrated, so problems are less likely to occur. When an image obtained by single-shot shooting is used to generate a multi-view distance image, the gradation scale of the pixel value of the Ev value difference image is corrected and adjusted. Again, this is less likely to affect the final integrated range image because it is only indirectly related.

上記の通り本第１の実施形態によれば、単写距離画像とマルチビュー距離画像とをそれぞれの欠点を補完するように統合することで、動いている被写体を含む撮影シーン全体の距離画像を精度良く取得することができる。 As described above, according to the first embodiment, by integrating the single-shot distance image and the multi-view distance image so as to compensate for their shortcomings, the distance image of the entire shooting scene including the moving subject can be obtained. It can be obtained with high accuracy.

なお、統合距離「画像」と記載したが、出力情報の形式は画像でなくても良い。例えば距離画像を距離値に応じて３次元空間に射影した２．５次元の立体像、保存形式の異なるポイントクラウドやボリュームデータ、メッシュ情報に変換した立体データでもよい。 Although the integrated distance is described as "image", the format of the output information may not be an image. For example, a 2.5-dimensional stereoscopic image obtained by projecting a distance image onto a three-dimensional space according to the distance value, point clouds or volume data in different storage formats, or stereoscopic data converted into mesh information may be used.

＜第２の実施形態＞
次に、本発明の第２の実施形態について説明する。図１１は、第２の実施形態における画像処理装置２００を示すブロック図である。 <Second embodiment>
Next, a second embodiment of the invention will be described. FIG. 11 is a block diagram showing an image processing device 200 according to the second embodiment.

画像入力部２０１は、不図示の外部撮像装置で撮影した視差画像対、及び、撮像装置内で視差画像対を画素毎に加算して変換した鑑賞画像が得られている場合には、鑑賞画像を入力し、入力された視差画像対及び鑑賞画像をメモリ１０２に記憶する。そして、入力した視差画像対及び鑑賞画像を第１の実施形態で説明したように処理して、統合距離画像を得る。なお、画像入力部２０１以外の構成は図１に示すものと同様であるため、同じ参照番号を付して、ここでは説明を省略する。 When the image input unit 201 obtains a parallax image pair captured by an external imaging device (not shown) and a viewing image obtained by adding and converting a parallax image pair for each pixel in the imaging device, the viewing image is obtained. is input, and the input parallax image pair and viewing image are stored in the memory 102 . Then, the input parallax image pair and viewing image are processed as described in the first embodiment to obtain an integrated range image. Since the configuration other than the image input unit 201 is the same as that shown in FIG. 1, the same reference numerals are given and the description is omitted here.

図１２（ａ）に示すように、全てのショットそれぞれの視差画像対が入力した場合には、単写距離画像の生成に用いる視差画像対と、マルチビュー距離画像生成時の基準となる鑑賞画像とを選択することが可能である。従って、連写により得られた複数の視差画像対から、事後または予め決めたアルゴリズムにより、単写距離画像の生成に用いる視差画像対と、基準となる鑑賞画像とを選択する。例えば全ショットの最後のショットで得られる視差画像対を選択するように事前に設定しておいても良い。このようなケースは特に動画撮影の場合が想定されるため、動画撮影中は常に各フレームに対応するショット、または数フレームおきのショットで得られる画像を、視差画像対として取得しておく。 As shown in FIG. 12A, when parallax image pairs for all shots are input, a parallax image pair used for generating a single-view distance image and a viewing image serving as a reference when generating a multi-view distance image and can be selected. Therefore, from a plurality of parallax image pairs obtained by continuous shooting, a parallax image pair to be used for generating a single shot distance image and a reference viewing image are selected after the fact or by a predetermined algorithm. For example, it may be set in advance so that the parallax image pair obtained in the last shot of all shots is selected. Since such a case is assumed to occur particularly in video shooting, images obtained from shots corresponding to each frame or shots at intervals of several frames are always acquired as parallax image pairs during video shooting.

また、図１２（ｂ）に示すように、選択する視差画像対を順次移動して、単写距離画像とマルチビュー距離画像とを生成してもよい。このようにすることで、全ての視差画像対に対して、単写距離画像とマルチビュー距離画像とを統合した、広い奥行のシーンに対しても精度の高いシーン全体の統合距離画像を得ることができる。 Alternatively, as shown in FIG. 12B, the single-view distance image and the multi-view distance image may be generated by sequentially moving the parallax image pair to be selected. By doing so, it is possible to obtain an integrated range image of the entire scene with high accuracy even for a wide-depth scene by integrating the single-view range image and the multi-view range image for all pairs of parallax images. can be done.

上記の通り第２の実施形態によれば、画像処理装置において、撮影装置から得られた視差画像対を用いて、精度の高い距離画像を得ることができる。 As described above, according to the second embodiment, an image processing apparatus can obtain a highly accurate range image using a parallax image pair obtained from an imaging apparatus.

＜変形例＞
上述した第１及び第２の実施形態の説明においては、単写距離画像生成用の視差画像対の撮影を典型的な単写撮影であるものとして説明した。しかしながら被写体がゆっくり移動するような場合には複数の視差画像対を撮影し、単写距離画像生成部１０４において、それぞれ単写距離画像を生成し、統合して単写距離画像の質を高めた後、マルチビュー距離画像と統合しても良い。 <Modification>
In the description of the first and second embodiments above, it is assumed that the shooting of the parallax image pair for generating the single-shot distance image is a typical single-shot shooting. However, when the subject moves slowly, a plurality of pairs of parallax images are captured, and the single-shot distance image generation unit 104 generates single-shot distance images and integrates them to improve the quality of the single-shot distance image. After that, it may be integrated with the multi-view range image.

図１３を参照して、複数の視差画像対からそれぞれ単写距離画像を生成して統合し、単写距離画像の質を高めてからマルチビュー距離画像と統合する流れを説明する。例えばショット間で撮像装置の位置や姿勢に変化があった場合は、単写距離画像を三次元的に視点変化して、基準となるショット時の座標に合致した単写距離画像に変換して統合する。 With reference to FIG. 13, a flow of generating single-shot distance images from a plurality of pairs of parallax images, integrating them, enhancing the quality of the single-shot distance images, and then integrating them with the multi-view distance image will be described. For example, if the position or posture of the imaging device changes between shots, the viewpoint of the single-shot distance image is changed three-dimensionally, and the single-shot distance image is converted to a single-shot distance image that matches the coordinates at the time of the reference shot. Integrate.

例えば、図１３では、３つの視差画像対のうち、中央の視差画像対を基準として設定し、残りの視差画像対から得られる単写距離画像を、その視差画像対の撮影時の撮影位置や姿勢に合わせて視点変換する。そして、視点変換した２つの単写距離画像と基準の単写距離画像とを統合し、統合した単写距離画像をマルチビュー距離画像と統合して、最終的な統合距離画像を得る。 For example, in FIG. 13, of the three parallax image pairs, the central parallax image pair is set as a reference, and the single shooting distance images obtained from the remaining parallax image pairs are used as the shooting position or the shooting position of the parallax image pair. Change the viewpoint according to the posture. Then, the two single-shot distance images after viewpoint conversion and the reference single-shot distance image are integrated, and the integrated single-shot distance image is integrated with the multi-view distance image to obtain a final integrated distance image.

上記の通り変形例によれば、より精度の高い距離画像を得ることができる。 As described above, according to the modified example, it is possible to obtain a more accurate distance image.

＜第３の実施形態＞
次に、本発明の第３の実施形態について説明する。
上述した第１の実施形態及び第２の実施形態では、瞳分割撮像系で１回の撮影で得られる視差画像対から生成した単写距離画像と、時系列でマルチショットを行って得た複数の鑑賞画像から生成したマルチビュー距離画像とを統合して、統合距離画像を生成する場合について説明した。しかしながら、瞳分割撮像系で撮影した視差画像対を入力として単写距離画像を精度良く得ようとすると、瞳を広げて基線長を長くする必要がある。一方、瞳を広げると今度は被写界深度が浅くなるため、合焦位置から所定距離範囲外にある被写体はボケてしまい、鑑賞画像間の対応付けが難しくなる。そのため、鑑賞画像を用いて合焦位置の所定距離範囲外にある被写体までの距離を精度よく算出することができなくなってしまう。 <Third Embodiment>
Next, a third embodiment of the invention will be described.
In the above-described first and second embodiments, the single shooting distance image generated from the pair of parallax images obtained in one shot with the split-pupil imaging system and the multiple distance images obtained by performing multi-shots in time series. A case of generating an integrated range image by integrating the multi-view range image generated from the viewing image has been described. However, if a pair of parallax images captured by a split-pupil imaging system is used as an input and a single shooting range image is to be accurately obtained, it is necessary to widen the pupil and lengthen the baseline length. On the other hand, when the pupil is widened, the depth of field becomes shallower, and subjects outside the predetermined distance range from the in-focus position become blurred, making it difficult to associate the viewing images. Therefore, it becomes impossible to accurately calculate the distance to the subject outside the predetermined distance range of the in-focus position using the viewing image.

このように、原理的に、精度の良い単写距離画像とマルチビュー距離画像とを共に得ることができる撮影シーンや撮影条件には制限がある。例えば、暗所等の環境下では、選択可能な撮影条件が更に厳しくなり、単写距離画像とマルチビュー距離画像とを共に精度良く得ることがより難しくなる。これは、暗所では、動きブレを防ぐためシャッタースピードを短時間に保つ必要があるが、光量が足りなくなるため光学系の瞳を開き、Ｆ値を小さくして光量を稼ぐ必要があるためである。上述したように、光学系の瞳を開くと被写界深度が浅くなってしまうため、マルチビュー距離画像を精度よく生成可能な奥行範囲が狭まってしまう。 Thus, in principle, there is a limit to the shooting scenes and shooting conditions under which both highly accurate single-view distance images and multi-view distance images can be obtained. For example, under dark environments, selectable shooting conditions become more severe, and it becomes more difficult to accurately obtain both single-shot range images and multi-view range images. This is because in dark places, it is necessary to keep the shutter speed short in order to prevent motion blur, but since the amount of light is insufficient, it is necessary to open the pupil of the optical system and decrease the F-number to increase the amount of light. be. As described above, when the pupil of the optical system is opened, the depth of field becomes shallow, so the depth range in which the multi-view distance image can be generated with high accuracy is narrowed.

そこで、第３の実施形態では、瞳分割撮像系により視差画像対を取得していた本撮影以外のマルチショット撮影においても、視差画像対を取得できるように撮影を実施する。そして、各ショットの視差画像対を用いて、合焦位置から所定距離範囲外にある画像領域の焦点ボケを回復して被写界深度を拡大した鑑賞画像を生成し、マルチショット撮影で得た複数の鑑賞画像間での対応付けを焦点ボケのない状態で行う。その上で、単写距離画像とマルチビュー距離画像とを統合して統合距離画像を生成することで、合焦位置から所定距離範囲外にある被写体までの距離も含めたシーン全体において、より精度の高い統合距離画像を得る。 Therefore, in the third embodiment, even in multi-shot shooting other than the main shooting in which the parallax image pair is acquired by the pupil division imaging system, the shooting is performed so that the parallax image pair can be acquired. Then, using the parallax image pair of each shot, the defocusing of the image area outside the predetermined distance range from the in-focus position is restored to generate a viewing image with an expanded depth of field, which is obtained by multi-shot photography. To associate a plurality of viewing images with no defocus. In addition, by integrating the single-shooting distance image and the multi-view distance image to generate an integrated distance image, the accuracy of the entire scene, including the distance from the in-focus position to the subject outside the predetermined distance range, can be improved. to obtain a high integrated range image of .

図１４は、第３の実施形態における画像処理装置３００を示すブロック図である。なお、図１４において、第２の実施形態で図１１を参照して説明した画像処理装置２００と同様の構成には同じ参照番号を付し、説明を省略する。 FIG. 14 is a block diagram showing an image processing device 300 according to the third embodiment. In FIG. 14, the same reference numerals are given to the same components as those of the image processing apparatus 200 described in the second embodiment with reference to FIG. 11, and the description thereof will be omitted.

第３の実施形態における鑑賞画像生成部３０３は、鑑賞画像生成部１０３と同様に入力した視差画像対からそのまま鑑賞画像を生成する処理と、入力した視差画像対に対して焦点ボケの回復処理を実施してから鑑賞画像を生成する処理とを行うことができる。つまり、マルチビュー距離画像生成部１０５に入力する複数の鑑賞画像全てについて焦点ボケ回復を適用しても良いし、一部だけに適用しても良い。そのため、鑑賞画像生成部３０３への入力はされる画像は、全て視差画像対であっても良いし、視差画像対と、画素毎に第１の信号と第２の信号を加算してから出力された鑑賞画像とを混在させても良い。 The viewing image generation unit 303 according to the third embodiment performs a process of generating a viewing image as it is from an input parallax image pair and a defocus recovery process for the input parallax image pair in the same manner as the viewing image generation unit 103 . A process of generating a viewing image can be performed after execution. That is, defocus blur recovery may be applied to all of the plurality of viewing images input to the multi-view range image generation unit 105, or may be applied to only some of them. Therefore, the images to be input to the viewing image generation unit 303 may all be parallax image pairs, or the parallax image pairs and the first signal and the second signal are added for each pixel before being output. It is also possible to mix the displayed viewing image.

次に、図１５のフローチャートを参照して、第３の実施形態における撮影及び統合距離画像生成の手順について説明する。 Next, the procedure of photographing and integrated range image generation in the third embodiment will be described with reference to the flowchart of FIG. 15 .

Ｓ３０１では、単写距離画像の生成及び焦点ボケ回復に用いられる視差画像対を、複数回連続して撮影する（マルチショット）。マルチショットを行って得た全ての画像について単写距離画像の生成及び焦点ボケ回復を実施する場合には、すべてのショットをマイクロレンズ２１１下の視点数分の複数枚からなる視差画像のセット（視差画像対）として取得する。マルチショットを行って得た一部の画像について単写距離画像の生成及び焦点ボケ回復を実施しない場合には、生成済みの鑑賞画像を混在させても良い。マルチショットの代わりに動画撮影により複数の画像を取得する場合には、すべてのフレームにおいて視差画像対を取得する方が制御が単純である。 In S301, a pair of parallax images used for generating a single shooting range image and recovering out-of-focus blur is continuously photographed a plurality of times (multi-shot). When generating a single shooting distance image and recovering the out-of-focus blur for all images obtained by performing multi-shots, a set of parallax images ( obtained as a parallax image pair). If the generation of single-shot distance images and recovery of defocus blur are not performed for some images obtained by performing multi-shot, the generated viewing images may be mixed. If a plurality of images are to be captured by video shooting instead of multi-shot, the control is simpler to acquire parallax image pairs in every frame.

Ｓ３０２では、統合距離画像を生成する基準とするショットを、マルチショットを行って得た複数の画像の中から選択する。静止画の連写の場合には、撮影時に撮像装置のＧＵＩで予め選択しておいても良い。また画像処理装置３００の不図示のＧＵＩにより基準とするショットを選択しても良い。動画撮影の場合は、統合距離画像を生成する基準となるフレームを順次時間方向に移動する。 In S302, a shot used as a reference for generating an integrated range image is selected from a plurality of images obtained by performing multi-shot. In the case of continuous shooting of still images, selection may be made in advance using the GUI of the imaging device at the time of shooting. Alternatively, a reference shot may be selected using a GUI (not shown) of the image processing apparatus 300 . In the case of moving image shooting, the reference frame for generating the integrated range image is sequentially moved in the time direction.

Ｓ３０３では、各ショットで撮影されたマイクロレンズ２１１下の視点数分の複数枚もしくはその一部からなる視差画像対を入力として、焦点ボケ回復を実施した１枚の鑑賞画像を生成する。 In S303, a pair of parallax images corresponding to the number of viewpoints under the microlens 211 photographed in each shot, or a pair of parallax images consisting of a part of them, is input, and one viewing image subjected to defocus recovery is generated.

各ショットの視差画像対に対する焦点ボケ回復処理は、ブラインドや距離画像を算出して焦点ボケカーネルを推定して行うデコンボリューション処理やＭＡＰ推定処理により実現しても良い。もしくはそれを代替するディープラーニング処理により実現しても良い。ディープラーニング処理により、各ショットの視差画像対に対して焦点ボケ回復処理を実現する場合、エンコーダ－デコーダ構造を用いたエンドトゥエンド処理による実現がまず考えられる。あるいは、従来のノンブラインドな距離画像あるいは焦点ボケカーネルを推定して行うデコンボリューション処理に対応するネットワークを構築して実現する。以下、それぞれの処理について説明する。 Defocus recovery processing for the parallax image pair of each shot may be realized by deconvolution processing or MAP estimation processing performed by calculating blinds or range images and estimating a defocus kernel. Alternatively, it may be realized by a deep learning process that replaces it. When implementing defocus blur recovery processing for a parallax image pair of each shot by deep learning processing, implementation by end-to-end processing using an encoder-decoder structure is first considered. Alternatively, a network corresponding to deconvolution processing performed by estimating a conventional non-blind range image or focus blur kernel is constructed and realized. Each process will be described below.

まず、エンコーダ－デコーダ構造を用いたエンドトゥエンド処理による実現方法を説明する。図１６は、焦点ボケ回復処理に用いるネットワークの入出力関係を表した図である。図１６（ａ）は、焦点ボケ回復をエンドトゥエンドで行うネットワークの入出力例である。本アプローチについては、非特許文献２のような形で複数のネットワークの実現系が提案されている。なお、各ショットの視差画像対（Ａ像、Ｂ像）を入力とし、焦点ボケ回復画像が得られれば、どのようなアプローチの手法でも構わない。エンコーダ－デコーダ構造のネットワークに対し、撮像装置の絞りを絞って撮影した全焦点画像を教師データとしてネットワークを学習する。出力画像とグラウンドトゥルースの全焦点画像との差分を損失としてネットワークを学習する。学習が完了すると、各ショットの視差画像対を入力として、焦点ボケ回復を実施した１枚の鑑賞画像が得られる。 First, an implementation method by end-to-end processing using an encoder-decoder structure will be described. FIG. 16 is a diagram showing the input/output relationship of the network used for the defocus recovery process. FIG. 16(a) is an input/output example of a network that performs end-to-end defocus recovery. Regarding this approach, a system for realizing a plurality of networks has been proposed in the form of Non-Patent Document 2. Note that any approach may be used as long as a parallax image pair (A image, B image) of each shot is input and a defocus recovery image is obtained. For the encoder-decoder structure network, the network is learned using all-in-focus images taken with the aperture of the imaging device narrowed down as training data. The network is trained using the difference between the output image and the ground truth all-in-focus image as the loss. When the learning is completed, the pair of parallax images of each shot is used as an input, and a single viewing image that has undergone defocus recovery is obtained.

次に、距離画像あるいは焦点ボケカーネルを推定して行うデコンボリューション処理に対応するネットワークを構築する実現方法の一例について説明する。図１６（ｂ）にネットワークの例を示す。ネットワークは、各ショットの視差画像対を入力とし、仮距離画像もしくは距離値ｄを１／ｄとした仮逆深度画像を算出するネットワークと、視差画像対とこの仮距離画像もしくは仮逆深度画像を入力として焦点ボケ回復画像及びリファインされた距離画像もしくは逆深度画像を出力するネットワークから構成される。仮距離画像もしくは仮逆深度画像は、焦点ボケカーネルを表すパラメータのマップもしくはそれを画像化したもので良い。なお、図１６（ｂ）は、各ショットの視差画像が２枚（Ａ像、Ｂ像）の場合で例示している。距離算出ネットワーク及び焦点ボケ回復ネットワークはそれぞれ独立して学習が可能である。視差画像対に対応する距離値の正解画像、撮像装置の絞りを絞って撮影した全焦点画像を教師データとしてネットワークを学習する。この学習は、距離値の誤差や画素値の誤差、エッジの鮮明度等を誤差関数として逆伝播を実施することにより行う。このようなネットワークとしては、非特許文献３に開示されたものが提案されている。ネットワークの詳細や学習データの詳細、誤差関数の例等は、非特許文献３を参照して利用することで実現が可能である。 Next, an example of an implementation method for constructing a network corresponding to deconvolution processing performed by estimating a range image or a focus blur kernel will be described. FIG. 16(b) shows an example of a network. A network receives a parallax image pair of each shot as an input and calculates a temporary distance image or a temporary inverse depth image with the distance value d set to 1/d, and a parallax image pair and this temporary distance image or temporary inverse depth image. It consists of a network that outputs a defocused image and a refined range image or an inverse depth image as input. The provisional distance image or the provisional inverse depth image may be a map of parameters representing a focus blur kernel or an image thereof. Note that FIG. 16B illustrates a case where there are two parallax images (A image and B image) for each shot. The distance calculation network and the defocus recovery network can be learned independently. The network is learned by using the correct image of the distance value corresponding to the parallax image pair and the omnifocal image captured by narrowing down the aperture of the imaging device as training data. This learning is performed by performing backpropagation using distance value error, pixel value error, edge definition, etc. as an error function. As such a network, the one disclosed in Non-Patent Document 3 has been proposed. Details of the network, details of learning data, examples of error functions, etc. can be realized by referring to Non-Patent Document 3.

なお、非特許文献３のネットワーク形態を参考にして、距離算出ネットワーク及び焦点ボケ回復ネットワークを例示したが、本発明はそれぞれ例示した特定のネットワーク形態に限られるものではない。それぞれ他のディープラーニングネットワークや古典手法に置き換えることが可能である。例えば、距離算出ネットワークを、図１６（ｃ）に示すような、非特許文献４に開示された焦点ボケカーネルのモデルパラメータを推定するディープラーニングネットワークとしてもよい。また焦点ボケ回復ネットワークは古典処理のシフトバリアント型のデコンボリューション処理に置き換え可能である。 Although the distance calculation network and the defocus recovery network were illustrated with reference to the network configuration of Non-Patent Document 3, the present invention is not limited to the specific network configurations illustrated. Each can be replaced by other deep learning networks or classical methods. For example, the distance calculation network may be a deep learning network for estimating the model parameters of the focus blur kernel disclosed in Non-Patent Document 4, as shown in FIG. 16(c). Also, the defocus recovery network can replace the shift-variant deconvolution process of the classical process.

非特許文献４は、焦点ボケカーネルのモデルを想定し、入力した各ショットの視差画像対に対する画角毎の焦点ボケカーネルのパラメータを求めて視差マップを作成し、距離画像の代替として出力するネットワークである。 Non-Patent Document 4 assumes a model of a focus blur kernel, finds the parameters of the focus blur kernel for each angle of view for the input parallax image pair of each shot, creates a parallax map, and outputs it as a substitute for the distance image. is.

Ｓ３０４では、単写距離画像を生成するが、この処理は第１の実施形態の図３のＳ１０２における処理と同様であるため、説明を省略する。 In S304, a single-shot distance image is generated, but since this process is the same as the process in S102 of FIG. 3 of the first embodiment, description thereof will be omitted.

Ｓ３０５では、Ｓ３０３で焦点ボケを回復した鑑賞画像もしくは焦点ボケ回復を行わない鑑賞画像も混在した、マルチショットにより取得した複数の鑑賞画像を用いて、マルチビュー距離画像を生成する。なお、マルチビュー距離画像の生成方法については、第１の実施形態で説明したＳ１０３における処理と同様であるため、説明を省略する。 In S305, a multi-view distance image is generated using a plurality of viewing images acquired by multi-shot, which include viewing images with or without defocus recovery in S303. Note that the method of generating the multi-view range image is the same as the process in S103 described in the first embodiment, so description thereof will be omitted.

Ｓ３０６では、単写距離画像とマルチビュー距離画像とを統合して、シーンの奥行方向全体に対して精度の高い統合距離画像を生成する。なお、統合距離画像の生成方法については、第１の実施形態で説明したＳ１０４における処理と同様であるため、説明を省略する。 In S306, the single-view range image and the multi-view range image are integrated to generate a highly accurate integrated range image for the entire depth direction of the scene. Note that the method of generating the integrated range image is the same as the processing in S104 described in the first embodiment, so description thereof will be omitted.

次に、図１７を参照して、上述した撮影から統合距離画像生成までの手順について概説する。なお、マルチショットで全ての画像を視差画像対として取得する場合について説明する。例えば撮影にスチルカメラを用いた場合を想定すると、Ｓ３０１では、撮影者がシャッタを全押しし、意図的に撮影する場合が考えられる。その後、Ｓ３０２では、マルチショットを行って取得した複数の視差画像対の中から、基準となる視差画像対（基準ショット）を選択する。なお、選択する視差画像対は、単写距離画像の生成に用いるショットの視差画像対と同じであっても良い。Ｓ３０３では、マルチビュー距離画像の生成に用いる鑑賞用画像を生成するための視差画像対に対して、焦点ボケ回復処理を実施する。 Next, with reference to FIG. 17, the procedure from photographing to integrated range image generation will be outlined. A case of acquiring all images as a pair of parallax images in multi-shot will be described. For example, assuming that a still camera is used for photographing, in S301, the photographer may fully depress the shutter to intentionally photograph. After that, in S302, a reference parallax image pair (reference shot) is selected from a plurality of parallax image pairs obtained by performing multi-shot. Note that the parallax image pair to be selected may be the same as the parallax image pair of the shot used to generate the single-shot distance image. In S303, defocus blur recovery processing is performed on the parallax image pair for generating the viewing image used to generate the multi-view range image.

Ｓ３０４では、選択した視差画像対を用いて、単写距離画像を生成する。また、Ｓ３０５では、焦点ボケ回復処理を実施して得られた複数の鑑賞画像から、マルチビュー距離画像を生成する。Ｓ３０６にて、単写距離画像とマルチビュー距離画像とを統合して、統合距離画像を得る。得られた統合距離画像は、合焦位置から所定距離範囲外にある領域も含んだシーン全体について、単写距離画像よりも距離精度を高めた距離画像となる。また、マルチビュー距離画像と異なり、動体を含んでいても距離値を得ることができる。 In S304, the selected parallax image pair is used to generate a single-shot distance image. Also, in S305, a multi-view range image is generated from a plurality of viewing images obtained by performing defocus recovery processing. In S306, the single-shot range image and the multi-view range image are integrated to obtain an integrated range image. The obtained integrated distance image is a distance image with higher distance accuracy than the single-shooting distance image for the entire scene including the area outside the predetermined distance range from the in-focus position. Also, unlike the multi-view range image, a range value can be obtained even if a moving object is included.

図１８は、マルチショットにより得られたすべてのショットについて統合距離画像を生成したい場合や、動画像撮影においてすべてのフレームに対応する統合距離画像を生成したい場合の説明図である。この場合、Ｓ３０２における基準ショットの選択を順次変更することで、すべてのショットに対応する統合距離画像を生成する。 FIG. 18 is an explanatory diagram for a case where it is desired to generate an integrated range image for all shots obtained by multi-shots, or for a case where an integrated range image for all frames in moving image shooting is desired to be generated. In this case, by sequentially changing the selection of the reference shots in S302, integrated range images corresponding to all shots are generated.

上記の通り第３の実施形態によれば、様々な撮影シーンにおいて、精度の高い距離画像を得ることができる。 As described above, according to the third embodiment, it is possible to obtain highly accurate range images in various shooting scenes.

＜第４の実施形態＞
次に、本発明の第４の実施形態について説明する。
図１９Ａは、本発明の撮像装置の第４の実施形態の構成を示すブロック図である。図１９Ａにおいて、撮像装置４００は、撮像部４０１、撮像部４０２、演算部４０３、記憶部４０４、シャッターボタン４０５、操作部４０６、表示部４０７、制御部４０８とを備える。 <Fourth Embodiment>
Next, a fourth embodiment of the invention will be described.
FIG. 19A is a block diagram showing the configuration of the imaging device according to the fourth embodiment of the present invention. In FIG. 19A , imaging device 400 includes imaging unit 401 , imaging unit 402 , arithmetic unit 403 , storage unit 404 , shutter button 405 , operation unit 406 , display unit 407 , and control unit 408 .

図１９Ｂは、本実施形態の撮像装置４００の外観正面図である。撮像装置４００は、撮像部４０１と撮像部４０２の２つの撮像部を有する。撮像部４０１は、レンズ４０１ａと撮像素子４０１ｂとを備える。撮像部４０２は、レンズ４０２ａと撮像素子４０２ｂとを備える。レンズ４０１ａ，４０２ａは、被写体で反射された光を集光し、撮像素子４０１ｂ，４０２ｂ上に光学像を形成する。撮像素子４０１ｂ，４０２ｂは、光学像を電気信号に変換して映像データを出力する。 FIG. 19B is an external front view of the imaging device 400 of this embodiment. The imaging device 400 has two imaging units, an imaging unit 401 and an imaging unit 402 . The imaging unit 401 includes a lens 401a and an imaging device 401b. The imaging unit 402 includes a lens 402a and an imaging device 402b. Lenses 401a and 402a converge the light reflected by the subject and form optical images on imaging elements 401b and 402b. The imaging devices 401b and 402b convert optical images into electrical signals and output video data.

撮像部４０１と撮像部４０２は、共通の被写体を撮影して互いに視差のある画像を取得できるように配置されている。撮像装置４００の使用者がシャッターボタン４０５を押下することにより、撮像部４０１と撮像部４０２が複眼ステレオ撮影を行う。なお、撮像部４０１の光軸と撮像部４０２の光軸の間隔Ｄは、既知であるものとする。 The imaging unit 401 and the imaging unit 402 are arranged so that they can photograph a common subject and obtain images with a mutual parallax. When the user of the imaging apparatus 400 presses the shutter button 405, the imaging units 401 and 402 perform compound-eye stereo imaging. It is assumed that the distance D between the optical axis of the imaging unit 401 and the optical axis of the imaging unit 402 is known.

図１９Ｃは、本実施形態の撮像装置４００の外観背面図である。操作部４０６は、撮影条件の設定や、距離計測モードの開始や終了などの指示を行うときに押下する。表示部４０７は、例えば液晶ディスプレイなどであり、撮影時の構図を表示したり、様々な設定項目を表示したりする。また、表示部４０７がタッチパネルとなっている場合には、シャッターボタン４０５や操作部４０６を兼用することができ、使用者が表示部４０７にタッチすることで、撮影や設定を行うことができる。この場合にはシャッターボタン４０５や操作部４０６などのハードウェアパーツを装備する必要がなく、使用者が操作しやすくできるとともに、大きい表示部４０７を設置して見やすくすることができる。 FIG. 19C is an external rear view of the imaging device 400 of this embodiment. An operation unit 406 is pressed to set imaging conditions and to instruct the start and end of the distance measurement mode. A display unit 407 is, for example, a liquid crystal display, and displays the composition at the time of shooting and various setting items. When the display unit 407 is a touch panel, the shutter button 405 and the operation unit 406 can also be used, and the user can perform shooting and settings by touching the display unit 407 . In this case, hardware parts such as the shutter button 405 and the operation unit 406 are not required, and the user can easily operate the camera, and the large display unit 407 can be installed for easy viewing.

制御部４０８は、撮像部４０１や撮像部４０２で撮影を行う場合の撮影条件を制御する。例えば、光学系の絞りの開口径、シャッタースピード、ＩＳＯ感度などを制御する。演算部４０３は、撮像部４０１や撮像部４０２が撮影した画像に対して現像処理を行ったり、被写体距離を算出したりする。 The control unit 408 controls shooting conditions when the imaging unit 401 and the imaging unit 402 perform shooting. For example, it controls the aperture diameter of the diaphragm of the optical system, the shutter speed, the ISO sensitivity, and the like. The calculation unit 403 performs development processing on the images captured by the imaging unit 401 and the imaging unit 402, and calculates the subject distance.

図２０は、本実施形態の撮像装置４００における測距処理の動作を示すフローチャートである。Ｓ４０１で使用者が距離計測モードの開始を指示すると、このフローチャートの動作が開始される。 FIG. 20 is a flowchart showing the operation of distance measurement processing in the imaging device 400 of this embodiment. When the user instructs the start of the distance measurement mode in S401, the operation of this flow chart is started.

図２１（ａ）Ａは、Ｓ４０１での詳しい処理を示すフローチャートである。Ｓ４０１は、Ｓ４０１１とＳ４０１２の処理からなる。 FIG. 21(a)A is a flowchart showing detailed processing in S401. S401 consists of the processing of S4011 and S4012.

Ｓ４０１１では、使用者が距離計測モードを選択する。図２２に距離計測モード選択時の表示部４０７の状態を示す。使用者は操作部４０６を操作することにより、表示部４０７に表示５０１を表示させる。表示５０１は、撮影モードの項目を示しており、使用者は操作部４０６を操作して距離計測モードを選択する。Ｓ４０１２では、撮像装置４００の制御部４０８は、連続撮影を開始する。 In S4011, the user selects the distance measurement mode. FIG. 22 shows the state of the display section 407 when the distance measurement mode is selected. The user causes the display unit 407 to display the display 501 by operating the operation unit 406 . A display 501 shows items of shooting mode, and the user operates the operation unit 406 to select the distance measurement mode. In S4012, the control unit 408 of the imaging device 400 starts continuous shooting.

図２０に戻り、Ｓ４０２では、単眼ステレオカメラによる相対距離の取得を行う。単眼ステレオカメラによる撮影は、撮像部４０１または撮像部４０２のどちらか一方を用いればよい。本実施形態では、撮像部４０１を用いて撮影するものとして説明する。連続撮影は、動画撮影のようにフレーム間隔が密なものでもよいし、静止画の連射撮影のようにフレーム間隔が疎なものでもよい。 Returning to FIG. 20, in S402, the relative distance is acquired by the monocular stereo camera. Either the imaging unit 401 or the imaging unit 402 may be used for photographing with a monocular stereo camera. In this embodiment, it is assumed that the imaging unit 401 is used for imaging. The continuous shooting may be performed with close frame intervals such as moving image shooting, or may be performed with loose frame intervals such as continuous shooting of still images.

図２１（ｂ）は、Ｓ４０２での詳しい処理を示すフローチャートである。Ｓ４０２は、Ｓ４０２１～Ｓ４０２７の処理からなる。 FIG. 21B is a flowchart showing detailed processing in S402. S402 consists of the processing of S4021 to S4027.

Ｓ４０２１は、単眼ステレオカメラによる撮影フレームのループである。単眼ステレオカメラによる相対距離の取得は、まず被写体の３次元相対位置を取得し、それを用いて使用者が絶対距離値を取得したい位置からの相対距離値に変換する。被写体の３次元相対位置の取得には、ＳｆＭ（ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎ）やＳＬＡＭ（ＳｉｍｕｌｔａｎｅｏｕｓＬｏｃａｌｉｚａｔｉｏｎＡｎｄＭａｐｐｉｎｇ）などの手法を用いることができる。または、ＳｆＭやＳＬＡＭで撮像装置４００の位置姿勢を算出した後、その位置姿勢を使って密な３次元相対位置を取得するＭＶＳ（ＭｕｌｔｉＶｉｅｗＳｔｅｒｅｏ）という手法を用いることもできる。以下では、ＳｆＭやＳＬＡＭのように、撮像装置４００の位置姿勢と被写体の３次元相対位置を同時に取得する手法を用いるものとして説明する。 S4021 is a loop of frames captured by the monocular stereo camera. Acquisition of the relative distance by the monocular stereo camera first acquires the three-dimensional relative position of the object, and uses it to convert the relative distance value from the position where the user wants to acquire the absolute distance value. Techniques such as SfM (Structure from Motion) and SLAM (Simultaneous Localization And Mapping) can be used to acquire the three-dimensional relative position of the subject. Alternatively, after calculating the position and orientation of the imaging device 400 by SfM or SLAM, a method called MVS (Multi View Stereo) can be used in which the calculated position and orientation are used to acquire a dense three-dimensional relative position. In the following description, it is assumed that a technique such as SfM or SLAM that simultaneously obtains the position and orientation of the imaging device 400 and the three-dimensional relative position of the subject is used.

図２３は、相対距離の計測中における表示部４０７の状態を示す図である。表示６０１は相対距離の計測対象となる被写体をリアルタイムで表示している。使用者がこの映像を見ながら撮影することにより、距離計測を行いたい被写体を含んだ相対距離計測を行うことができる。通知６０２は、単眼ステレオ撮影による相対距離計測を行っていることを使用者に通知する表示であり、表示６０１に重畳して表示される。通知６０２は、警報音や音声ガイドなどで行うことも可能であり、その場合には、表示６０１を欠損なく表示することができる。 FIG. 23 is a diagram showing the state of the display unit 407 during relative distance measurement. A display 601 displays in real time the subject whose relative distance is to be measured. By taking a picture while viewing this image, the user can measure the relative distance including the object for which the distance is to be measured. A notification 602 is a display for notifying the user that relative distance measurement is being performed by monocular stereo shooting, and is displayed superimposed on the display 601 . The notification 602 can also be given by an alarm sound, an audio guide, etc. In that case, the display 601 can be displayed without loss.

Ｓ４０２２では、制御部４０８は、現在の撮影フレームの画像から特徴点を抽出する。特徴点の抽出方法は、代表的なものではＳＩＦＴ（ＳｃａｌｅＩｎｖａｒｉａｎｔＦｅａｔｕｒｅＴｒａｎｓｆｏｒｍ）、ＳＵＲＦ（Ｓｐｅｅｄｅｄ－ＵｐＲｏｂｕｓｔＦｅａｔｕｒｅｓ）、ＦＡＳＴ（ＦｅａｔｕｒｅｓｆｒｏｍＡｃｃｅｌｅｒａｔｅｄＳｅｇｍｅｎｔＴｅｓｔ）などが挙げられるが他の手法でも構わない。 In S4022, the control unit 408 extracts feature points from the image of the current captured frame. Typical methods for extracting feature points include SIFT (Scale Invariant Feature Transform), SURF (Speeded-Up Robust Features), and FAST (Features from Accelerated Segment Test), but other methods may also be used.

Ｓ４０２３では、制御部４０８は、現在のフレームが第１フレームか否かを判定し、第１フレームであれば以降の処理を行わずに第２フレームに進み、そうでなければＳ４０２４に進む。 In S4023, the control unit 408 determines whether or not the current frame is the first frame.

Ｓ４０２４では、制御部４０８は、直前のフレームにおいてＳ４０２２で抽出した特徴点と、現在のフレームにおいてＳ４０２２で抽出した特徴点とを対応付ける。フレーム間隔が疎な場合には、異なるフレーム間の画像において距離計測対象となる被写体の共通部分が、十分に写っている必要がある。共通部分が少ないと、特徴点の対応付けができなくなり相対距離取得の計算が停止してしまうからである。 In S4024, the control unit 408 associates the feature points extracted in S4022 in the previous frame with the feature points extracted in S4022 in the current frame. When the frame interval is sparse, it is necessary that the common portion of the object to be distance-measured is sufficiently captured in the images between different frames. This is because if there are few common parts, the feature points cannot be associated, and the calculation for obtaining the relative distance stops.

図２４は、単眼ステレオカメラによる相対距離計測中の特徴点の対応付けの様子を示す図である。位置７０１と位置７０２は、撮像装置４００の異なる時刻での位置を示す。画像７０３と画像７０４は、それぞれ位置７０１と位置７０２で撮像部４０１が撮影した画像を示す。特徴点７０５は、画像７０３中で抽出した特徴点であり、画像７０４中では、特徴点７０６として正しく対応付けられている。これに対して、特徴点７０７は画像７０４では正しい位置で抽出されていないため、画像７０４中では、特徴点７０８として誤った対応付けがなされている。 FIG. 24 is a diagram showing how feature points are associated during relative distance measurement by a monocular stereo camera. A position 701 and a position 702 indicate positions of the imaging device 400 at different times. An image 703 and an image 704 represent images captured by the imaging unit 401 at positions 701 and 702, respectively. A feature point 705 is a feature point extracted from the image 703 and correctly associated with the feature point 706 in the image 704 . On the other hand, since the feature point 707 is not extracted at the correct position in the image 704 , it is incorrectly associated as the feature point 708 in the image 704 .

いま、単眼ステレオカメラによる相対距離値の信頼度を特徴点のマッチング精度であると定義すると、特徴点７０７はマッチング精度が悪い特徴点であり、相対距離値の信頼度が相対的に低い。このような信頼度の低下の原因としては、被写体のテクスチャが少ないことや、特徴点７０７に対応する画像７０４中の位置で像がボケてしまっていることなどが挙げられる。 Now, if the reliability of the relative distance value by the monocular stereo camera is defined as the matching accuracy of the feature point, the feature point 707 is a feature point with poor matching accuracy, and the reliability of the relative distance value is relatively low. Reasons for such a decrease in reliability include the fact that the subject has little texture and that the image is blurred at a position in the image 704 corresponding to the feature point 707 .

Ｓ４０２５では、制御部４０８は、撮像装置４００の位置姿勢と被写体の３次元相対位置を算出する。 In S4025, the control unit 408 calculates the position and orientation of the imaging device 400 and the three-dimensional relative position of the subject.

Ｓ４０２６では、制御部４０８は、相対距離値の信頼度を算出する。Ｓ４０２４で対応付けされた特徴点のマッチング精度を算出して、マッチング精度が高い点ほど信頼度が高いとする。マッチング精度の算出についはＲＡＮＳＡＣ（ＲａｎｄｏｍＳａｍｐｌｅＣｏｎｓｅｎｓｕｓ）などのアルゴリズムを用いることができる。ＲＡＮＳＡＣでは、撮像装置４００の位置姿勢と画像上の多数の特徴点の動きの平均値から注目する特徴点の動きがどれだけずれているかを計算し、特徴点のマッチング精度を判定している。 In S4026, the control unit 408 calculates the reliability of the relative distance value. The matching accuracy of the feature points associated in S4024 is calculated, and the higher the matching accuracy, the higher the reliability. Algorithms such as RANSAC (Random Sample Consensus) can be used to calculate matching accuracy. RANSAC calculates how much the motion of a feature point of interest deviates from the position and orientation of the imaging device 400 and the average value of the motion of a large number of feature points on the image, and determines the matching accuracy of the feature points.

Ｓ４０２７では、制御部４０８は、使用者が複眼ステレオカメラ撮影の指示をしたか否かを判定する。使用者が指示をしていないなら引き続き次のフレームに進み、指示をしたなら、このフローの処理を終了し、図２０のＳ４０３に進む。ここで、複眼ステレオカメラ撮影の指示とは、使用者がシャッターボタン４０５を半押しするなどして、絶対距離値算出のための撮影動作に入る指示のことである。 In S4027, the control unit 408 determines whether or not the user has instructed the compound-eye stereo camera shooting. If the user has not given an instruction, the process proceeds to the next frame. Here, the instruction to shoot with the compound-eye stereo camera means that the user presses the shutter button 405 halfway, for example, to enter the shooting operation for calculating the absolute distance value.

Ｓ４０３では、制御部４０８は、複眼ステレオカメラ、すなわち撮像部４０１と撮像部４０２の双方で撮影し、絶対距離値を取得する。 In S403, the control unit 408 captures images with the compound eye stereo camera, that is, both the imaging unit 401 and the imaging unit 402, and acquires the absolute distance value.

図２５（ａ）は、Ｓ４０３についての詳しい処理を示すフローチャートである。Ｓ４０３は、Ｓ４０３１～Ｓ４０３９の処理からなる。 FIG. 25(a) is a flowchart showing detailed processing of S403. S403 consists of the processing of S4031 to S4039.

Ｓ４０３１では、制御部４０８は、特定の被写体にフォーカスする（焦点を合わせる）。特定の被写体とは、例えば使用者が特に距離を計測したいもの等である。フォーカスの方法としては、例えば、使用者がシャッターボタン４０５を半押しすることで動作するオートフォーカス機能によってピントを合わせる方法がある。フォーカスは手動であっても構わない。また、撮像部４０１と撮像部４０２の双方が同じ被写体に対してピントを合わせる。 In S4031, the control unit 408 focuses (focuses on) a specific subject. A specific subject is, for example, an object whose distance the user particularly wants to measure. As a method of focusing, for example, there is a method of focusing by an autofocus function that operates when the user half-presses the shutter button 405 . Focusing may be manual. Also, both the imaging unit 401 and the imaging unit 402 focus on the same subject.

Ｓ４０３２では、複眼ステレオカメラによる絶対距離とその信頼度を算出する。複眼ステレオカメラによる絶対距離算出では、撮像部４０１と撮像部４０２がそれぞれ撮影した画像に対してステレオマッチング等の手法を用いる。 In S4032, the absolute distance and its reliability by the compound-eye stereo camera are calculated. In absolute distance calculation by a compound eye stereo camera, a technique such as stereo matching is used for images captured by the imaging units 401 and 402 respectively.

図２６は、複眼ステレオカメラによる絶対距離計測中のステレオマッチングの様子を示す図である。位置８０１で使用者がＳ４０３１で焦点合わせをしたときに、撮像部４０１と撮像部４０２が撮影した画像をそれぞれ画像８０２、画像８０３とする。画像８０２を基準画像、画像８０３を参照画像とし、画像８０２と画像８０３内にそれぞれウィンドウ（領域）８０４とウィンドウ８０５を設定する。ウィンドウ８０４は固定し、ウィンドウ８０５を移動させて、ウィンドウ内の画像に対して相関演算を行うことにより、最も相関値が高かったウィンドウ８０５の位置をウィンドウ８０４の対応位置とする。対応位置の画像上のズレ量を視差として、視差の値と、撮像部４０１の光軸と撮像部４０２の光軸の間隔Ｄに基づく基線長とからウィンドウ８０４に写った被写体までの絶対距離を算出する。このときに、算出した絶対距離に対する信頼度を計算する。例えば、ウィンドウ８０４の位置をいろいろ変更して、それぞれに対してウィンドウマッチング計算で得た相関値の最高値が相対的に高いものを信頼度が高いとすればよい。また、後述するように信頼度を被写体に対するデフォーカス量で表してもよい。ここでは、ウィンドウマッチングの手法を説明したが、これに限らず単眼ステレオカメラによる相対距離算出のように特徴点を抽出して、対応付けを行ってもよい。 FIG. 26 is a diagram showing how stereo matching is performed during absolute distance measurement by a compound-eye stereo camera. Assume that the images captured by the imaging units 401 and 402 when the user performs focusing at the position 801 in S4031 are images 802 and 803, respectively. Using an image 802 as a reference image and an image 803 as a reference image, windows (regions) 804 and 805 are set in the images 802 and 803, respectively. The window 804 is fixed, the window 805 is moved, and the correlation calculation is performed on the images within the window. The absolute distance to the subject captured in the window 804 is calculated from the parallax value and the baseline length based on the distance D between the optical axis of the imaging unit 401 and the optical axis of the imaging unit 402, where the amount of shift on the image of the corresponding position is parallax. calculate. At this time, the reliability for the calculated absolute distance is calculated. For example, the position of the window 804 may be changed in various ways, and the highest correlation value obtained by the window matching calculation for each of them may be regarded as having the highest reliability. Also, as described later, the reliability may be represented by the defocus amount with respect to the subject. Although the method of window matching has been described here, the method is not limited to this, and feature points may be extracted and associated as in relative distance calculation using a monocular stereo camera.

Ｓ４０３３では、制御部４０８は、単眼ステレオカメラで取得した相対距離値と、複眼ステレオカメラで取得した絶対距離値の双方で信頼度が高い略同一な被写体があるかを判定する。信頼度が高い略同一な被写体がある場合にはＳ４０３７へ進み、ない場合にはＳ４０３４に進む。 In S4033, the control unit 408 determines whether there is a substantially identical subject with high reliability in both the relative distance value obtained by the monocular stereo camera and the absolute distance value obtained by the compound eye stereo camera. If there is a substantially identical subject with high reliability, the process proceeds to S4037; otherwise, the process proceeds to S4034.

Ｓ４０３４では、フォーカス位置付近の特徴点を用いて、単眼ステレオカメラで取得した相対距離を絶対距離に変換する。相対距離算出と絶対距離算出における信頼度算出で、ピント付近に双方の信頼度が高い被写体があるとは限らないが、ここでは被写界深度の概算値を取得できる程度に信頼度が高ければよい。 In S4034, the feature point near the focus position is used to convert the relative distance acquired by the monocular stereo camera into an absolute distance. When calculating the reliability of relative distance calculation and absolute distance calculation, it is not always the case that there is a subject near the focus with high reliability for both. good.

次に、複眼ステレオカメラで取得される絶対距離の信頼度をデフォーカス量を用いて定義する。例えば、信頼度をデフォーカス量の大きさの逆数等として、デフォーカス量が小さいほど信頼度が高くなるように決める。これによって、ウィンドウ８０４をいろいろな位置に設定して、信頼度が高い位置を決定することができる。単眼ステレオカメラで取得した特徴点のうち、特徴点Ｘとなりうる候補点を含むウィンドウ８０４を設定すれば、それに対してもデフォーカス量から信頼度が分かる。複眼ステレオカメラによる撮影において、デフォーカス量が被写界深度内に入っている被写体の絶対距離値を信頼度が高いとすれば、ウィンドウ８０４に含まれる被写体が被写界深度内に入っていれば特徴点Ｘの候補点となる。 Next, the reliability of the absolute distance acquired by the compound-eye stereo camera is defined using the defocus amount. For example, the reciprocal of the magnitude of the defocus amount is used as the reliability, and the smaller the defocus amount, the higher the reliability. This allows the window 804 to be set at different positions to determine the position with the highest confidence. If a window 804 including candidate points that can be the feature point X among the feature points acquired by the monocular stereo camera is set, the reliability can be obtained from the defocus amount. In shooting with a compound-eye stereo camera, if the absolute distance value of a subject whose defocus amount is within the depth of field is highly reliable, the subject included in the window 804 is within the depth of field. becomes a candidate point for feature point X.

Ｓ４０３５では、制御部４０８は、特徴点Ｘの候補点が被写界深度内に入るように絞りを小さく設定する。この処理によって、特徴点Ｘの候補点をボケることなく撮影することができ、高精度に絶対距離値を取得することができる。Ｓ４０３４で相対距離値を絶対距離値に変換する際に、信頼度の高い特徴点を利用しているとは限らないので、被写界深度内に入る絞りの大きさよりも小さめの絞りを設定するとよい。ここで、設定するのは絞りの大きさである必要はなく、絞りの大きさは不変のまま、特徴点Ｘの候補点が被写界深度内に入る位置にピントを合わせ直してもよい。こうすることで、撮影シーンが暗すぎて絞りを小さくできない場合でも特徴点Ｘの候補点をボケることなく撮影することができる。 In S4035, the control unit 408 sets the aperture small so that the candidate point of the feature point X is within the depth of field. By this processing, the candidate point of the feature point X can be photographed without being blurred, and the absolute distance value can be obtained with high accuracy. When converting the relative distance value to the absolute distance value in S4034, it is not always possible to use a feature point with high reliability. good. Here, it is not necessary to set the size of the aperture, and the focus may be readjusted to the position where the candidate point of the feature point X is within the depth of field while the size of the aperture remains unchanged. By doing this, even if the shooting scene is too dark to reduce the aperture, the candidate points of the feature point X can be shot without being blurred.

Ｓ４０３６では、制御部４０８は、特徴点Ｘの候補点に対して絶対距離値の信頼度を算出する。 In S4036, the control unit 408 calculates the reliability of the absolute distance value for the feature point X candidate point.

Ｓ４０３７では、特徴点Ｘの候補点から特徴点Ｘを決定する。決定の方法は、例えば単眼ステレオカメラで算出した信頼度と、複眼ステレオカメラで取得した信頼度の総合順位が最も高かった位置等とする。 In S4037, the feature point X is determined from the candidate points of the feature point X. FIG. The determination method is, for example, the position where the total ranking of the reliability calculated by the monocular stereo camera and the reliability obtained by the compound eye stereo camera is highest.

Ｓ４０３８では、使用者がシャッターボタン４０５を押して撮影する。 In S4038, the user presses the shutter button 405 to take an image.

Ｓ４０３９では、撮影した複眼ステレオカメラの２枚の画像から特徴点Ｘの絶対距離値を取得する。絶対距離値の取得方法は前述と同じであるが、ウィンドウマッチングによって計算する場合には、特徴点Ｘを含んでいてできるだけ小さいウィンドウを設定すればよい。特徴点マッチングにより計算する場合には、そのまま特徴点Ｘを２枚の画像でマッチングすればよい。Ｓ４０３９から、図２０のＳ４０４に戻る。 In S4039, the absolute distance value of the feature point X is obtained from the two images taken by the compound-eye stereo camera. The method of obtaining the absolute distance value is the same as described above, but when calculating by window matching, a window containing the feature point X and as small as possible should be set. In the case of calculation by feature point matching, the feature points X may be matched between two images as they are. From S4039, the process returns to S404 in FIG.

Ｓ４０４では、単眼ステレオカメラで取得した相対距離値を絶対距離値に変換する。図２５（ｂ）は、Ｓ４０４についての詳しい処理を示すフローチャートである。Ｓ４０４は、Ｓ４０４１～Ｓ４０４２の処理からなる。 In S404, the relative distance value acquired by the monocular stereo camera is converted into an absolute distance value. FIG. 25(b) is a flowchart showing detailed processing of S404. S404 consists of the processing of S4041 and S4042.

Ｓ４０４１では、特徴点Ｘに対する相対距離値と絶対距離値を用いて、全被写体に対して相対距離値を絶対距離値に変換する変換式（変換関係）を算出する。例えば、特徴点Ｘの相対距離値がｚｒで、絶対距離値がＺａ［ｍ］である場合には、Ｚ＝（Ｚａ／ｚｒ）×ｚとすればよい。ここで、ｚはある被写体の相対距離値であり、Ｚはそれに対する絶対距離値である。 In S4041, using the relative distance value and the absolute distance value for the feature point X, a conversion formula (conversion relation) for converting the relative distance value to the absolute distance value for all subjects is calculated. For example, when the relative distance value of the feature point X is zr and the absolute distance value is Za[m], Z=(Za/zr)×z. where z is the relative distance value of an object and Z is the absolute distance value relative to it.

Ｓ４０４２では、変換式を用いて単眼ステレオカメラで取得した相対距離値を絶対距離値に変換する。ここで、変換する範囲をＳ４０３８で撮影した構図の範囲に限定するなどすれば、使用者の取得したい構図の絶対距離値が高精度に得られる。 In S4042, the relative distance value obtained by the monocular stereo camera is converted into an absolute distance value using a conversion formula. Here, if the conversion range is limited to the range of the composition photographed in S4038, the absolute distance value of the composition desired by the user can be obtained with high accuracy.

本実施形態の方法を用いることにより、相対距離値の信頼度が高く、かつ絶対距離値の信頼度が高い像の位置で、相対距離値から絶対距離値への変換式を決定することができる。そして、その変換式を用いて他の像に対しても変換を行うことで、全被写体に対して高精度な絶対距離値を得ることができる。 By using the method of this embodiment, it is possible to determine the conversion formula from the relative distance value to the absolute distance value at the position of the image where the reliability of the relative distance value is high and the reliability of the absolute distance value is high. . Then, by converting other images using the conversion formula, highly accurate absolute distance values can be obtained for all subjects.

＜第５の実施形態＞
次に、本発明の第５の実施形態について説明する。
上述の第４の実施形態では、撮像装置を２眼ステレオカメラの構成としたが、撮像素子を瞳分割型にして、撮像部は撮像部４０１または撮像部４０２の一方だけにすると、撮像部が１つになって構成が単純化される。 <Fifth Embodiment>
Next, a fifth embodiment of the invention will be described.
In the above-described fourth embodiment, the imaging device is configured as a twin-lens stereo camera. Combining them into one simplifies the configuration.

図２７は、瞳分割型の撮像素子の構造と距離算出の原理を示す図である。図２７（ａ）は、被写体にピントが合っている状態を示し、図２７（ｂ）は、被写体がピント位置より手前に位置する状態を示している。 FIG. 27 is a diagram showing the structure of a split-pupil imaging device and the principle of distance calculation. FIG. 27(a) shows a state in which the subject is in focus, and FIG. 27(b) shows a state in which the subject is located in front of the focus position.

撮像素子９０１は、瞳分割型の構造を有しており、画素９０３の内部は２つのサブ画素９０４とサブ画素９０５に分かれている。被写体で反射された光のうち、一方の光束は結像光学系９０２の端部を通過してサブ画素９０４に受光され、他方の光束は結像光学系９０２の逆の端部を通過してサブ画素９０５に受光される。画像９０６と画像９０８は、サブ画素９０４によって受光された光から生成された画像であり、画像９０７と画像９０９は、サブ画素９０５によって受光された光から生成された画像である。 The imaging device 901 has a split-pupil structure, and the interior of a pixel 903 is divided into two sub-pixels 904 and 905 . Of the light reflected by the subject, one light flux passes through the end of the imaging optical system 902 and is received by the sub-pixels 904, and the other light flux passes through the opposite end of the imaging optical system 902. Light is received by sub-pixels 905 . Images 906 and 908 are images generated from light received by subpixel 904 , and images 907 and 909 are images generated from light received by subpixel 905 .

図２７（ａ）に示すように、ピント位置にある被写体からの光は、同一画素内のサブ画素９０４とサブ画素９０５に受光されるため、画像９０６と画像９０７に写った被写体には視差がない。これに対して、図２７（ｂ）に示すように、ピント位置からずれた被写体からの光は、異なる画素のサブ画素９０４とサブ画素９０５に受光されるため、画像９０８と画像９０９に写った被写体には視差が生じる。この視差から絶対距離を算出することができる。 As shown in FIG. 27A, the light from the subject at the focus position is received by the sub-pixels 904 and 905 in the same pixel, so there is a parallax between the subjects in the images 906 and 907. do not have. On the other hand, as shown in FIG. 27B, light from an out-of-focus subject is received by sub-pixels 904 and 905 of different pixels, so that it appears in images 908 and 909. Parallax occurs in the subject. An absolute distance can be calculated from this parallax.

瞳分割型の撮像素子を用いた場合には、レンズを共有した複眼となるため、絞りの大きさにより複眼ステレオカメラの基線長が決まる。このため、特徴点Ｘの候補となる点を被写界深度内に収めようとして絞りを小さくすると、基線長が短くなってしまい高精度な絶対距離値を取得することができない。この場合には、瞳分割型撮像素子によって取得される絶対距離値の信頼度を、デフォーカスの大きさと基線長の両方を考慮したものとすればよい。また、先の説明のように、絞りは固定したまま、特徴点Ｘの候補となる点の信頼度が高くなるような位置にピントを設定しなおしてもよい。 When a split-pupil imaging device is used, the compound eyes share a lens, so the size of the aperture determines the base length of the compound eye stereo camera. For this reason, if the aperture is reduced in order to keep the candidate point of the feature point X within the depth of field, the base line length will be shortened, making it impossible to obtain a highly accurate absolute distance value. In this case, the reliability of the absolute distance value obtained by the split-pupil imaging device may be determined by considering both the magnitude of the defocus and the baseline length. Further, as described above, the aperture may be fixed and the focus may be reset to a position where the reliability of the candidate for the feature point X increases.

以上の説明では、単眼ステレオカメラで取得した相対距離値を複眼ステレオカメラで取得した絶対距離値に合わせて、高精度な絶対距離値を算出する方法について説明したが、さらに両者の距離値を組み合わせてもよい。 In the above explanation, we explained how to calculate a highly accurate absolute distance value by matching the relative distance value obtained with the monocular stereo camera with the absolute distance value obtained with the compound eye stereo camera. may

例えば、動的な被写体を含むシーンの絶対距離値を取得したい場合には、単眼ステレオカメラでは動的被写体の相対距離値を取得することができない。この場合には、動的被写体の絶対距離値は複眼ステレオカメラで取得したものをそのまま用いるとよい。 For example, when it is desired to obtain an absolute distance value of a scene including a dynamic subject, a monocular stereo camera cannot obtain a relative distance value of the dynamic subject. In this case, the absolute distance value of the dynamic object obtained by the compound-eye stereo camera may be used as it is.

単眼ステレオカメラでは撮影時刻の異なるフレーム間の画像を用いて像の対応点を算出しているため、動的被写体を含んだ画像の場合には対応点が見つからない場合がある。また、動的被写体の動きとカメラの位置姿勢変化の区別ができないこともある。これに対して、複眼ステレオカメラでは撮影時刻を一致させることができるので、２画像間で動的被写体を静止被写体と同様に扱うことができ、絶対距離値を算出することができる。これを用いて、濃淡で表した絶対距離画像を作る場合には、静止被写体に対しては単眼ステレオカメラで取得した相対距離値を静止被写体から抽出した特徴点Ｘで絶対距離値に変換した距離像を用いる。そして、動的被写体に対しては複眼ステレオカメラで取得した絶対距離値を用いる。動的被写体か否かの判別は、機械学習などを用いて判別してもよいし、単眼ステレオカメラで取得した相対距離画像にはなくて、複眼ステレオカメラで取得した絶対距離画像にある被写体を動的被写体としてもよく、両者を組み合わせてもよい。 Since the monocular stereo camera calculates the corresponding points of the image using the images between the frames taken at different times, it may not be possible to find the corresponding points in the case of the image including a dynamic subject. Also, it may not be possible to distinguish between the movement of the dynamic subject and the change in the camera's position and orientation. On the other hand, since the compound-eye stereo camera can match the photographing times, it is possible to treat a dynamic subject between two images in the same manner as a static subject, and to calculate an absolute distance value. Using this, when creating an absolute distance image represented by gradation, the distance obtained by converting the relative distance value obtained by the monocular stereo camera to the absolute distance value with the feature point X extracted from the stationary object use an image. Then, the absolute distance value obtained by the compound-eye stereo camera is used for the dynamic subject. Determination of whether or not the subject is dynamic may be performed using machine learning or the like, and the subject is not in the relative distance image obtained by the monocular stereo camera, but in the absolute distance image obtained by the compound eye stereo camera. A dynamic subject may be used, or both may be combined.

＜他の実施形態＞
なお、本発明は、複数の機器から構成されるシステムに適用しても、一つの機器からなる装置に適用してもよい。 <Other embodiments>
The present invention may be applied to a system composed of a plurality of devices or to an apparatus composed of a single device.

また、本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 Further, the present invention supplies a program that implements one or more functions of the above-described embodiments to a system or device via a network or a storage medium, and one or more processors in the computer of the system or device executes the program. It can also be realized by a process of reading and executing. It can also be implemented by a circuit (for example, ASIC) that implements one or more functions.

発明は上記実施形態に制限されるものではなく、発明の精神及び範囲から離脱することなく、様々な変更及び変形が可能である。従って、発明の範囲を公にするために請求項を添付する。 The invention is not limited to the embodiments described above, and various modifications and variations are possible without departing from the spirit and scope of the invention. Accordingly, the claims are appended to make public the scope of the invention.

１００：撮像装置、１０１：撮像部、１０２：メモリ、１０３，３０３：鑑賞画像生成部、１０４：単写距離画像生成部、１０５：マルチビュー距離画像生成部、１０６：距離画像統合部、１０１１：光学系、１０１２：撮像素子、２０４：演算処理部、４００：撮像装置、４０１，４０２：撮像部、４０３：演算部、４０４：記憶部、４０５：シャッターボタン、４０６：操作部、４０７：表示部、４０８：制御部 100: imaging device, 101: imaging unit, 102: memory, 103, 303: viewing image generation unit, 104: single-shot distance image generation unit, 105: multi-view distance image generation unit, 106: distance image integration unit, 1011: Optical system 1012: Imaging device 204: Arithmetic processing unit 400: Imaging device 401, 402: Imaging unit 403: Arithmetic unit 404: Storage unit 405: Shutter button 406: Operation unit 407: Display unit , 408: control unit

Claims

acquisition means for acquiring a plurality of different viewpoint images of the same scene photographed from different positions and at least a pair of parallax image pairs having parallax due to pupil division;
a first generation means for generating a first distance image from the pair of parallax images;
a second generation means for generating a second distance image from the plurality of different-viewpoint images;
Integrating means for integrating the first range image and the second range image to generate an integrated range image.

The integration means supplements the distance information of the area where the subject is moving among the distance information forming the second distance image with the distance information of the same area of the first distance image, thereby 2. The image processing apparatus according to claim 1, wherein the first distance image and the second distance image are integrated.

The integrating means may combine distance information constituting the first distance image with an area in which the distance information indicates a distance shorter than a predetermined distance and a distance within a predetermined range from the in-focus distance. Integrating the first distance image and the second distance image by complementing the distance information of the area showing and the area excluding the area with the distance information of the same area of the second distance image 2. The image processing apparatus according to claim 1.

4. The image processing according to any one of claims 1 to 3, wherein said second generating means generates said second distance image based on parallax between said different viewpoint images. Device.

5. The image processing apparatus according to claim 4, wherein said second generating means generates said second range image using epipolar geometry from said different viewpoint images.

the acquisition means inputs a plurality of parallax image pairs obtained by photographing the same scene from different positions;
6. The image processing apparatus according to any one of claims 1 to 5, further comprising third generating means for generating said plurality of different-viewpoint images from said plurality of parallax image pairs.

The third generating means uses one of the plurality of parallax image pairs as a reference parallax image pair, and adds the plurality of parallax image pairs excluding the reference parallax image pair for each pixel to generate the plurality of different pairs. 7. The image processing apparatus according to claim 6, wherein a viewpoint image is generated.

The third generating means leaves one of the plurality of parallax image pairs as a reference parallax image pair, and adds the plurality of parallax image pairs including the reference parallax image pair for each pixel, thereby obtaining the 7. The image processing apparatus according to claim 6, wherein a plurality of images of different viewpoints are generated.

The third generating means generates a plurality of different-viewpoint images while sequentially changing the reference parallax image pair among the plurality of parallax image pairs,
The integration means integrates a first distance image generated from the reference parallax image pair sequentially shifted and a second distance image generated from the corresponding plurality of different viewpoint images, and a plurality of the The image processing device according to claim 7 or 8, wherein an integrated range image is generated.

The third generating means performs defocus recovery processing on the plurality of parallax image pairs, and generates the plurality of different-viewpoint images from the plurality of parallax image pairs subjected to the defocus recovery processing. 10. The image processing apparatus according to any one of claims 6 to 9.

11. The method according to claim 10, wherein the focus blur recovery processing includes deconvolution processing or MAP estimation processing performed by estimating a focus blur kernel, and deep learning processing by end-to-end processing using an encoder-decoder structure. image processing device.

the acquisition means inputs a plurality of parallax image pairs obtained by photographing the same scene from different positions;
The first generating means generates the first distance images for each of a plurality of predetermined parallax image pairs among the plurality of parallax image pairs, and generates the first distance images among the plurality of generated first distance images. Using one of the first distance images as a reference, the other first distance images are integrated by viewpoint conversion,
6. The image processing apparatus according to claim 1, wherein said integrating means integrates said integrated first distance image and said second distance image.

An image processing device according to any one of claims 1 to 12;
An imaging apparatus, comprising: imaging means for imaging the at least one pair of parallax images as at least a part of the acquisition means.

14. The imaging apparatus according to claim 13, wherein said imaging means captures said at least one pair of parallax images after capturing said plurality of images of different viewpoints.

14. The imaging apparatus according to claim 13, wherein said imaging means captures said plurality of different-viewpoint images after capturing said at least one pair of parallax images.

14. The imaging apparatus according to claim 13, wherein said imaging means captures said plurality of different-viewpoint images before and after imaging said at least one pair of parallax images.

17. The imaging according to any one of claims 13 to 16, wherein the imaging means narrows down the aperture when shooting the plurality of images of different viewpoints than when shooting the at least one pair of parallax images. Device.

17. The photographing means sets photographing conditions respectively when photographing the at least one pair of parallax images and when photographing the plurality of images of different viewpoints. The imaging device according to .

an obtaining step in which the obtaining means obtains a plurality of different viewpoint images of the same scene photographed from different positions and at least one pair of parallax images having parallax due to pupil division;
a first generating step in which a first generating means generates a first distance image from the pair of parallax images;
a second generating step in which a second generating means generates a second distance image from the plurality of different-viewpoint images;
An image processing method, comprising: an integrating step of integrating the first range image and the second range image to generate an integrated range image.

A program for causing a computer to function as each means of the image processing apparatus according to any one of claims 1 to 12.

A computer-readable storage medium storing the program according to claim 20.

at least two imaging units with a known distance between the optical axes;
Based on a plurality of first images having parallax with each other taken at approximately the same time using the imaging unit, a plurality of absolute distance values to the subject and their reliability are calculated, and the imaging unit is used to calculate calculating means for calculating a plurality of relative distance values to the subject and their reliability based on a plurality of second images having parallax with each other taken at different times;
The calculating means uses, of the plurality of absolute distance values and the plurality of relative distance values, an absolute distance value and a relative distance value that have a relatively high degree of reliability and that correspond to substantially the same region of the subject. , and obtains a conversion relationship between an absolute distance value and a relative distance value.

23. The imaging apparatus according to claim 22, wherein said calculating means calculates said plurality of absolute distance values using window matching.

24. The imaging apparatus according to claim 23, wherein said calculation means determines the reliability of said absolute distance value or the reliability of said relative distance value based on accuracy of window matching.

23. The imaging apparatus according to claim 22, wherein said calculation means determines the reliability of said absolute distance value or the reliability of said relative distance value based on accuracy of matching of feature points.

26. The imaging apparatus according to any one of claims 22 to 25, wherein said at least two imaging units are composed of pupil-divided pixels of an imaging device.

27. The imaging apparatus according to any one of claims 22 to 26, wherein said calculator calculates the reliability of said absolute distance value based on the magnitude of defocus amount of said imaging unit.

28. The imaging apparatus according to claim 27, wherein said calculation means performs calculation so that the reliability of absolute distance values within the depth of field of said imaging unit is relatively high.

The calculating means calculates the relative distance value before photographing the plurality of first images, and uses the reliability of the relative distance value to increase the reliability of the absolute distance value. 29. The imaging apparatus according to any one of claims 22 to 28, wherein the imaging conditions for one image are determined.

30. An imaging apparatus according to claim 29, wherein said imaging condition is an aperture value of said imaging unit.

30. An imaging apparatus according to claim 29, wherein said imaging condition is a focus position of said imaging unit.

The calculation means generates a first absolute distance image using the absolute distance value for the subject, converts the relative distance value to an absolute distance value for the other subject using the conversion relationship, and converts the relative distance value to the absolute distance value. 32. The imaging apparatus according to any one of claims 22 to 31, wherein two absolute distance images are generated, and the first absolute distance image and the second absolute distance image are integrated.

33. The imaging apparatus according to any one of claims 22 to 32, wherein said calculation means converts a relative distance value of another object into an absolute distance value using said conversion relationship.

34. A method according to any one of claims 22 to 33, wherein said calculating means calculates said absolute distance value based on a plurality of said first images and an interval between said optical axes. imaging device.

A method of controlling an imaging device comprising at least two imaging units with a known distance between the optical axes, comprising:
Based on a plurality of first images having parallax with each other taken at approximately the same time using the imaging unit, a plurality of absolute distance values to the subject and their reliability are calculated, and the imaging unit is used to calculate a calculating step of calculating a plurality of relative distance values to the subject and their reliability based on a plurality of second images having parallax with each other taken at different times;
In the calculating step, of the plurality of absolute distance values and the plurality of relative distance values, an absolute distance value and a relative distance value having a relatively high degree of reliability and corresponding to substantially the same region of the subject are used. A control method for an imaging device, comprising: obtaining a conversion relationship between an absolute distance value and a relative distance value.

A program for causing a computer to execute the control method according to claim 35.

A computer-readable storage medium storing a program for causing a computer to execute the control method according to claim 35.