JP7200002B2

JP7200002B2 - Image processing device, imaging device, image processing method, program, and storage medium

Info

Publication number: JP7200002B2
Application number: JP2019030595A
Authority: JP
Inventors: 隆弘高橋
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-02-22
Filing date: 2019-02-22
Publication date: 2023-01-06
Anticipated expiration: 2039-02-22
Also published as: JP2020134399A

Description

本発明は、被写体の速度を取得する画像処理装置、画像処理装置を備える撮像装置、画像処理方法、プログラム、および記憶媒体に関する。 The present invention relates to an image processing device that obtains the velocity of a subject, an imaging device that includes the image processing device, an image processing method, a program, and a storage medium.

撮影した被写体までの距離を検出し、撮影画像中の被写体像に対応する被写体の実際の移動速度を計測する電子カメラが提案されている（特許文献１）。また、距離センサで検出した被写体までの距離とジャイロセンサで検出したパンニング角度とに基づいて被写体の移動速度を求める撮像装置が提案されている（特許文献２）。 An electronic camera has been proposed that detects the distance to a photographed subject and measures the actual moving speed of the subject corresponding to the subject image in the photographed image (Patent Document 1). Further, an imaging apparatus has been proposed that obtains the moving speed of a subject based on the distance to the subject detected by a distance sensor and the panning angle detected by a gyro sensor (Patent Document 2).

特開２００５－１５９６７４号公報JP 2005-159674 A 特開２００７－２２５５５０号公報JP 2007-225550 A

様々な物体が被写体となり得る上に、被写体を撮像する環境も多岐に亘る。そのため、被写体の運動の様子によっては、被写体の速度を精度良く取得することが困難である場合がある。例えば、被写体が低い速度でカメラに向かって移動しているような場合には、フレーム間における被写体の移動距離が短いので、取得した被写体速度の精度が低いことがある。 Various objects can be subjects, and the environment in which the subjects are imaged is also diverse. Therefore, it may be difficult to accurately acquire the speed of the subject depending on the motion of the subject. For example, when the subject is moving toward the camera at a low speed, the accuracy of the obtained subject speed may be low because the subject moves a short distance between frames.

以上の事情に鑑み、本発明の目的は、撮像画像内の被写体像に対応する被写体の速度を高精度に取得できる画像処理装置、画像処理装置を備える撮像装置、画像処理方法、プログラム、および記憶媒体を提供することにある。 In view of the above circumstances, an object of the present invention is to provide an image processing apparatus capable of obtaining with high accuracy the velocity of a subject corresponding to a subject image in a captured image, an imaging apparatus provided with the image processing apparatus, an image processing method, a program, and a memory. It is to provide a medium.

上記目的を達成するために、本発明の画像処理装置は、複数の撮像画像に含まれる基準画像内の被写体像に対応する被写体像を含む複数の対応画像を複数の前記撮像画像から選択する追跡部と、前記基準画像および前記対応画像の各々について、当該基準画像または当該対応画像を取得した取得装置と当該被写体像に対応する被写体との距離に関する距離情報を取得する距離取得部と、前記基準画像および前記対応画像の各々について前記被写体の位置に関する位置情報を取得し、前記距離情報と前記位置情報とを用いて前記基準画像を基準とした前記被写体の３次元位置情報を取得する位置取得部と、前記対応画像の各々に関する信頼度に基づいて選択された複数の前記対応画像における前記３次元位置情報を用いて前記被写体像に対応する被写体の速度を取得する速度取得部と、を備える。 To achieve the above object, the image processing apparatus of the present invention provides a tracking method that selects a plurality of corresponding images including a subject image corresponding to a subject image in a reference image included in a plurality of captured images from a plurality of the captured images. a distance acquisition unit configured to acquire, for each of the reference image and the corresponding image, distance information relating to a distance between an acquisition device that acquired the reference image or the corresponding image and a subject corresponding to the subject image; A position acquisition unit that acquires position information regarding the position of the subject for each of the image and the corresponding image, and acquires three-dimensional position information of the subject with reference to the reference image using the distance information and the position information. and a speed acquisition unit that acquires the speed of the subject corresponding to the subject image using the three-dimensional position information in the plurality of corresponding images selected based on the reliability of each of the corresponding images.

本発明によれば、撮像画像内の被写体像に対応する被写体の速度を高精度に取得できる。 According to the present invention, the velocity of the subject corresponding to the subject image in the captured image can be obtained with high accuracy.

本発明の実施形態に係る画像処理装置を含む撮像装置のブロック図である。1 is a block diagram of an imaging device including an image processing device according to an embodiment of the present invention; FIG. 本発明の実施形態に係る撮像素子の説明図である。1 is an explanatory diagram of an imaging device according to an embodiment of the present invention; FIG. 本発明の実施形態に係る速度取得処理を示すフローチャートである。4 is a flowchart showing speed acquisition processing according to the embodiment of the present invention; 本発明の実施形態に係る表示部に表示される映像に関する説明図である。FIG. 4 is an explanatory diagram relating to an image displayed on the display unit according to the embodiment of the present invention; 本発明の実施形態に係る３次元位置再構成の説明図である。FIG. 4 is an explanatory diagram of three-dimensional position reconstruction according to the embodiment of the present invention;

以下、本発明の実施形態について添付図面を参照しながら詳細に説明する。以下に説明される実施形態は本発明を実現可能な構成の一例に過ぎない。以下の実施形態は、本発明が適用される装置の構成や各種条件に応じて適宜修正または変更することが可能である。したがって、本発明の範囲は以下の実施形態に記載される構成によって限定されるものではない。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. The embodiments described below are merely examples of configurations that can implement the present invention. The following embodiments can be appropriately modified or changed according to the configuration of the device to which the present invention is applied and various conditions. Therefore, the scope of the invention is not limited by the configurations described in the following embodiments.

本実施形態の撮像装置１は、本発明に係る画像処理装置を含む。本実施形態の撮像装置１は、デジタルスチルカメラ、デジタルビデオカメラ、監視カメラ、工業用カメラ、医療用カメラ等の種々の電子カメラ装置に適用することが可能である。また、本実施形態の撮像装置１を、スマートフォン、タブレット端末等の撮像機能を有する種々の情報処理装置に適用することも可能である。 The imaging device 1 of this embodiment includes an image processing device according to the present invention. The imaging device 1 of this embodiment can be applied to various electronic camera devices such as digital still cameras, digital video cameras, surveillance cameras, industrial cameras, and medical cameras. Also, the imaging device 1 of the present embodiment can be applied to various information processing devices having an imaging function, such as smartphones and tablet terminals.

＜撮像装置の構成＞
図１は、本発明の実施形態に係る撮像装置１の構成を模式的に示すブロック図である。撮像装置１は、結像光学系１０、撮像素子１１、制御部１２、画像処理装置１３、記憶部１４、入力部１５、表示部１６、および通信部１７を有する。 <Structure of Imaging Device>
FIG. 1 is a block diagram schematically showing the configuration of an imaging device 1 according to an embodiment of the invention. The imaging device 1 has an imaging optical system 10 , an imaging device 11 , a control section 12 , an image processing device 13 , a storage section 14 , an input section 15 , a display section 16 and a communication section 17 .

結像光学系１０は、複数のレンズ群（不図示）を有する撮影レンズであって、被写体の像を撮像素子１１上に結像させる。結像光学系１０は、撮像装置１に対して着脱可能であってもよいし、撮像装置１と一体であってもよい。着脱可能である場合、結像光学系１０は撮像装置１が有する構成要素ではないと理解し得る。撮像素子１１から所定距離だけ離れた位置に結像光学系１０の射出瞳１０１が存在する。本明細書において、ｚ軸は、結像光学系１０の光軸１０２と平行の関係にある。また、ｘ軸およびｙ軸は、互いに直交すると共に、それぞれｚ軸および光軸に直交する。 The imaging optical system 10 is a photographing lens having a plurality of lens groups (not shown), and forms an image of a subject on the imaging device 11 . The imaging optical system 10 may be detachable from the imaging device 1 or may be integrated with the imaging device 1 . If it is detachable, it can be understood that the imaging optical system 10 is not a component of the imaging device 1 . An exit pupil 101 of the imaging optical system 10 exists at a position separated from the imaging device 11 by a predetermined distance. In this specification, the z-axis is parallel to the optical axis 102 of the imaging optical system 10 . Also, the x-axis and the y-axis are orthogonal to each other and orthogonal to the z-axis and the optical axis, respectively.

撮像素子１１は、ＣＭＯＳ（相補型金属酸化膜半導体）やＣＣＤ（電荷結合素子）等を含むイメージセンサであって、撮像面位相差測距方式（像面位相差方式）による測距機能を有する。撮像素子１１は、結像光学系１０を介して撮像素子１１上に結像した被写体像を光電変換することによって、被写体像に対応する画像信号を生成し出力する。 The imaging device 11 is an image sensor including a CMOS (complementary metal oxide semiconductor), a CCD (charge-coupled device), or the like, and has a range-finding function by an imaging plane phase difference ranging method (image plane phase difference method). . The imaging device 11 photoelectrically converts a subject image formed on the imaging device 11 via the imaging optical system 10 to generate and output an image signal corresponding to the subject image.

制御部１２は、ＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）等の素子を含む制御装置であって、撮像装置１の各部を制御する。制御部１２は、例えば、オートフォーカス（ＡＦ）による自動焦点合わせ、フォーカス位置の変更、Ｆ値（絞り）の変更、および画像の取込みに関する制御、並びに記憶部１４、入力部１５、表示部１６、および通信部１７に対する制御を行う。 The control unit 12 is a control device including elements such as a CPU (Central Processing Unit), a ROM (Read Only Memory), and a RAM (Random Access Memory), and controls each unit of the imaging device 1 . The control unit 12 controls, for example, automatic focusing by autofocus (AF), change of focus position, change of F-number (aperture), and image capture, as well as a storage unit 14, an input unit 15, a display unit 16, and controls the communication unit 17 .

画像処理装置１３は、画像処理に関する種々の処理を実行する論理回路である。画像処理装置１３は、画像生成部１３０、メモリ１３１、追跡部１３２、距離取得部１３３、位置取得部１３４、および速度取得部１３５を有する。画像生成部１３０は、撮像素子１１が出力する画像信号に対して、ノイズ除去、デモザイキング、輝度信号変換、収差補正、ホワイトバランス調整、色補正などの信号処理を実行して撮像画像（映像データ）を生成する。メモリ１３１は、画像生成部１３０が出力した撮像画像を蓄積する。追跡部１３２は、速度取得の対象である被写体像を含む複数の撮像画像を選択し、被写体像の画像上の位置を追跡する。距離取得部１３３は、測距機能を有する撮像装置１によって得られた撮像画像に写っている被写体像に対応する被写体と撮像装置１との距離に関する距離情報を、後述する視差量に基づいて取得（算出）する。位置取得部１３４は、撮像画像に写っている被写体像の位置に関する位置情報および撮像画像を取得したときの撮像装置１の位置および姿勢を示す運動情報を取得（算出）する。以上の運動情報は、撮像装置１が備える不図示のジャイロセンサおよび加速度センサから位置取得部１３４に供給されると好適である。以上の距離情報と位置情報と運動情報とは、対応する撮像画像と関連付けられてメモリ１３１に記憶されると好適である。また、位置取得部１３４は、以上の距離情報と位置情報と運動情報とに基づいて、各時刻における被写体像の３次元位置情報を取得（算出）する。速度取得部１３５は、後述する信頼度に基づいて選択された撮像画像の３次元位置情報を用いて被写体像に対応する被写体の速度を取得（算出）する。なお、画像処理装置１３は、上記した処理を実現するプログラムを格納したメモリと以上のプログラムを実行するＣＰＵとで構成されていてもよい。 The image processing device 13 is a logic circuit that executes various processes related to image processing. The image processing device 13 has an image generation unit 130 , a memory 131 , a tracking unit 132 , a distance acquisition unit 133 , a position acquisition unit 134 and a speed acquisition unit 135 . The image generation unit 130 performs signal processing such as noise removal, demosaicing, luminance signal conversion, aberration correction, white balance adjustment, and color correction on the image signal output from the image sensor 11 to generate a captured image (video data). ). The memory 131 accumulates captured images output by the image generator 130 . The tracking unit 132 selects a plurality of captured images including the subject image for which velocity is to be obtained, and tracks the position of the subject image on the image. The distance acquisition unit 133 acquires distance information about the distance between the imaging device 1 and the subject corresponding to the subject image captured in the captured image obtained by the imaging device 1 having a distance measuring function, based on the parallax amount described later. (calculate. The position acquisition unit 134 acquires (calculates) position information about the position of the subject image in the captured image and motion information indicating the position and orientation of the imaging device 1 when the captured image was acquired. The motion information described above is preferably supplied to the position acquisition unit 134 from a gyro sensor and an acceleration sensor (not shown) included in the imaging device 1 . The above distance information, position information, and motion information are preferably stored in the memory 131 in association with the corresponding captured image. Also, the position acquisition unit 134 acquires (calculates) three-dimensional position information of the subject image at each time based on the distance information, the position information, and the motion information. The velocity acquisition unit 135 acquires (calculates) the velocity of the subject corresponding to the subject image using the three-dimensional position information of the captured image selected based on the reliability to be described later. Note that the image processing device 13 may be composed of a memory storing a program for realizing the above processing and a CPU executing the above program.

記憶部１４は、フラッシュメモリ等の不揮発性の記憶媒体であって、撮像装置１にて取得された撮像画像（映像データ）、撮像情報、センサデータ、中間データ、撮像装置１で使用されるパラメータ等を記憶する。撮像情報は、撮像画像に関連付けられその撮像画像の撮像条件を示す情報である。記憶部１４は、高速に読取り／書込みが可能であると共に容量の大きい記憶媒体であることが好ましい。 The storage unit 14 is a non-volatile storage medium such as a flash memory, and stores captured images (video data) acquired by the imaging device 1, imaging information, sensor data, intermediate data, and parameters used by the imaging device 1. etc. is stored. The imaging information is information associated with a captured image and indicating the imaging conditions of the captured image. The storage unit 14 is preferably a high-speed readable/writable storage medium with a large capacity.

入力部１５は、例えばダイヤル、ボタン、スイッチ等によって構成される操作インターフェイスであって、ユーザによる情報入力や設定変更に用いられる。表示部１６は、液晶ディスプレイや有機ＥＬディスプレイ等の表示手段であって、ユーザに対する情報、例えば、メモリ１３１に蓄積されている撮像画像、撮影時の構図、各種の設定画面、およびメッセージ情報等を表示する。なお、入力部１５と表示部１６とを兼ねるタッチパネルが採用されてもよい。 The input unit 15 is an operation interface composed of, for example, dials, buttons, switches, etc., and is used by the user to input information and change settings. The display unit 16 is display means such as a liquid crystal display or an organic EL display, and displays information for the user, such as captured images stored in the memory 131, composition at the time of shooting, various setting screens, and message information. indicate. A touch panel that serves as both the input unit 15 and the display unit 16 may be employed.

通信部１７は、有線または無線によって撮像装置１と他の装置とを接続する通信インターフェイスであって、撮像画像、被写体位置、被写体速度等の情報を他の装置と送受信することが可能である。 The communication unit 17 is a communication interface that connects the imaging device 1 and other devices by wire or wirelessly, and is capable of transmitting and receiving information such as a captured image, subject position, and subject speed to and from other devices.

＜撮像素子の構成＞
図２は、本発明の実施形態に係る撮像面位相差測距方式に対応した撮像素子１１の説明図である。撮像素子１１は、ｘ軸方向およびｙ軸方向に亘って（すなわち、ｘｙ平面上に）配置された複数の画素を有する。図２（ａ）の画素断面図に示すように、各画素は、マイクロレンズ２０１、カラーフィルタ２０２、および１対の光電変換部２０３Ａ、２０３Ｂを有する。各画素には、カラーフィルタ２０２の波長帯域に応じたＲＧＢ（Red, Green, Blue）の分光特性が付与される。複数のカラーフィルタ２０２は、例えば、ベイヤー配列等の配色パターン（不図示）に従って配置されている。基板２０４内に形成される光電変換部２０３Ａ、２０３Ｂは、検出対象の波長帯域についての感度を有している。各画素には、不図示の配線が接続されている。各画素から出力された画素信号は、不図示の画素信号処理部に供給されて処理され、画像信号として撮像素子１１から出力される。なお、上記した複数の画素は、ｘｙ平面に配置されていなくてもよい。 <Structure of image sensor>
FIG. 2 is an explanatory diagram of the imaging device 11 compatible with the imaging surface phase difference ranging method according to the embodiment of the present invention. The imaging element 11 has a plurality of pixels arranged over the x-axis direction and the y-axis direction (that is, on the xy plane). As shown in the pixel cross-sectional view of FIG. 2A, each pixel has a microlens 201, a color filter 202, and a pair of photoelectric conversion units 203A and 203B. Each pixel is provided with RGB (Red, Green, Blue) spectral characteristics according to the wavelength band of the color filter 202 . The plurality of color filters 202 are arranged, for example, according to a coloration pattern (not shown) such as a Bayer arrangement. The photoelectric conversion units 203A and 203B formed in the substrate 204 have sensitivity with respect to the wavelength band to be detected. A wiring (not shown) is connected to each pixel. A pixel signal output from each pixel is supplied to a pixel signal processing unit (not shown), processed, and output from the image sensor 11 as an image signal. Note that the plurality of pixels described above need not be arranged on the xy plane.

図２（ｂ）は、光軸１０２と撮像素子１１との交点（中心像高）から見た結像光学系１０の射出瞳１０１を示す図である。射出瞳１０１は、相異なる瞳領域である第１瞳領域２１０と第２瞳領域２２０とに分割される。射出瞳１０１と各光電変換部２０３とは各マイクロレンズ２０１によって共役の関係にある。第１瞳領域２１０を主として通過した第１光束が光電変換部２０３Ａに入射し、第２瞳領域２２０を主として通過した第２光束が光電変換部２０３Ｂに入射する。光電変換部２０３Ａ，２０３Ｂは、それぞれ、入射した光束を光電変換してＡ画像（Ａ画像信号）およびＢ画像（Ｂ画像信号）を取得し、画像処理装置１３へと出力する。画像処理装置１３は、入力されたＡ画像およびＢ画像に対して種々の画像処理（画像生成、測距演算等）を実行し、実行結果を必要に応じて記憶部１４に記憶させる。画像処理装置１３は、Ａ画像とＢ画像とを加算して加算後の画像情報（撮像信号）を取得することができる。また、画像処理装置１３の距離取得部１３３は、以下のようにＡ画像とＢ画像との相関演算を実行してデフォーカス量ひいては被写体までの距離を取得（算出）することができる。 FIG. 2B is a diagram showing the exit pupil 101 of the imaging optical system 10 as viewed from the intersection point (center image height) between the optical axis 102 and the image sensor 11 . The exit pupil 101 is divided into a first pupil region 210 and a second pupil region 220, which are different pupil regions. The exit pupil 101 and each photoelectric conversion unit 203 are in a conjugate relationship due to each microlens 201 . The first light flux that has mainly passed through the first pupil region 210 enters the photoelectric conversion unit 203A, and the second light flux that has mainly passed through the second pupil region 220 enters the photoelectric conversion unit 203B. The photoelectric conversion units 203A and 203B obtain an A image (A image signal) and a B image (B image signal) by photoelectrically converting the incident light beams, and output them to the image processing device 13 . The image processing device 13 performs various types of image processing (image generation, distance measurement calculation, etc.) on the input A image and B image, and stores the execution results in the storage unit 14 as necessary. The image processing device 13 can add the A image and the B image to acquire image information (image pickup signal) after the addition. Further, the distance acquisition unit 133 of the image processing device 13 can acquire (calculate) the defocus amount and, in turn, the distance to the subject by executing the correlation calculation between the A image and the B image as follows.

図２（ｂ）は、第１瞳領域２１０の重心位置（第１重心位置２１１）と第２瞳領域２２０の重心位置（第２重心位置２２１）とを示す。本実施形態において、第１重心位置２１１は、射出瞳１０１の中心から第１軸２００（ｘ軸）に沿って偏心（移動）している。一方、第２重心位置２２１は、射出瞳１０１の中心から第１軸２００に沿って第１重心位置２１１とは逆方向に偏心（移動）している。図２（ｂ）において、第１重心位置２１１と第２重心位置２２１とを結ぶ方向が瞳分割方向に相当し、第１重心位置２１１と第２重心位置２２１との距離（重心間距離）が基線長２２２に相当する。 FIG. 2B shows the center-of-gravity position of the first pupil region 210 (first center-of-gravity position 211) and the center-of-gravity position of the second pupil region 220 (second center-of-gravity position 221). In this embodiment, the first center-of-gravity position 211 is decentered (moved) from the center of the exit pupil 101 along the first axis 200 (x-axis). On the other hand, the second center-of-gravity position 221 is decentered (moved) along the first axis 200 from the center of the exit pupil 101 in the direction opposite to the first center-of-gravity position 211 . In FIG. 2B, the direction connecting the first barycentric position 211 and the second barycentric position 221 corresponds to the pupil splitting direction, and the distance between the first barycentric position 211 and the second barycentric position 221 (the distance between the barycenters) is It corresponds to the baseline length 222 .

Ａ画像およびＢ画像の位置は、デフォーカスによって瞳分割方向（本例ではｘ軸方向）に対してそれぞれ変化する。これらの画像間の相対的な位置変化量、すなわち、Ａ画像とＢ画像との視差（位相差）量は、デフォーカス量（ピントのズレ量）に応じた値を示す。以上の視差量は、Ａ画像およびＢ画像の局所領域同士の相関演算を所定領域内にて実行し、最も類似度の高い位置の差分に応じて算出される。相関値の算出方式としては、例えば、一般的に用いられるＳＡＤ（Sum of Absolute Difference）法やＳＳＤ（Sum of Squared Difference）法を採用し得る。以上のように算出された視差量は、基線長を用いた幾何学的関係（例えば、三角測量の原理）に基づいてデフォーカス量に換算できる。デフォーカス量は、結像光学系１０の結像関係に基づいて被写体距離に変換できる。なお、以上の視差量に対して、所定の変換係数を乗算することによってデフォーカス量または被写体距離を取得する構成が採用されてもよい。以上のように、撮像面位相差測距機能を有する撮像素子１１を用いて撮像装置１（注目画素）から被写体までの距離を取得することができる。 The positions of the A image and the B image respectively change with respect to the pupil division direction (x-axis direction in this example) due to defocus. The amount of relative positional change between these images, that is, the amount of parallax (phase difference) between the A image and the B image indicates a value corresponding to the amount of defocus (the amount of defocus). The amount of parallax described above is calculated according to the difference between the positions with the highest degree of similarity by executing the correlation calculation between the local regions of the A image and the B image within a predetermined region. As a method for calculating the correlation value, for example, a generally used SAD (Sum of Absolute Difference) method or SSD (Sum of Squared Difference) method can be adopted. The amount of parallax calculated as described above can be converted into the amount of defocus based on a geometric relationship (for example, the principle of triangulation) using the baseline length. The defocus amount can be converted into an object distance based on the imaging relationship of the imaging optical system 10 . A configuration may be adopted in which the defocus amount or the subject distance is obtained by multiplying the parallax amount described above by a predetermined conversion coefficient. As described above, it is possible to obtain the distance from the imaging device 1 (pixel of interest) to the subject using the imaging device 11 having the imaging surface phase difference ranging function.

＜速度取得処理フロー＞
図３は、本実施形態における速度取得処理のフローチャートである。本実施形態の画像処理装置１３は、撮像装置１が撮影し記録した映像中の被写体の速度を以下のように取得（算出）する。 <Speed acquisition processing flow>
FIG. 3 is a flow chart of speed acquisition processing in this embodiment. The image processing device 13 of the present embodiment acquires (calculates) the velocity of the subject in the video captured and recorded by the imaging device 1 as follows.

ステップＳ３０１において、撮像装置１のユーザが、入力部１５を操作して速度取得の対象である映像４０１を選択する。図４（ａ）に示すように、記憶部１４に記憶されている選択可能な映像４０１が、表示部１６の表示領域に３行３列に亘って表示されている。映像４０１は、例えば、撮像面測距機能を有する撮像装置１の撮像素子１１を用いて取得された動画ファイル（映像データ）であって、測距に用いられる撮像情報が関連付けられた複数の撮像画像（複数のフレーム）を含んでいる。なお、映像４０１は、動画に限定されず、例えば、時系列的に取得され相対取得時刻が明らかな複数の静止画像であってもよい。 In step S301, the user of the imaging device 1 operates the input unit 15 to select the image 401 whose velocity is to be obtained. As shown in FIG. 4A, a selectable image 401 stored in the storage unit 14 is displayed in the display area of the display unit 16 over 3 rows and 3 columns. The video 401 is, for example, a moving image file (video data) acquired using the imaging device 11 of the imaging device 1 having an imaging plane ranging function, and is a plurality of images associated with imaging information used for ranging. Contains an image (multiple frames). Note that the video 401 is not limited to a moving image, and may be, for example, a plurality of still images obtained in chronological order and whose relative acquisition times are clear.

ステップＳ３０２において、ステップＳ３０１にて選択された映像４０１から速度算出の対象（追跡対象）である被写体像が含まれる基準フレーム（基準画像）Ｆｂが選択される。より具体的には、ユーザが、図４（ｂ）に示すように表示部１６に表示された映像４０１を参照しながら入力部１５を操作して基準フレームＦｂを選択する。ユーザは、スクロールバー４０２を操作することで任意の時刻に対応するフレームＦを表示部１６に表示させることができる。被写体が撮像装置１に向かって移動している場合、撮像装置１のより近くに被写体が位置するフレームＦを基準フレームＦｂとして選択すると好適である。被写体が撮像画像において相対的に大きく表示されるからであり、また、測距精度が被写体までの距離に影響されるからである。測距精度は測距方式に応じて変化するものの、一般的に、測距された距離には被写体距離に対して数％程度の誤差が含まれる。 In step S302, a reference frame (reference image) Fb including a subject image that is a target (tracking target) for velocity calculation is selected from the video 401 selected in step S301. More specifically, the user operates the input unit 15 to select the reference frame Fb while referring to the image 401 displayed on the display unit 16 as shown in FIG. 4B. The user can display a frame F corresponding to an arbitrary time on the display unit 16 by operating the scroll bar 402 . When the subject is moving toward the imaging device 1, it is preferable to select the frame F in which the subject is positioned closer to the imaging device 1 as the reference frame Fb. This is because the subject is displayed relatively large in the captured image, and the distance measurement accuracy is affected by the distance to the subject. Although the distance measurement accuracy varies depending on the distance measurement method, the measured distance generally includes an error of about several percent with respect to the object distance.

なお、ステップＳ３０２において、ユーザの選択によらずに、追跡部１３２が、撮像画像に関連付けられた撮像情報が示す撮像条件によって定まる測距精度に基づいて基準フレームＦｂを選択してもよい。測距精度に影響する撮像条件として、結像光学系１０の焦点距離、絞り値（Ｆ値）、被写体までの距離が例示される。以上の構成によれば、客観的なパラメータである撮像条件に基づいて、速度取得のために適切な基準フレームＦｂが選択され得る。 Note that in step S302, the tracking unit 132 may select the reference frame Fb based on the ranging accuracy determined by the imaging conditions indicated by the imaging information associated with the captured image, regardless of the user's selection. The focal length of the imaging optical system 10, the aperture value (F number), and the distance to the subject are exemplified as imaging conditions that affect the distance measurement accuracy. According to the above configuration, an appropriate reference frame Fb for velocity acquisition can be selected based on the imaging conditions, which are objective parameters.

次いで、ユーザが、入力部１５を操作して速度取得の対象である基準フレームＦｂ内（基準画像内）の被写体像を特定する。ユーザは、例えば、被写体像を囲む矩形領域４０３を直接特定することで被写体像（被写体領域）を指定したり、被写体像上または近傍の一点を特定することで自動的に設定される矩形領域４０３によって被写体像（被写体領域）を指定したりできる。また、ユーザの選択によらずに、追跡部１３２が一般物体認識処理によって被写体像（被写体領域）を指定する構成も採用可能である。 Next, the user operates the input unit 15 to specify the subject image within the reference frame Fb (within the reference image) that is the target of velocity acquisition. The user can, for example, specify a subject image (subject area) by directly specifying a rectangular area 403 surrounding the subject image, or specify a rectangular area 403 automatically set by specifying a point on or near the subject image. You can specify the subject image (subject area) with . Further, it is also possible to employ a configuration in which the tracking unit 132 designates a subject image (subject area) through general object recognition processing, regardless of user selection.

ステップＳ３０３において、追跡部１３２が、ステップＳ３０２にて指定された被写体像を基準フレームＦｂの前後の時刻に亘る複数のフレームＦにおいて追跡する。具体的には、図４（ｃ）に示すように、追跡部１３２は、基準フレームＦｂ内の被写体像（被写体領域）をテンプレートとして用いたテンプレートマッチングによって、テンプレートとの類似度が所定閾値より高い領域を有するフレーム（撮像画像）Ｆを選択する。すなわち、追跡部１３２は、被写体像の類似度に基づいてフレームＦを選択する。以上の構成によれば、指定された被写体像に類似する画像領域を有するフレームＦが適切に選択される。以上のように選択されたフレームＦは、基準フレームＦｂ内の指定された被写体像に対応する被写体像を含むので、以後、「対応フレーム（対応画像）Ｆ」と称する場合がある。また、以降、基準フレームＦｂに対応する時刻よりも前の時刻の対応フレームＦを「対応フレームＦｓ」と称し、基準フレームＦｂに対応する時刻よりも後の時刻の対応フレームＦを「対応フレームＦｅ」と称する場合がある。追跡部１３２は、基準フレームＦｂの前後に対応フレームＦが検出されなくなると追跡処理を終了する。 In step S303, the tracking unit 132 tracks the object image specified in step S302 in a plurality of frames F over time before and after the reference frame Fb. Specifically, as shown in FIG. 4(c), the tracking unit 132 performs template matching using a subject image (subject region) in the reference frame Fb as a template so that the degree of similarity with the template is higher than a predetermined threshold. A frame (captured image) F having a region is selected. That is, the tracking unit 132 selects the frame F based on the similarity of the subject images. According to the above configuration, a frame F having an image area similar to the designated subject image is appropriately selected. Since the frame F selected as described above includes the subject image corresponding to the designated subject image in the reference frame Fb, it may be hereinafter referred to as "corresponding frame (corresponding image) F". Further, hereinafter, a corresponding frame F at a time earlier than the time corresponding to the reference frame Fb will be referred to as a "corresponding frame Fs", and a corresponding frame F at a time later than the time corresponding to the reference frame Fb will be referred to as a "corresponding frame Fe ” may be called. The tracking unit 132 terminates the tracking process when the corresponding frame F is no longer detected before or after the reference frame Fb.

ステップＳ３０４において、位置取得部１３４が、基準フレームＦｂにおける撮像装置１の撮像位置（基準撮像位置）を基準として、対応フレームＦｓから対応フレームＦｅに亘る被写体の３次元位置を再構成する。より具体的には以下の通りである。 In step S304, the position acquisition unit 134 reconstructs the three-dimensional position of the subject from the corresponding frame Fs to the corresponding frame Fe based on the imaging position (reference imaging position) of the imaging device 1 in the reference frame Fb. More specifically, it is as follows.

距離取得部１３３が、基準フレームＦｂおよび対応フレームＦｓ～Ｆｅの各々について、指定された被写体像に対応する被写体と撮像装置１との距離（被写体距離）に関する距離情報を取得（算出）する。距離取得部１３３は、被写体領域（図４（ｂ）の矩形領域４０３）に相当する各画素について、図２を参照して前述した手法に基づいて距離を算出し、算出された複数の距離に関する統計値（中央値、平均値等）を被写体の代表距離として取得すると好適である。 The distance acquisition unit 133 acquires (calculates) distance information regarding the distance (subject distance) between the subject corresponding to the designated subject image and the imaging device 1 for each of the reference frame Fb and the corresponding frames Fs to Fe. The distance acquisition unit 133 calculates the distance for each pixel corresponding to the subject area (rectangular area 403 in FIG. 4B) based on the method described above with reference to FIG. It is preferable to obtain a statistical value (median value, average value, etc.) as the representative distance of the subject.

位置取得部１３４が、基準フレームＦｂおよび対応フレームＦｓ～Ｆｅの各々について、被写体領域である矩形領域４０３の中心を指定された被写体の位置に関する位置情報である代表位置として取得（算出）する。そして、以上の各フレームＦｓ～Ｆｅについて被写体の代表３次元位置（Ｘｉ，Ｙｉ，Ｚｉ）（ｉ＝Ｆｓ～Ｆｅ）を求める。 The position acquisition unit 134 acquires (calculates) the center of the rectangular area 403, which is the subject area, as a representative position, which is position information regarding the position of the designated subject, for each of the reference frame Fb and the corresponding frames Fs to Fe. Then, the representative three-dimensional position (Xi, Yi, Zi) (i=Fs to Fe) of the subject is obtained for each of the frames Fs to Fe.

次いで、位置取得部１３４が、基準フレームＦｂを撮像した際の撮像装置１の撮像位置（基準撮像位置）に対する各フレームＦｓ～Ｆｅの運動情報を取得する。運動情報は、基準撮像位置に対する撮像装置１の撮像位置の位置および姿勢の変化を示す情報である。位置取得部１３４は、メモリ１３１に記憶されている、各フレームＦｓ～Ｆｅに関連付けられたジャイロセンサおよび加速度センサからの出力情報に基づいて、運動情報を示す回転Ｒ（例えば、回転行列）および並進Ｔ（例えば、並進ベクトル）を求める。そして、位置取得部１３４が、各フレームＦｓ～Ｆｅの被写体の代表３次元位置を、基準フレームＦｂからの撮像装置１の位置および姿勢の変化の逆演算であるＲ^－１とＴ^－１とを適用して、基準フレームＦｂを基準とする３次元座標系に変換する。 Next, the position acquisition unit 134 acquires motion information of each frame Fs to Fe with respect to the imaging position (reference imaging position) of the imaging device 1 when the reference frame Fb was imaged. The motion information is information indicating changes in the position and posture of the imaging position of the imaging device 1 with respect to the reference imaging position. Based on the output information from the gyro sensor and the acceleration sensor associated with each frame Fs to Fe stored in the memory 131, the position acquisition unit 134 obtains rotation R (for example, rotation matrix) and translation R indicating motion information. Determine T (eg, translation vector). Then, the position acquisition unit 134 obtains the representative three-dimensional positions of the subject in each of the frames Fs to Fe by calculating R ⁻¹ and T ⁻¹ , which are inverse calculations of changes in the position and orientation of the imaging device 1 from the reference frame Fb. applied to transform into a three-dimensional coordinate system with reference to the reference frame Fb.

図５は、ステップＳ３０４の３次元位置再構成の説明図である。図５（ａ）の左列は、時刻ｔ１～ｔ４において撮像装置１が被写体を撮像している様子を示し、右列は、時刻ｔ１～ｔ４において撮像された撮像画像を示す。本例では、撮像装置１と被写体との距離が最も近い時刻ｔ３のフレームＦが基準フレームＦｂとして選択されている。対応フレームＦｓ～Ｆｅ（時刻ｔ１，ｔ２，ｔ４に相当）は追跡部１３２が被写体を追跡し得たフレームＦである。図５（ｂ）は、位置取得部１３４が、対応フレームＦｓ～Ｆｅについて取得された代表距離と変換後の代表３次元位置とに基づいて、基準フレームＦｂを基準として再構成した被写体の３次元位置を示す。図５（ｂ）は、再構成された被写体の３次元位置を上方から俯瞰した図である。また、図５（ｂ）は、基準フレームＦｂおよび対応フレームＦｓ～Ｆｅ（すなわち、時刻ｔ１～ｔ４）における被写体の３次元位置と、他の時刻の対応フレームＦにおける被写体の３次元位置とで示される移動軌跡を示している。 FIG. 5 is an explanatory diagram of the three-dimensional position reconstruction in step S304. The left column of FIG. 5(a) shows how the imaging device 1 captures images of the subject at times t1 to t4, and the right column shows captured images captured at times t1 to t4. In this example, the frame F at the time t3 when the distance between the imaging device 1 and the subject is the shortest is selected as the reference frame Fb. Corresponding frames Fs to Fe (corresponding to times t1, t2, and t4) are frames F in which the tracking unit 132 has been able to track the object. FIG. 5B shows a three-dimensional image of the object reconstructed by the position acquisition unit 134 with reference to the reference frame Fb based on the representative distances acquired for the corresponding frames Fs to Fe and the representative three-dimensional positions after conversion. indicate position. FIG. 5B is an overhead view of the reconstructed three-dimensional position of the subject. FIG. 5B shows the three-dimensional positions of the subject in the reference frame Fb and the corresponding frames Fs to Fe (that is, times t1 to t4) and the three-dimensional positions of the subject in the corresponding frames F at other times. It shows the trajectory of movement.

ステップＳ３０５，Ｓ３０６において、速度取得部１３５が、被写体の速度の取得に用いる対応フレームＦを信頼度Ｃに基づいて選択し、選択した対応フレームＦにおける被写体の３次元位置情報に基づいて被写体の速度を取得（算出）する。被写体距離や撮像条件に関するパラメータである信頼度Ｃの例を以下に説明する。 In steps S305 and S306, the velocity acquisition unit 135 selects the corresponding frame F to be used for acquiring the velocity of the subject based on the reliability C, and calculates the velocity of the subject based on the three-dimensional position information of the subject in the selected corresponding frame F. is obtained (calculated). An example of reliability C, which is a parameter related to subject distance and imaging conditions, will be described below.

信頼度Ｃの第１の例は、判断対象である対応フレームＦ（以下、対象フレームＦａと称する）における被写体距離（すなわち、撮像装置１と被写体との距離）Ｄａと基準フレームＦｂにおける被写体距離Ｄｂとの差分値Ｄｆ（Ｄｆ＝｜Ｄａ－Ｄｂ｜）である。速度取得部１３５は、閾値δ１（第１閾値）を上回る差分値Ｄｆ（Ｄｆ＞δ１）を示す対象フレームＦａ、を被写体の速度の取得に用いるべき対応フレームＦとして選択する。閾値δ１は距離分解能であって、撮像条件および被写体距離によって定まる値である。以上の構成によれば、基準フレームＦｂに対して距離分解能以下の位置にある被写体像を含む対応フレームＦに基づいて精度の低い被写体速度が取得されることが抑制される。なお、本例の差分値Ｄｆは、特定の方向（例えば、Ｘ方向、Ｙ方向、Ｚ方向のいずれか）における被写体距離Ｄの差分値であってもよいし、３次元的に算出された被写体距離Ｄの差分値であってもよい。 A first example of the reliability C is the subject distance (that is, the distance between the imaging device 1 and the subject) Da in the corresponding frame F (hereinafter referred to as the target frame Fa) to be judged and the subject distance Db in the reference frame Fb. is a difference value Df (Df=|Da-Db|). The speed acquisition unit 135 selects the target frame Fa showing the difference value Df (Df>δ1) exceeding the threshold δ1 (first threshold) as the corresponding frame F to be used for acquiring the speed of the subject. The threshold δ1 is a distance resolution, and is a value determined by imaging conditions and subject distance. According to the above configuration, it is possible to suppress the acquisition of an inaccurate subject velocity based on the corresponding frame F including the subject image at a position below the distance resolution with respect to the reference frame Fb. Note that the difference value Df in this example may be a difference value of the subject distance D in a specific direction (for example, one of the X direction, Y direction, and Z direction), or may be a three-dimensionally calculated subject distance D. A difference value of the distance D may be used.

信頼度Ｃの第２の例は、対象フレームＦａの運動情報によって示される変化量Ｖである。撮像装置１の位置および姿勢の変化が大きいほど、変化量Ｖも増大する。変化量Ｖは、回転Ｒによって示されてもよく、並進Ｔによって示されてもよく、回転Ｒと並進Ｔとに基づいて得られる総合的な変化値によって示されてもよい。速度取得部１３５は、閾値δ２（第２閾値）を下回る変化量Ｖ（Ｖ＜δ２）を示す対象フレームＦａを、被写体の速度の取得に用いるべき対応フレームＦとして選択する。以上の構成によれば、撮像装置１が不連続的または急激に動いた時点（すなわち、被写体が相対的に速く動いた時点）の対応フレームＦに基づいて精度の低い被写体速度が取得されることが抑制される。 A second example of the reliability C is the amount of change V indicated by the motion information of the target frame Fa. As the change in the position and orientation of the imaging device 1 increases, the amount of change V also increases. The change amount V may be indicated by the rotation R, the translation T, or a total change value obtained based on the rotation R and the translation T. The speed acquisition unit 135 selects the target frame Fa exhibiting a change amount V (V<δ2) below the threshold δ2 (second threshold) as the corresponding frame F to be used for acquiring the speed of the subject. According to the above configuration, an object speed with low accuracy can be acquired based on the corresponding frame F at the time when the imaging device 1 moves discontinuously or rapidly (that is, when the object moves relatively fast). is suppressed.

速度取得部１３５は、ステップＳ３０５，Ｓ３０６のループを繰り返すことによって、各対応フレームＦｓ～Ｆｅに対しての選択処理および速度取得処理を実行する。すなわち、未処理の対応フレームＦｓ～Ｆｅが存在している場合（ステップＳ３０６：Ｎｏ）、ステップＳ３０５の処理が繰り返し実行されて各対応フレームＦｓ～Ｆｅについての速度取得が実行される。全ての対応フレームＦｓ～ＦｅについてステップＳ３０５の処理が終了すると（ステップＳ３０６：Ｙｅｓ）、ステップＳ３０７が実行される。 The speed acquisition unit 135 repeats the loop of steps S305 and S306 to execute selection processing and speed acquisition processing for each of the corresponding frames Fs to Fe. That is, if unprocessed corresponding frames Fs to Fe exist (step S306: No), the process of step S305 is repeatedly executed to obtain the velocity for each corresponding frame Fs to Fe. When the process of step S305 is completed for all corresponding frames Fs to Fe (step S306: Yes), step S307 is executed.

ステップＳ３０７において、速度取得部１３５は、ステップＳ３０５，Ｓ３０６において速度取得の対象として選択されなかった対応フレームＦにおける被写体速度を、被写体の速度が取得された対応フレームＦに基づく補完処理によって取得する。 In step S307, the velocity acquisition unit 135 acquires the subject velocity in the corresponding frame F that was not selected as the subject of velocity acquisition in steps S305 and S306 by complementing processing based on the corresponding frame F in which the velocity of the subject was acquired.

なお、ステップＳ３０５～Ｓ３０７によって取得された各対応フレームＦにおける被写体速度は、速度取得部１３５がメタデータとして映像データ内に記録すると好適である。また、表示部１６が映像４０１を表示する際に、各対応フレームＦにおける被写体速度がその対応フレームＦ内の被写体像の近傍に表示されると好適である。 It should be noted that the subject speed in each corresponding frame F obtained in steps S305 to S307 is preferably recorded in the video data by the speed obtaining unit 135 as metadata. Also, when the display unit 16 displays the image 401, it is preferable that the subject velocity in each corresponding frame F is displayed near the subject image in the corresponding frame F. FIG.

以上に説明したように、測距機能を有する撮像装置１が取得した被写体像を含む映像データ（複数のフレームＦ）から取得した被写体距離と被写体の３次元位置とに基づいて、被写体速度を精度よく取得（算出）することが可能である。また、以上の実施形態では、信頼度Ｃが低いフレーム（撮像画像）を被写体速度の取得に用いないので、取得される被写体速度がより高精度となる。 As described above, the object velocity can be accurately calculated based on the object distance and the three-dimensional position of the object obtained from the video data (plurality of frames F) including the object image obtained by the imaging device 1 having the distance measuring function. It is possible to obtain (calculate) well. In addition, in the above embodiment, frames (captured images) with low reliability C are not used to acquire the subject velocity, so the acquired subject velocity is more accurate.

さらに、以上の構成においては、撮像装置１の位置および姿勢を示す運動情報に基づいて被写体の３次元位置を取得（算出）する。したがって、撮像装置１を移動させながら取得した映像データ（例えば、被写体像が画面の略中央に位置し続けるような観賞用の映像データ）からも、被写体速度を精度よく取得（算出）することが可能である。 Furthermore, in the above configuration, the three-dimensional position of the subject is obtained (calculated) based on motion information indicating the position and orientation of the imaging device 1 . Therefore, it is possible to accurately acquire (calculate) the subject velocity even from video data acquired while moving the imaging device 1 (for example, viewing video data in which the subject image is continuously positioned substantially at the center of the screen). It is possible.

＜変形例＞
以上の各実施形態は多様に変形される。具体的な変形の態様を以下に例示する。以上の実施形態および以下の例示から任意に選択された２以上の態様は、相互に矛盾しない限り適宜に併合され得る。 <Modification>
Each of the above embodiments can be modified in various ways. Specific modification modes are exemplified below. Two or more aspects arbitrarily selected from the above embodiments and the following examples can be combined as appropriate as long as they do not contradict each other.

上記した実施形態は、撮像装置１を移動させながら取得した観賞用の映像データに対しても被写体速度を精度よく取得するという課題を、撮像装置１の位置および姿勢を示す運動情報に基づいて被写体の３次元位置を取得するという構成によって解決している。しかしながら、位置および姿勢の変化が少ない撮像装置１（例えば、監視カメラ）においては、以上の課題は存在しないので、運動情報を用いずに被写体の３次元情報（ひいては被写体速度）が取得されてもよい。本変形例の構成によれば、位置取得部１３４が運動情報を取得しなくてよいので、画像処理装置１３の構成をより簡素にすることができる。 The above-described embodiment solves the problem of accurately obtaining the subject velocity for viewing video data obtained while moving the image capturing apparatus 1, based on the motion information indicating the position and orientation of the image capturing apparatus 1. is solved by the configuration of acquiring the three-dimensional position of However, since the above problem does not exist in the imaging device 1 (for example, a monitoring camera) whose position and posture change little, even if the three-dimensional information of the subject (and thus the subject velocity) is acquired without using the motion information, good. According to the configuration of this modification, the position acquisition unit 134 does not need to acquire motion information, so the configuration of the image processing device 13 can be simplified.

上記した実施形態においては、撮像装置１によって取得された映像（撮像画像）を撮像装置１に含まれる画像処理装置１３が処理することによって被写体速度が取得される。しかしながら、画像処理装置１３が撮像装置１から独立した別個の装置であってもよい。すなわち、独立した画像処理装置１３が、他の取得装置によって取得された映像（撮像画像）を上記のように処理して被写体速度を取得してもよい。 In the above-described embodiment, the image processing device 13 included in the imaging device 1 processes the video (captured image) acquired by the imaging device 1 to acquire the subject velocity. However, the image processing device 13 may be a separate device independent from the imaging device 1 . That is, the independent image processing device 13 may process the video (captured image) acquired by another acquisition device as described above to acquire the subject velocity.

画像処理装置１３は、任意の映像（撮像画像）に対して上記した速度取得処理を実行することができる。例えば、撮像装置１に記憶された映像に対して上記処理を実行してもよいし、撮像装置１に記憶される前の映像に対して上記処理を実行することもできる。 The image processing device 13 can execute the speed acquisition process described above on any video (captured image). For example, the above processing may be performed on the image stored in the image capturing device 1, or the image before being stored in the image capturing device 1 may be subjected to the above processing.

画像処理装置１３の各機能ブロックは、算術的な演算によって各値（距離、位置、運動情報、３次元位置、速度等）を算出してもよいし、入力と出力とを予め関連付けて記憶したテーブル等のデータセットを参照することによって各値を取得してもよい。 Each functional block of the image processing device 13 may calculate each value (distance, position, motion information, three-dimensional position, speed, etc.) by arithmetic operation, or may store input and output in association with each other in advance. Each value may be obtained by referring to a data set such as a table.

上記した実施形態においては、位置取得部１３４が各対応フレームＦについて取得した被写体の３次元位置に基づいて、速度取得部１３５が被写体速度を取得している。本変形例の速度取得部１３５は、図５（ｃ）に示すように、被写体の３次元位置によって示される被写体の移動軌跡を平滑化し、平滑化後の移動軌跡を用いて被写体の速度を取得することができる。以上の平滑化処理は、例えば、移動平均処理やスプライン処理によって実現される。本変形例の構成によれば、被写体の３次元位置の誤差や変動を低減できるので、被写体速度をより精度よく取得することが可能である。 In the above-described embodiment, the velocity acquisition section 135 acquires the subject velocity based on the three-dimensional position of the subject acquired for each corresponding frame F by the position acquisition section 134 . As shown in FIG. 5C, the velocity acquisition unit 135 of this modification smoothes the movement trajectory of the subject indicated by the three-dimensional position of the subject, and acquires the velocity of the subject using the smoothed movement trajectory. can do. The above smoothing processing is realized by, for example, moving average processing or spline processing. According to the configuration of this modified example, errors and fluctuations in the three-dimensional position of the subject can be reduced, so it is possible to obtain the subject velocity with higher accuracy.

上記した実施形態においては、被写体の３次元位置に基づいて被写体速度が算定されている。しかしながら、図５（ａ）のように被写体が撮像装置１に対して平面的に移動する映像については、図５（ｂ），（ｃ）のような上から俯瞰したような座標系、すなわち、地上からの高さを省略した座標系で速度を算出してもよい。換言すると、速度取得部１３５は、被写体の３次元位置情報が示す３次元位置を２次元平面上にマッピングし、マッピング後の２次元位置に基づいて被写体の速度を取得してもよい。上記のマッピングは、省略したい次元のデータを用いないことによって実現されてもよいし、被写体の３次元位置をマッピングすべき２次元平面を求め、その２次元平面に対する被写体の３次元位置の投影によって実現されてもよい。以上の構成によれば、主たる移動方向でない次元が省略された座標系にて被写体速度を取得するので、被写体位置の誤差を抑制でき、ひいては速度取得の精度を向上させることが可能である。また、３次元演算の処理量が低減されるので、画像処理装置１３における処理負荷の低減も図られる。 In the above-described embodiment, the object speed is calculated based on the three-dimensional position of the object. However, for an image in which the subject moves two-dimensionally with respect to the imaging device 1 as shown in FIG. The speed may be calculated using a coordinate system that omits the height from the ground. In other words, the velocity acquisition unit 135 may map the three-dimensional position indicated by the three-dimensional position information of the subject on a two-dimensional plane, and acquire the velocity of the subject based on the two-dimensional position after mapping. The above mapping may be realized by not using the dimensional data to be omitted, or by obtaining a two-dimensional plane on which the three-dimensional position of the subject is to be mapped and projecting the three-dimensional position of the subject onto the two-dimensional plane. may be implemented. According to the above configuration, the object velocity is acquired in a coordinate system in which dimensions other than the main movement direction are omitted, so that it is possible to suppress errors in the object position and improve the accuracy of velocity acquisition. In addition, since the processing amount of the three-dimensional calculation is reduced, the processing load on the image processing device 13 can also be reduced.

以上の２次元平面へのマッピングは、ユーザからの指定に基づいて実行されてもよいし、速度取得部１３５による判定に基づいて実行されてもよい。例えば、速度取得部１３５が、対応フレームＦｓ～Ｆｅにおける３次元位置のうち１つの次元（例えば、鉛直方向の次元）における変動量Ｆｌが閾値δ３（第３閾値）を下回る（Ｆｌ＜δ３）か否かを判定する。そして、変動量Ｆｌが閾値δ３を下回る場合、速度取得部１３５が、その１つの次元以外の２つの次元に相当する２次元平面（例えば、前後左右平面）上にマッピングを実行すると好適である。以上の構成によれば、２次元平面へのマッピングの要否が自動的に判定されるので、ユーザの利便性が向上する。 The mapping onto the two-dimensional plane described above may be performed based on the user's designation, or may be performed based on the determination by the speed acquisition unit 135 . For example, the velocity acquisition unit 135 determines whether the variation amount Fl in one dimension (for example, the vertical dimension) of the three-dimensional positions in the corresponding frames Fs to Fe is below the threshold δ3 (third threshold) (Fl<δ3). determine whether or not Then, when the variation amount Fl is less than the threshold value δ3, it is preferable that the velocity acquisition unit 135 performs mapping on a two-dimensional plane (for example, front, back, left, and right planes) corresponding to two dimensions other than the one dimension. According to the above configuration, it is automatically determined whether or not mapping onto a two-dimensional plane is necessary, thereby improving convenience for the user.

他に、速度取得部１３５が、基準フレームＦｂ内の被写体像にパターン認識を実行してもよい。速度取得部１３５は、パターン認識によって被写体像が分類された物体種別（カテゴリ）が平面上を移動する可能性の高い物体（例えば、「人間」「自動車」）を示す特定カテゴリに相当する場合に、上記したマッピングを実行すると好適である。種々の手法を、以上のパターン認識に適用可能である。例えば、速度取得部１３５は、上記した特定カテゴリに属する学習データ（特定カテゴリがラベル付けされた学習データ）をニューラルネットワークに入力して学習させた学習済みモデルを用いて、被写体像に対するパターン認識を実行しても良い。以上の構成によれば、３次元位置とは異なる被写体自体の性質に基づいてマッピングの要否を判定することが可能である。 Alternatively, the velocity acquisition unit 135 may perform pattern recognition on the subject image within the reference frame Fb. If the object type (category) into which the subject image is classified by pattern recognition corresponds to a specific category indicating an object that is likely to move on a plane (for example, “human” or “automobile”), the speed acquisition unit 135 , preferably performs the mapping described above. Various techniques can be applied to the above pattern recognition. For example, the velocity acquisition unit 135 uses a trained model trained by inputting learning data belonging to the above-described specific category (learning data labeled with the specific category) into a neural network to perform pattern recognition on the subject image. may be executed. According to the above configuration, it is possible to determine whether or not mapping is necessary based on the property of the subject itself, which is different from the three-dimensional position.

被写体像は、複数のフレームＦにおいて写り方が変化し得る。したがって、ステップＳ３０３において、テンプレートマッチングの最中に類似度が相対的に低下した場合に、追跡部１３２が、マッチングされた被写体像を新たなテンプレートとして用いて（すなわち、テンプレートを更新して）更新処理を継続してもよい。他に、追跡部１３２が、一般物体認識処理または深層学習によるパターン認識処理によって検出したフレームＦ内の検出物体像と基準フレームＦｂのテンプレートとの類似性に基づいて追跡処理を実行してもよい。 The subject image may vary in how it is captured in a plurality of frames F. FIG. Therefore, in step S303, when the similarity is relatively lowered during template matching, the tracking unit 132 uses the matched subject image as a new template (that is, updates the template) to update the template. Processing may continue. Alternatively, the tracking unit 132 may perform the tracking process based on the similarity between the detected object image in the frame F detected by the general object recognition process or the pattern recognition process by deep learning and the template of the reference frame Fb. .

上記した実施形態のステップＳ３０４で、距離取得部１３３が、矩形領域４０３の一部、例えば、中心点を含む一部領域、所定間隔をおいて連続する複数領域、端部に位置するエッジ領域を選択し、選択した領域に相当する画素から代表距離を求めてもよい。 In step S304 of the above-described embodiment, the distance acquisition unit 133 extracts a part of the rectangular area 403, for example, a partial area including the center point, a plurality of continuous areas at predetermined intervals, and an edge area located at the end. A representative distance may be obtained from pixels corresponding to the selected area.

以上、本発明の好ましい実施の形態について説明したが、本発明は上述した実施の形態に限定されず、その要旨の範囲内で種々の変形及び変更が可能である。例えば、本発明は、上述の実施の形態の１以上の機能を実現するプログラムを、ネットワークや記憶媒体を介してシステムや装置に供給し、そのシステム又は装置のコンピュータの１つ以上のプロセッサがプログラムを読み出して実行する処理でも実現可能である。以上の記憶媒体は、コンピュータ読み取り可能な記憶媒体である。また、本発明は、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 Although preferred embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments, and various modifications and changes are possible within the scope of the gist thereof. For example, the present invention supplies a program that implements one or more functions of the above-described embodiments to a system or device via a network or a storage medium, and one or more processors of a computer of the system or device executes the program. It is also possible to implement the process of reading and executing the The above storage medium is a computer-readable storage medium. The invention can also be implemented by a circuit (eg, an ASIC) that implements one or more functions.

１撮像装置（取得装置）
１０結像光学系
１１撮像素子
１２制御部
１３画像処理装置
１３０画像生成部
１３２追跡部
１３３距離取得部
１３４位置取得部
１３５速度取得部 1 Imaging device (acquisition device)
REFERENCE SIGNS LIST 10 imaging optical system 11 imaging element 12 control unit 13 image processing device 130 image generation unit 132 tracking unit 133 distance acquisition unit 134 position acquisition unit 135 velocity acquisition unit

Claims

a tracking unit that selects, from the plurality of captured images, a plurality of corresponding images including a subject image corresponding to a subject image in a reference image included in the plurality of captured images;
a distance acquisition unit that acquires, for each of the reference image and the corresponding image, distance information regarding a distance between an acquisition device that acquired the reference image or the corresponding image and a subject corresponding to the subject image;
position information relating to the position of the subject is obtained for each of the reference image and the corresponding image, and three-dimensional position information of the subject relative to the reference image is obtained using the distance information and the position information; an acquisition unit;
a speed acquisition unit that acquires the speed of the subject corresponding to the subject image using the three-dimensional position information in the plurality of corresponding images selected based on the reliability of each of the corresponding images. image processing device.

The reliability is a difference value between the distance between the acquisition device and the subject in each of the plurality of corresponding images and the distance between the acquisition device and the subject in the reference image,
2. The image processing apparatus according to claim 1, wherein the speed acquisition unit selects the corresponding images in which the difference value exceeds a first threshold.

The position acquisition unit acquires motion information indicating the position and orientation of the acquisition device for each of the plurality of corresponding images, and uses the distance information, the position information, and the motion information to set the reference image as a reference. 3. The image processing apparatus according to claim 1, wherein the three-dimensional positional information of the subject is acquired.

The reliability is the amount of change indicated by the exercise information,
4. The image processing apparatus according to claim 3, wherein the speed acquisition unit selects the corresponding images in which the amount of change is less than a second threshold.

3. The speed obtaining unit smoothes a moving trajectory of the subject indicated by the three-dimensional position information of the subject, and obtains the speed of the subject using the smoothed moving trajectory. The image processing apparatus according to any one of claims 1 to 4.

The speed acquisition unit maps the three-dimensional position indicated by the three-dimensional position information of the subject on a two-dimensional plane, and acquires the speed of the subject based on the two-dimensional position after mapping. The image processing apparatus according to any one of claims 1 to 5.

When the amount of variation in one dimension of the three-dimensional positions in the plurality of corresponding images is below a third threshold, the velocity acquisition unit obtains 7. The image processing apparatus according to claim 6, wherein said mapping is performed.

The velocity acquisition unit performs the mapping on the two-dimensional plane when an object type classified by pattern recognition of the subject image in the reference image corresponds to a specific category. Item 7. The image processing device according to item 6.

2. The tracking unit selects the reference image based on at least one of a focal length, an aperture value, and a distance to the subject indicated by imaging information attached to the captured image. 9. The image processing apparatus according to any one of claims 8 to 8.

10. The tracking unit according to any one of claims 1 to 9, wherein the tracking unit selects the corresponding image from the plurality of captured images based on similarity of the subject images included in the plurality of captured images. 2. The image processing device according to item 1.

An imaging device comprising the image processing device according to any one of claims 1 to 10,
An imaging device that outputs an image signal corresponding to the subject image,
An imaging apparatus, wherein the image processing apparatus acquires the captured image by performing signal processing on the image signal.

a step of selecting a plurality of corresponding images from the plurality of captured images, including a subject image corresponding to a subject image in a reference image included in the plurality of captured images;
obtaining, for each of the reference image and the corresponding image, distance information relating to the distance between an acquisition device that acquired the reference image or the corresponding image and a subject corresponding to the subject image;
a step of acquiring position information regarding the position of the subject for each of the reference image and the corresponding image, and acquiring three-dimensional position information of the subject relative to the reference image using the distance information and the position information; When,
and obtaining the velocity of the subject corresponding to the subject image using the three-dimensional position information in the plurality of corresponding images selected based on the reliability of each of the corresponding images. Image processing method.

a step of selecting a plurality of corresponding images from the plurality of captured images, including a subject image corresponding to a subject image in a reference image included in the plurality of captured images;
obtaining, for each of the reference image and the corresponding image, distance information relating to the distance between an acquisition device that acquired the reference image or the corresponding image and a subject corresponding to the subject image;
a step of acquiring position information regarding the position of the subject for each of the reference image and the corresponding image, and acquiring three-dimensional position information of the subject relative to the reference image using the distance information and the position information; When,
obtaining the velocity of the subject corresponding to the subject image using the three-dimensional position information in the plurality of corresponding images selected based on the reliability of each of the corresponding images. A program characterized by causing a

a step of selecting a plurality of corresponding images from the plurality of captured images, including a subject image corresponding to a subject image in a reference image included in the plurality of captured images;
obtaining, for each of the reference image and the corresponding image, distance information relating to the distance between an acquisition device that acquired the reference image or the corresponding image and a subject corresponding to the subject image;
a step of acquiring position information regarding the position of the subject for each of the reference image and the corresponding image, and acquiring three-dimensional position information of the subject relative to the reference image using the distance information and the position information; When,
obtaining the velocity of the subject corresponding to the subject image using the three-dimensional position information in the plurality of corresponding images selected based on the reliability of each of the corresponding images. A computer-readable storage medium that stores a program to be executed by