JP7406695B2

JP7406695B2 - Image processing device and image processing program

Info

Publication number: JP7406695B2
Application number: JP2019227633A
Authority: JP
Inventors: 一幸三浦; 篤志長
Original assignee: Takenaka Corp; Yamaguchi University NUC
Current assignee: Takenaka Corp; Yamaguchi University NUC
Priority date: 2019-12-17
Filing date: 2019-12-17
Publication date: 2023-12-28
Anticipated expiration: 2039-12-17
Also published as: JP2021096166A

Description

本発明は、画像処理装置及び画像処理プログラムに関する。 The present invention relates to an image processing device and an image processing program.

大地震が発生した直後の建物の健全性を評価する際には、当該建物の詳細な診断を行うに先立って避難要否の判断を行うための一次的な診断が重要となる。そこで、本発明の発明者らは、特許文献１において、加速度センサ等に依存せずに一次的な簡易診断が可能となり得る建物の健全性の診断を行うシステムを提案している。 When evaluating the health of a building immediately after a major earthquake occurs, it is important to perform a primary diagnosis to determine whether evacuation is necessary before conducting a detailed diagnosis of the building. In view of this, the inventors of the present invention have proposed in Patent Document 1 a system for diagnosing the health of a building that can perform a primary and simple diagnosis without relying on acceleration sensors or the like.

このシステムでは、撮影装置により対象となる建物が撮影された動画像を解析することで得られる、当該建物の固有振動数の地震発生前後の変化率等をもとに当該建物の健全性を数値化することができる。 This system numerically evaluates the health of a building based on the rate of change in the building's natural frequency before and after an earthquake, which is obtained by analyzing video images taken of the building using a camera. can be converted into

対象となる建物の固有振動数を算出する場合において、特許文献２～３及び非特許文献１に開示されている技術等を用いて動画像中の微振動成分を検出することで固有振動数を算定する場合には、撮影装置自身が微振動環境下に存在すると撮影装置の振動と建物の振動との切り分けが必要となる。 When calculating the natural frequency of a target building, the natural frequency can be calculated by detecting minute vibration components in moving images using the techniques disclosed in Patent Documents 2 to 3 and Non-Patent Document 1. When calculating, if the imaging device itself exists in a microvibration environment, it is necessary to distinguish between the vibrations of the imaging device and the vibrations of the building.

撮影装置の振動と建物の振動との時間周波数特性及び空間周波数特性が十分に異なる場合は、特許文献１にも記載されているように時空間周波数領域上で分離することが可能となる。しかし、撮影装置の振動特性と建物の振動特性とが時空間周波数領域上でラップする場合は単純に分離することができないため、本来計測したい建物の振動の固有振動数等の特性を正しく評価できない場合がある。 If the time-frequency characteristics and spatial frequency characteristics of the vibration of the photographing device and the vibration of the building are sufficiently different, it becomes possible to separate them in the spatio-temporal frequency domain as described in Patent Document 1. However, if the vibration characteristics of the imaging device and the vibration characteristics of the building overlap in the spatio-temporal frequency domain, they cannot be simply separated, making it impossible to correctly evaluate the characteristics such as the natural frequency of the vibration of the building that is originally intended to be measured. There are cases.

一方で、撮影装置の振動を軽減させる技術として、撮影装置と三脚等の付帯器具とを含めた撮影システム全体の重量を重くする、防振ゴムやスプリング等の器具を導入する等といった物理的な防振対策も考えられる。しかし、この対策では、地盤の振動等に含まれる数Ｈｚ以下程度の周波数帯域での劇的な効果は期待できない。 On the other hand, there are some physical techniques to reduce the vibration of photographic equipment, such as increasing the weight of the entire photographing system including the photographing equipment and incidental equipment such as tripods, and introducing equipment such as anti-vibration rubber and springs. Anti-vibration measures can also be considered. However, this measure cannot be expected to have a dramatic effect in the frequency band of several Hz or less, which is included in ground vibrations.

そこで、撮影装置の振動成分を検出（さらには除去まで）できるソフトウェア的な処理が必要とされている。なお、このようなソフトウェア的な対策は、特許文献１に記載されている移動撮影による建物の健全性を診断する用途に限らず、固定撮影での建物の健全性の診断の用途も含む、特許文献２～３及び非特許文献１に代表される被写体の時空間フィルタリングに基づく動画像処理のみの振動解析手法において全般的に有用である。 Therefore, there is a need for software-based processing that can detect (and even remove) the vibration components of the imaging device. Note that such software measures are not limited to the use of diagnosing the health of buildings using moving photography as described in Patent Document 1, but also include the use of diagnosing the health of buildings using fixed photography. The present invention is generally useful in vibration analysis methods that involve only moving image processing based on spatiotemporal filtering of objects, as typified by Documents 2 and 3 and Non-Patent Document 1.

ソフトウェア的な振動成分の検出処理及び除去処理に関する技術として、特許文献４には、機械式又は光学式ではない、デジタル式の手ぶれ補正技術が開示されている。この技術では、観賞や記録等の目視用途、パノラマ合成や３次元再構成等の画像間のマッチング用途等において、撮影装置自身の動きに起因する画像上の変動を除去することができる。 As a technology related to software-based vibration component detection and removal processing, Patent Document 4 discloses a digital image stabilization technology that is not mechanical or optical. With this technology, it is possible to remove fluctuations in images caused by the movement of the imaging device itself, in visual applications such as viewing and recording, and in matching applications between images such as panoramic composition and three-dimensional reconstruction.

特開２０１８－１３６１９１号公報Japanese Patent Application Publication No. 2018-136191 米国特許出願公開第２０１４／００７２１９０号明細書US Patent Application Publication No. 2014/0072190 米国特許第９３２４００５号明細書US Patent No. 9324005 特開２０１９－００４４５１号公報JP2019-004451A

J.G. Chen, A. Davis, N. Wadhwa, F. Durand, W.T. Freeman, and O. Buyukozturk, “Video Camera-based Vibration Measurement for Condition Assessment of Civil Infrastructure”, International Symposium Non-Destructive Testing in Civil Engineering (2015)J.G. Chen, A. Davis, N. Wadhwa, F. Durand, W.T. Freeman, and O. Buyukozturk, “Video Camera-based Vibration Measurement for Condition Assessment of Civil Infrastructure”, International Symposium Non-Destructive Testing in Civil Engineering (2015)

しかしながら、目視用途で主に検出や除去を行わなければならない対象は、数画素（ピクセル）以上に及ぶ変動成分であり、特許文献４に開示されている技術では、サブピクセル級の極微細な変動の検出等についてはなんら記載されていない。また、マッチング用途ではサブピクセル級の精度が要求される場合があるが、複数の画素の対応関係に基づく幾何変換をベースとすることが多く、建物の常時微動を撮影した動画像のような、一見すると動きが存在しないような被写体において撮影装置の振動成分が混入した場合に正しく機能するとは限らない。 However, in visual applications, the main targets that must be detected and removed are fluctuation components that extend over several pixels (pixels), and the technology disclosed in Patent Document 4 is capable of detecting and removing ultrafine fluctuations on the sub-pixel level. There is no mention of detection, etc. In addition, although sub-pixel-level accuracy may be required for matching purposes, it is often based on geometric transformation based on the correspondence between multiple pixels, and it is often If a vibration component from the photographing device is mixed into a subject that does not appear to be moving at first glance, it may not necessarily function correctly.

本開示は、以上の事情を鑑みて成されたものであり、動画像から撮影装置の微細な振動成分を精度良く検出することができる画像処理装置及び画像処理プログラムを提供することを目的とする。 The present disclosure has been made in view of the above circumstances, and aims to provide an image processing device and an image processing program that can accurately detect minute vibration components of a photographing device from a moving image. .

請求項１に記載の本発明に係る画像処理装置は、複数の物体が被写体として含まれ、かつ、撮影装置による撮影によって得られた動画像を取得する取得部と、前記取得部によって取得された動画像における各フレーム画像間の、各フレーム画像を複数に分割した分割領域毎の前記物体の移動速度を示す物理量を導出する導出部と、前記導出部によって導出された前記分割領域毎の物理量のうち、最も頻度が高い物理量に対応する分割領域を抽出する抽出部と、前記抽出部によって抽出された分割領域の振動成分を前記撮影装置の振動成分であるとして特定する特定部と、を備え、前記導出部は、前記動画像に対して複素空間フィルタリング処理を行って位相画像を生成し、当該位相画像の時間的に隣接するフレーム画像間で、かつ、前記分割領域毎の差分を示す位相差信号を前記物理量として導出し、前記特定部は、前記抽出部によって抽出された分割領域における前記位相差信号の代表値の時系列信号を、前記撮影装置の振動成分であるとして特定し、前記振動成分は、前記撮影装置の振動の振幅及び時間周波数を含む。 The image processing device according to the present invention according to claim 1 includes: an acquisition unit that acquires a moving image that includes a plurality of objects as subjects and that is obtained by photographing with a photographing device; a derivation unit that derives a physical quantity indicating the moving speed of the object for each divided area obtained by dividing each frame image into a plurality of divided areas between each frame image in a moving image; comprising: an extraction unit that extracts a divided area corresponding to the most frequently occurring physical quantity; and a identification unit that identifies a vibration component of the divided area extracted by the extraction unit as a vibration component of the imaging device ; The deriving unit performs a complex spatial filtering process on the moving image to generate a phase image, and calculates a phase difference between temporally adjacent frame images of the phase image and indicating a difference for each of the divided regions. Deriving the signal as the physical quantity, the identifying unit identifies a time-series signal of the representative value of the phase difference signal in the divided region extracted by the extracting unit as a vibration component of the imaging device, and The components include the amplitude and temporal frequency of the vibration of the imaging device .

請求項１に記載の本発明に係る画像処理装置によれば、撮影装置による撮影によって得られた動画像における各フレーム画像間の分割領域毎の物体の移動速度を示す物理量を導出し、導出した分割領域毎の物理量のうち、最も頻度が高い物理量に対応する分割領域の振動成分を前記撮影装置の振動成分であるとして特定することで、動画像から撮影装置の微細な振動成分を精度良く検出することができる。 According to the image processing device according to the present invention as set forth in claim 1, a physical quantity indicating the moving speed of an object for each divided region between each frame image in a moving image obtained by photographing with a photographing device is derived; By identifying the vibration component of the divided region corresponding to the physical quantity with the highest frequency among the physical quantities of each divided region as the vibration component of the photographing device, minute vibration components of the photographing device can be accurately detected from the video image. can do.

請求項１に記載の本発明に係る画像処理装置は、前記導出部は、前記動画像に対して複素空間フィルタリング処理を行って位相画像を生成し、当該位相画像の時間的に隣接するフレーム画像間で、かつ、前記分割領域毎の差分を示す位相差信号を前記物理量として導出する。 In the image processing device according to the present invention according to claim 1 , the deriving unit performs a complex spatial filtering process on the moving image to generate a phase image, and generates a temporally adjacent frame of the phase image. A phase difference signal indicating the difference between the images and for each of the divided regions is derived as the physical quantity.

請求項１に記載の本発明に係る画像処理装置によれば、前記動画像に対して複素空間フィルタリング処理を行って位相画像を生成し、当該位相画像の時間的に隣接するフレーム画像間で、かつ、前記分割領域毎の差分を示す位相差信号を前記物理量として導出することで、より簡易に物体の移動速度を示す物理量を導出することができる。 According to the image processing device according to the present invention as set forth in claim 1 , a phase image is generated by performing complex spatial filtering processing on the moving image, and between temporally adjacent frame images of the phase image, Further, by deriving a phase difference signal indicating the difference between the divided regions as the physical quantity, it is possible to more easily derive the physical quantity indicating the moving speed of the object.

請求項２に記載の本発明に係る画像処理装置は、請求項１に記載の画像処理装置であって、前記導出部は、前記位相画像がラッピングされた位相である場合、当該位相画像の各画素の位相に対してアンラップ処理を行った後に前記位相差信号を導出する。 The image processing device according to the present invention according to claim 2 is the image processing device according to claim 1 , in which, when the phase image is a wrapped phase, the derivation unit The phase difference signal is derived after unwrapping the phase of the pixel.

請求項２に記載の本発明に係る画像処理装置によれば、前記位相画像がラッピングされた位相である場合、当該位相画像の各画素の位相に対してアンラップ処理を行った後に前記位相差信号を導出することで、より高精度に当該位相差信号を導出することができる。 According to the image processing device according to the present invention as set forth in claim 2 , when the phase image is a wrapped phase, the phase difference signal is processed after unwrapping the phase of each pixel of the phase image. By deriving , the phase difference signal can be derived with higher accuracy.

請求項１に記載の本発明に係る画像処理装置は、前記特定部は、前記抽出部によって抽出された分割領域における前記位相差信号の代表値の時系列信号を、前記撮影装置の振動成分であるとして特定する。 In the image processing device according to the present invention as set forth in claim 1 , the specifying unit converts the time-series signal of the representative value of the phase difference signal in the divided region extracted by the extracting unit into a vibration component of the photographing device. .

請求項３に記載の本発明に係る画像処理装置は、請求項１又は請求項２に記載の画像処理装置であって、前記位相差信号の代表値は、各々当該位相差信号の、平均値、又は、中央値、又は、最大値、又は、最小値であるものである。 The image processing device according to the present invention according to claim 3 is the image processing device according to claim 1 or 2 , wherein the representative value of the phase difference signal is an average value of each of the phase difference signals. , or the median value, or the maximum value, or the minimum value.

請求項１及び請求項３に記載の本発明に係る画像処理装置によれば、抽出された分割領域における前記位相差信号の代表値の時系列信号を、前記撮影装置の振動成分であるとして特定することで、前記代表値を適用しない場合に比較して、より簡易に撮影装置の振動成分を特定することができる。 According to the image processing device according to the present invention according to claims 1 and 3 , the time series signal of the representative value of the phase difference signal in the extracted divided region is identified as a vibration component of the photographing device. By doing so, the vibration component of the imaging device can be specified more easily than when the representative value is not applied.

請求項４に記載の本発明に係る画像処理装置は、請求項１～請求項３の何れか１項に記載の画像処理装置であって、前記導出部は、前記動画像における予め定められた一部の領域のみを対象として前記物理量を導出する。 The image processing device according to the present invention according to claim 4 is the image processing device according to any one of claims 1 to 3 , wherein the derivation unit The physical quantity is derived for only a part of the area.

請求項４に記載の本発明に係る画像処理装置によれば、前記動画像における予め定められた一部の領域のみを対象として前記物理量を導出することで、演算負荷を低減することができる。 According to the image processing device according to the fourth aspect of the present invention, the calculation load can be reduced by deriving the physical quantity only for a predetermined part of the moving image.

請求項５に記載の本発明に係る画像処理装置は、請求項４に記載の画像処理装置であって、前記一部の領域は、前記動画像におけるＳ／Ｎ比が所定レベル以上である領域であるものである。 The image processing apparatus according to the present invention according to claim 5 is the image processing apparatus according to claim 4 , wherein the partial area is an area where the S/N ratio in the moving image is equal to or higher than a predetermined level. It is something that is.

請求項５に記載の本発明に係る画像処理装置によれば、前記一部の領域を、前記動画像におけるＳ／Ｎ比が所定レベル以上である領域とすることで、より効果的に演算負荷を低減しながら振動成分を精度よく検出することができる。 According to the image processing device according to the present invention as set forth in claim 5 , by setting the partial area to an area where the S/N ratio in the moving image is equal to or higher than a predetermined level, the calculation load can be more effectively reduced. The vibration component can be detected with high accuracy while reducing the vibration.

請求項６に記載の本発明に係る画像処理装置は、請求項１～請求項５の何れか１項に記載の画像処理装置であって、前記分割領域は、前記動画像における１画素毎の領域、又は、複数の画素群毎の領域であるものである。 The image processing device according to the present invention according to claim 6 is the image processing device according to any one of claims 1 to 5 , in which the divided area is divided into areas for each pixel in the moving image. It is a region or a region for each of a plurality of pixel groups.

請求項６に記載の本発明に係る画像処理装置によれば、前記分割領域を、前記動画像における１画素毎の領域、又は、複数の画素群毎の領域とすることで、より簡易かつ高精度に撮影装置の振動成分を検出することができる。 According to the image processing device according to the present invention as set forth in claim 6 , by setting the divided region to be a region for each pixel in the moving image or a region for each group of a plurality of pixels, the processing can be performed more simply and with high efficiency. It is possible to accurately detect the vibration component of the photographing device.

請求項７に記載の本発明に係る画像処理プログラムは、複数の物体が被写体として含まれ、かつ、撮影装置による撮影によって得られた動画像を取得し、取得した動画像における各フレーム画像間の、各フレーム画像を複数に分割した分割領域毎の前記物体の移動速度を示す物理量を導出し、導出した前記分割領域毎の物理量のうち、最も頻度が高い物理量に対応する分割領域を抽出し、抽出した分割領域の振動成分を前記撮影装置の振動成分であるとして特定する、処理をコンピュータに実行させる画像処理プログラムであって、前記動画像に対して複素空間フィルタリング処理を行って位相画像を生成し、当該位相画像の時間的に隣接するフレーム画像間で、かつ、前記分割領域毎の差分を示す位相差信号を前記物理量として導出し、抽出した分割領域における前記位相差信号の代表値の時系列信号を、前記撮影装置の振動成分であるとして特定し、前記振動成分は、前記撮影装置の振動の振幅及び時間周波数を含む。 The image processing program according to the present invention according to claim 7 acquires a moving image in which a plurality of objects are included as subjects and is obtained by photographing with a photographing device, and calculates the difference between each frame image in the acquired moving image. , deriving a physical quantity indicating the moving speed of the object for each divided region obtained by dividing each frame image into a plurality of divided regions, and extracting a divided region corresponding to the physical quantity with the highest frequency among the derived physical quantities for each divided region, An image processing program that causes a computer to execute a process of identifying a vibration component of the extracted divided region as a vibration component of the imaging device, the program performing a complex space filtering process on the moving image to generate a phase image. Then, a phase difference signal indicating a difference between temporally adjacent frame images of the phase image and for each divided region is derived as the physical quantity, and a time difference of the representative value of the phase difference signal in the extracted divided region is calculated. The series signal is identified as a vibration component of the photographing device, and the vibration component includes an amplitude and a temporal frequency of vibration of the photographing device .

請求項７に記載の本発明に係る画像処理プログラムによれば、撮影装置による撮影によって得られた動画像における各フレーム画像間の分割領域毎の物体の移動速度を示す物理量を導出し、導出した分割領域毎の物理量のうち、最も頻度が高い物理量に対応する分割領域の振動成分を前記撮影装置の振動成分であるとして特定することで、動画像から撮影装置の微細な振動成分を精度良く検出することができる。 According to the image processing program according to the present invention as set forth in claim 7 , a physical quantity indicating the moving speed of an object for each divided region between each frame image in a moving image obtained by photographing with a photographing device is derived; By identifying the vibration component of the divided region corresponding to the physical quantity with the highest frequency among the physical quantities of each divided region as the vibration component of the photographing device, minute vibration components of the photographing device can be accurately detected from the video image. can do.

以上説明したように、本発明によれば、動画像から撮影装置の微細な振動成分を精度良く検出することができる。 As described above, according to the present invention, minute vibration components of the photographing device can be detected with high accuracy from a moving image.

実施形態に係る画像処理装置のハードウェア構成の一例を示すブロック図である。FIG. 1 is a block diagram illustrating an example of a hardware configuration of an image processing device according to an embodiment. 実施形態に係る画像処理装置の機能的な構成の一例を示すブロック図である。FIG. 1 is a block diagram illustrating an example of a functional configuration of an image processing device according to an embodiment. 実施形態に係る動画像データベースの構成の一例を示す模式図である。FIG. 1 is a schematic diagram showing an example of the configuration of a moving image database according to an embodiment. 実施形態に係る振動成分特定処理の一例を示すフローチャートである。It is a flow chart which shows an example of vibration component identification processing concerning an embodiment. 実施形態に係る動画像（代表画像）の一例を示す正面図である。FIG. 2 is a front view showing an example of a moving image (representative image) according to the embodiment. 実施形態に係る位相画像（水平成分）の一例を示す正面図である。It is a front view showing an example of a phase image (horizontal component) concerning an embodiment. 実施形態に係る位相画像（垂直成分）の一例を示す正面図である。It is a front view showing an example of a phase image (vertical component) concerning an embodiment. 実施形態に係るアンラップ処理の説明に供するグラフである。It is a graph provided for explanation of unwrapping processing concerning an embodiment. 実施形態に係る位相差信号の度数分布の一例を示すグラフである。It is a graph showing an example of frequency distribution of a phase difference signal concerning an embodiment. 実施形態に係る検証実験で用いるサンプル画像（正弦波画像）の一例を示す正面図である。FIG. 2 is a front view showing an example of a sample image (sine wave image) used in a verification experiment according to the embodiment. 実施形態に係る検証実験において度数分布が最大となる位相差となる画素群における平均位相差信号の一例を示すグラフである。7 is a graph showing an example of an average phase difference signal in a pixel group having a phase difference with a maximum frequency distribution in a verification experiment according to an embodiment. 実施形態に係る検証実験の説明に供する画素の移動量信号の一例を示すグラフである。It is a graph which shows an example of the movement amount signal of a pixel used for explanation of the verification experiment based on embodiment. 実施形態に係る検証実験の説明に供する時間周波数スペクトルの一例を示すグラフである。It is a graph which shows an example of a time frequency spectrum used for explanation of the verification experiment based on embodiment. 実施形態に係る検証実験の説明に供する時間周波数スペクトルの一例を示すグラフである。It is a graph which shows an example of a time frequency spectrum used for explanation of the verification experiment based on embodiment.

以下、図面を参照して、本発明を実施するための形態例を詳細に説明する。なお、本実施形態では、本発明を、風加振や地盤振動の影響下での微動状態における建物を撮影した動画像を処理対象とした画像処理装置に適用した場合について説明する。 Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to the drawings. In this embodiment, a case will be described in which the present invention is applied to an image processing apparatus that processes a moving image of a building in a state of slight movement under the influence of wind excitation or ground vibration.

まず、図１及び図２を参照して、本実施形態に係る画像処理装置１０の構成を説明する。なお、画像処理装置１０の例としては、パーソナルコンピュータ及びサーバコンピュータ等の情報処理装置が挙げられる。 First, the configuration of an image processing apparatus 10 according to the present embodiment will be described with reference to FIGS. 1 and 2. Note that examples of the image processing device 10 include information processing devices such as a personal computer and a server computer.

図１に示すように、本実施形態に係る画像処理装置１０は、ＣＰＵ（Central Processing Unit）１１、一時記憶領域としてのメモリ１２、不揮発性の記憶部１３、キーボードとマウス等の入力部１４、液晶ディスプレイ等の表示部１５、媒体読み書き装置（Ｒ／Ｗ）１６及び通信インタフェース（Ｉ／Ｆ）部１８を備えている。ＣＰＵ１１、メモリ１２、記憶部１３、入力部１４、表示部１５、媒体読み書き装置１６及び通信Ｉ／Ｆ部１８はバスＢ１を介して互いに接続されている。媒体読み書き装置１６は、記録媒体１７に書き込まれている情報の読み出し及び記録媒体１７への情報の書き込みを行う。 As shown in FIG. 1, the image processing device 10 according to the present embodiment includes a CPU (Central Processing Unit) 11, a memory 12 as a temporary storage area, a nonvolatile storage section 13, an input section 14 such as a keyboard and a mouse, It includes a display section 15 such as a liquid crystal display, a medium read/write device (R/W) 16, and a communication interface (I/F) section 18. The CPU 11, memory 12, storage section 13, input section 14, display section 15, medium reading/writing device 16, and communication I/F section 18 are connected to each other via a bus B1. The medium read/write device 16 reads information written in the recording medium 17 and writes information to the recording medium 17 .

記憶部１３はＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、フラッシュメモリ等によって実現される。記憶媒体としての記憶部１３には、振動成分特定プログラム１３Ａが記憶されている。振動成分特定プログラム１３Ａは、振動成分特定プログラム１３Ａが書き込まれた記録媒体１７が媒体読み書き装置１６にセットされ、媒体読み書き装置１６が記録媒体１７からの振動成分特定プログラム１３Ａの読み出しを行うことで、記憶部１３へ記憶される。ＣＰＵ１１は、振動成分特定プログラム１３Ａを記憶部１３から読み出してメモリ１２に展開し、振動成分特定プログラム１３Ａが有するプロセスを順次実行する。また、記憶部１３には、動画像データベース１３Ｂ、複素空間フィルタデータベース１３Ｃ等の各種データベースが記憶される。 The storage unit 13 is realized by an HDD (Hard Disk Drive), an SSD (Solid State Drive), a flash memory, or the like. A vibration component identification program 13A is stored in the storage unit 13 as a storage medium. The vibration component identification program 13A is created by setting the recording medium 17 on which the vibration component identification program 13A has been written into the medium reading/writing device 16, and reading the vibration component identification program 13A from the recording medium 17. It is stored in the storage unit 13. The CPU 11 reads the vibration component identification program 13A from the storage unit 13, expands it into the memory 12, and sequentially executes the processes included in the vibration component identification program 13A. The storage unit 13 also stores various databases such as a moving image database 13B and a complex spatial filter database 13C.

本実施形態に係る画像処理装置１０は、通信Ｉ／Ｆ部１８に、動画像の撮影を行う撮影装置２０が接続される。撮影装置２０は、撮影時に複数の建物を含むように撮影を行うためのものである。なお、撮影装置２０による撮影方法は、空撮、地上での人による移動撮影、三脚や固定部材等を用いた固定撮影等の何れの方法でもよい。また、本実施形態では、撮影装置２０としてカラー画像を撮影する撮影装置を適用しているが、これに限定されるものではなく、例えば、モノクロ画像を撮影する撮影装置を撮影装置２０として適用する形態としてもよい。 In the image processing device 10 according to the present embodiment, a photographing device 20 that photographs moving images is connected to the communication I/F section 18 . The photographing device 20 is for photographing so as to include a plurality of buildings at the time of photographing. Note that the photographing method using the photographing device 20 may be any method such as aerial photographing, moving photographing by a person on the ground, or fixed photographing using a tripod or a fixed member. Further, in the present embodiment, a photographing device that takes a color image is used as the photographing device 20, but the invention is not limited to this. For example, a photographing device that takes a monochrome image can be applied as the photographing device 20. It may also be a form.

次に、図２を参照して、本実施形態に係る画像処理装置１０の機能的な構成について説明する。図２に示すように、画像処理装置１０は、取得部１１Ａ、導出部１１Ｂ、抽出部１１Ｃ及び特定部１１Ｄを含む。画像処理装置１０のＣＰＵ１１が振動成分特定プログラム１３Ａを実行することで、取得部１１Ａ、導出部１１Ｂ、抽出部１１Ｃ及び特定部１１Ｄとして機能する。 Next, with reference to FIG. 2, the functional configuration of the image processing device 10 according to this embodiment will be described. As shown in FIG. 2, the image processing device 10 includes an acquisition section 11A, a derivation section 11B, an extraction section 11C, and a specification section 11D. By executing the vibration component identification program 13A, the CPU 11 of the image processing device 10 functions as an acquisition unit 11A, a derivation unit 11B, an extraction unit 11C, and an identification unit 11D.

本実施形態に係る取得部１１Ａは、複数の物体が被写体として含まれ、かつ、撮影装置２０による撮影によって得られた動画像を取得する。なお、本実施形態では、上記物体として、建物を適用しているが、これに限定されるものではない。例えば、橋、塔等の建物を除く建造物、山、樹木等の自然物や、脈動などの生体情報、空調ダクトや変圧器などの設備機器、またはこれらの複数種類の組み合わせ等を上記物体として適用する形態としてもよい。 The acquisition unit 11A according to the present embodiment acquires a moving image that includes a plurality of objects as subjects and that is obtained by photographing with the photographing device 20. Note that in this embodiment, a building is used as the object, but the object is not limited to this. For example, buildings other than buildings such as bridges and towers, natural objects such as mountains and trees, biological information such as pulsation, equipment such as air conditioning ducts and transformers, or combinations of multiple types of these can be applied as the above objects. It may also be in the form of

また、導出部１１Ｂは、取得部１１Ａによって取得された動画像における各フレーム画像間の、各フレーム画像を複数に分割した分割領域毎の上記物体の移動速度を示す物理量を導出する。なお、本実施形態では、上記分割領域として、複数の画素群毎の領域を適用しているが、これに限定されるものではない。例えば、撮影装置２０による撮影によって得られた動画像における１画素毎の領域を上記分割領域として適用する形態としてもよい。 Further, the derivation unit 11B derives a physical quantity indicating the moving speed of the object for each divided region obtained by dividing each frame image into a plurality of parts between each frame image in the moving image acquired by the acquisition unit 11A. Note that in this embodiment, regions for each of a plurality of pixel groups are used as the divided regions, but the present invention is not limited to this. For example, a configuration may be adopted in which each pixel area in a moving image obtained by photographing with the photographing device 20 is applied as the divided area.

本実施形態に係る導出部１１Ｂは、上記動画像に対して複素空間フィルタリング処理を行って位相画像を生成し、当該位相画像の時間的に隣接するフレーム画像間で、かつ、上記分割領域毎の差分を示す位相差信号を上記物理量として導出する。即ち、ここでいう「位相」は信号の「位置」を表す変量であり、上記位相画像は空間上の位置を示す位置情報を保有している。従って、時間的に隣接するフレーム画像間の位相画像の差分である位相差信号は、１フレームが経過する時間間隔における位置の変化である「変位」を表すため、単位時間当たりの変位、即ち上記移動速度と相関を有する変量となる。 The derivation unit 11B according to the present embodiment performs a complex space filtering process on the moving image to generate a phase image, and calculates the difference between temporally adjacent frame images of the phase image and for each divided region. A phase difference signal indicating the difference is derived as the physical quantity. That is, the "phase" here is a variable representing the "position" of a signal, and the phase image has position information indicating the spatial position. Therefore, the phase difference signal, which is the difference in phase images between temporally adjacent frame images, represents a "displacement" that is a change in position in the time interval over which one frame passes. This is a variable that has a correlation with moving speed.

また、本実施形態に係る導出部１１Ｂは、上記位相画像がラッピングされた位相である場合、当該位相画像の各画素の位相に対してアンラップ処理を行った後に上記位相差信号を導出する。さらに、本実施形態に係る導出部１１Ｂは、上記動画像における予め定められた一部の領域のみを対象として上記物理量を導出する。なお、本実施形態では、上記一部の領域として、上記動画像におけるＳ／Ｎ比（Signal to Noise ratio）が所定レベル以上である領域を適用しているが、これに限定されるものではない。 Furthermore, when the phase image has a wrapped phase, the derivation unit 11B according to the present embodiment derives the phase difference signal after performing an unwrapping process on the phase of each pixel of the phase image. Further, the deriving unit 11B according to the present embodiment derives the physical quantity only for a predetermined part of the moving image. Note that in this embodiment, an area in which the S/N ratio (Signal to Noise ratio) in the video image is equal to or higher than a predetermined level is used as the above-mentioned partial area, but the present invention is not limited to this. .

また、本実施形態に係る抽出部１１Ｃは、導出部１１Ｂによって導出された分割領域毎の物理量のうち、最も頻度が高い物理量に対応する分割領域を抽出する。さらに、本実施形態に係る特定部１１Ｄは、抽出部１１Ｃによって抽出された分割領域の振動成分を撮影装置２０の振動成分であるとして特定する。 Furthermore, the extraction unit 11C according to the present embodiment extracts the divided area corresponding to the physical quantity with the highest frequency among the physical quantities for each divided area derived by the derivation unit 11B. Further, the identifying unit 11D according to the present embodiment identifies the vibration component of the divided region extracted by the extracting unit 11C as the vibration component of the imaging device 20.

本実施形態に係る特定部１１Ｄは、抽出部１１Ｃによって抽出された分割領域における上記位相差信号の代表値の時系列信号を、撮影装置２０の振動成分であるとして特定する。なお、本実施形態では、上記位相差信号の代表値として、当該位相差信号の平均値を適用しているが、これに限定されるものではない。例えば、上記平均値に代えて、中央値、又は、最大値、又は、最小値を上記代表値として適用する形態としてもよい。 The identifying unit 11D according to the present embodiment identifies the time series signal of the representative value of the phase difference signal in the divided region extracted by the extracting unit 11C as a vibration component of the imaging device 20. In this embodiment, the average value of the phase difference signal is used as the representative value of the phase difference signal, but the present invention is not limited to this. For example, instead of the average value, a median value, maximum value, or minimum value may be used as the representative value.

次に、図３を参照して、本実施形態に係る動画像データベース１３Ｂについて説明する。図３に示すように、本実施形態に係る動画像データベース１３Ｂは、予め割り振られた動画像ＩＤ（Identification）毎に、撮影装置２０による動画像の撮影によって得られた動画像情報が記憶されている。このように、本実施形態では、動画像情報を事前に撮影装置２０から取り込んで動画像データベース１３Ｂに登録しているが、これに限定されるものではない。例えば、撮影装置２０による撮影を常時実施し、所定レベル以上の振動が発生した際に撮影装置２０から得られる動画像情報をオンラインで、リアルタイム又は非リアルタイムで用いる形態等としてもよい。 Next, with reference to FIG. 3, the moving image database 13B according to this embodiment will be described. As shown in FIG. 3, the moving image database 13B according to the present embodiment stores moving image information obtained by photographing a moving image with the photographing device 20 for each moving image ID (Identification) assigned in advance. There is. In this manner, in this embodiment, moving image information is captured in advance from the photographing device 20 and registered in the moving image database 13B, but the present invention is not limited to this. For example, a configuration may be adopted in which the image capturing device 20 constantly performs image capturing, and the moving image information obtained from the image capturing device 20 when a vibration of a predetermined level or higher occurs is used online, in real time or non-real time.

一方、本実施形態に係る複素空間フィルタデータベース１３Ｃは、予め定められた複素空間フィルタ（本実施形態では、複素ガボールフィルタ（Gabor Filter））を示す情報が登録されている。但し、複素空間フィルタは複素ガボールフィルタに限定されるものではなく、空間位相特性が９０度だけ異なり、空間振幅特性が等しい空間フィルタ（実部フィルタ、虚部フィルタ）を組とした複素空間フィルタであれば、他のフィルタを複素空間フィルタとして適用してもよい。 On the other hand, in the complex space filter database 13C according to this embodiment, information indicating a predetermined complex space filter (in this embodiment, a complex Gabor filter) is registered. However, the complex spatial filter is not limited to the complex Gabor filter, but is a complex spatial filter that is a set of spatial filters (real part filter, imaginary part filter) whose spatial phase characteristics differ by 90 degrees and whose spatial amplitude characteristics are equal. If available, other filters may be applied as complex space filters.

次に、図４～図８を参照して、本実施形態に係る画像処理装置１０の作用を説明する。ユーザによって振動成分特定プログラム１３Ａの実行を開始する指示入力が入力部１４を介して行われた場合に、画像処理装置１０のＣＰＵ１１が当該振動成分特定プログラム１３Ａを実行することにより、図４に示す振動成分特定処理が実行される。なお、ここでは、錯綜を回避するために、動画像データベース１３Ｂ及び複素空間フィルタデータベース１３Ｃが構築済みであり、処理対象とする動画像情報がユーザによって指定されている場合について説明する。 Next, the operation of the image processing device 10 according to this embodiment will be explained with reference to FIGS. 4 to 8. When the user inputs an instruction to start executing the vibration component identification program 13A through the input unit 14, the CPU 11 of the image processing device 10 executes the vibration component identification program 13A, thereby executing the program shown in FIG. Vibration component identification processing is executed. Here, in order to avoid confusion, a case will be described in which the moving image database 13B and the complex spatial filter database 13C have been constructed, and the moving image information to be processed is specified by the user.

図４のステップ２００で、取得部１１Ａは、ユーザによって指定された動画像情報（以下、「処理対象動画像情報」という。）を動画像データベース１３Ｂから読み出すことにより取得する。 At step 200 in FIG. 4, the acquisition unit 11A acquires moving image information specified by the user (hereinafter referred to as "processing target moving image information") by reading it from the moving image database 13B.

ステップ２０２で、導出部１１Ｂは、複素空間フィルタデータベース１３Ｃから複素空間フィルタを示す情報を読み出し、処理対象動画像情報に対して当該複素空間フィルタ（本実施形態では、複素ガボールフィルタ）による複素空間フィルタリング処理を行って位相画像を生成する。 In step 202, the derivation unit 11B reads information indicating a complex space filter from the complex space filter database 13C, and performs complex space filtering on the processing target video information using the complex space filter (in this embodiment, a complex Gabor filter). Processing is performed to generate a phase image.

即ち、まず、導出部１１Ｂは、読み出した複素空間フィルタを用いた複素空間フィルタリング処理を行うことで、処理対象動画像情報により示される動画像の各フレーム画像から、実部画像Ｉ_ｒｅと虚部画像Ｉ_ｉｍを算出する。次いで、導出部１１Ｂは、次の式（１）による演算を画素毎に行うことにより、位相画像Ｉ_θを算出する。 That is, first, the deriving unit 11B performs a complex space filtering process using the read complex space filter to extract the real part image I _re and the imaginary part from each frame image of the moving image indicated by the processing target moving image information. Calculate the image I _im . Next, the derivation unit 11B calculates the phase image I _θ by performing calculation according to the following equation (1) for each pixel.

Ｉ_θ＝ｔａｎ^－１（Ｉ_ｉｍ／Ｉ_ｒｅ）（１） I _θ = tan ⁻¹ (I _im /I _re ) (1)

以上の処理を処理対象動画像情報により示される動画像の全フレーム画像に実行する。複素空間フィルタリング処理は、空間領域上での畳み込みカーネルのコンボリューションによる方法と、空間周波数領域上でのフィルタ積による方法との何れの方法を適用してもよい。 The above processing is performed on all frame images of the moving image indicated by the processing target moving image information. For the complex spatial filtering process, either a method using convolution kernel convolution on the spatial domain or a method using filter product on the spatial frequency domain may be applied.

例えば、処理対象動画像情報により示される動画像のうちの１枚の画像が、一例として図５に示す画像である場合、上述した複素空間フィルタリング処理によって得られる水平成分の位相画像が図６Ａに示すものとなり、垂直成分の位相画像が図６Ｂに示すものとなる。なお、図５に示す画像は、便宜上、建物を被写体としたものではなく、本発明の発明者らが制作した構造物を被写体として撮影したものを適用している。 For example, if one of the moving images indicated by the processing target moving image information is the image shown in FIG. The phase image of the vertical component is as shown in FIG. 6B. Note that, for convenience, the image shown in FIG. 5 is a photograph of a structure created by the inventors of the present invention, rather than a building as a subject.

ステップ２０４で、導出部１１Ｂは、導出した位相画像Ｉ_θにおける撮影装置２０の振動成分の検出対象とする領域（以下、「処理対象領域」という。）を決定する。本実施形態では、処理対象領域の決定方法として、Ｓ／Ｎ比が所定レベル以上である画素群を処理対象領域とする方法を適用している。ここで、Ｓ／Ｎ比が所定レベル以上である画素群の一例としては、取得部１１Ａによって取得した段階の処理対象動画像情報が示す動画像において、フレーム画像の空間１次微分フィルタ（例えば、Ｓｏｂｅｌフィルタ）、空間２次微分フィルタ（例えば、ラプラシアンフィルタ）、あるいはエッジ検出処理を施した出力画像が相当する。スパイク的なノイズ成分を除去して、領域の塊を確保する必要があれば、メディアンフィルタや、膨張処理及び収縮処理を併用する。導出部１１Ｂは、最終的に閾値処理により二値化することで処理対象領域を決定する。 In step 204, the deriving unit 11B determines a region (hereinafter referred to as a "processing target region") in which the vibration component of the photographing device 20 is to be detected in the derived phase image _Iθ . In this embodiment, as a method for determining the processing target area, a method is applied in which a group of pixels having an S/N ratio of a predetermined level or higher is set as the processing target area. Here, as an example of a pixel group whose S/N ratio is a predetermined level or higher, in a moving image indicated by the processing target moving image information at the stage acquired by the acquisition unit 11A, a spatial first-order differential filter of a frame image (for example, This corresponds to an output image that has been subjected to a Sobel filter), a spatial second-order differential filter (for example, a Laplacian filter), or an edge detection process. If it is necessary to remove spike-like noise components and secure a cluster of regions, a median filter, dilation processing, and contraction processing are used together. The derivation unit 11B finally determines the processing target area by performing binarization using threshold processing.

なお、処理対象領域の決定方法は以上の方法に限定されるものではなく、例えば、ユーザによって予め指定された注目領域を処理対象領域として決定する形態としてもよい。 Note that the method for determining the region to be processed is not limited to the above method, and for example, a region of interest designated in advance by the user may be determined as the region to be processed.

ステップ２０６で、導出部１１Ｂは、ステップ２０２の処理によって得られた位相画像Ｉ_θに対して位相アンラップ処理を行う。即ち、位相情報は、一般的には－π～＋πの範囲で折り返される形でラッピングされている（即ち、例えばπ＋π／４→－π／４となる。）。そこで、本実施形態では、位相アンラップ処理（位相接続処理）を行う。位相アンラップ処理としては、例えば、インターネット（URL:https://www.researchgate.net/publication/265151826）、（URL:http://retrofocus28.blogspot.com/2013/12/phase-unwrapping_26.html）、（URL:https://jp.mathworks.com/help/dsp/ref/unwrap.html#f5-1119858）等に記載の既知のアルゴリズムを適用することができる。なお、導出した位相画像Ｉ_θがラッピングされていないものであれば、本ステップ２０６の処理は実行する必要がないことは言うまでもない。 In step 206, the derivation unit 11B performs phase unwrapping processing on the phase image _Iθ obtained by the processing in step 202. That is, the phase information is generally wrapped in the range of -π to +π (ie, for example, π+π/4→-π/4). Therefore, in this embodiment, phase unwrap processing (phase connection processing) is performed. As phase unwrapping processing, for example, the Internet (URL: https://www.researchgate.net/publication/265151826), (URL: http://retrofocus28.blogspot.com/2013/12/phase-unwrapping_26.html) , (URL: https://jp.mathworks.com/help/dsp/ref/unwrap.html#f5-1119858) etc. can be applied. It goes without saying that if the derived phase image I _θ is not wrapped, there is no need to perform the process of step 206.

図７には、処理対象とする画像が図５に示す画像の一部領域の画像である場合における、ステップ２０６の位相アンラップ処理による位相接続の前後における位相画像の一例が示されている。なお、図７に示す例では、位相接続前の位相画像の時系列の変化を破線で示し、位相接続後の位相画像の時系列の変化を実線で示している。 FIG. 7 shows an example of phase images before and after the phase connection by the phase unwrapping process in step 206 when the image to be processed is an image of a partial region of the image shown in FIG. 5. In the example shown in FIG. 7, the time-series changes in the phase images before phase connection are shown by broken lines, and the time-series changes in the phase images after phase connection are shown by solid lines.

図７に示すように、位相接続により、＋１８０度を超えて－１８０度に折り返された信号が＋１８０度を超えて連続的に表される。但し、図７の例で用いた位相アンラップ処理は、位相が２回転以上することを想定していないため、図７における横軸の値が１２０フレーム付近で＋３６０度を超えた信号の折り返しが残ったままとなっている。これは、アンラップ後の信号に位相アンラップ処理を再度施すことで解消される。しかし、本発明はサブピクセル級の微弱な振動を対象としているが、位相ラッピングが生じる場合はサブピクセルを超えるような大きな動きを生じていると解釈することも可能なため、ラッピングが生じた画素群または画像領域は処理対象領域から除外することを検知する目的で位相アンラップ処理を利用することも可能である。 As shown in FIG. 7, due to the phase connection, a signal that has been folded over +180 degrees to -180 degrees is continuously expressed over +180 degrees. However, the phase unwrapping process used in the example in Figure 7 does not assume that the phase rotates more than two times, so the folding of the signal where the value of the horizontal axis in Figure 7 exceeds +360 degrees around the 120th frame remains. It's still there. This problem can be resolved by subjecting the unwrapped signal to the phase unwrapping process again. However, although the present invention targets weak vibrations at the sub-pixel level, if phase wrapping occurs, it can be interpreted as a large movement that exceeds the sub-pixel level. It is also possible to use phase unwrap processing for the purpose of detecting that a group or image region is to be excluded from the processing target region.

ステップ２０８で、導出部１１Ｂは、以上の処理を経て得られた位相画像Ｉ_θにおいて、ステップ２０４の処理によって決定した処理対象領域の各画素における、時間的に隣接するフレーム画像間の差分を示す時系列の信号である、上述した位相差信号を算出する。 In step 208, the deriving unit 11B indicates the difference between temporally adjacent frame images in each pixel of the processing target area determined by the processing in step 204 in the phase image I _θ obtained through the above processing. The above-described phase difference signal, which is a time-series signal, is calculated.

ステップ２１０で、抽出部１１Ｃは、任意のフレームにおいて、ステップ２０８の処理によって導出した位相差信号の度数分布を算出する。図８には、ステップ２１０の処理によって得られた１フレームにおける度数分布の一例が示されている。なお、図８に示す度数分布では、隣り合うフレーム間の位相差を計算していることから、極端に大きな位相差は無視して、－９０度～＋９０度の範囲内のみを１度刻みで計数している。 In step 210, the extraction unit 11C calculates the frequency distribution of the phase difference signal derived by the process in step 208 in an arbitrary frame. FIG. 8 shows an example of the frequency distribution in one frame obtained by the process of step 210. In addition, in the frequency distribution shown in Figure 8, since the phase difference between adjacent frames is calculated, extremely large phase differences are ignored and only the range from -90 degrees to +90 degrees is calculated in 1 degree increments. I am counting.

ステップ２１２で、抽出部１１Ｃは、算出した度数分布において度数が最大となる範囲の位相差を有する画素群における位相差信号の代表値を全フレームで求める。なお、本実施形態では、上記代表値として位相差信号の平均値を適用しているが、これに限定されるものではない。例えば、上記代表値として、位相差信号の中央値、又は、最大値、又は、最小値を適用する形態としてもよいことは上述した通りである。 In step 212, the extraction unit 11C obtains, for all frames, the representative value of the phase difference signal in the pixel group having the phase difference in the range where the frequency is maximum in the calculated frequency distribution. Note that in this embodiment, the average value of the phase difference signals is used as the representative value, but the present invention is not limited to this. For example, as described above, the median value, maximum value, or minimum value of the phase difference signal may be used as the representative value.

ステップ２１４で、特定部１１Ｄは、ステップ２１２の処理によって得られた代表値の時系列信号を撮影装置２０の振動成分であるものと特定し、特定した振動成分を示す情報を記憶部１３の所定領域に記憶した後に本振動成分特定処理を終了する。 In step 214, the identifying unit 11D identifies the time series signal of the representative value obtained by the process in step 212 as a vibration component of the imaging device 20, and stores information indicating the identified vibration component in a predetermined location in the storage unit 13. After storing it in the area, the main vibration component specifying process ends.

次に、図９～図１２を参照して、本実施形態に係る画像処理装置１０による撮影装置の振動成分の特定に関する検証実験について説明する。 Next, with reference to FIGS. 9 to 12, a verification experiment regarding identification of the vibration component of the photographing device by the image processing device 10 according to the present embodiment will be described.

ここでは、図９に示す空間波長８画素で、かつ、１００画素×１００画素の正弦波画像を、撮影装置の振動成分に見立てた振幅０．５画素、時間周波数２Ｈｚで振動させた動画像で検証した。複素空間フィルタとしては、空間波長λ＝８画素にピークを持つガウス関数型のバンドパスフィルタを用いた。 Here, a sine wave image of 100 pixels x 100 pixels with a spatial wavelength of 8 pixels shown in Fig. 9 is used as a moving image vibrated at an amplitude of 0.5 pixels and a temporal frequency of 2 Hz, which is likened to the vibration component of the imaging device. Verified. As the complex spatial filter, a Gaussian function type bandpass filter having a peak at spatial wavelength λ=8 pixels was used.

この場合、度数分布が最大となる位相差θ_ｍａｘとなる画素群における平均位相差信号ΔＩ_{θ＿ａｖｅ}（ｔ）は図１０に示すものとなった。この信号は位相差θ_ｍａｘとなる画素群の平均角速度［ｒａｄ／ｓ］に相当するため、画素の移動量信号ｄ（ｔ）［画素］に変換するために以下の数式に示す操作を行った。この数式におけるｄ（ｔ）が撮影装置の振動成分の信号と推測される（図１１参照。）。 In this case, the average phase difference signal ΔI _{θ_ave} (t) in the pixel group where the frequency distribution has the maximum phase difference θ _max is as shown in FIG. Since this signal corresponds to the average angular velocity [rad/s] of the pixel group with the phase difference θ _max , the operation shown in the following formula was performed to convert it to the pixel movement amount signal d(t) [pixel]. . It is assumed that d(t) in this formula is a signal of the vibration component of the imaging device (see FIG. 11).

以上より、全画素において上記の振動成分が撮影装置の振動成分として存在すると仮定することができる。そこで、画像中の任意の１画素のみを抽出して振動検出を行った信号Ａ（ｔ）と、信号Ａ（ｔ）から上記の撮影装置の振動成分ｄ（ｔ）を差し引いた信号との、それぞれの時間周波数スペクトルを図１２Ａ及び図１２Ｂに示す。図１２Ａより信号Ａ（ｔ）は２Ｈｚにピークを持つことが分かる。検証用の動画像は撮影装置の振動に見立てた振動成分以外は含まれないため、図１２Ｂより、本実施形態に係る手法によって撮影装置の振動成分が除去されていることが確認された。 From the above, it can be assumed that the above vibration component exists as a vibration component of the photographing device in all pixels. Therefore, a signal A(t) obtained by extracting only one arbitrary pixel in the image and performing vibration detection, and a signal obtained by subtracting the vibration component d(t) of the photographing device from the signal A(t), The respective time-frequency spectra are shown in FIGS. 12A and 12B. It can be seen from FIG. 12A that the signal A(t) has a peak at 2 Hz. Since the verification moving image does not contain vibration components other than vibrations of the photographing device, it was confirmed from FIG. 12B that the vibration component of the photographing device was removed by the method according to the present embodiment.

以上説明したように、本実施形態によれば、複数の物体が被写体として含まれ、かつ、撮影装置による撮影によって得られた動画像を取得する取得部１１Ａと、取得部１１Ａによって取得された動画像における各フレーム画像間の、各フレーム画像を複数に分割した分割領域毎の上記物体の移動速度を示す物理量を導出する導出部１１Ｂと、導出部１１Ｂによって導出された分割領域毎の物理量のうち、最も頻度が高い物理量に対応する分割領域を抽出する抽出部１１Ｃと、抽出部１１Ｃによって抽出された分割領域の振動成分を撮影装置の振動成分であるとして特定する特定部１１Ｄと、を備えている。従って、動画像から撮影装置の微細な振動成分を精度良く検出することができる。 As described above, according to the present embodiment, the acquisition unit 11A acquires a moving image in which a plurality of objects are included as subjects and is obtained by photographing with a photographing device, and the moving image acquired by the acquisition unit 11A. A deriving unit 11B that derives a physical quantity indicating the moving speed of the object for each divided area obtained by dividing each frame image into a plurality of parts between each frame image in the image, and a physical quantity for each divided area derived by the deriving unit 11B. , an extraction unit 11C that extracts a divided area corresponding to the most frequently occurring physical quantity, and a identification unit 11D that identifies the vibration component of the divided area extracted by the extraction unit 11C as a vibration component of the imaging device. There is. Therefore, minute vibration components of the photographing device can be detected with high accuracy from the moving image.

また、本実施形態によれば、動画像に対して複素空間フィルタリング処理を行って位相画像を生成し、当該位相画像の時間的に隣接するフレーム画像間で、かつ、上記分割領域毎の差分を示す位相差信号を上記物理量として導出している。従って、より簡易に物体の移動速度を示す物理量を導出することができる。 Further, according to the present embodiment, a phase image is generated by performing complex space filtering processing on a moving image, and the difference between temporally adjacent frame images of the phase image and for each of the divided regions is calculated. The phase difference signal shown is derived as the above-mentioned physical quantity. Therefore, the physical quantity indicating the moving speed of the object can be derived more easily.

また、本実施形態によれば、位相画像がラッピングされた位相である場合、当該位相画像の各画素の位相に対してアンラップ処理を行った後に位相差信号を導出している。従って、より高精度に当該位相差信号を導出することができる。 Furthermore, according to the present embodiment, when the phase image has a wrapped phase, the phase difference signal is derived after unwrapping the phase of each pixel of the phase image. Therefore, the phase difference signal can be derived with higher accuracy.

また、本実施形態によれば、抽出された分割領域における位相差信号の代表値の時系列信号を、撮影装置の振動成分であるとして特定している。従って、上記代表値を適用しない場合に比較して、より簡易に撮影装置の振動成分を特定することができる。 Further, according to the present embodiment, the time series signal of the representative value of the phase difference signal in the extracted divided region is specified as being the vibration component of the photographing device. Therefore, the vibration component of the imaging device can be specified more easily than when the representative value is not applied.

また、本実施形態によれば、動画像における予め定められた一部の領域のみを対象として上記物理量を導出している。従って、演算負荷を低減することができる。 Further, according to the present embodiment, the physical quantity is derived only for a predetermined part of the moving image. Therefore, calculation load can be reduced.

また、本実施形態によれば、上記一部の領域を、動画像におけるＳ／Ｎ比が所定レベル以上である領域としている。従って、より効果的に演算負荷を低減しながら振動成分を精度よく検出することができる。 Further, according to the present embodiment, the above-mentioned part of the area is an area where the S/N ratio in the moving image is equal to or higher than a predetermined level. Therefore, vibration components can be detected with high accuracy while reducing the calculation load more effectively.

さらに、本実施形態によれば、上記分割領域を、動画像における１画素毎の領域、又は、複数の画素群毎の領域としている。従って、より簡易かつ高精度に撮影装置の振動成分を検出することができる。 Furthermore, according to the present embodiment, the divided areas are areas for each pixel in a moving image or areas for each group of pixels. Therefore, the vibration component of the photographing device can be detected more simply and with high precision.

なお、上記実施形態では、処理対象動画像情報により示される動画像の全体に対して位相画像を導出する場合について説明したが、これに限定されない。例えば、畳み込みカーネルによる手法に限定すれば、図４におけるステップ２０４の処理をステップ２０２の処理に先立って実行した上で、特定の注目画素とコンボリューションに必要な周辺画素のみの空間情報に限定して位相差信号まで算出するまでの処理を行い、注目画素を移動しながらステップ２０２の処理とステップ２０８の処理を繰り返して実行し、必要な画素全体の位相差信号を算出する形態としてもよい。この形態では、最終判定に至るまでの中間処理情報を逐次破棄しながら処理できるため、一連の処理の実施中に必要な記憶容量を最小限に抑えることが可能となる。 Note that in the above embodiment, a case has been described in which a phase image is derived for the entire moving image indicated by processing target moving image information, but the present invention is not limited to this. For example, if the method is limited to a convolution kernel, the process of step 204 in FIG. 4 is executed prior to the process of step 202, and the spatial information is limited to only the surrounding pixels necessary for convolution with a specific pixel of interest. It is also possible to perform the process up to calculating the phase difference signal, and then repeat the process of step 202 and step 208 while moving the pixel of interest to calculate the phase difference signal of all the necessary pixels. In this embodiment, intermediate processing information up to the final determination can be sequentially discarded during processing, making it possible to minimize the storage capacity required during execution of a series of processing.

また、上記実施形態において、例えば、取得部１１Ａ、導出部１１Ｂ、抽出部１１Ｃ及び特定部１１Ｄの各処理を実行する処理部（processing unit）のハードウェア的な構造としては、次に示す各種のプロセッサ（processor）を用いることができる。上記各種のプロセッサには、前述したように、ソフトウェア（プログラム）を実行して処理部として機能する汎用的なプロセッサであるＣＰＵに加えて、ＦＰＧＡ（Field-Programmable Gate Array）等の製造後に回路構成を変更可能なプロセッサであるプログラマブルロジックデバイス（Programmable Logic Device：PLD）、ＡＳＩＣ（Application Specific Integrated Circuit）等の特定の処理を実行させるために専用に設計された回路構成を有するプロセッサである専用電気回路等が含まれる。 Further, in the above embodiment, for example, the hardware structure of a processing unit that executes each process of the acquisition unit 11A, derivation unit 11B, extraction unit 11C, and identification unit 11D includes the following various types. A processor can be used. As mentioned above, the various processors mentioned above include the CPU, which is a general-purpose processor that executes software (programs) and functions as a processing unit, as well as circuit configurations such as FPGA (Field-Programmable Gate Array) after manufacturing. A programmable logic device (PLD), which is a processor that can be changed, and a dedicated electric circuit, which is a processor that has a circuit configuration specifically designed to execute a specific process, such as an ASIC (Application Specific Integrated Circuit) etc. are included.

処理部は、これらの各種のプロセッサのうちの１つで構成されてもよいし、同種又は異種の２つ以上のプロセッサの組み合わせ（例えば、複数のＦＰＧＡの組み合わせや、ＣＰＵとＦＰＧＡとの組み合わせ）で構成されてもよい。また、処理部を１つのプロセッサで構成してもよい。 The processing unit may be configured with one of these various processors, or a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs or a combination of a CPU and an FPGA). It may be composed of. Further, the processing section may be configured with one processor.

処理部を１つのプロセッサで構成する例としては、第１に、クライアント及びサーバ等のコンピュータに代表されるように、１つ以上のＣＰＵとソフトウェアの組み合わせで１つのプロセッサを構成し、このプロセッサが処理部として機能する形態がある。第２に、システムオンチップ（System On Chip：SoC）等に代表されるように、処理部を含むシステム全体の機能を１つのＩＣ（Integrated Circuit）チップで実現するプロセッサを使用する形態がある。このように、処理部は、ハードウェア的な構造として、上記各種のプロセッサの１つ以上を用いて構成される。 As an example of configuring the processing unit with one processor, first, as typified by computers such as clients and servers, one processor is configured with a combination of one or more CPUs and software, and this processor is There is a form that functions as a processing section. Second, there is a form of using a processor, such as a system on chip (SoC), in which the functions of the entire system including a processing section are realized by one IC (Integrated Circuit) chip. In this way, the processing section is configured as a hardware structure using one or more of the various processors described above.

更に、これらの各種のプロセッサのハードウェア的な構造としては、より具体的には、半導体素子などの回路素子を組み合わせた電気回路（circuitry）を用いることができる。 Furthermore, as the hardware structure of these various processors, more specifically, an electric circuit (circuitry) that is a combination of circuit elements such as semiconductor elements can be used.

１０画像処理装置
１１ＣＰＵ
１１Ａ取得部
１１Ｂ導出部
１１Ｃ抽出部
１１Ｄ特定部
１２メモリ
１３記憶部
１３Ａ振動成分特定プログラム
１３Ｂ動画像データベース
１３Ｃ複素空間フィルタデータベース
１４入力部
１５表示部
１６媒体読み書き装置
１７記録媒体
１８通信Ｉ／Ｆ部
２０撮影装置 10 Image processing device 11 CPU
11A Acquisition unit 11B Derivation unit 11C Extraction unit 11D Specification unit 12 Memory 13 Storage unit 13A Vibration component identification program 13B Moving image database 13C Complex spatial filter database 14 Input unit 15 Display unit 16 Media reading/writing device 17 Recording medium 18 Communication I/F unit 20 Photography device

Claims

an acquisition unit that acquires a moving image that includes a plurality of objects as subjects and that is obtained by photographing with a photographing device;
a derivation unit that derives a physical quantity indicating a moving speed of the object for each divided region obtained by dividing each frame image into a plurality of regions between each frame image in the moving image acquired by the acquisition unit;
an extraction unit that extracts a divided area corresponding to the most frequently occurring physical quantity among the physical quantities for each divided area derived by the deriving unit;
a specifying unit that specifies a vibration component of the divided region extracted by the extraction unit as a vibration component of the photographing device;
Equipped with
The deriving unit performs a complex spatial filtering process on the moving image to generate a phase image, and calculates a phase difference between temporally adjacent frame images of the phase image and indicating a difference for each of the divided regions. Deriving the signal as the physical quantity,
The identifying unit identifies a time series signal of the representative value of the phase difference signal in the divided region extracted by the extracting unit as a vibration component of the imaging device,
The vibration component includes the amplitude and temporal frequency of vibration of the imaging device,
Image processing device.

When the phase image is a wrapped phase, the derivation unit derives the phase difference signal after performing an unwrapping process on the phase of each pixel of the phase image.
The image processing device according to claim 1 .

The representative value of the phase difference signal is an average value, a median value, a maximum value, or a minimum value of the phase difference signal, respectively.
The image processing device according to claim 1 or claim 2 .

The derivation unit derives the physical quantity only from a predetermined part of the moving image.
The image processing device according to any one of claims 1 to 3 .

The partial area is an area where the S/N ratio in the moving image is equal to or higher than a predetermined level;
The image processing device according to claim 4 .

The divided area is an area for each pixel in the moving image, or an area for each group of pixels,
The image processing device according to any one of claims 1 to 5 .

Obtaining a moving image containing multiple objects as subjects and obtained by shooting with a shooting device,
Deriving a physical quantity indicating the moving speed of the object for each divided area obtained by dividing each frame image into a plurality of parts between each frame image in the acquired moving image,
Among the derived physical quantities for each divided region, extract the divided region corresponding to the most frequently occurring physical quantity,
identifying a vibration component of the extracted divided region as a vibration component of the imaging device;
An image processing program for causing a computer to perform processing ,
A phase image is generated by performing a complex space filtering process on the moving image, and a phase difference signal indicating a difference between temporally adjacent frame images of the phase image and for each divided area is used as the physical quantity. Derive,
identifying a time series signal of a representative value of the phase difference signal in the extracted divided region as being a vibration component of the imaging device;
The vibration component includes the amplitude and temporal frequency of vibration of the imaging device,
Image processing program .