JP7357836B2

JP7357836B2 - Image processing device and image processing program

Info

Publication number: JP7357836B2
Application number: JP2019227634A
Authority: JP
Inventors: 一幸三浦; 篤志長
Original assignee: Takenaka Corp; Yamaguchi University NUC
Current assignee: Takenaka Corp; Yamaguchi University NUC
Priority date: 2019-12-17
Filing date: 2019-12-17
Publication date: 2023-10-10
Anticipated expiration: 2039-12-17
Also published as: JP2021096167A

Description

本発明は、画像処理装置及び画像処理プログラムに関する。 The present invention relates to an image processing device and an image processing program.

大地震が発生した直後の建物の健全性を評価する際には、当該建物の詳細な診断を行うに先立って避難要否の判断を行うための一次的な診断が重要となる。そこで、本発明の発明者らは、特許文献１において、加速度センサ等に依存せずに一次的な簡易診断が可能となり得る建物の健全性の診断を行うシステムを提案している。 When evaluating the health of a building immediately after a major earthquake occurs, it is important to perform a primary diagnosis to determine whether evacuation is necessary before conducting a detailed diagnosis of the building. In view of this, the inventors of the present invention have proposed in Patent Document 1 a system for diagnosing the health of a building that can perform a primary and simple diagnosis without relying on acceleration sensors or the like.

このシステムでは、撮影装置により対象となる建物が撮影された動画像を解析することで得られる、当該建物の固有振動数の地震発生前後の変化率等をもとに当該建物の健全性を数値化することができる。 This system numerically evaluates the health of a building based on the rate of change in the building's natural frequency before and after an earthquake, which is obtained by analyzing video images taken of the building using a camera. can be converted into

対象となる建物の固有振動数を算出する場合において、特許文献２～３及び非特許文献１に開示されている技術等を用いて動画像中の微振動成分を検出することで固有振動数を算定する場合には、撮影装置自身が微振動環境下に存在すると撮影装置の振動と建物の振動との切り分けが必要となる。 When calculating the natural frequency of a target building, the natural frequency can be calculated by detecting minute vibration components in moving images using the techniques disclosed in Patent Documents 2 to 3 and Non-Patent Document 1. When calculating, if the imaging device itself exists in a microvibration environment, it is necessary to distinguish between the vibrations of the imaging device and the vibrations of the building.

撮影装置の振動と建物の振動との時間周波数特性及び空間周波数特性が十分に異なる場合は、特許文献１にも記載されているように時空間周波数領域上で分離することが可能となる。しかし、撮影装置の振動特性と建物の振動特性とが時空間周波数領域上でラップする場合は単純に分離することができないため、本来計測したい建物の振動の固有振動数等の特性を正しく評価できない場合がある。 If the time-frequency characteristics and spatial frequency characteristics of the vibration of the photographing device and the vibration of the building are sufficiently different, it becomes possible to separate them in the spatio-temporal frequency domain as described in Patent Document 1. However, if the vibration characteristics of the imaging device and the vibration characteristics of the building overlap in the spatio-temporal frequency domain, they cannot be simply separated, making it impossible to correctly evaluate the characteristics such as the natural frequency of the vibration of the building that is originally intended to be measured. There are cases.

一方で、撮影装置の振動を軽減させる技術として、撮影装置と三脚等の付帯器具とを含めた撮影システム全体の重量を重くする、防振ゴムやスプリング等の器具を導入する等といった物理的な防振対策も考えられる。しかし、この対策では、地盤の振動等に含まれる数Ｈｚ以下程度の周波数帯域での劇的な効果は期待できない。 On the other hand, there are some physical techniques to reduce the vibration of photographic equipment, such as increasing the weight of the entire photographing system including the photographing equipment and incidental equipment such as tripods, and introducing equipment such as anti-vibration rubber and springs. Anti-vibration measures can also be considered. However, this measure cannot be expected to have a dramatic effect in the frequency band of several Hz or less, which is included in ground vibrations.

そこで、撮影装置の振動成分を検出（さらには除去まで）できるソフトウェア的な処理が必要とされている。なお、このようなソフトウェア的な対策は、特許文献１に記載されている移動撮影による建物の健全性を診断する用途に限らず、固定撮影での建物の健全性の診断の用途も含む、特許文献２～３及び非特許文献１に代表される被写体の時空間フィルタリングに基づく動画像処理のみの振動解析手法において全般的に有用である。 Therefore, there is a need for software-based processing that can detect (and even remove) the vibration components of the imaging device. Note that such software measures are not limited to the use of diagnosing the health of buildings using moving photography as described in Patent Document 1, but also include the use of diagnosing the health of buildings using fixed photography. The present invention is generally useful in vibration analysis methods that involve only moving image processing based on spatiotemporal filtering of objects, as typified by Documents 2 and 3 and Non-Patent Document 1.

ソフトウェア的な振動成分の検出処理及び除去処理に関する技術として、特許文献４には、機械式又は光学式ではない、デジタル式の手ぶれ補正技術が開示されている。この技術では、観賞や記録等の目視用途、パノラマ合成や３次元再構成等の画像間のマッチング用途等において、撮影装置自身の動きに起因する画像上の変動を除去することができる。 As a technology related to software-based vibration component detection and removal processing, Patent Document 4 discloses a digital image stabilization technology that is not mechanical or optical. With this technology, it is possible to remove fluctuations in images caused by the movement of the imaging device itself, in visual applications such as viewing and recording, and in matching applications between images such as panoramic composition and three-dimensional reconstruction.

特開２０１８－１３６１９１号公報Japanese Patent Application Publication No. 2018-136191 米国特許出願公開第２０１４／００７２１９０号明細書US Patent Application Publication No. 2014/0072190 米国特許第９３２４００５号明細書US Patent No. 9324005 特開２０１９－００４４５１号公報JP2019-004451A

J.G. Chen, A. Davis, N. Wadhwa, F. Durand, W.T. Freeman, and O. Buyukozturk, “Video Camera-based Vibration Measurement for Condition Assessment of Civil Infrastructure”, International Symposium Non-Destructive Testing in Civil Engineering (2015)J.G. Chen, A. Davis, N. Wadhwa, F. Durand, W.T. Freeman, and O. Buyukozturk, “Video Camera-based Vibration Measurement for Condition Assessment of Civil Infrastructure”, International Symposium Non-Destructive Testing in Civil Engineering (2015)

しかしながら、目視用途で主に検出や除去を行わなければならない対象は、数画素（ピクセル）以上に及ぶ変動成分であり、特許文献４に開示されている技術では、サブピクセル級の極微細な変動の検出等についてはなんら記載されていない。また、マッチング用途ではサブピクセル級の精度が要求される場合があるが、複数の画素の対応関係に基づく幾何変換をベースとすることが多く、建物の常時微動を撮影した動画像のような、一見すると動きが存在しないような被写体において撮影装置の振動成分が混入した場合に正しく機能するとは限らない。 However, in visual applications, the main targets that must be detected and removed are fluctuation components that extend over several pixels (pixels), and the technology disclosed in Patent Document 4 is capable of detecting and removing ultrafine fluctuations on the sub-pixel level. There is no mention of detection, etc. In addition, although sub-pixel-level accuracy may be required for matching purposes, it is often based on geometric transformation based on the correspondence between multiple pixels, and it is often If a vibration component from the photographing device is mixed into a subject that does not appear to be moving at first glance, it may not necessarily function correctly.

本開示は、以上の事情を鑑みて成されたものであり、動画像から撮影装置の微細な振動成分を精度良く検出することができる画像処理装置及び画像処理プログラムを提供することを目的とする。 The present disclosure has been made in view of the above circumstances, and aims to provide an image processing device and an image processing program that can accurately detect minute vibration components of a photographing device from a moving image. .

請求項１に記載の本発明に係る画像処理装置は、複数の物体が被写体として含まれ、かつ、撮影装置による撮影によって得られた動画像を取得する取得部と、前記取得部によって取得された動画像における、各々前記複数の物体の何れかの領域である複数の物体領域を抽出する抽出部と、前記取得部によって取得された動画像における、前記抽出部によって抽出された前記複数の物体領域の各々に対する振動解析を行い、前記複数の物体領域の各々の間で共通となる振動成分を前記撮影装置の振動成分であるとして特定する特定部と、を備える。 The image processing device according to the present invention according to claim 1 includes: an acquisition unit that acquires a moving image that includes a plurality of objects as subjects and that is obtained by photographing with a photographing device; an extraction unit that extracts a plurality of object regions, each of which is one of the plurality of objects, in a moving image; and the plurality of object regions extracted by the extraction unit in the moving image acquired by the acquisition unit. an identification unit that performs vibration analysis on each of the plurality of object regions and identifies a vibration component that is common among each of the plurality of object regions as a vibration component of the photographing device.

請求項１に記載の本発明に係る画像処理装置によれば、撮影装置による撮影によって得られた動画像における、各々複数の物体の何れかの領域である複数の物体領域を抽出し、上記動画像における、上記複数の物体領域の各々に対する振動解析を行い、当該複数の物体領域の各々の間で共通となる振動成分を上記撮影装置の振動成分であるとして特定することで、動画像から撮影装置の微細な振動成分を精度良く検出することができる。 According to the image processing device according to the present invention as set forth in claim 1, a plurality of object regions, each of which is one of the regions of a plurality of objects, are extracted from a moving image obtained by photographing with a photographing device; By performing vibration analysis on each of the plurality of object regions in the image and identifying a common vibration component between each of the plurality of object regions as a vibration component of the photographing device, it is possible to capture images from a moving image. It is possible to detect minute vibration components of the device with high precision.

請求項２に記載の本発明に係る画像処理装置は、請求項１に記載の画像処理装置であって、前記抽出部は、前記動画像に含まれるＳ／Ｎ比が所定レベル以上である領域を検出し、検出した領域における空間的に連続する部分画素群の各領域を前記複数の物体領域として抽出する。 An image processing apparatus according to the present invention according to claim 2 is the image processing apparatus according to claim 1, in which the extraction section extracts an area in which the S/N ratio included in the moving image is equal to or higher than a predetermined level. is detected, and each area of a spatially continuous partial pixel group in the detected area is extracted as the plurality of object areas.

請求項２に記載の本発明に係る画像処理装置によれば、動画像に含まれるＳ／Ｎ比が所定レベル以上である領域を検出し、検出した領域における空間的に連続する部分画素群の各領域を上記複数の物体領域として抽出することで、より簡易に当該複数の物体領域を高い信頼性で抽出することができる。 According to the image processing device according to the present invention as set forth in claim 2, a region included in a moving image where the S/N ratio is equal to or higher than a predetermined level is detected, and a group of spatially continuous partial pixels in the detected region is detected. By extracting each region as the plurality of object regions, the plurality of object regions can be extracted more easily and with high reliability.

請求項３に記載の本発明に係る画像処理装置は、請求項１又は請求項２に記載の画像処理装置であって、前記動画像に対して複素空間フィルタリング処理を行うことにより位相画像を生成する生成部と、前記抽出部によって抽出された前記複数の物体領域について、前記生成部によって生成された位相画像の前記動画像における各フレーム画像間の変動を示す信号である位相変動信号を導出する導出部と、前記導出部によって導出された位相変動信号を、周波数解析によって時間周波数スペクトルに変換する変換部と、前記変換部によって得られた時間周波数スペクトルを用いて、同一領域内の時間周波数スペクトルを平均化した位相変動スペクトルを前記複数の物体領域の各々について算出する算出部と、を更に備え、前記特定部は、前記算出部によって算出された位相変動スペクトルにおいて、前記複数の物体領域に共通するピーク周波数を含む所定周波数範囲を、前記撮影装置の振動成分であるとして特定する。 The image processing device according to the present invention according to claim 3 is the image processing device according to claim 1 or 2, which generates a phase image by performing complex spatial filtering processing on the moving image. and a phase variation signal that is a signal indicating a variation between each frame image in the moving image of the phase image generated by the generation unit, for the plurality of object regions extracted by the extraction unit. a derivation unit; a conversion unit that converts the phase fluctuation signal derived by the derivation unit into a time-frequency spectrum by frequency analysis; and a time-frequency spectrum in the same region using the time-frequency spectrum obtained by the conversion unit. further comprising a calculation unit that calculates, for each of the plurality of object regions, a phase fluctuation spectrum that is an average of A predetermined frequency range including the peak frequency is identified as a vibration component of the photographing device.

請求項３に記載の本発明に係る画像処理装置によれば、動画像に対して複素空間フィルタリング処理を行うことにより位相画像を生成し、上記複数の物体領域について、生成した位相画像の上記動画像における各フレーム画像間の変動を示す信号である位相変動信号を導出し、導出した位相変動信号を、周波数解析によって時間周波数スペクトルに変換し、当該時間周波数スペクトルを用いて、同一領域内の時間周波数スペクトルを平均化した位相変動スペクトルを上記複数の物体領域の各々について算出し、算出した位相変動スペクトルにおいて、上記複数の物体領域に共通するピーク周波数を含む所定周波数範囲を、上記撮影装置の振動成分であるとして特定することで、より高精度に撮影装置の振動成分を特定することができる。 According to the image processing device according to the present invention as set forth in claim 3, a phase image is generated by performing complex spatial filtering processing on a moving image, and the moving image of the generated phase image is generated for the plurality of object regions. A phase fluctuation signal, which is a signal indicating the fluctuation between each frame image in the image, is derived, the derived phase fluctuation signal is converted into a time-frequency spectrum by frequency analysis, and the time-frequency spectrum is used to calculate the time within the same area. A phase fluctuation spectrum obtained by averaging the frequency spectra is calculated for each of the plurality of object regions, and in the calculated phase fluctuation spectrum, a predetermined frequency range including a peak frequency common to the plurality of object regions is determined by the vibration of the imaging device. By specifying it as a component, it is possible to specify the vibration component of the photographing device with higher accuracy.

請求項４に記載の本発明に係る画像処理装置は、請求項３に記載の画像処理装置であって、前記導出部は、前記位相画像がラッピングされた位相である場合、当該位相画像の各画素の位相に対してアンラップ処理を行った後に前記位相変動信号を導出する。 The image processing device according to the present invention according to claim 4 is the image processing device according to claim 3, in which, when the phase image is a wrapped phase, the derivation unit The phase fluctuation signal is derived after unwrapping the phase of the pixel.

請求項４に記載の本発明に係る画像処理装置によれば、上記位相画像がラッピングされた位相である場合、当該位相画像の各画素の位相に対してアンラップ処理を行った後に上記位相変動信号を導出することで、より高精度に当該位相変動信号を導出することができる。 According to the image processing device according to the present invention as set forth in claim 4, when the phase image has a wrapped phase, the phase fluctuation signal is processed after unwrapping the phase of each pixel of the phase image. By deriving , the phase fluctuation signal can be derived with higher accuracy.

請求項５に記載の本発明に係る画像処理プログラムは、複数の物体が被写体として含まれ、かつ、撮影装置による撮影によって得られた動画像を取得し、取得した動画像における、各々前記複数の物体の何れかの領域である複数の物体領域を抽出し、取得した動画像における、抽出した前記複数の物体領域の各々に対する振動解析を行い、前記複数の物体領域の各々の間で共通となる振動成分を前記撮影装置の振動成分であるとして特定する、処理をコンピュータに実行させる。 The image processing program according to the present invention according to claim 5 acquires a moving image in which a plurality of objects are included as subjects and is obtained by photographing with a photographing device, and each of the plurality of objects in the acquired moving image. A plurality of object regions, which are any regions of the object, are extracted, and a vibration analysis is performed on each of the plurality of extracted object regions in the acquired video image, and vibration analysis is performed on each of the plurality of object regions, which is common to each of the plurality of object regions. A computer is caused to execute a process of identifying the vibration component as a vibration component of the photographing device.

請求項５に記載の本発明に係る画像処理プログラムによれば、撮影装置による撮影によって得られた動画像における、各々複数の物体の何れかの領域である複数の物体領域を抽出し、上記動画像における、上記複数の物体領域の各々に対する振動解析を行い、当該複数の物体領域の各々の間で共通となる振動成分を上記撮影装置の振動成分であるとして特定することで、動画像から撮影装置の微細な振動成分を精度良く検出することができる。 According to the image processing program according to the present invention as set forth in claim 5, a plurality of object regions, each of which is one of a plurality of object regions, in a moving image obtained by photographing with a photographing device are extracted, and By performing vibration analysis on each of the plurality of object regions in the image and identifying a common vibration component between each of the plurality of object regions as a vibration component of the photographing device, it is possible to capture images from a moving image. It is possible to detect minute vibration components of the device with high precision.

以上説明したように、本発明によれば、動画像から撮影装置の微細な振動成分を精度良く検出することができる。 As described above, according to the present invention, minute vibration components of the photographing device can be detected with high accuracy from a moving image.

実施形態に係る画像処理装置のハードウェア構成の一例を示すブロック図である。FIG. 1 is a block diagram illustrating an example of a hardware configuration of an image processing device according to an embodiment. 実施形態に係る画像処理装置の機能的な構成の一例を示すブロック図である。FIG. 1 is a block diagram illustrating an example of a functional configuration of an image processing device according to an embodiment. 実施形態に係る動画像データベースの構成の一例を示す模式図である。FIG. 1 is a schematic diagram showing an example of the configuration of a moving image database according to an embodiment. 実施形態に係る振動成分特定処理の一例を示すフローチャートである。It is a flow chart which shows an example of vibration component identification processing concerning an embodiment. 実施形態に係る動画像（代表画像）の一例を示す正面図である。FIG. 2 is a front view showing an example of a moving image (representative image) according to the embodiment. 実施形態に係る位相画像（水平成分）の一例を示す正面図である。It is a front view showing an example of a phase image (horizontal component) concerning an embodiment. 実施形態に係る位相画像（垂直成分）の一例を示す正面図である。It is a front view showing an example of a phase image (vertical component) concerning an embodiment. 実施形態に係る動画像の一部画像の一例を示す正面図である。FIG. 2 is a front view showing an example of a partial image of a moving image according to an embodiment. 図７Ａに示す画像に対してラベリング処理を実施した結果の一例を示す正面図である。7A is a front view showing an example of a result of labeling processing performed on the image shown in FIG. 7A. FIG. 実施形態に係る位相変動信号の一例を示すグラフである。3 is a graph showing an example of a phase fluctuation signal according to an embodiment. 実施形態に係るサンプル画像（正弦波画像）の一例を示す正面図である。It is a front view showing an example of a sample image (sine wave image) concerning an embodiment. 実施形態に係る時間周波数スペクトルの一例を示すグラフである。It is a graph showing an example of a time frequency spectrum concerning an embodiment. 実施形態に係る実証実験の説明に供する時間周波数スペクトルの一例を示すグラフである。It is a graph which shows an example of the time frequency spectrum used for explanation of the demonstration experiment based on embodiment.

以下、図面を参照して、本発明を実施するための形態例を詳細に説明する。なお、本実施形態では、本発明を、風加振や地盤振動の影響下での微動状態における建物を撮影した動画像を処理対象とした画像処理装置に適用した場合について説明する。 Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to the drawings. In this embodiment, a case will be described in which the present invention is applied to an image processing apparatus that processes a moving image of a building in a state of slight movement under the influence of wind excitation or ground vibration.

まず、図１及び図２を参照して、本実施形態に係る画像処理装置１０の構成を説明する。なお、画像処理装置１０の例としては、パーソナルコンピュータ及びサーバコンピュータ等の情報処理装置が挙げられる。 First, the configuration of an image processing apparatus 10 according to the present embodiment will be described with reference to FIGS. 1 and 2. Note that examples of the image processing device 10 include information processing devices such as a personal computer and a server computer.

図１に示すように、本実施形態に係る画像処理装置１０は、ＣＰＵ（Central Processing Unit）１１、一時記憶領域としてのメモリ１２、不揮発性の記憶部１３、キーボードとマウス等の入力部１４、液晶ディスプレイ等の表示部１５、媒体読み書き装置（Ｒ／Ｗ）１６及び通信インタフェース（Ｉ／Ｆ）部１８を備えている。ＣＰＵ１１、メモリ１２、記憶部１３、入力部１４、表示部１５、媒体読み書き装置１６及び通信Ｉ／Ｆ部１８はバスＢ１を介して互いに接続されている。媒体読み書き装置１６は、記録媒体１７に書き込まれている情報の読み出し及び記録媒体１７への情報の書き込みを行う。 As shown in FIG. 1, the image processing device 10 according to the present embodiment includes a CPU (Central Processing Unit) 11, a memory 12 as a temporary storage area, a nonvolatile storage section 13, an input section 14 such as a keyboard and a mouse, It includes a display section 15 such as a liquid crystal display, a medium read/write device (R/W) 16, and a communication interface (I/F) section 18. The CPU 11, memory 12, storage section 13, input section 14, display section 15, medium reading/writing device 16, and communication I/F section 18 are connected to each other via a bus B1. The medium read/write device 16 reads information written in the recording medium 17 and writes information to the recording medium 17 .

記憶部１３はＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、フラッシュメモリ等によって実現される。記憶媒体としての記憶部１３には、振動成分特定プログラム１３Ａが記憶されている。振動成分特定プログラム１３Ａは、振動成分特定プログラム１３Ａが書き込まれた記録媒体１７が媒体読み書き装置１６にセットされ、媒体読み書き装置１６が記録媒体１７からの振動成分特定プログラム１３Ａの読み出しを行うことで、記憶部１３へ記憶される。ＣＰＵ１１は、振動成分特定プログラム１３Ａを記憶部１３から読み出してメモリ１２に展開し、振動成分特定プログラム１３Ａが有するプロセスを順次実行する。また、記憶部１３には、動画像データベース１３Ｂ、複素空間フィルタデータベース１３Ｃ等の各種データベースが記憶される。 The storage unit 13 is realized by an HDD (Hard Disk Drive), an SSD (Solid State Drive), a flash memory, or the like. A vibration component identification program 13A is stored in the storage unit 13 as a storage medium. The vibration component identification program 13A is created by setting the recording medium 17 on which the vibration component identification program 13A has been written into the medium reading/writing device 16, and reading the vibration component identification program 13A from the recording medium 17. It is stored in the storage unit 13. The CPU 11 reads the vibration component identification program 13A from the storage unit 13, expands it into the memory 12, and sequentially executes the processes included in the vibration component identification program 13A. The storage unit 13 also stores various databases such as a moving image database 13B and a complex spatial filter database 13C.

本実施形態に係る画像処理装置１０は、通信Ｉ／Ｆ部１８に、動画像の撮影を行う撮影装置２０が接続される。撮影装置２０は、撮影時に複数の建物を含むように撮影を行うためのものである。なお、撮影装置２０による撮影方法は、空撮、地上での人による移動撮影、三脚や固定部材等を用いた固定撮影等の何れの方法でもよい。また、本実施形態では、撮影装置２０としてカラー画像を撮影する撮影装置を適用しているが、これに限定されるものではなく、例えば、モノクロ画像を撮影する撮影装置を撮影装置２０として適用する形態としてもよい。 In the image processing device 10 according to the present embodiment, a photographing device 20 that photographs moving images is connected to the communication I/F section 18 . The photographing device 20 is for photographing so as to include a plurality of buildings at the time of photographing. Note that the photographing method using the photographing device 20 may be any method such as aerial photographing, moving photographing by a person on the ground, or fixed photographing using a tripod or a fixed member. Further, in the present embodiment, a photographing device that takes a color image is used as the photographing device 20, but the invention is not limited to this. For example, a photographing device that takes a monochrome image can be applied as the photographing device 20. It may also be a form.

次に、図２を参照して、本実施形態に係る画像処理装置１０の機能的な構成について説明する。図２に示すように、画像処理装置１０は、取得部１１Ａ、抽出部１１Ｂ、生成部１１Ｃ、導出部１１Ｄ、変換部１１Ｅ、算出部１１Ｆ及び特定部１１Ｇを含む。画像処理装置１０のＣＰＵ１１が振動成分特定プログラム１３Ａを実行することで、取得部１１Ａ、抽出部１１Ｂ、生成部１１Ｃ、導出部１１Ｄ、変換部１１Ｅ、算出部１１Ｆ及び特定部１１Ｇとして機能する。 Next, with reference to FIG. 2, the functional configuration of the image processing device 10 according to this embodiment will be described. As shown in FIG. 2, the image processing device 10 includes an acquisition section 11A, an extraction section 11B, a generation section 11C, a derivation section 11D, a conversion section 11E, a calculation section 11F, and a specification section 11G. By executing the vibration component specifying program 13A, the CPU 11 of the image processing device 10 functions as an acquisition section 11A, an extraction section 11B, a generation section 11C, a derivation section 11D, a conversion section 11E, a calculation section 11F, and a specification section 11G.

本実施形態に係る取得部１１Ａは、複数の物体が被写体として含まれ、かつ、撮影装置２０による撮影によって得られた動画像を取得する。なお、本実施形態では、上記物体として、建物を適用しているが、これに限定されるものではない。例えば、橋、塔等の建物を除く建造物、山、樹木等の自然物や、脈動などの生体情報、空調ダクトや変圧器などの設備機器、またはこれらの複数種類の組み合わせ等を上記物体として適用する形態としてもよい。 The acquisition unit 11A according to the present embodiment acquires a moving image that includes a plurality of objects as subjects and that is obtained by photographing with the photographing device 20. Note that in this embodiment, a building is used as the object, but the object is not limited to this. For example, buildings other than buildings such as bridges and towers, natural objects such as mountains and trees, biological information such as pulsation, equipment such as air conditioning ducts and transformers, or combinations of multiple types of these can be applied as the above objects. It may also be in the form of

また、本実施形態に係る抽出部１１Ｂは、取得部１１Ａによって取得された動画像における、各々上記複数の物体の何れかの領域である複数の物体領域を抽出する。そして、本実施形態に係る特定部１１Ｇは、取得部１１Ａによって取得された動画像における、抽出部１１Ｂによって抽出された複数の物体領域の各々に対する振動解析を行い、当該複数の物体領域の各々の間で共通となる振動成分を撮影装置２０の振動成分であるとして特定する。本実施形態に係る抽出部１１Ｂは、上記動画像に含まれるＳ／Ｎ比（Signal to Noise ratio）が所定レベル以上である領域を検出し、検出した領域における空間的に連続する部分画素群の各領域を上記複数の物体領域として抽出する。 Further, the extraction unit 11B according to the present embodiment extracts a plurality of object regions, each of which is one of the plurality of objects, in the moving image acquired by the acquisition unit 11A. The identification unit 11G according to the present embodiment performs vibration analysis on each of the plurality of object regions extracted by the extraction unit 11B in the moving image acquired by the acquisition unit 11A, and A vibration component that is common between the images is identified as a vibration component of the photographing device 20. The extraction unit 11B according to the present embodiment detects a region in which the S/N ratio (Signal to Noise ratio) included in the video image is equal to or higher than a predetermined level, and extracts a group of spatially continuous partial pixels in the detected region. Each region is extracted as the plurality of object regions.

一方、本実施形態に係る生成部１１Ｃは、上記動画像に対して複素空間フィルタリング処理を行うことにより位相画像を生成する。また、本実施形態に係る導出部１１Ｄは、抽出部１１Ｂによって抽出された複数の物体領域について、生成部１１Ｃによって生成された位相画像の上記動画像における各フレーム画像間の変動を示す信号である位相変動信号を導出する。また、本実施形態に係る変換部１１Ｅは、導出部１１Ｄによって導出された位相変動信号を、周波数解析によって時間周波数スペクトルに変換する。さらに、本実施形態に係る算出部１１Ｆは、変換部１１Ｅによって得られた時間周波数スペクトルを用いて、同一領域内の時間周波数スペクトルを平均化した位相変動スペクトルを上記複数の物体領域の各々について算出する。 On the other hand, the generation unit 11C according to the present embodiment generates a phase image by performing complex spatial filtering processing on the moving image. Further, the derivation unit 11D according to the present embodiment generates a signal indicating a variation between each frame image in the moving image of the phase image generated by the generation unit 11C for the plurality of object regions extracted by the extraction unit 11B. Derive the phase variation signal. Further, the converting unit 11E according to the present embodiment converts the phase fluctuation signal derived by the deriving unit 11D into a time-frequency spectrum by frequency analysis. Further, the calculation unit 11F according to the present embodiment calculates a phase fluctuation spectrum obtained by averaging the time-frequency spectra in the same area for each of the plurality of object areas using the time-frequency spectrum obtained by the conversion unit 11E. do.

ここで、本実施形態に係る特定部１１Ｇは、算出部１１Ｆによって算出された位相変動スペクトルにおいて、上記複数の物体領域に共通するピーク周波数を含む所定周波数範囲を、撮影装置２０の振動成分であるとして特定する。また、本実施形態に係る導出部１１Ｄは、上記位相画像がラッピングされた位相である場合、当該位相画像の各画素の位相に対してアンラップ処理を行った後に位相変動信号を導出する。 Here, in the phase variation spectrum calculated by the calculation unit 11F, the identification unit 11G according to the present embodiment identifies a predetermined frequency range including a peak frequency common to the plurality of object regions as a vibration component of the imaging device 20. Specify as. Furthermore, when the phase image has a wrapped phase, the derivation unit 11D according to the present embodiment performs an unwrapping process on the phase of each pixel of the phase image, and then derives a phase fluctuation signal.

次に、図３を参照して、本実施形態に係る動画像データベース１３Ｂについて説明する。図３に示すように、本実施形態に係る動画像データベース１３Ｂは、予め割り振られた動画像ＩＤ（Identification）毎に、撮影装置２０による動画像の撮影によって得られた動画像情報が記憶されている。このように、本実施形態では、動画像情報を事前に撮影装置２０から取り込んで動画像データベース１３Ｂに登録しているが、これに限定されるものではない。例えば、撮影装置２０による撮影を常時実施し、所定レベル以上の振動が発生した際に撮影装置２０から得られる動画像情報をオンラインで、リアルタイム又は非リアルタイムで用いる形態等としてもよい。 Next, with reference to FIG. 3, the moving image database 13B according to this embodiment will be described. As shown in FIG. 3, the moving image database 13B according to the present embodiment stores moving image information obtained by photographing a moving image with the photographing device 20 for each moving image ID (Identification) assigned in advance. There is. In this manner, in this embodiment, moving image information is captured in advance from the photographing device 20 and registered in the moving image database 13B, but the present invention is not limited to this. For example, a configuration may be adopted in which the image capturing device 20 constantly performs image capturing, and the moving image information obtained from the image capturing device 20 when a vibration of a predetermined level or higher occurs is used online, in real time or non-real time.

一方、本実施形態に係る複素空間フィルタデータベース１３Ｃは、予め定められた複素空間フィルタ（本実施形態では、複素ガボールフィルタ（Gabor Filter））を示す情報が登録されている。但し、複素空間フィルタは複素ガボールフィルタに限定されるものではなく、空間位相特性が９０度だけ異なり、空間振幅特性が等しい空間フィルタ（実部フィルタ、虚部フィルタ）を組とした複素空間フィルタであれば、他のフィルタを複素空間フィルタとして適用してもよい。 On the other hand, in the complex space filter database 13C according to this embodiment, information indicating a predetermined complex space filter (in this embodiment, a complex Gabor filter) is registered. However, the complex spatial filter is not limited to the complex Gabor filter, but is a complex spatial filter that is a set of spatial filters (real part filter, imaginary part filter) whose spatial phase characteristics differ by 90 degrees and whose spatial amplitude characteristics are equal. If available, other filters may be applied as complex space filters.

次に、図４～図１０を参照して、本実施形態に係る画像処理装置１０の作用を説明する。ユーザによって振動成分特定プログラム１３Ａの実行を開始する指示入力が入力部１４を介して行われた場合に、画像処理装置１０のＣＰＵ１１が当該振動成分特定プログラム１３Ａを実行することにより、図４に示す振動成分特定処理が実行される。なお、ここでは、錯綜を回避するために、動画像データベース１３Ｂ及び複素空間フィルタデータベース１３Ｃが構築済みであり、処理対象とする動画像情報がユーザによって指定されている場合について説明する。 Next, the operation of the image processing device 10 according to this embodiment will be explained with reference to FIGS. 4 to 10. When the user inputs an instruction to start executing the vibration component identification program 13A through the input unit 14, the CPU 11 of the image processing device 10 executes the vibration component identification program 13A, thereby executing the program shown in FIG. Vibration component identification processing is executed. Here, in order to avoid confusion, a case will be described in which the moving image database 13B and the complex spatial filter database 13C have been constructed, and the moving image information to be processed is specified by the user.

図４のステップ２００で、取得部１１Ａは、ユーザによって指定された動画像情報（以下、「処理対象動画像情報」という。）を動画像データベース１３Ｂから読み出すことにより取得する。 At step 200 in FIG. 4, the acquisition unit 11A acquires moving image information specified by the user (hereinafter referred to as "processing target moving image information") by reading it from the moving image database 13B.

ステップ２０２で、生成部１１Ｃは、複素空間フィルタデータベース１３Ｃから複素空間フィルタを示す情報を読み出し、処理対象動画像情報に対して当該複素空間フィルタ（本実施形態では、複素ガボールフィルタ）による複素空間フィルタリング処理を行って位相画像を生成する。 In step 202, the generation unit 11C reads information indicating a complex space filter from the complex space filter database 13C, and performs complex space filtering using the complex space filter (in this embodiment, a complex Gabor filter) on the processing target video information. Processing is performed to generate a phase image.

即ち、まず、生成部１１Ｃは、読み出した複素空間フィルタを用いた複素空間フィルタリング処理を行うことで、処理対象動画像情報により示される動画像の各フレーム画像から、実部画像Ｉ_ｒｅと虚部画像Ｉ_ｉｍを算出する。次いで、生成部１１Ｃは、次の式（１）による演算を画素毎に行うことにより、位相画像Ｉ_θを算出する。 That is, first, the generation unit 11C performs a complex space filtering process using the read complex space filter to generate the real part image _Ire and the imaginary part from each frame image of the moving image indicated by the processing target moving image information. Calculate the image I _im . Next, the generation unit 11C calculates the phase image I _θ by performing calculation according to the following equation (1) for each pixel.

Ｉ_θ＝ｔａｎ^－１（Ｉ_ｉｍ／Ｉ_ｒｅ）（１） I _θ = tan ⁻¹ (I _im /I _re ) (1)

以上の処理を処理対象動画像情報により示される動画像の全フレーム画像に実行する。複素空間フィルタリング処理は、空間領域上での畳み込みカーネルのコンボリューションによる方法と、空間周波数領域上でのフィルタ積による方法との何れの方法を適用してもよい。 The above processing is performed on all frame images of the moving image indicated by the processing target moving image information. For the complex spatial filtering process, either a method using convolution kernel convolution on the spatial domain or a method using filter product on the spatial frequency domain may be applied.

例えば、処理対象動画像情報により示される動画像のうちの１枚の画像が、一例として図５に示す画像である場合、上述した複素空間フィルタリング処理によって得られる水平成分の位相画像が図６Ａに示すものとなり、垂直成分の位相画像が図６Ｂに示すものとなる。なお、図５に示す画像は、便宜上、建物を被写体としたものではなく、本発明の発明者らが制作した構造物を被写体として撮影したものを適用している。 For example, if one of the moving images indicated by the processing target moving image information is the image shown in FIG. The phase image of the vertical component is as shown in FIG. 6B. Note that, for convenience, the image shown in FIG. 5 is a photograph of a structure created by the inventors of the present invention, rather than a building as a subject.

ステップ２０４で、抽出部１１Ｂは、導出した位相画像Ｉ_θにおける撮影装置２０の振動成分の検出対象とする領域（以下、「処理対象領域」という。）を決定する。本実施形態では、処理対象領域の決定方法として、Ｓ／Ｎ比が所定レベル以上である画素群を処理対象領域とする方法を適用している。ここで、Ｓ／Ｎ比が所定レベル以上である画素群の一例としては、取得部１１Ａによって取得した段階の処理対象動画像情報が示す動画像において、フレーム画像の空間１次微分フィルタ（例えば、Ｓｏｂｅｌフィルタ）、空間２次微分フィルタ（例えば、ラプラシアンフィルタ）、あるいはエッジ画像を検出するエッジ検出処理を施した出力画像が相当する。スパイク的なノイズ成分を除去して、領域の塊を確保する必要があれば、メディアンフィルタや、膨張処理及び収縮処理を併用する。抽出部１１Ｂは、最終的に閾値処理により二値化することで処理対象領域を決定する。 In step 204, the extraction unit 11B determines a region (hereinafter referred to as a "processing target region") in which the vibration component of the photographing device 20 is to be detected in the derived phase image _Iθ . In this embodiment, as a method for determining the processing target area, a method is applied in which a group of pixels having an S/N ratio of a predetermined level or higher is set as the processing target area. Here, as an example of a pixel group whose S/N ratio is a predetermined level or higher, in a moving image indicated by the processing target moving image information at the stage acquired by the acquisition unit 11A, a spatial first-order differential filter of a frame image (for example, This corresponds to an output image that has been subjected to an edge detection process that detects an edge image. If it is necessary to remove spike-like noise components and secure a cluster of regions, a median filter, dilation processing, and contraction processing are used in combination. The extraction unit 11B finally determines the processing target area by performing binarization using threshold processing.

なお、処理対象領域の決定方法は以上の方法に限定されるものではなく、例えば、ユーザによって予め指定された注目領域を処理対象領域として決定する形態としてもよい。 Note that the method for determining the region to be processed is not limited to the above method, and for example, a region of interest designated in advance by the user may be determined as the region to be processed.

ステップ２０６で、抽出部１１Ｂは、ステップ２０４の処理によって得られた処理対象領域に対応する二値化画像に対してラベリング処理を行い、空間的に連続する画素群を１つの塊として切り分けることで、処理対象領域を複数の部分画素群に分割する。この処理によって得られた複数の部分画素群が、上述した複数の物体領域に相当し、以下では当該部分画素群を物体領域という。 In step 206, the extraction unit 11B performs a labeling process on the binarized image corresponding to the processing target area obtained by the process in step 204, and separates a group of spatially continuous pixels into one block. , divides the processing target area into a plurality of partial pixel groups. The plurality of partial pixel groups obtained through this process correspond to the plurality of object regions described above, and hereinafter, the partial pixel groups will be referred to as object regions.

図７Ｂには、処理対象とする画像が図５に示した画像の一部の画像である図７Ａに示すものである場合における、ステップ２０６の処理によって得られた物体領域の一例が示されている。なお、図７Ｂでは、物体領域毎に異なる濃度で示しており、図７Ｂに示す例では、６つの物体領域が抽出されている。 FIG. 7B shows an example of an object area obtained by the process in step 206 when the image to be processed is shown in FIG. 7A, which is a part of the image shown in FIG. There is. Note that in FIG. 7B, different densities are shown for each object region, and in the example shown in FIG. 7B, six object regions are extracted.

ステップ２０８で、導出部１１Ｄは、ステップ２０２の処理によって得られた位相画像Ｉ_θに対して位相アンラップ処理を行う。即ち、位相情報は、一般的には－π～＋πの範囲で折り返される形でラッピングされている（即ち、例えばπ＋π／４→－π／４となる。）。そこで、本実施形態では、位相アンラップ処理（位相接続処理）を行う。位相アンラップ処理としては、例えば、インターネット（URL:https://www.researchgate.net/publication/265151826）、（URL:http://retrofocus28.blogspot.com/2013/12/phase-unwrapping_26.html）、（URL:https://jp.mathworks.com/help/dsp/ref/unwrap.html#f5-1119858）等に記載の既知のアルゴリズムを適用することができる。なお、導出した位相画像Ｉ_θがラッピングされていないものであれば、本ステップ２０８の処理は実行する必要がないことは言うまでもない。 In step 208, the derivation unit 11D performs phase unwrapping processing on the phase image _Iθ obtained by the processing in step 202. That is, the phase information is generally wrapped in the range of -π to +π (ie, for example, π+π/4→-π/4). Therefore, in this embodiment, phase unwrap processing (phase connection processing) is performed. Examples of phase unwrapping processing include the Internet (URL: https://www.researchgate.net/publication/265151826), (URL: http://retrofocus28.blogspot.com/2013/12/phase-unwrapping_26.html) , (URL: https://jp.mathworks.com/help/dsp/ref/unwrap.html#f5-1119858) etc. can be applied. It goes without saying that if the derived phase image I _θ is not wrapped, there is no need to perform the process of step 208.

ステップ２１０で、導出部１１Ｄは、以上の処理を経て得られた位相画像Ｉ_θに対して、ステップ２０６の処理によって得られた物体領域のうちの何れか１つの物体領域（以下、「処理対象物体領域」という。）の各画素における、位相画像Ｉ_θそのままで時間方向に切り取った時系列データである、上述した位相変動信号を算出する。 In step 210, the derivation unit _11D selects one of the object regions (hereinafter referred to as "processing target The above-mentioned phase fluctuation signal, which is time-series data obtained by cutting the phase image _Iθ as it is in the time direction at each pixel of the object region (referred to as "object region"), is calculated.

図８には、処理対象とする画像が図７Ａに示すものである場合における、ステップ２１０の処理によって得られた、位相アンラップ処理による位相接続の前後における位相変動信号の一例が示されている。なお、図８に示す例では、位相接続前の位相変動信号を破線で示し、位相接続後の位相変動信号を実線で示している。 FIG. 8 shows an example of a phase fluctuation signal obtained by the process of step 210 before and after the phase connection by the phase unwrap process when the image to be processed is the one shown in FIG. 7A. In the example shown in FIG. 8, the phase fluctuation signal before phase connection is shown by a broken line, and the phase fluctuation signal after phase connection is shown by a solid line.

図８に示すように、位相接続により、＋１８０度を超えて－１８０度に折り返された信号が＋１８０度を超えて連続的に表される。但し、図８の例で用いた位相アンラップ処理は、位相が２回転以上することを想定していないため、図８における横軸の値が１２０フレーム付近で＋３６０度を超えた信号の折り返しが残ったままとなっている。これは、アンラップ後の信号に位相アンラップ処理を再度施すことで解消される。しかし、本発明はサブピクセル級の微弱な振動を対象としているが、位相ラッピングが生じる場合はサブピクセルを超えるような大きな動きを生じていると解釈することも可能なため、ラッピングが生じた画素群または画像領域は処理対象領域から除外することを検知する目的で位相アンラップ処理を利用することも可能である。 As shown in FIG. 8, due to the phase connection, a signal that has been folded over +180 degrees to -180 degrees is continuously expressed over +180 degrees. However, since the phase unwrapping process used in the example of Figure 8 does not assume that the phase rotates more than two times, the folding of the signal where the value of the horizontal axis in Figure 8 exceeds +360 degrees around the 120th frame remains. It's still there. This problem can be resolved by subjecting the unwrapped signal to the phase unwrapping process again. However, although the present invention targets weak vibrations at the sub-pixel level, if phase wrapping occurs, it can be interpreted as a large movement that exceeds the sub-pixel level. It is also possible to use phase unwrap processing for the purpose of detecting that a group or image region is to be excluded from the processing target region.

ステップ２１２で、変換部１１Ｅは、ステップ２１０の処理によって得られた処理対象物体領域の位相変動信号を周波数解析により、時間周波数領域の信号である時間周波数スペクトルに変換する。 In step 212, the conversion unit 11E converts the phase fluctuation signal of the object region to be processed obtained by the processing in step 210 into a time-frequency spectrum, which is a signal in the time-frequency domain, by frequency analysis.

図１０には、図９に示す空間波長８画素で、かつ、１００画素×１００画素の正弦波画像を、撮影装置の振動成分に見立てた振幅０．５画素、時間周波数２Ｈｚで振動させた動画像における、ステップ２１２の処理によって得られた時間周波数スペクトルの一例が示されている。なお、図１０に示す例では、複素空間フィルタとして、空間波長λ＝８画素にピークを持つガウス関数型のバンドパスフィルタを用いている。 FIG. 10 shows a video in which the sine wave image of 100 pixels x 100 pixels with a spatial wavelength of 8 pixels shown in FIG. An example of a time-frequency spectrum obtained by the processing of step 212 in the image is shown. In the example shown in FIG. 10, a Gaussian function type bandpass filter having a peak at the spatial wavelength λ=8 pixels is used as the complex spatial filter.

ステップ２１４で、算出部１１Ｆは、ステップ２１２の処理によって得られた処理対象物体領域の時間周波数スペクトルに対して、同一領域内の時間周波数スペクトルを平均化した時間周波数スペクトルを、上述した位相変動スペクトルとして算出する。 In step 214, the calculation unit 11F calculates, for the time-frequency spectrum of the object region to be processed obtained by the processing in step 212, a time-frequency spectrum obtained by averaging the time-frequency spectra in the same region, as the above-mentioned phase fluctuation spectrum. Calculated as

ステップ２１６で、算出部１１Ｆは、以上のステップ２１０～ステップ２１４の処理が全ての物体領域について終了したか否かを判定し、否定判定となった場合はステップ２１０に戻る一方、肯定判定となった時点でステップ２１８に移行する。なお、ステップ２１０～ステップ２１６の処理を繰り返し実行する場合に、ＣＰＵ１１は、それまでに処理対象としなかった物体領域を処理対象物体領域とする。 In step 216, the calculation unit 11F determines whether or not the processing in steps 210 to 214 has been completed for all object regions, and if the determination is negative, the process returns to step 210, while if the determination is affirmative. At that point, the process moves to step 218. Note that when repeatedly executing the processing from step 210 to step 216, the CPU 11 sets an object region that has not been targeted for processing as an object region for processing.

ステップ２１８で、特定部１１Ｇは、全ての物体領域の位相変動スペクトルにおいて、各物体領域に共通するピーク周波数を、撮影装置２０の振動成分によるピークであると推定し、当該ピーク周波数を含む所定周波数範囲を撮影装置２０の振動成分であるとして特定し、特定した振動成分を示す情報を記憶部１３の所定領域に記憶した後に本振動成分特定処理を終了する。 In step 218, the identifying unit 11G estimates that the peak frequency common to each object region in the phase variation spectra of all object regions is a peak due to the vibration component of the imaging device 20, and specifies a predetermined frequency that includes the peak frequency. After specifying the range as a vibration component of the imaging device 20 and storing information indicating the specified vibration component in a predetermined area of the storage unit 13, the main vibration component identification process ends.

次に、本実施形態に係る画像処理装置１０による撮影装置の振動成分の特定に関する検証実験について説明する。 Next, a verification experiment regarding identification of the vibration component of the photographing device by the image processing device 10 according to the present embodiment will be described.

図１１には、一例として図５に示す動画像を処理対象として適用した場合の画像処理装置１０による解析結果の一例が示されている。なお、図１１に示す例は、被写体が１３Ｈｚで振動しており、撮影装置２０が８Ｈｚで振動している場合で、かつ、物体領域として４箇所の領域を任意に抽出した場合について示している。 FIG. 11 shows an example of an analysis result by the image processing apparatus 10 when the moving image shown in FIG. 5 is applied as a processing target. Note that the example shown in FIG. 11 shows a case where the subject is vibrating at 13 Hz, the photographing device 20 is vibrating at 8 Hz, and four areas are arbitrarily extracted as object areas. .

図１１に示すように、複数の物体領域に跨がってピークが検出された８Ｈｚが撮影装置２０の振動成分であることが特定できることが確認された。 As shown in FIG. 11, it was confirmed that 8 Hz, whose peak was detected across multiple object regions, could be identified as the vibration component of the imaging device 20.

以上説明したように、本実施形態によれば、複数の物体が被写体として含まれ、かつ、撮影装置による撮影によって得られた動画像を取得する取得部１１Ａと、取得部１１Ａによって取得された動画像における、各々前記複数の物体の何れかの領域である複数の物体領域を抽出する抽出部１１Ｂと、取得部１１Ａによって取得された動画像における、抽出部１１Ｂによって抽出された前記複数の物体領域の各々に対する振動解析を行い、前記複数の物体領域の各々の間で共通となる振動成分を前記撮影装置の振動成分であるとして特定する特定部１１Ｇと、を備えている。従って、動画像から撮影装置の微細な振動成分を精度良く検出することができる。 As described above, according to the present embodiment, the acquisition unit 11A acquires a moving image in which a plurality of objects are included as subjects and is obtained by photographing with a photographing device, and the moving image acquired by the acquisition unit 11A. an extraction unit 11B that extracts a plurality of object regions, each of which is one of the plurality of objects, in the image; and the plurality of object regions extracted by the extraction unit 11B in the moving image acquired by the acquisition unit 11A. and a specifying unit 11G that performs vibration analysis on each of the plurality of object regions and specifies a vibration component that is common among each of the plurality of object regions as a vibration component of the photographing device. Therefore, minute vibration components of the photographing device can be detected with high accuracy from the moving image.

また、本実施形態によれば、動画像に含まれるＳ／Ｎ比が所定レベル以上である領域を検出し、検出した領域における空間的に連続する部分画素群の各領域を上記複数の物体領域として抽出している。従って、より簡易に当該複数の物体領域を高い信頼性で抽出することができる。 Further, according to the present embodiment, an area in which the S/N ratio included in a moving image is equal to or higher than a predetermined level is detected, and each area of a spatially continuous partial pixel group in the detected area is divided into the plurality of object areas. It is extracted as Therefore, the plurality of object regions can be extracted more easily and with high reliability.

また、本実施形態によれば、動画像に対して複素空間フィルタリング処理を行うことにより位相画像を生成し、上記複数の物体領域について、生成した位相画像の上記動画像における各フレーム画像間の変動を示す信号である位相変動信号を導出し、導出した位相変動信号を、周波数解析によって時間周波数スペクトルに変換し、当該時間周波数スペクトルを用いて、同一領域内の時間周波数スペクトルを平均化した位相変動スペクトルを上記複数の物体領域の各々について算出し、算出した位相変動スペクトルにおいて、上記複数の物体領域に共通するピーク周波数を含む所定周波数範囲を、上記撮影装置の振動成分であるとして特定している。従って、より高精度に撮影装置の振動成分を特定することができる。 Further, according to the present embodiment, a phase image is generated by performing complex space filtering processing on a moving image, and with respect to the plurality of object regions, variation between each frame image of the generated phase image in the moving image is A phase fluctuation signal that is a signal indicating A spectrum is calculated for each of the plurality of object regions, and in the calculated phase variation spectrum, a predetermined frequency range including a peak frequency common to the plurality of object regions is identified as a vibration component of the imaging device. . Therefore, the vibration component of the photographing device can be specified with higher accuracy.

さらに、本実施形態によれば、上記位相画像がラッピングされた位相である場合、当該位相画像の各画素の位相に対してアンラップ処理を行った後に上記位相変動信号を導出している。従って、より高精度に当該位相変動信号を導出することができる。 Further, according to the present embodiment, when the phase image has a wrapped phase, the phase fluctuation signal is derived after unwrapping the phase of each pixel of the phase image. Therefore, the phase fluctuation signal can be derived with higher accuracy.

なお、上記実施形態において、例えば、取得部１１Ａ、抽出部１１Ｂ、生成部１１Ｃ、導出部１１Ｄ、変換部１１Ｅ、算出部１１Ｆ及び特定部１１Ｇの各処理を実行する処理部（processing unit）のハードウェア的な構造としては、次に示す各種のプロセッサ（processor）を用いることができる。上記各種のプロセッサには、前述したように、ソフトウェア（プログラム）を実行して処理部として機能する汎用的なプロセッサであるＣＰＵに加えて、ＦＰＧＡ（Field-Programmable Gate Array）等の製造後に回路構成を変更可能なプロセッサであるプログラマブルロジックデバイス（Programmable Logic Device：PLD）、ＡＳＩＣ（Application Specific Integrated Circuit）等の特定の処理を実行させるために専用に設計された回路構成を有するプロセッサである専用電気回路等が含まれる。 In the above embodiment, for example, the hardware of the processing unit that executes each process of the acquisition unit 11A, extraction unit 11B, generation unit 11C, derivation unit 11D, conversion unit 11E, calculation unit 11F, and identification unit 11G As the hardware structure, the following various processors can be used. As mentioned above, the various processors mentioned above include the CPU, which is a general-purpose processor that executes software (programs) and functions as a processing unit, as well as circuit configurations such as FPGA (Field-Programmable Gate Array) after manufacturing. A programmable logic device (PLD), which is a processor that can be changed, and a dedicated electric circuit, which is a processor that has a circuit configuration specifically designed to execute a specific process, such as an ASIC (Application Specific Integrated Circuit) etc. are included.

処理部は、これらの各種のプロセッサのうちの１つで構成されてもよいし、同種又は異種の２つ以上のプロセッサの組み合わせ（例えば、複数のＦＰＧＡの組み合わせや、ＣＰＵとＦＰＧＡとの組み合わせ）で構成されてもよい。また、処理部を１つのプロセッサで構成してもよい。 The processing unit may be configured with one of these various processors, or a combination of two or more processors of the same or different types (for example, a combination of multiple FPGAs or a combination of a CPU and an FPGA). It may be composed of. Further, the processing section may be configured with one processor.

処理部を１つのプロセッサで構成する例としては、第１に、クライアント及びサーバ等のコンピュータに代表されるように、１つ以上のＣＰＵとソフトウェアの組み合わせで１つのプロセッサを構成し、このプロセッサが処理部として機能する形態がある。第２に、システムオンチップ（System On Chip：SoC）等に代表されるように、処理部を含むシステム全体の機能を１つのＩＣ（Integrated Circuit）チップで実現するプロセッサを使用する形態がある。このように、処理部は、ハードウェア的な構造として、上記各種のプロセッサの１つ以上を用いて構成される。 As an example of configuring the processing unit with one processor, first, as typified by computers such as clients and servers, one processor is configured with a combination of one or more CPUs and software, and this processor is There is a form that functions as a processing section. Second, there is a form of using a processor, such as a system on chip (SoC), in which the functions of the entire system including a processing section are realized by one IC (Integrated Circuit) chip. In this way, the processing section is configured as a hardware structure using one or more of the various processors described above.

更に、これらの各種のプロセッサのハードウェア的な構造としては、より具体的には、半導体素子などの回路素子を組み合わせた電気回路（circuitry）を用いることができる。 Furthermore, as the hardware structure of these various processors, more specifically, an electric circuit (circuitry) that is a combination of circuit elements such as semiconductor elements can be used.

１０画像処理装置
１１ＣＰＵ
１１Ａ取得部
１１Ｂ抽出部
１１Ｃ生成部
１１Ｄ導出部
１１Ｅ変換部
１１Ｆ算出部
１１Ｇ特定部
１２メモリ
１３記憶部
１３Ａ振動成分特定プログラム
１３Ｂ動画像データベース
１３Ｃ複素空間フィルタデータベース
１４入力部
１５表示部
１６媒体読み書き装置
１７記録媒体
１８通信Ｉ／Ｆ部
２０撮影装置 10 Image processing device 11 CPU
11A Acquisition unit 11B Extraction unit 11C Generation unit 11D Derivation unit 11E Conversion unit 11F Calculation unit 11G Specification unit 12 Memory 13 Storage unit 13A Vibration component identification program 13B Moving image database 13C Complex space filter database 14 Input unit 15 Display unit 16 Media reading/writing device 17 Recording medium 18 Communication I/F unit 20 Photographing device

Claims

an acquisition unit that acquires a moving image that includes a plurality of objects as subjects and that is obtained by photographing with a photographing device;
an extraction unit that extracts a plurality of object regions, each of which is one of the plurality of objects, in the moving image acquired by the acquisition unit;
A vibration analysis is performed on each of the plurality of object regions extracted by the extraction section in the moving image obtained by the acquisition section, and a vibration component that is common between each of the plurality of object regions is detected by the imaging device. a specific part that is identified as being a vibration component of
An image processing device equipped with

The extraction unit detects a region in the video image where the S/N ratio is equal to or higher than a predetermined level, and extracts each region of a spatially continuous partial pixel group in the detected region as the plurality of object regions. ,
The image processing device according to claim 1.

a generation unit that generates a phase image by performing complex space filtering processing on the moving image;
a derivation unit that derives a phase fluctuation signal that is a signal indicating a variation between each frame image in the moving image of the phase image generated by the generation unit for the plurality of object regions extracted by the extraction unit;
a conversion unit that converts the phase fluctuation signal derived by the derivation unit into a time-frequency spectrum by frequency analysis;
a calculation unit that calculates, for each of the plurality of object regions, a phase variation spectrum obtained by averaging the time-frequency spectra in the same region using the time-frequency spectrum obtained by the conversion unit;
further comprising;
The identifying unit identifies a predetermined frequency range including a peak frequency common to the plurality of object regions as a vibration component of the imaging device in the phase fluctuation spectrum calculated by the calculating unit.
The image processing device according to claim 1 or claim 2.

When the phase image is a wrapped phase, the derivation unit derives the phase fluctuation signal after performing an unwrapping process on the phase of each pixel of the phase image.
The image processing device according to claim 3.

Obtaining a moving image containing multiple objects as subjects and obtained by shooting with a shooting device,
extracting a plurality of object regions, each of which is one of the plurality of objects, in the acquired video image;
performing a vibration analysis on each of the plurality of extracted object regions in the acquired moving image, and identifying a vibration component that is common among each of the plurality of object regions as a vibration component of the photographing device;
An image processing program that allows a computer to perform processing.