JP7246900B2

JP7246900B2 - Image processing device, image processing system, imaging device, image processing method, program, and storage medium

Info

Publication number: JP7246900B2
Application number: JP2018219876A
Authority: JP
Inventors: 良範木村
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2018-11-26
Filing date: 2018-11-26
Publication date: 2023-03-28
Anticipated expiration: 2038-11-26
Also published as: US11488279B2; JP2020086891A; US20200167885A1

Description

本発明は、畳み込みニューラルネット（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ：ＣＮＮ）を用いて、高精度に擾乱の強度を測定する画像処理装置に関する。 The present invention relates to an image processing apparatus that uses a convolutional neural network (CNN) to measure the intensity of disturbance with high accuracy.

従来、擾乱（大気ゆらぎ）による画像劣化を、画像処理により回復する方法が提案されている。非特許文献１には、擾乱による動画の各フレームの位置ずれを補正し、各フレームの場所ごとに異なるぼけを補正した後、ブラインドデコンボリューションでぼけを除去することで、擾乱による画像劣化を回復する方法が開示されている。 Conventionally, there has been proposed a method of recovering image deterioration caused by disturbance (atmospheric fluctuation) by image processing. In Non-Patent Document 1, image degradation caused by disturbance is recovered by correcting the positional deviation of each frame of a moving image due to disturbance, correcting blurring that differs depending on the location of each frame, and then removing the blurring by blind deconvolution. A method for doing so is disclosed.

ＸｉａｎｇＺｈｕ、ＰｅｙｍａｎＭｉｌａｎｆａｒ、「Ｒｅｍｏｖｉｎｇａｔｍｏｓｐｈｅｒｉｃｔｕｒｂｕｌｅｎｃｅｖｉａｓｐａｃｅ－ｉｎｖａｒｉａｎｔｄｅｃｏｎｖｏｌｕｔｉｏｎ」、ＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＰａｔｔｅｒｎＡｎａｌｙｓｉｓａｎｄＭａｃｈｉｎｅＩｎｔｅｌｌｉｇｅｎｃｅ、ｖｏｌ．３５、ｎｏ．１、２０１６、ｐｐ．１５７－１７０Xiang Zhu, Peyman Milanfar, "Removing atmospheric turbulence via space-invariant deconvolution", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, 2016, pp. 157-170 Ｘｉａ－ＪｉａｏＭａｏ、ＣｈｕｎｈｕａＳｈｅｎ、Ｙｕ－ＢｉｎＹａｎｇ、「Ｉｍａｇｅｒｅｓｔｏｒａｔｉｏｎｕｓｉｎｇｃｏｎｖｏｌｕｔｉｏｎａｌａｕｔｏ－ｅｎｃｏｄｅｒｓｗｉｔｈｓｙｍｍｅｔｒｉｃｓｋｉｐｃｏｎｎｅｃｔｉｏｎｓ」、ａｒＸｉｖ：１６０６．０８９２１、２０１６Xia-Jiao Mao, Chunhua Shen, Yu-Bin Yang, "Image restoration using convolutional auto-encoders with symmetric skip connections", arXiv:1606.08921, 2016 ＸａｖｉｅｒＧｌｏｒｏｔ、ＹｏｓｈｕａＢｅｎｇｉｏ、「Ｕｎｄｅｒｓｔａｎｄｉｎｇｔｈｅｄｉｆｆｉｃｕｌｔｙｏｆｔｒａｉｎｉｎｇｄｅｅｐｆｅｅｄｆｏｒｗａｒｄｎｅｕｒａｌｎｅｔｗｏｒｋｓ」、Ｐｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ１３ｔｈＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅａｎｄＳｔａｔｉｓｔｉｃｓ、２０１０、ｐｐ．２４９－２５６ＸａｖｉｅｒＧｌｏｒｏｔ、ＹｏｓｈｕａＢｅｎｇｉｏ、「Ｕｎｄｅｒｓｔａｎｄｉｎｇｔｈｅｄｉｆｆｉｃｕｌｔｙｏｆｔｒａｉｎｉｎｇｄｅｅｐｆｅｅｄｆｏｒｗａｒｄｎｅｕｒａｌｎｅｔｗｏｒｋｓ」、Ｐｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ１３ｔｈＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅａｎｄＳｔａｔｉｓｔｉｃｓ、２０１０、ｐｐ． 249-256

しかしながら、非特許文献１に開示された方法では、大気ゆらぎを除去することはできるが、大気ゆらぎに起因する画像劣化度合いである、大気ゆらぎの強度を測定することはできない。大気ゆらぎ動画からの、非剛体レジストレーションの変形ベクトルの推定を、擾乱の強度測定と考えることもできるが、動画中に移動物体がある場合には位置ずれ補正が困難であるため、高精度に変形ベクトルを推定することはできない。 However, although the method disclosed in Non-Patent Document 1 can remove atmospheric fluctuations, it is not possible to measure the intensity of atmospheric fluctuations, which is the degree of image deterioration caused by atmospheric fluctuations. Estimation of deformation vectors of non-rigid body registration from atmospheric fluctuation videos can be considered as intensity measurement of disturbances. Deformation vectors cannot be estimated.

そこで本発明は、高精度に擾乱の強度を測定可能な画像処理装置、画像処理システム、撮像装置、画像処理方法、プログラム、および、記憶媒体を提供することを目的とする。 SUMMARY OF THE INVENTION Accordingly, it is an object of the present invention to provide an image processing device, an image processing system, an imaging device, an image processing method, a program, and a storage medium that can measure the intensity of disturbance with high accuracy.

本発明の一側面としての画像処理装置は、擾乱により劣化した時間的に異なる複数の画像を取得する画像取得部と、既知の擾乱の強度に基づいて得られた複数の画像群を用いた学習により生成されたニューラルネットワークのネットワークパラメータを取得するパラメータ取得部と、前記複数の画像の各々から前記複数の画像の平均画像を減算することで規格化された複数の画像を生成し、前記ネットワークパラメータを有する前記ニューラルネットワークを用いて、前記規格化された複数の画像から前記擾乱の強度を測定する測定部とを有する。 An image processing apparatus as one aspect of the present invention includes an image acquisition unit that acquires a plurality of temporally different images degraded by a disturbance, and a learning process using a plurality of image groups obtained based on the known intensity of the disturbance. a parameter acquiring unit for acquiring network parameters of the neural network generated by; and generating a plurality of normalized images by subtracting an average image of the plurality of images from each of the plurality of images, the network parameters and a measurement unit that measures the intensity of the disturbance from the plurality of normalized images using the neural network having

本発明の他の側面としての画像処理システムは、前記画像処理装置と、該画像処理装置にネットワークを介して接続されているクライアント装置とを有する画像処理システムであって、前記クライアント装置は、前記擾乱により劣化した時間的に異なる複数の画像を前記画像処理装置に出力する画像出力部を有し、前記画像処理装置は、さらに前記擾乱の強度を前記クライアント装置へ出力する擾乱強度出力部を有する。 An image processing system as another aspect of the present invention is an image processing system comprising the image processing device and a client device connected to the image processing device via a network, wherein the client device comprises the An image output unit for outputting a plurality of temporally different images degraded by the disturbance to the image processing device, and the image processing device further includes a disturbance intensity output unit for outputting the intensity of the disturbance to the client device. .

本発明の他の側面としての撮像装置は、撮像素子と前記画像処理装置とを有する。 An imaging device as another aspect of the present invention has an imaging element and the image processing device.

本発明の他の側面としての画像処理方法は、擾乱により劣化した時間的に異なる複数の画像を取得する画像取得ステップと、既知の擾乱の強度に基づいて得られた複数の画像群を用いた学習により生成されたニューラルネットワークのネットワークパラメータを取得するパラメータ取得ステップと、前記複数の画像の各々から前記複数の画像の平均画像を減算することで規格化された複数の画像を生成し、前記ネットワークパラメータを有する前記ニューラルネットワークを用いて、前記規格化された複数の画像から前記擾乱の強度を測定する測定ステップとを有する。 An image processing method as another aspect of the present invention uses an image acquisition step of acquiring a plurality of temporally different images degraded by a disturbance, and a plurality of groups of images obtained based on a known intensity of the disturbance. a parameter acquisition step of acquiring network parameters of a neural network generated by learning ; and generating a plurality of normalized images by subtracting an average image of the plurality of images from each of the plurality of images, the network measuring the intensity of the disturbance from the normalized images using the neural network with parameters.

本発明の他の側面としてのプログラムは、前記画像処理方法をコンピュータに実行させる。 A program as another aspect of the present invention causes a computer to execute the image processing method.

本発明の他の側面としての記憶媒体は、前記プログラムを記憶している。 A storage medium as another aspect of the present invention stores the program.

本発明の他の目的及び特徴は、以下の実施形態において説明される。 Other objects and features of the invention are described in the following embodiments.

本発明によれば、高精度に擾乱の強度を測定可能な画像処理装置、画像処理システム、撮像装置、画像処理方法、プログラム、および、記憶媒体を提供することができる。 According to the present invention, it is possible to provide an image processing device, an image processing system, an imaging device, an image processing method, a program, and a storage medium capable of measuring the intensity of disturbance with high accuracy.

本実施形態における画像処理装置のブロック図である。1 is a block diagram of an image processing apparatus in this embodiment; FIG. 本実施形態における画像処理システムの構成図である。1 is a configuration diagram of an image processing system in this embodiment; FIG. 本実施形態における撮像装置の構成図である。1 is a configuration diagram of an imaging device according to this embodiment; FIG. 本実施形態における画像処理システムのブロック図である。1 is a block diagram of an image processing system in this embodiment; FIG. 本実施形態における画像処理方法のフローチャートである。4 is a flow chart of an image processing method according to the present embodiment; 実施例１におけるネットワーク構造を示す図である。1 is a diagram showing a network structure in Example 1; FIG. 実施例１における数値計算結果を示す図である。4 is a diagram showing numerical calculation results in Example 1. FIG. 実施例２におけるネットワーク構造を示す図である。FIG. 10 is a diagram showing a network structure in Example 2; 実施例２における数値計算結果を定性的に示す図である。FIG. 10 is a diagram qualitatively showing numerical calculation results in Example 2; 実施例２における数値計算結果を定量的に示す図である。FIG. 10 is a diagram quantitatively showing numerical calculation results in Example 2;

以下、本発明の実施形態について、図面を参照しながら詳細に説明する。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

まず、擾乱について説明する。撮影画像は、撮像装置と被写体との間に存在する媒質の擾乱（ｔｕｒｂｕｌｅｎｃｅ）により劣化する。例えば、炎天下における撮影や遠方の被写体を撮影する場合、大気のゆらぎ（擾乱）によって撮影画像は劣化する。また、例えば水上から水底の被写体を撮影する場合、水のゆらぎ（擾乱）によって撮影画像は劣化する。 First, the disturbance will be explained. A captured image is degraded by turbulence of a medium existing between an imaging device and a subject. For example, when photographing under the scorching sun or photographing a distant subject, the photographed image deteriorates due to atmospheric fluctuations (turbulence). Further, for example, when photographing an object on the bottom of the water from above, the photographed image deteriorates due to fluctuations (turbulence) in the water.

媒質の擾乱による撮影画像の劣化は、媒質の屈折率が場所や時間ごとに変化することによる。このため、一般に得られる撮像画像は、場所ごとに劣化度合いが異なる。これは、撮像装置と被写体との間に存在する媒質の厚みや温度分布、流れなどが場所ごとに異なり、その結果、媒質の屈折率が場所ごとに異なるためである。同様の理由で、撮像画像の劣化度合いは時間ごとに異なる。 Degradation of captured images due to medium disturbance is caused by changes in the refractive index of the medium depending on location and time. For this reason, generally obtained captured images have different degrees of deterioration depending on the location. This is because the thickness, temperature distribution, flow, etc. of the medium existing between the imaging device and the subject differ from place to place, and as a result, the refractive index of the medium differs from place to place. For the same reason, the degree of deterioration of the captured image differs from time to time.

非特許文献１に開示されている擾乱（大気ゆらぎ）による動画の各フレームの位置ずれ補正（ｒｅｇｉｓｔｒａｔｉｏｎ）は、非剛体（ｎｏｎ－ｒｉｇｉｄ）レジストレーションを用いて行われる。ここで、非剛体レジストレーションとは画像処理分野でよく用いられる、簡易的な大気ゆらぎ（補正）モデルである。簡単には、まず大気ゆらぎで劣化させる元の画像において、粗く画像の制御点（ｃｏｎｔｒｏｌｐｏｉｎｔ）を設定する。次に、各制御点において加える変形量を表したベクトル（ｄｅｆｏｒｍａｔｉｏｎｖｅｃｔｏｒ）を正規乱数でランダムに決定し、制御点を変形させる。簡単には、この変形ベクトルの正規乱数の分散が大きいと、得られる画像の大気ゆらぎによる劣化が大きくなる。 Positional deviation correction (registration) of each frame of a moving image due to disturbance (atmospheric fluctuation) disclosed in Non-Patent Document 1 is performed using non-rigid registration. Here, the non-rigid registration is a simple atmospheric fluctuation (correction) model often used in the field of image processing. Briefly, first, rough image control points are set in the original image to be degraded by atmospheric fluctuations. Next, a vector representing the amount of deformation to be applied at each control point (deformation vector) is randomly determined using normal random numbers to deform the control point. Simply put, when the variance of the normal random numbers of the deformation vectors is large, the deterioration of the resulting image due to atmospheric fluctuations becomes large.

次に、変形させた制御点から、元画像の各画素に加える大気ゆらぎによる変形量を、以下の式（１）に従って決定する。 Next, from the deformed control points, the amount of deformation due to atmospheric fluctuations to be applied to each pixel of the original image is determined according to the following equation (1).

式（１）において、Δｘは元画像の各画素に加える変形量、ｐは制御点に加える変形量（変形ベクトル）、Ａ（ｘ）はｐをΔｘへ変換する行列、ε_ｘ、ε_ｙはそれぞれ制御点のｘ、ｙ方向の間隔である。また、（ｘ_ｃ，ｙ_ｃ）は任意の制御点の座標、（ｘ_ｉ，ｙ_ｉ）は元画像のｉ番目の画素の座標である。式（１）は、簡単にいうと、変形させた制御点に沿って、元画像の各画素に滑らかな変形を加える式である。 In equation (1), Δx is the amount of deformation applied to each pixel of the original image, p is the amount of deformation applied to the control point (deformation vector), A(x) is the matrix for transforming p into Δx, and ε _x and ε _y are These are the distances between the control points in the x and y directions, respectively. Also, (x _c , y _c ) are the coordinates of an arbitrary control point, and (x _i , y _i ) are the coordinates of the i-th pixel of the original image. Formula (1) is simply a formula for adding smooth deformation to each pixel of the original image along the deformed control points.

最後に、変形させた各画素における画素値を、元画像から補間で決定し、大気ゆらぎ画像を生成する。なお、このモデルはＢ－Ｓｐｌｉｎｅとも呼ばれる。また、このモデルは媒質の種類（大気や水）によらず、任意の擾乱による画像劣化へ適用可能である。 Finally, the pixel value of each deformed pixel is determined by interpolation from the original image to generate an atmospheric fluctuation image. This model is also called B-Spline. In addition, this model can be applied to image degradation caused by arbitrary disturbances regardless of the type of medium (air or water).

位置ずれ補正の場合、これとは逆に大気ゆらぎ画像と参照画像(ｒｅｆｅｒｅｎｃｅｉｍａｇｅ)から、反復計算により変形ベクトルを推定し、得られる変形ベクトルから、大気ゆらぎ画像に加わった変形を補正する。ここで、参照画像は大気ゆらぎによる画像劣化がないと見なせる基準画像であり、例えば複数枚の大気ゆらぎ画像の平均をとることで与えられる。なお、この詳細は、非特許文献１に開示されている。 In the case of positional deviation correction, on the contrary, a deformation vector is estimated by iterative calculation from an atmospheric fluctuation image and a reference image, and the deformation applied to the atmospheric fluctuation image is corrected from the obtained deformation vector. Here, the reference image is a reference image that can be regarded as having no image deterioration due to atmospheric fluctuations, and is given by, for example, averaging a plurality of atmospheric fluctuation images. The details are disclosed in Non-Patent Document 1.

大気ゆらぎ動画を作る場合、大気ゆらぎを加える元の動画の各フレームに対して、前述の大気ゆらぎを加える処理を行えばよい。この場合、各フレームに加える大気ゆらぎの間に相関がないため、得られる大気ゆらぎ動画は現実（大気ゆらぎ実動画）とは異なる。しかし、得られる大気ゆらぎ動画は、定性的には大気ゆらぎ実動画と良く似ている。また、非特許文献１に開示された方法により、大気ゆらぎによる画像劣化は、良好に回復することができる。よって、Ｂ－Ｓｐｌｉｎｅに基づく擾乱モデルは、現実と近いと考えられるため、本発明でも後述するＣＮＮの学習のため、擾乱により劣化した訓練動画（訓練画像群）の作成に用いる。 When creating an atmospheric fluctuation moving image, the above-described process of adding atmospheric fluctuation may be performed for each frame of the original moving image to which the atmospheric fluctuation is to be added. In this case, since there is no correlation between the atmospheric fluctuations applied to each frame, the obtained atmospheric fluctuation moving image is different from reality (atmospheric fluctuation real moving image). However, the obtained atmospheric fluctuation animation is qualitatively similar to the real atmospheric fluctuation animation. Also, the method disclosed in Non-Patent Document 1 can satisfactorily recover image deterioration due to atmospheric fluctuations. Therefore, since the B-Spline-based disturbance model is considered to be close to reality, it is also used in the present invention to create a training video (training image group) degraded by disturbance for CNN learning, which will be described later.

各フレームの場所ごとに異なるぼけの補正は、注目領域の一定時間（フレーム）において、一番鮮鋭（画素値の分散が大きい）なものを選び、これを画像全体で行い、得られた領域をつなぎ合わせることでも行われる。この処理は、ラッキーイメージング（ｌｕｃｋｙｉｍａｇｉｎｇ）と呼ばれている。ブラインドデコンボリューションでは、大気ゆらぎによるぼけを表したＰＳＦ（ＰｏｉｎｔＳｐｒｅａｄＦｕｎｃｔｉｏｎ）と、大気ゆらぎを除去した画像の両方を、同時に推定することで行われる。 To correct the blur that differs depending on the location of each frame, select the sharpest one (with the largest pixel value variance) for a certain period of time (frame) in the region of interest, apply this to the entire image, and It is also done by joining. This process is called lucky imaging. Blind deconvolution is performed by simultaneously estimating both a PSF (Point Spread Function) representing blur due to atmospheric fluctuations and an image from which atmospheric fluctuations are removed.

次に、図１を参照して、本実施形態における画像処理装置について説明する。図１は、画像処理装置１００のブロック図である。画像処理装置１００は、画像取得部１０１、パラメータ取得部１０２、測定部１０３、および、補正部１０４を備えて構成されている。画像取得部１０１は、撮像装置により撮影された複数の画像（入力画像、動画）を取得する。撮像装置は、デジタル動画像データを取得可能であり、例えばデジタルビデオカメラやデジタルカメラである。 Next, the image processing apparatus according to this embodiment will be described with reference to FIG. FIG. 1 is a block diagram of an image processing apparatus 100. As shown in FIG. The image processing apparatus 100 includes an image acquisition section 101 , a parameter acquisition section 102 , a measurement section 103 and a correction section 104 . The image acquisition unit 101 acquires a plurality of images (input images, moving images) captured by an imaging device. The imaging device is capable of acquiring digital moving image data, and is, for example, a digital video camera or a digital camera.

動画像のフレームは一般に劣化している。例えば、デジタルカメラの場合、結像光学系（撮像光学系）や光学ローパスフィルタに起因するぼけ、撮像素子に起因するノイズ、デモザイキングのエラー、データ圧縮に起因するノイズなどが劣化要因として挙げられる。これらの動画像劣化過程は、既知であることが望ましい。これは、後述するＣＮＮの学習において大量に必要となる訓練画像群（訓練動画）を、数値計算で生成できるためである。動画像データの形式は、計算機に読み込み可能なデジタルデータであれば限定されるものでなく、例えば、ＡＶＩ（ＡｕｄｉｏＶｉｄｅｏＩｎｔｅｒｌｅａｖｅ）、ＭＰＥＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐ）である。また本実施形態において、動画像はカラーでもモノクロでもよいが、簡単のため、以下の説明ではモノクロ動画像であるとする。 Motion picture frames are generally degraded. For example, in the case of a digital camera, deterioration factors include blur caused by the imaging optical system (imaging optical system) and optical low-pass filter, noise caused by the image sensor, demosaicing errors, and noise caused by data compression. . It is desirable that these moving image deterioration processes are known. This is because a large number of training image groups (training videos) required for learning of the CNN, which will be described later, can be generated by numerical calculation. The format of moving image data is not limited as long as it is computer-readable digital data, and examples thereof include AVI (Audio Video Interleave) and MPEG (Moving Picture Experts Group). Also, in the present embodiment, the moving image may be either color or monochrome, but for the sake of simplicity, the following description assumes a monochrome moving image.

また画像取得部１０１は、複数の入力画像の撮影条件として、撮影に用いられた光学系（撮像光学系）の光学条件（焦点距離や絞り値など）、撮影に用いられた撮像素子の画素ピッチ、または、フレームレートを取得する。これは、後述するＣＮＮの学習条件と撮影条件（入力画像を取得した条件）とを合わせるためである。 In addition, the image acquisition unit 101 uses the optical conditions (focal length, aperture value, etc.) of the optical system (imaging optical system) used for photographing, the pixel pitch of the image sensor used for photographing, as the photographing conditions of the plurality of input images. , or to get the frame rate. This is to match the learning conditions of the CNN, which will be described later, with the shooting conditions (conditions for acquiring the input image).

パラメータ取得部１０２は、学習済みのネットワークパラメータを取得する。なお、ネットワークパラメータとは、後述するＣＮＮのパラメータであるフィルタおよびバイアスを含む。ＣＮＮとは、簡単には、学習したパラメータを用いた演算であり、例えば、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）、ワークステーション、ＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）、または、サーバで構成される。このためパラメータ取得部１０２は、例えばＰＣのＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）などで構成される。またはパラメータ取得部１０２は、ネットワークパラメータを記憶した記憶媒体を、ＣＤ－ＲＯＭドライブや、ＵＳＢインターフェースなどのインターフェース機器を介して取得してもよい。この場合、パラメータ取得部１０２は、インターフェース機器も含む形で構成される。 The parameter acquisition unit 102 acquires learned network parameters. Note that the network parameters include filters and biases, which are CNN parameters described later. A CNN is simply an operation using learned parameters, and is composed of, for example, a PC (Personal Computer), a workstation, an FPGA (Field Programmable Gate Array), or a server. Therefore, the parameter acquisition unit 102 is configured by, for example, an HDD (Hard Disk Drive) of a PC. Alternatively, the parameter acquisition unit 102 may acquire a storage medium storing network parameters via an interface device such as a CD-ROM drive or a USB interface. In this case, the parameter acquisition unit 102 is configured to include an interface device.

学習済みのネットワークパラメータとは、後述する測定部１０３および補正部１０４を構成するＣＮＮのネットワークパラメータを、予め学習により生成したものである。また、画像取得部１０１から提供される入力画像を取得した条件（撮影条件）と、ネットワークパラメータの学習条件とが近いものを選択し、取得するパラメータとしてもよい。ここで学習条件とは、後述するＣＮＮの学習に用いる訓練画像群を、数値計算的に生成（または取得）する際の撮影条件（光学系の光学条件、画素ピッチ、フレームレートなど）である。 A learned network parameter is a CNN network parameter that constitutes a measuring unit 103 and a correcting unit 104, which will be described later, and is generated in advance by learning. Alternatively, a condition (imaging condition) under which the input image provided from the image acquisition unit 101 is acquired and a network parameter learning condition that are close to each other may be selected and used as the parameter to be acquired. Here, the learning conditions are imaging conditions (optical conditions of the optical system, pixel pitch, frame rate, etc.) when numerically generating (or acquiring) a group of training images used for learning of the CNN, which will be described later.

次に、ＣＮＮについて簡単に説明する。ＣＮＮとは、学習（ｔｒａｉｎｉｎｇまたはｌｅａｒｎｉｎｇ）により生成したフィルタを、画像に対して畳み込んだ（ｃｏｎｖｏｌｕｔｉｏｎ）後、非線形演算することを繰り返す学習型の画像処理技術である。画像に対してフィルタを畳み込んだ後、非線形演算して得られる画像は、特徴マップ（ｆｅａｔｕｒｅｍａｐ）と呼ばれる。学習は、入力画像と出力画像とのペアからなる訓練画像群（ｔｒａｉｎｉｎｇｉｍａｇｅｓまたはｄａｔａｓｅｔｓ）を用いて行われる。簡単には、入力画像から対応する出力画像へ、高精度に変換できるフィルタの値を、訓練画像群から生成することが学習である。詳しくは後述する。 Next, CNN will be briefly described. CNN is a learning-type image processing technology that repeats non-linear operations after convolution of a filter generated by training or learning with respect to an image. After convolving the filter with respect to the image, the image obtained by non-linear operation is called a feature map. Learning is performed using training images or data sets consisting of pairs of input and output images. In simple terms, learning is the process of generating, from a set of training images, filter values that can convert an input image into a corresponding output image with high accuracy. Details will be described later.

また、画像がＲＧＢカラーチャンネルを持つ場合や複数の画像から構成されている（動画像）場合、または、特徴マップが複数の画像から構成されている場合、畳み込みに用いるフィルタはそれに応じて複数チャンネルを持つ。すなわち、畳み込みフィルタは、画像の縦横サイズと枚数の他にチャンネル数を加えた４次元配列で表現される。また、画像（または特徴マップ）にフィルタを畳み込んだ後、非線形演算する処理は、層（ｌａｙｅｒ）という単位で表現される。例えば、ｍ層目の特徴マップやｎ層目のフィルタと呼ばれる。例えば、フィルタの畳み込みと非線形演算を３回繰り返すＣＮＮは、３層構造のネットワークと呼ばれる。この処理は、以下の式（２）のように定式化できる。 Also, if the image has RGB color channels or is composed of multiple images (moving image), or if the feature map is composed of multiple images, the filter used for convolution will be multi-channel accordingly. have. That is, the convolution filter is represented by a four-dimensional array in which the number of channels is added to the vertical and horizontal size and the number of images. Further, the process of performing non-linear calculation after convolving an image (or feature map) with a filter is expressed in units of layers. For example, it is called an m-th layer feature map or an n-th layer filter. For example, a CNN that repeats filter convolution and nonlinear operation three times is called a three-layer network. This processing can be formulated as in the following equation (2).

式（２）において、Ｗ_ｎはｎ層目のフィルタ、ｂ_ｎはｎ層目のバイアス、ｆは非線形演算子、Ｘ_ｎはｎ層目の特徴マップ、＊は畳み込み演算子である。なお、右肩の（ｌ）はｌ番目のフィルタまたは特徴マップであることを表している。フィルタおよびバイアスは、後述する学習により生成され、まとめてネットワークパラメータと呼ばれる。また、非線形演算としてシグモイド関数（ｓｉｇｍｏｉｄｆｕｎｃｔｉｏｎ）やＲｅＬＵ（ＲｅｃｔｉｆｉｅｄＬｉｎｅａｒＵｎｉｔ）が良く用いられる。 In equation (2), _Wn is the n-th layer filter, _bn is the n-th layer bias, f is the non-linear operator, _Xn is the n-th layer feature map, and * is the convolution operator. Note that (l) in the right shoulder represents the l-th filter or feature map. Filters and biases are generated by learning, which will be described later, and are collectively called network parameters. A sigmoid function and ReLU (Rectified Linear Unit) are often used as nonlinear operations.

次に、ＣＮＮの学習について説明する。ＣＮＮの学習は、入力訓練画像（例えば劣化画像）と対応する出力訓練画像（例えば鮮鋭な正解画像）との組からなる訓練画像（訓練画像群）に対して、一般に以下の式（３）で表される目的関数を最小化することで行われる。 Next, learning of CNN will be described. CNN learning is generally performed by the following equation (3) for training images (training image group) consisting of a set of input training images (e.g. degraded images) and corresponding output training images (e.g. sharp correct images). It is done by minimizing the objective function represented.

式（３）において、Ｌは正解とその推定との誤差を測る損失関数（ｌｏｓｓｆｕｎｃｔｉｏｎ）である。Ｙ_ｉはｉ番目の出力訓練画像、Ｘ_ｉはｉ番目の入力訓練画像である。Ｆは、ＣＮＮの各層で行う演算（数式２）を、まとめて表した関数である。θは、ネットワークパラメータ（フィルタおよびバイアス）である。また、∥Ｚ∥_２はＬ２ノルムであり、簡単にはベクトルＺの要素の２乗和の平方根である。 In equation (3), L is a loss function that measures the error between the correct answer and its estimate. Y _i is the i th output training image and X _i is the i th input training image. F is a function collectively representing the operations (Formula 2) performed in each layer of the CNN. θ is the network parameters (filter and bias). Also, |Z|| ₂ is the L2 norm, which is simply the square root of the sum of the squares of the elements of vector Z.

なお、訓練画像には既知の対応関係を持った入出力画像が用いられる。例えば、鮮鋭な出力画像とそれに対して光学系に起因するぼけを付加し劣化した入力画像などである。また、ＣＮＮの出力が画像でなくスカラー（値）である場合も、同様に損失関数を定義し、ネットワークパラメータを決定すればよい。その場合、訓練画像は、入力画像と対応する出力値となる。スカラーを出力するＣＮＮは、全結合（ｆｕｌｌｙｃｏｎｎｅｃｔｅｄ）ニューラルネットワークと呼ばれる特殊なものであり、詳細は後述する。 Input/output images having a known correspondence relationship are used as training images. For example, there is a sharp output image and an input image that is degraded by adding blur due to an optical system. Also, when the output of the CNN is not an image but a scalar (value), the loss function should be similarly defined and the network parameters determined. In that case, the training images are the input images and the corresponding output values. A CNN that outputs a scalar is a special type called a fully connected neural network, and details of which will be described later.

式（３）において、ｎは学習に用いる訓練画像の全枚数であるが、一般に訓練画像の全枚数は多い（～数万枚）ため、確率的勾配降下法（ＳｔｏｃｈａｓｔｉｃＧｒａｄｉｅｎｔＤｅｓｃｅｎｔ；ＳＧＤ）では、訓練画像の一部をランダムに選び学習に用いる。これにより、多くの訓練画像を用いた学習における、計算負荷を低減することができる。 In formula (3), n is the total number of training images used for learning, but since the total number of training images is generally large (up to tens of thousands of images), in the stochastic gradient descent method (SGD), A part of the training image is randomly selected and used for learning. This makes it possible to reduce the computational load in learning using many training images.

また、目的関数の最小化（＝最適化）法として、モーメンタム（ｍｏｍｅｎｔｕｍ）法やＡｄａＧｒａｄ法、ＡｄａＤｅｌｔａ法、Ａｄａｍ法など、様々な方法が知られている。ただし、学習における最適化法の選択指針は現状存在しない。よって、基本的に何を用いても良いが、最適化法ごとの収束性には違いがあるため、学習時間の違いが生じることが知られている。 Also, various methods such as the momentum method, the AdaGrad method, the AdaDelta method, and the Adam method are known as methods for minimizing (=optimizing) the objective function. However, there is currently no selection guideline for optimization methods in learning. Therefore, basically any method can be used, but it is known that the learning time differs due to the difference in convergence between optimization methods.

以上の手順で学習したネットワークパラメータとＣＮＮを用いて、例えば劣化した画像から鮮鋭な画像へ、高精度に変換する画像処理が可能になる。この画像処理は、深層学習（ｄｅｅｐｌｅａｒｎｉｎｇ）とも呼ばれる。 Using the network parameters and CNN learned by the above procedure, image processing can be performed to convert, for example, a degraded image into a sharp image with high accuracy. This image processing is also called deep learning.

測定部１０３は、学習済みネットワークパラメータとＣＮＮとを用いて、入力画像（複数の画像）から擾乱の強度を測定して出力する。測定部１０３は、前述したＣＮＮであり、例えば、ＰＣ、ワークステーション、ＦＰＧＡ、サーバで構成されるが、これらに限定されるものではなく、前述したＣＮＮの演算が実現可能な計算機であればよい。擾乱の強度とは、入力画像の擾乱の強度を表すスカラーであり、具体的には入力画像の時間的または空間的な画素値の散布度である。ここで散布度とは、分散、標準偏差を含む統計量である。例えば、擾乱モデルとして前述したＢ－Ｓｐｌｉｎｅを用いた場合、擾乱の強度として、前述した変形ベクトルの分散を用いてもよい。このように、入力画像の擾乱の強度を、時間的または空間的な画素値の散布度というスカラーで表現する点が、本発明の特徴の一つである。 The measuring unit 103 measures and outputs the intensity of the disturbance from the input image (a plurality of images) using the learned network parameters and the CNN. The measurement unit 103 is the above-described CNN, and includes, for example, a PC, a workstation, an FPGA, and a server, but is not limited to these, and may be a computer capable of realizing the above-described CNN calculation. . The intensity of the disturbance is a scalar representing the intensity of the disturbance of the input image, specifically, the temporal or spatial distribution of pixel values of the input image. Here, the degree of dispersion is a statistic including variance and standard deviation. For example, when the B-Spline described above is used as the disturbance model, the variance of the deformation vector described above may be used as the strength of the disturbance. Thus, one of the features of the present invention is that the intensity of the disturbance in the input image is represented by a scalar, that is, the temporal or spatial distribution of pixel values.

前述のように、ＣＮＮの出力は画像である。このため、スカラーである擾乱の強度を出力するには、ＣＮＮの出力である画像をスカラーに変換する全結合ニューラルネットワークを出力部分に追加すればよい。全結合ニューラルネットワークは、以下の式（４）ように定式化できる。 As mentioned above, the output of a CNN is an image. Therefore, in order to output the scalar intensity of the disturbance, a fully-connected neural network that converts the image, which is the output of the CNN, to a scalar should be added to the output part. A fully-connected neural network can be formulated as in Equation (4) below.

式（４）において、Ｘ_ｎはｎ層目の特徴マップを表したベクトル、Ｗ_ｎはX_ｎ－１の各要素に加える重みを表した行列である。これより、ＣＮＮで出力される画像を、ベクトルに変換した後、全結合ニューラルネットワークへ入力する必要がある。例えば、５０×５０画素の画像を、２５００次元のベクトルに変換する。また、全結合ニューラルネットワークへ入力できる画像サイズは、全結合ニューラルネットワークのサイズにより規定される。このため、全結合ニューラルネットワークへ入力可能な出力画像を得ることができるように、ＣＮＮへの入力画像のサイズを調整する必要がある。 In equation (4), _Xn is a vector representing the n-th layer feature map, and _Wn is a matrix representing a weight to be added to each element of _Xn-1 . Therefore, it is necessary to convert the image output by the CNN into a vector and then input it to the fully-connected neural network. For example, a 50×50 pixel image is transformed into a 2500-dimensional vector. Also, the image size that can be input to the fully-connected neural network is defined by the size of the fully-connected neural network. Therefore, it is necessary to adjust the size of the input image to the CNN so that an output image that can be input to the fully-connected neural network can be obtained.

なお、全結合ニューラルネットワークを追加したＣＮＮであっても、前述した方法で学習することができる。これは、歴史的には全結合ニューラルネットワークが先に研究され、その派生としてＣＮＮが後に研究されたためであるが、その詳細は省略する。測定部１０３は、正確には、ＣＮＮとその出力部分に追加された全結合ニューラルネットワークとを備えて構成されるが、単に「測定部１０３のＣＮＮ」などと呼ぶ。 Note that even a CNN added with a fully-connected neural network can be learned by the method described above. This is because, historically, fully-connected neural networks were first studied, and CNNs were later studied as a derivation thereof, but details thereof will be omitted. The measurement unit 103 is precisely configured with a CNN and a fully-connected neural network added to its output part, but is simply referred to as "the CNN of the measurement unit 103".

また、測定部１０３のＣＮＮのネットワークパラメータの学習には、入力訓練画像とその擾乱の強度との組からなる訓練画像（訓練画像群）を用いる。ただし、一般的に、既知の擾乱の強度で劣化した入力訓練画像を得ることは難しい。このため、例えば前述した擾乱モデルＢ－Ｓｐｌｉｎｅを用いて、数値計算的に生成してもよい。この場合、擾乱の強度として、前述した変形ベクトルの分散を用いることができる。例えば、ネットワークパラメータは、時間的に異なる複数の第１の画像群（第１の動画）と複数の第１の画像群に対して既知の擾乱の強度により劣化させた複数の第２の画像群（第２の動画）との組からなる訓練画像群（訓練動画）を用いた学習により生成される。「時間的に異なる」複数の画像とは、異なる時間に取得された複数の画像を含む。 In addition, training images (training image group) composed of pairs of input training images and their disturbance intensities are used for the learning of the CNN network parameters by the measuring unit 103 . However, it is generally difficult to obtain degraded input training images with known disturbance strengths. Therefore, it may be generated numerically using the disturbance model B-Spline described above, for example. In this case, the variance of the deformation vector described above can be used as the intensity of the disturbance. For example, the network parameters may be a plurality of temporally different first image groups (first moving images) and a plurality of second image groups degraded by a known disturbance strength for the first image groups. It is generated by learning using a training image group (training video) consisting of a set of (second video). Multiple images that are "temporally different" include multiple images acquired at different times.

また、移動物体を含む入力訓練画像を訓練画像として用いることで、移動物体に頑強な（高精度な）擾乱の強度の測定が可能になる。同様に、画像取得部１０１の動画劣化過程が既知であれば、その劣化を含む入力訓練画像を数値計算で生成して訓練画像として用いることで、劣化に頑強な（高精度な）擾乱の強度の測定が可能になる。 In addition, by using input training images including moving objects as training images, it is possible to measure the intensity of disturbances robust (highly accurate) to moving objects. Similarly, if the motion picture deterioration process of the image acquisition unit 101 is known, an input training image including the deterioration is generated by numerical calculation and used as the training image. can be measured.

（規格化）
測定部１０３は、入力画像または入力訓練画像に対して規格化を行う。これは、入力画像の画素値の絶対値により、測定結果が左右されないようにすることが目的である。規格化は、例えば、複数の入力画像の平均画像を生成し、複数の入力画像の各々から平均画像を減算することにより行われる。この規格化方法は、以下の式（５）のように定式化することができる。 (standardization)
The measurement unit 103 normalizes the input image or the input training image. The purpose of this is to prevent the measurement result from being influenced by the absolute value of the pixel value of the input image. Normalization is performed, for example, by generating an average image of multiple input images and subtracting the average image from each of the multiple input images. This normalization method can be formulated as the following equation (5).

式（５）において、Ｉ_ｉはｉ番目の入力画像、ｍは入力画像の枚数である。また、Ｉ_ｉの上に付されたバー（―）は、規格化されたことを表している。 In equation (5), I _i is the i-th input image, and m is the number of input images. Also, a bar (-) above I _i indicates normalization.

また規格化は、例えば、複数の入力画像から時間的に隣り合う２枚の画像の差分画像を生成することにより行うこともできる。この規格化方法は、以下の式（６）ように定式化することができる。 Standardization can also be performed, for example, by generating a difference image between two temporally adjacent images from a plurality of input images. This normalization method can be formulated as in Equation (6) below.

式（６）にて用いられる記号の意味は、前述の式（５）と同様である。規格化方法は、擾乱の強度の定義、入力画像、擾乱の強度の測定精度などに応じて適宜選択であるが、本実施形態では、基本的に式（５）で与えられる規格化方法を用いる。このように本実施形態では、好ましくは、規格化を行った入力画像を用いて擾乱の強度を測定する。これにより、測定結果が入力画像の画素値の絶対値により受ける影響を除去することができるため、高精度な測定結果を得ることが可能である。 The symbols used in formula (6) have the same meanings as in formula (5) above. The normalization method can be appropriately selected according to the definition of the intensity of the disturbance, the input image, the measurement accuracy of the intensity of the disturbance, etc. In this embodiment, basically the normalization method given by Equation (5) is used. . Thus, in this embodiment, preferably, the intensity of the disturbance is measured using the normalized input image. As a result, it is possible to eliminate the influence of the absolute values of the pixel values of the input image on the measurement results, so that highly accurate measurement results can be obtained.

（入力画像サイズ調整）
測定部１０３へ入力する画像の縦・横の画素数は、ネットワークの出力部分に追加された全結合ニューラルネットワークで決定される。このため、入力画像と入力訓練画像はトリミングや補間や間引きにより、各画像の縦・横の画素数を調整する必要がある。 (input image size adjustment)
The number of vertical and horizontal pixels of the image input to the measurement unit 103 is determined by a fully connected neural network added to the output part of the network. For this reason, it is necessary to adjust the number of vertical and horizontal pixels of each image by trimming, interpolating, or thinning the input image and the input training image.

また、測定部１０３へ入力する画像の枚数（フレーム数）は、入力訓練画像のフレーム数に応じて決定される。このため入力画像は、時間的に補間または間引いて、測定部１０３へ入力する必要がある。例えば、高フレームレートで入力画像を取得した場合、入力訓練画像のフレームレートと一致するように間引いた入力画像を、測定部１０３へ入力する。なお、これは入力画像と入力訓練画像のフレームレートの違いにより、測定結果が左右されないようにするためである。また、入力画像を取得した条件は画像取得部１０１から、入力訓練画像のフレーム数はパラメータ取得部１０２から、各々取得すればよい。 Also, the number of images (number of frames) to be input to the measurement unit 103 is determined according to the number of frames of the input training images. Therefore, the input image needs to be temporally interpolated or thinned out before being input to the measurement unit 103 . For example, when the input images are acquired at a high frame rate, the input images thinned out so as to match the frame rate of the input training images are input to the measurement unit 103 . This is to prevent the measurement result from being affected by the difference in frame rate between the input image and the input training image. Also, the condition for acquiring the input image may be acquired from the image acquisition unit 101, and the number of frames of the input training image may be acquired from the parameter acquisition unit 102, respectively.

このように本実施形態において、好ましくは、画像サイズ（特にフレームレート）を調整した入力画像を用いて擾乱の強度を測定する。これにより、測定結果が入力画像と入力訓練画像のフレームレートの違いにより受ける影響を除去することができるため、高精度な測定結果を得ることが可能である。 Thus, in this embodiment, the intensity of disturbance is preferably measured using an input image whose image size (especially frame rate) has been adjusted. As a result, it is possible to eliminate the influence of the difference in frame rate between the input image and the input training image on the measurement result, so it is possible to obtain a highly accurate measurement result.

（複数箇所測定）
測定部１０３は、入力画像の複数箇所の擾乱の強度を測定し、最終的な擾乱強度を決定してもよい。ここで、入力画像の複数箇所とは、入力画像の空間と時間（縦、横、フレーム）のうち、複数の箇所という意味である。より具体的には、入力画像の空間・時間方向のある箇所から前述した方法で縦・横画素数、フレーム数の画像を抽出し、それを入力画像として測定部１０３へ入力し、その箇所における擾乱の強度を測定する。 (measurement at multiple locations)
The measuring unit 103 may measure the intensity of the disturbance at multiple locations in the input image and determine the final intensity of the disturbance. Here, multiple locations in the input image means multiple locations in the space and time (vertical, horizontal, frame) of the input image. More specifically, an image having the number of vertical and horizontal pixels and the number of frames is extracted from a certain location in the input image in the spatial and temporal directions by the method described above, and is input to the measurement unit 103 as an input image. Measure the intensity of the disturbance.

また、入力画像の複数箇所で測定した擾乱の強度から、例えばその平均値をとって最終的な擾乱強度としてもよい。また、複数箇所で測定した擾乱の強度から中間値、最小値、最大値、または、最頻値を取得し、それを最終的な擾乱の強度としてもよい。これは、入力画像の局所的に擾乱による劣化が大きい場所により、測定結果が左右されないようにするためである。 Alternatively, the final disturbance intensity may be obtained by, for example, averaging the intensity of the disturbance measured at a plurality of locations in the input image. Alternatively, an intermediate value, minimum value, maximum value, or mode value may be obtained from the disturbance intensities measured at a plurality of locations, and used as the final disturbance intensity. This is to prevent the measurement result from being affected by the portion of the input image that is locally greatly degraded due to disturbance.

このように本実施例において、好ましくは、入力画像の複数箇所の擾乱強度を測定し、最終的な擾乱強度を決定する。これにより、入力画像の局所的に擾乱による劣化が大きい場所が存在しても、高精度な測定結果を得ることが可能である。 Thus, in this embodiment, preferably, the disturbance strength is measured at a plurality of locations in the input image to determine the final disturbance strength. As a result, highly accurate measurement results can be obtained even if there is a localized portion of the input image that is significantly deteriorated due to disturbance.

補正部１０４は、測定された擾乱の強度に基づいて、擾乱による入力画像の劣化を補正する。ここで、補正方法は限定されるものではないが、画像処理の精度の点から、前述したＣＮＮを用いることが好ましい。このため、以下では例示的に補正部１０４はＣＮＮであるとして説明する。補正部１０４は前述したＣＮＮであり、例えば、ＰＣ、ワークステーション、ＦＰＧＡ、サーバで構成されるが、これらに限定されるものではなく、前述したＣＮＮの演算が実現可能な計算機であればよい。補正部１０４は、パラメータ取得部１０２が提供する学習済みのネットワークパラメータを用いて、補正処理を行う。また、補正部１０４のＣＮＮのネットワークパラメータの学習には、出力訓練画像に既知の擾乱の強度の劣化を加えた入力訓練画像との組からなる訓練画像を用いる。ただし、一般的にこのような訓練画像を得ることは難しい。このため、例えば前述した擾乱モデルＢ－Ｓｐｌｉｎｅを用いて、訓練画像を数値計算的に生成してもよい。この場合、擾乱の強度として、前述した変形ベクトルの分散を用いることができる。 The correction unit 104 corrects deterioration of the input image due to the disturbance based on the measured strength of the disturbance. Here, although the correction method is not limited, it is preferable to use the above-described CNN from the viewpoint of accuracy of image processing. Therefore, the correction unit 104 is exemplified as a CNN below. The correction unit 104 is the above-described CNN and includes, for example, a PC, a workstation, an FPGA, and a server, but is not limited to these, and may be any computer capable of realizing the above-described CNN calculations. The correction unit 104 performs correction processing using learned network parameters provided by the parameter acquisition unit 102 . Also, for the learning of the network parameters of the CNN by the correcting unit 104, a training image is used that is a pair of an output training image and an input training image obtained by adding deterioration of known disturbance intensity to the output training image. However, it is generally difficult to obtain such training images. For this reason, training images may be generated numerically using, for example, the aforementioned disturbance model B-Spline. In this case, the variance of the deformation vector described above can be used as the intensity of the disturbance.

また、移動物体を含む入出力訓練画像を訓練画像として用いることにより、移動物体に頑強な擾乱の補正が可能になる。同様に、画像取得部１０１の動画劣化過程が既知であれば、その劣化を含む入力訓練画像を数値計算で生成して訓練画像として用いることで、劣化に頑強な擾乱の補正が可能になる。 In addition, by using input/output training images including moving objects as training images, it is possible to correct disturbances robust to moving objects. Similarly, if the moving image degradation process of the image acquisition unit 101 is known, an input training image including the degradation is generated by numerical calculation and used as the training image, thereby making it possible to correct disturbance robustly against degradation.

（ネットワークパラメータ選択）
補正部１０４は、測定された擾乱の強度に基づいて、パラメータ取得部１０２が提供する学習済みのネットワークパラメータを選択して、補正処理を行う。これは、入力画像と同じ擾乱強度の訓練画像で学習したネットワークパラメータを用いることで、高精度な擾乱補正を行うためである。例えば、測定された擾乱の強度と最も近い擾乱の強度による劣化を加えた訓練画像で学習したネットワークパラメータを選択し、補正処理に用いてもよい。このように、測定された擾乱の強度に基づいて、学習済みのネットワークパラメータを選択して補正処理を行うことで、高精度な擾乱補正を行うことができる。 (Network parameter selection)
The correction unit 104 selects the learned network parameters provided by the parameter acquisition unit 102 based on the measured intensity of the disturbance, and performs correction processing. This is because highly accurate disturbance correction is performed by using network parameters learned from training images having the same disturbance intensity as the input image. For example, network parameters learned from training images that have been degraded by the measured intensity of the disturbance and the intensity of the nearest disturbance may be selected and used in the correction process. In this manner, highly accurate disturbance correction can be performed by selecting learned network parameters and performing correction processing based on the measured disturbance intensity.

（フレーム数）
補正部１０４は、測定された擾乱の強度に基づいて、入力画像の枚数（フレーム数）を決定する。これは、擾乱が大きければ補正に入力画像の枚数を多く必要とし、逆に擾乱が小さければ補正に入力画像の枚数を多く必要としないからである。なお、補正部１０４のＣＮＮのネットワークパラメータ学習の際には、訓練画像に加える擾乱強度に応じて、入力訓練画像の枚数を調整する。具体的には、擾乱が大きければ入力訓練画像の枚数を多くし、擾乱が小さければ入力訓練画像の枚数を少なく調整して、学習すればよい。このように、測定された擾乱強度に基づき、入力画像の枚数（フレーム数）を決定することで、補正に必要なデータを決定することができる。 (number of frames)
The correction unit 104 determines the number of input images (the number of frames) based on the measured intensity of the disturbance. This is because if the disturbance is large, a large number of input images are required for correction, and conversely, if the disturbance is small, a large number of input images are not required for correction. When the correction unit 104 performs CNN network parameter learning, the number of input training images is adjusted according to the disturbance intensity applied to the training images. Specifically, if the disturbance is large, the number of input training images is increased, and if the disturbance is small, the number of input training images is decreased for learning. Thus, by determining the number of input images (the number of frames) based on the measured disturbance intensity, it is possible to determine data necessary for correction.

補正部１０４で得られる画像処理結果である出力画像は、画像処理装置１００の内部に設けられた不図示の記憶部に記憶することができる。また出力画像は、画像処理装置１００の外部に設けられた不図示の表示部に表示してもよい。または、出力画像は、不図示のＣＤ－ＲＯＭドライブやＵＳＢインターフェースなどのインターフェース機器を介して、画像処理装置１００の外部の不図示の記憶媒体に記憶してもよい。なお、画像取得部１０１、パラメータ取得部１０２、測定部１０３、および、補正部１０４の間で情報（データ）をやり取りするための配線や無線に関する説明については省略する。 An output image as a result of image processing obtained by the correction unit 104 can be stored in a storage unit (not shown) provided inside the image processing apparatus 100 . The output image may also be displayed on a display unit (not shown) provided outside the image processing apparatus 100 . Alternatively, the output image may be stored in an external storage medium (not shown) of the image processing apparatus 100 via an interface device such as a CD-ROM drive (not shown) or a USB interface. A description of wiring and wireless communication for exchanging information (data) among the image acquiring unit 101, the parameter acquiring unit 102, the measuring unit 103, and the correcting unit 104 will be omitted.

画像取得部１０１、パラメータ取得部１０２、測定部１０３、および補正部１０４の機能を記述したプログラムを計算機に実行させることで、画像処理装置１００の機能を、計算機上で実現してもよい。同様に、測定部１０３および補正部１０４の少なくとも一方の機能を記述したプログラムをＶＬＳＩへ電子回路として実装し、画像処理装置１００の一部機能を実現してもよい。 The functions of the image processing apparatus 100 may be realized on the computer by causing the computer to execute a program describing the functions of the image acquisition unit 101, the parameter acquisition unit 102, the measurement unit 103, and the correction unit 104. Similarly, a program describing the function of at least one of the measuring unit 103 and the correcting unit 104 may be implemented as an electronic circuit on VLSI to implement a part of the functions of the image processing apparatus 100 .

次に、図２を参照して、本実施形態における画像処理システムについて説明する。図２は、画像処理システム２００の構成図である。画像処理システム２００は、画像処理装置１００ａと撮像装置（デジタルカメラ）２０１とを備えて構成される。撮像装置２０１は、撮像光学系および撮像素子を有し、撮影画像を取得する。撮像装置２０１により取得された撮影画像は、画像処理装置１００ａへ出力される。画像処理装置１００ａは、ＰＣとディスプレイとを有する。ＰＣは、画像取得部１０１、パラメータ取得部１０２、測定部１０３、および、補正部１０４を有する。ディスプレイは、画像処理結果としての出力画像を表示する。 Next, referring to FIG. 2, the image processing system according to this embodiment will be described. FIG. 2 is a configuration diagram of the image processing system 200. As shown in FIG. The image processing system 200 includes an image processing device 100 a and an imaging device (digital camera) 201 . The imaging device 201 has an imaging optical system and an imaging device, and acquires a captured image. A photographed image acquired by the imaging device 201 is output to the image processing device 100a. The image processing device 100a has a PC and a display. The PC has an image acquisition section 101 , a parameter acquisition section 102 , a measurement section 103 and a correction section 104 . The display displays an output image as a result of image processing.

次に、図３を参照して本実施形態における撮像装置について説明する。図３は、撮像装置３００の構成図である。撮像装置３００は、カメラ本体３０１とレンズ装置（交換レンズ）３０２とを備えて構成されている。カメラ本体３０１は、撮像素子３０３、画像処理エンジン（画像処理装置）３０４、および、モニタ３０５を有する。画像処理エンジン３０４は、画像取得部１０１、パラメータ取得部１０２、測定部１０３、および、補正部１０４を有する。モニタ３０５は、画像処理結果としての出力画像を表示する。 Next, the imaging device according to this embodiment will be described with reference to FIG. FIG. 3 is a configuration diagram of the imaging device 300. As shown in FIG. The imaging device 300 is configured including a camera body 301 and a lens device (interchangeable lens) 302 . The camera body 301 has an image sensor 303 , an image processing engine (image processing device) 304 and a monitor 305 . The image processing engine 304 has an image acquisition unit 101 , a parameter acquisition unit 102 , a measurement unit 103 and a correction unit 104 . A monitor 305 displays an output image as a result of image processing.

次に、図４を参照して、本実施形態における別の画像処理システムについて説明する。図４は、画像処理システム４００のブロック図である。画像処理システム４００は、クライアント装置４０１と、クライアント装置４０１にネットワーク４０３を介して接続されているサーバ装置４０２とを有する。クライアント装置４０１は、画像出力部４０４を有する。画像出力部４０４は、擾乱により劣化した時間的に異なる複数の画像をサーバ装置４０２へ出力する。サーバ装置４０２は、パラメータ取得部４０５、測定部４０６、および、擾乱強度出力部４０７を有する。パラメータ取得部４０５は、学習済みのネットワークパラメータを取得する。測定部４０６は、ネットワークパラメータとニューラルネットワークとを用いて、複数の画像から擾乱の強度を測定する。擾乱強度出力部４０７は、擾乱の強度をクライアント装置４０１へ出力する。またクライアント装置４０１またはサーバ装置４０２は、擾乱の強度に基づいて複数の画像を補正する補正部（不図示）を有していてもよい。 Next, another image processing system according to this embodiment will be described with reference to FIG. FIG. 4 is a block diagram of an image processing system 400. As shown in FIG. The image processing system 400 has a client device 401 and a server device 402 connected to the client device 401 via a network 403 . The client device 401 has an image output unit 404 . The image output unit 404 outputs to the server device 402 a plurality of temporally different images degraded by the disturbance. The server device 402 has a parameter acquisition section 405 , a measurement section 406 and a disturbance intensity output section 407 . The parameter acquisition unit 405 acquires learned network parameters. A measurement unit 406 measures the intensity of disturbance from a plurality of images using network parameters and a neural network. A disturbance intensity output unit 407 outputs the intensity of the disturbance to the client device 401 . Also, the client device 401 or the server device 402 may have a correction unit (not shown) that corrects a plurality of images based on the intensity of the disturbance.

次に、図５を参照して、本実施形態における画像処理方法について説明する。図５は、画像処理方法のフローチャートであり、ＶＬＳＩなどで画像処理方法を実施する場合におけるプログラムのフローを示している。図５の各ステップは、例えば、画像処理装置１００の画像取得部１０１、パラメータ取得部１０２、測定部１０３、および、補正部１０４により実行される。 Next, an image processing method according to this embodiment will be described with reference to FIG. FIG. 5 is a flow chart of the image processing method, showing the program flow when the image processing method is implemented by VLSI or the like. Each step in FIG. 5 is executed by the image acquisition unit 101, the parameter acquisition unit 102, the measurement unit 103, and the correction unit 104 of the image processing apparatus 100, for example.

まずステップＳ５０１において、画像取得部１０１は、擾乱により劣化した時間的に異なる複数の画像（入力画像、動画）を取得する。続いてステップＳ５０２において、パラメータ取得部１０２は、学習済みのネットワークパラメータを取得する。続いてステップＳ５０３において、測定部１０３は、ネットワークパラメータとニューラルネットワークとを用いて、複数の画像から擾乱の強度を測定する。最後にステップＳ５０４において、補正部１０４は、擾乱の強度に基づいて複数の画像を補正する。 First, in step S501, the image acquisition unit 101 acquires a plurality of temporally different images (input images, moving images) degraded by disturbance. Subsequently, in step S502, the parameter acquisition unit 102 acquires learned network parameters. Subsequently, in step S503, the measurement unit 103 measures the intensity of disturbance from a plurality of images using network parameters and a neural network. Finally, in step S504, the correction unit 104 corrects the multiple images based on the intensity of the disturbance.

次に、各実施例について詳述する。 Next, each embodiment will be described in detail.

まず、図６および図７を参照して、本発明の実施例１について説明する。本実施例では、画像処理装置１００の機能を記述したプログラムを用いて、既知の擾乱強度により劣化した入力画像の擾乱強度を測定した数値計算の結果を説明する。 First, Embodiment 1 of the present invention will be described with reference to FIGS. 6 and 7. FIG. In this embodiment, a program describing the functions of the image processing apparatus 100 is used to explain the results of numerical calculations in which the disturbance intensity of an input image degraded by a known disturbance intensity is measured.

図６は、本実施例におけるネットワーク構造（測定部１０３のＣＮＮ）６００を示す図である。図６において、ｃｏｎｖは畳み込み層、ｄｅｃｏｎｖは逆（転置）畳み込み層をそれぞれ示している。また、各層の上の数字列はフィルタの縦横サイズ、チャンネル数、および、枚数を表している。例えば、図６中の「３×３×１×８」は縦横サイズ３×３、チャンネル数１、枚数８枚のフィルタで畳み込み、または逆（転置）畳み込みを行うことを表している。逆（転置）畳み込みとは、畳み込みの一種であり、簡単には畳み込みの逆処理である。詳細は、例えば非特許文献２に開示されている。また、図６中の丸中の＋印は特徴マップの要素ごとの和を表している。 FIG. 6 is a diagram showing a network structure (CNN of measurement unit 103) 600 in this embodiment. In FIG. 6, conv indicates a convolutional layer, and deconv indicates an inverse (transposed) convolutional layer. Also, the number strings on each layer represent the vertical and horizontal size of the filter, the number of channels, and the number of sheets. For example, "3×3×1×8" in FIG. 6 indicates that convolution or inverse (transposition) convolution is performed using filters with a size of 3×3, 1 channel, and 8 filters. Inverse (transposed) convolution is a kind of convolution, and is simply the inverse process of convolution. Details are disclosed in Non-Patent Document 2, for example. In addition, the + mark in the circle in FIG. 6 represents the sum of each element of the feature map.

また、ネットワーク構造６００における出力部のＦＣは、全結合ネットワークを示している。全結合ネットワークの上の数字列は、全結合ネットワークへの入力サイズと出力サイズを表している。例えば、図６中の「２５００×２５００」は２５００次元のベクトルを入力し、２５００次元のベクトルを出力することを表している。より具体的には、ＣＮＮで出力される５０×５０画素の画像を、２５００次元のベクトルへ変換し、全結合ネットワークへ入力している。前述したように、全結合ネットワークへ入力可能な画像サイズは決まっている。それに従って入力画像サイズは決定される。本実施例では、入力画像サイズは５０×５０画素、１１フレームである。 Also, the FC at the output of the network structure 600 indicates a fully connected network. The numbers above the fully-connected network represent the input and output sizes to the fully-connected network. For example, "2500×2500" in FIG. 6 indicates that a 2500-dimensional vector is input and a 2500-dimensional vector is output. More specifically, a 50×50 pixel image output by the CNN is converted into a 2500-dimensional vector and input to the fully-connected network. As described above, the image size that can be input to the fully-connected network is fixed. The input image size is determined accordingly. In this embodiment, the input image size is 50×50 pixels and 11 frames.

なお、図６に示されるネットワーク構造６００は一例に過ぎず、本発明はこれに限定されるものではない。訓練画像は、既知の擾乱強度を有する入力訓練画像とその擾乱強度との組からなる。なお、測定部１０３のネットワーク構造６００に合わせて、入力訓練画像サイズは５０×５０画素、１１フレームである。また、入力訓練画像は、前述した擾乱モデルＢ－Ｓｐｌｉｎｅを用いて、数値計算的に生成される。その際、擾乱強度として前述した変形ベクトルの分散を用いる。また、訓練画像として移動物体を含む入力訓練画像を用いる。これは、前述したとおり、本発明により移動物体に頑強な擾乱強度の測定が可能になることを示すためである。 It should be noted that the network structure 600 shown in FIG. 6 is only an example, and the present invention is not limited to this. The training images consist of pairs of input training images with known disturbance strengths and their disturbance strengths. Note that the input training image size is 50×50 pixels and 11 frames in accordance with the network structure 600 of the measurement unit 103 . Also, the input training images are numerically generated using the disturbance model B-Spline described above. At that time, the variance of the deformation vector described above is used as the disturbance intensity. Also, input training images containing moving objects are used as training images. This is to demonstrate that the present invention enables robust measurement of disturbance intensity for moving objects, as described above.

入力画像は、訓練入力画像と同じ条件（光学系の光学条件、イメージセンサの画素ピッチ、および、フレームレート）で取得したと見なせる画像を用いる。このため、入力画像のフレームレートに関して調整は行っていない。なお、入力画像サイズは４００×４００画素、４０フレームである。そこから、５０×５０画素、１１フレームを時間的および空間的にランダムに２０箇所抽出し、算出した擾乱強度の平均値を、最終的な擾乱強度とする。また、入力画像として移動物体（車）を含むものを用いた。 The input image uses an image that can be considered to have been acquired under the same conditions (optical system optical conditions, image sensor pixel pitch, and frame rate) as the training input image. Therefore, the frame rate of the input image is not adjusted. The input image size is 400×400 pixels and 40 frames. From there, 11 frames of 50×50 pixels are randomly extracted temporally and spatially at 20 locations, and the average value of the calculated disturbance strength is used as the final disturbance strength. Also, an input image containing a moving object (a car) was used.

入力画像と入力訓練画像の規格化は、式（５）で与えられる方法を用いる。すなわち、複数の入力（訓練）画像の平均画像を生成し、これを複数の入力（訓練）画像の各々から減算することにより、規格化を行う。また、全ての画像はモノクロ画像であり、画素値は［０１］の範囲になるように規格化される。 Normalization of input images and input training images uses the method given by Equation (5). That is, normalization is performed by generating an average image of multiple input (training) images and subtracting it from each of the multiple input (training) images. All images are monochrome images, and the pixel values are normalized so that they fall within the range of [0 1].

学習は、最適化法としてＡｄａｍ法を用いたＳＧＤ（非特許文献２参照）である。なお、Ａｄａｍ法のパラメータは、α＝１０^－４、β_１＝０．９、β_２＝０．９９９、ε＝１０^－８である。また、ＳＧＤは訓練画像の全枚数７６８００枚から、ランダムに１２８枚選択して用いる。また学習の反復回数は、１８×１０^４回（３００エポック）である。またネットワークパラメータ（フィルタおよびバイアス）の初期値は、全ての層でＸａｖｉｅｒ（非特許文献３参照）を用いる。 Learning is SGD (see Non-Patent Document 2) using Adam's method as an optimization method. The parameters for Adam's method are α=10 ⁻⁴ , β ₁ =0.9, β ₂ =0.999, and ε=10 ⁻⁸ . Also, SGD is used by randomly selecting 128 images from the total number of 76,800 training images. The number of iterations of learning is 18×10 ⁴ times (300 epochs). For the initial values of network parameters (filters and biases), Xavier (see Non-Patent Document 3) is used in all layers.

なお、測定部１０３へ入力する入力画像と入力訓練画像のサイズやフレーム数、規格化方法は一例であり、本発明はこれに限定されるものではない。また、測定部１０３から出力される擾乱強度の定義や最終的な擾乱強度の算出方法は一例であり、本発明はこれに限定されるものではない。 Note that the size, the number of frames, and the normalization method of the input image and the input training image to be input to the measurement unit 103 are examples, and the present invention is not limited thereto. Also, the definition of the disturbance intensity output from the measurement unit 103 and the final calculation method of the disturbance intensity are examples, and the present invention is not limited to this.

図６に示されるように、ネットワーク構造（ニューラルネットワーク）６００は、メイン部６０１、入力部６０２、変換部６０３、および、出力部６０４を有する。メイン部６０１は、第１のネットワークパラメータと少なくとも２層以上の第１の畳み込みニューラルネットワーク（ＣＮＮ）とを用いて、複数の画像を第１の特徴量６１１ａ、６１１ｂ、６１１ｃに変換する。入力部６０２は、第２のネットワークパラメータと第２のＣＮＮとを用いて、複数の画像を第２の特徴量６１２ａ、６１２ｂ、６１２ｃに変換する。変換部６０３は、第１の特徴量と第２の特徴量とを加算して第３の特徴量６１３ａ、６１３ｂ、６１３ｃを生成し、第３のネットワークパラメータと第３のＣＮＮとを用いて第３の特徴量を第４の特徴量６１４に変換する。出力部６０４は、第４のネットワークパラメータと全結合ニューラルネットワークとを用いて、第４の特徴量から擾乱の強度を出力する。 As shown in FIG. 6, a network structure (neural network) 600 has a main section 601, an input section 602, a transform section 603, and an output section 604. FIG. The main unit 601 converts a plurality of images into first features 611a, 611b, 611c using first network parameters and a first convolutional neural network (CNN) of at least two layers. The input unit 602 converts a plurality of images into second features 612a, 612b, 612c using the second network parameters and the second CNN. The transformation unit 603 adds the first feature amount and the second feature amount to generate third feature amounts 613a, 613b, and 613c, and uses the third network parameter and the third CNN to generate the third feature amount. 3 is converted into a fourth feature quantity 614 . The output unit 604 uses the fourth network parameters and the fully-connected neural network to output the intensity of the disturbance from the fourth feature amount.

図７は、本実施例における数値計算結果（擾乱強度測定結果）を示す図である。図７において横軸は入力画像に与えた擾乱強度（真の擾乱強度）、縦軸は入力画像から測定された擾乱強度である。なお、グラフのエラーバーは、入力画像からランダムに抽出された、２０箇所の測定した擾乱強度の標準偏差を表している。これより、入力画像に与えた擾乱強度と相関が高く、かつ移動物体に頑強な擾乱強度測定ができていることが分かる。 FIG. 7 is a diagram showing numerical calculation results (disturbance intensity measurement results) in this embodiment. In FIG. 7, the horizontal axis is the disturbance intensity applied to the input image (true disturbance intensity), and the vertical axis is the disturbance intensity measured from the input image. Note that the error bars in the graph represent the standard deviation of the 20 measured disturbance intensities randomly extracted from the input image. From this, it can be seen that there is a high correlation with the disturbance intensity applied to the input image, and robust disturbance intensity measurement can be performed for moving objects.

次に、図８乃至図１０を参照して、本発明の実施例２について説明する。本実施例では、画像処理装置１００の機能を記述したプログラムを用いて、未知の擾乱強度を有する入力画像の擾乱強度を測定した後、擾乱を補正した数値計算の結果について説明する。なお、測定部１０３のＣＮＮは、実施例１と同様であるため、その説明は省略する。 Next, Embodiment 2 of the present invention will be described with reference to FIGS. 8 to 10. FIG. In this embodiment, a program describing the functions of the image processing apparatus 100 is used to measure the disturbance intensity of an input image having an unknown disturbance intensity, and then numerical calculation results for correcting the disturbance will be described. Note that the CNN of the measurement unit 103 is the same as that of the first embodiment, so the description thereof will be omitted.

図８は、本実施例におけるネットワーク構造（補正部１０４のＣＮＮ）８００を示す図である。図８のネットワーク構造８００の基本構成は、出力部分に全結合ネットワークがない以外は、実施例１にて説明した測定部１０３のネットワーク構造６００と同様であるため、その詳細な説明は省略する。 FIG. 8 is a diagram showing a network structure (CNN of correction unit 104) 800 in this embodiment. Since the basic configuration of the network structure 800 in FIG. 8 is the same as the network structure 600 of the measurement unit 103 described in the first embodiment except that there is no fully connected network in the output portion, detailed description thereof will be omitted.

ネットワーク構造８００は、メイン部８０１、入力部８０２、および、出力部８０３を有する。メイン部８０１は、学習済みの第５のネットワークパラメータと少なくとも２層以上の第５の畳み込みニューラルネットワーク（ＣＮＮ）とを用いて、複数の画像を第５の特徴量８１１ａ、８１１ｂ、８１１ｃに変換する。入力部８０２は、学習済みの第６のネットワークパラメータと第６のＣＮＮとを用いて、複数の画像を第６の特徴量８１２ａ、８１２ｂ、８１２ｃに変換する。出力部８０３は、第５の特徴量と第６の特徴量とを加算して第７の特徴量８１３ａ、８１３ｂ、８１３ｃを生成し、学習済みの第７のネットワークパラメータと第７のＣＮＮとを用いて第７の特徴量を出力画像へ変換する。なお、図８に示されるネットワーク構造８００は一例に過ぎず、本発明はこれに限定されるものではない。 The network structure 800 has a main part 801 , an input part 802 and an output part 803 . A main unit 801 converts a plurality of images into fifth feature amounts 811a, 811b, and 811c using a learned fifth network parameter and a fifth convolutional neural network (CNN) having at least two layers. . The input unit 802 converts a plurality of images into sixth feature amounts 812a, 812b, and 812c using the sixth learned network parameter and the sixth CNN. The output unit 803 adds the fifth feature amount and the sixth feature amount to generate the seventh feature amounts 813a, 813b, and 813c, and outputs the learned seventh network parameter and the seventh CNN. is used to transform the seventh feature quantity into an output image. Note that the network structure 800 shown in FIG. 8 is only an example, and the present invention is not limited to this.

訓練画像は、出力訓練画像に対して既知の擾乱強度による劣化を加えた入力訓練画像の組からなる。なお、入力訓練画像は、前述した擾乱モデルＢ－Ｓｐｌｉｎｅを用いて、出力訓練画像から数値計算的に生成される。その際、擾乱強度として前述した変形ベクトルの分散を用いる。また、ネットワーク構造８００にはその出力部分に全結合ネットワークがないため、任意サイズの訓練画像を用いることができる。本実施例では、実施例１と同様に、入出力訓練画像サイズは５０×５０画素を用いる。また、擾乱強度に応じて入力訓練画像の枚数（フレーム数）を決定することができる。本実施例では、簡単のため、擾乱強度によらずフレーム数は１１フレームである。 The training images consist of a set of input training images degraded by a known disturbance strength to the output training images. Note that the input training images are numerically generated from the output training images using the disturbance model B-Spline described above. At that time, the variance of the deformation vector described above is used as the disturbance intensity. Also, because the network structure 800 does not have a fully connected network at its output, training images of any size can be used. In this embodiment, as in the first embodiment, the input/output training image size is 50×50 pixels. Also, the number of input training images (the number of frames) can be determined according to the disturbance intensity. In this embodiment, for the sake of simplicity, the number of frames is 11 regardless of the disturbance intensity.

入力画像は、訓練入力画像と同じ条件（光学系の光学条件、イメージセンサの画素ピッチ、フレームレート）で取得したと見なせる画像を用いる。このため、フレームレートに関して調整は行っていない。なお、入力画像サイズは４００×４００画素、８０フレームである。また出力画像サイズは、入力画像サイズと同じである。また、全ての画像はモノクロ画像であり、画素値は［０１］の範囲になるように規格化されている。 As the input image, an image that can be regarded as having been acquired under the same conditions as the training input image (optical conditions of the optical system, pixel pitch of the image sensor, frame rate) is used. For this reason, no adjustments have been made to the frame rate. The input image size is 400×400 pixels and 80 frames. Also, the output image size is the same as the input image size. Also, all the images are monochrome images, and the pixel values are normalized so that they fall within the range of [0 1].

学習は、前述と同様に、最適化法としてＡｄａｍ法を用いたＳＧＤである。Ａｄａｍ法のパラメータは、α＝１０^－４、β_１＝０．９、β_２＝０．９９９、ε＝１０^－８である。ＳＧＤは、訓練画像全枚数７６８００枚から、ランダムに１２８枚選択して用いる。学習の反復回数は、１８×１０^４回（３００エポック）である。ネットワークパラメータ（フィルタおよびバイアス）の初期値は、全ての層でＸａｖｉｅｒ（非特許文献３参照）を用いる。なお、測定部１０３へ入力する入力画像と入力訓練画像のサイズやフレーム数は一例であり、本発明はこれに限定されるものではない。また、測定部１０３から出力される出力画像と出力訓練画像のサイズやフレーム数は一例であり、本発明はこれに限定されるものではない。 Learning is SGD using Adam's method as an optimization method, as described above. The parameters for Adam's method are α=10 ⁻⁴ , β ₁ =0.9, β ₂ =0.999, ε=10 ⁻⁸ . SGD is used by randomly selecting 128 training images from a total of 76,800 training images. The number of learning iterations is 18×10 ⁴ times (300 epochs). Initial values of network parameters (filters and biases) use Xavier (see Non-Patent Document 3) in all layers. Note that the size and the number of frames of the input image and the input training image to be input to the measurement unit 103 are examples, and the present invention is not limited to this. Also, the size and the number of frames of the output image and the output training image output from the measurement unit 103 are examples, and the present invention is not limited to this.

図９は、本実施例における数値計算結果を定性的に示す図であり、擾乱補正結果を示す。図９（ａ）は擾乱で劣化した画像（入力画像）の１フレーム、図９（ｂ）は擾乱を補正した（出力画像）の対応する１フレームである。なお、各図の下には、分かり易さのため、各図の一断面を時間方向に積層した図を併せて示している。これより、断面図のゆらぎが緩やかになっていることから、擾乱が適切に補正されていることが定性的に分かる。 FIG. 9 is a diagram qualitatively showing numerical calculation results in this embodiment, showing disturbance correction results. FIG. 9(a) shows one frame of an image (input image) degraded by the disturbance, and FIG. 9(b) shows one corresponding frame of the disturbance-corrected (output image). For ease of understanding, a diagram in which one cross-section of each figure is stacked in the time direction is also shown below each figure. From this, it can be qualitatively understood that the disturbance is appropriately corrected because the fluctuation of the cross-sectional view is moderate.

図１０は、本実施例における数値計算結果を定量的に示す図であり、擾乱で劣化した画像（入力画像）と、擾乱を補正した画像（出力画像）の擾乱強度を、本実施例により測定した結果を示す。なお、擾乱強度の測定には、実施例１で説明した方法を用いる。入力画像よりも出力画像の擾乱強度のほうが小さくなっていることから、擾乱が適切に補正されていることが定量的に分かる。 FIG. 10 is a diagram quantitatively showing the results of numerical calculations in this embodiment, in which the disturbance intensity of an image (input image) degraded by disturbance and an image corrected for disturbance (output image) is measured according to this embodiment. The results are shown. The method described in Example 1 is used for measuring the disturbance intensity. Since the disturbance intensity of the output image is smaller than that of the input image, it can be quantitatively understood that the disturbance is appropriately corrected.

（その他の実施例）
本発明は、上述の実施例の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other examples)
The present invention supplies a program that implements one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in the computer of the system or apparatus reads and executes the program. It can also be realized by processing to It can also be implemented by a circuit (for example, ASIC) that implements one or more functions.

各実施例によれば、高精度に擾乱の強度を測定可能な画像処理装置、画像処理システム、撮像装置、画像処理方法、プログラム、および、記憶媒体を提供することができる。 According to each embodiment, it is possible to provide an image processing device, an image processing system, an imaging device, an image processing method, a program, and a storage medium capable of measuring the intensity of disturbance with high precision.

以上、本発明の好ましい実施例について説明したが、本発明はこれらの実施例に限定されず、その要旨の範囲内で種々の変形及び変更が可能である。 Although preferred embodiments of the present invention have been described above, the present invention is not limited to these embodiments, and various modifications and changes are possible within the scope of the gist.

１００画像処理装置
１０１画像取得部
１０２パラメータ取得部
１０３測定部
100 image processing device 101 image acquisition unit 102 parameter acquisition unit 103 measurement unit

Claims

an image acquisition unit that acquires a plurality of temporally different images that are degraded by the disturbance;
A parameter acquisition unit that acquires network parameters of a neural network generated by learning using a plurality of image groups obtained based on known disturbance intensities;
generating a plurality of normalized images by subtracting an average image of the plurality of images from each of the plurality of images ; and a measurement unit that measures the intensity of the disturbance from the image of the image processing apparatus.

2. The image processing apparatus according to claim 1, wherein the intensity of the disturbance is temporal or spatial dispersion of pixel values.

The plurality of images are images acquired by photographing using an optical system and an imaging device,
3. The image processing apparatus according to claim 1 , wherein the parameter acquisition unit selects the network parameters to be acquired based on shooting conditions in the shooting .

4. The image processing apparatus according to claim 3, wherein the photographing condition is an optical condition of the optical system, a pixel pitch of the imaging device, or a frame rate.

The measurement unit generates a plurality of images whose sizes are adjusted based on the frame rate and the learning conditions of the neural network, and measures the intensity of the disturbance from the adjusted images. 5. The image processing apparatus according to claim 4.

The neural network is
a main unit that transforms the plurality of images into a first feature using a first convolutional neural network having first network parameters ;
an input unit that transforms the plurality of images into a second feature using a second convolutional neural network having second network parameters ;
A third feature is generated by adding the first feature and the second feature, and the third feature is obtained using a third convolutional neural network having a third network parameter. a conversion unit that converts the quantity into a fourth feature quantity;
an output unit that outputs the intensity of the disturbance from the fourth feature using a fully connected network having a fourth network parameter ;
6. The image processing apparatus according to claim 1, wherein said network parameters include said first to fourth network parameters .

7. The image processing apparatus according to any one of claims 1 to 6 , wherein the measurement unit measures the intensity of the disturbance at a plurality of locations in the plurality of images.

3. The intensity of the known disturbance is a normal random number variance of a deformation amount given by random numbers to the control points in the image , and is generated using a B-Spline-based disturbance model. 8. The image processing device according to any one of 7 .

9. The image processing apparatus according to any one of claims 1 to 8 , further comprising a correction unit that corrects the plurality of images based on the intensity of the disturbance.

9. The image processing apparatus according to any one of claims 1 to 8, further comprising a correction unit that selects network parameters for correcting the plurality of images based on the intensity of the disturbance.

An image processing system comprising the image processing device according to any one of claims 1 to 10 and a client device connected to the image processing device via a network,
The client device has an image output unit that outputs a plurality of temporally different images degraded by the disturbance to the image processing device,
The image processing system , wherein the image processing device further includes a disturbance intensity output unit that outputs the intensity of the disturbance to the client device .

an imaging device;
An imaging apparatus comprising the image processing apparatus according to any one of claims 1 to 10 .

an image acquisition step of acquiring a plurality of temporally different images degraded by the disturbance;
A parameter acquisition step of acquiring network parameters of a neural network generated by learning using a plurality of image groups obtained based on known disturbance intensities;
generating a plurality of normalized images by subtracting an average image of the plurality of images from each of the plurality of images ; and a measuring step of measuring the intensity of the disturbance from the image of the image.

A program that causes a computer to execute the image processing method according to claim 13 .

15. A storage medium storing the program according to claim 14 .