JP7309520B2

JP7309520B2 - Image processing method, image processing device, imaging device, program, storage medium, image processing system, and learned model manufacturing method

Info

Publication number: JP7309520B2
Application number: JP2019152000A
Authority: JP
Inventors: 智暁井上; 良範木村
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2018-09-26
Filing date: 2019-08-22
Publication date: 2023-07-18
Anticipated expiration: 2039-08-22
Also published as: JP2020057373A

Description

本発明は、複数の入力画像を畳み込みニューラルネットワークを用いて複数の入力画像を処理する画像処理方法に関する。 The present invention relates to an image processing method for processing a plurality of input images using a convolutional neural network for a plurality of input images.

近年、画像の画質を向上させるため、畳み込みニューラルネットワーク（ＣＮＮ）を用いた画像処理技術が利用され始めている。ＣＮＮとは、学習により生成したフィルタを、入力画像に対して畳み込こみ、学習により生成したバイアスを加算したのち、非線形演算することを繰り返し、所望の出力画像へ変換する学習型の画像処理技術である。この学習は、入力学習画像と出力学習画像との組からなる学習画像を用いて行われる。簡単には、学習とは、入力画像に対応する入力学習画像と、出力画像に対応する出力学習画像とを大量（例えば数万枚程度）に用意し、これらの学習画像から入出力画像間の関係を学ぶことである。 In recent years, image processing techniques using convolutional neural networks (CNN) have begun to be used in order to improve the image quality of images. CNN is a learning-type image processing technology that convolves a filter generated by learning with an input image, adds a bias generated by learning, and then repeats nonlinear operations to convert to a desired output image. is. This learning is performed using a learning image consisting of a set of an input learning image and an output learning image. Simply put, learning involves preparing a large number of input learning images corresponding to input images and output learning images corresponding to output images (for example, about tens of thousands of images), and using these learning images as input/output images. It is learning to relate.

例えば、非特許文献１には、入力学習画像をスマートフォンカメラで、出力学習画像をデジタル一眼カメラで取得し、入力画像であるスマートフォンカメラ画像を、デジタル一眼カメラ画質へ変換するＣＮＮが開示されている。これにより、小型のスマートフォンカメラによって、大型の撮像装置であるデジタル一眼カメラの画質に近い画像を取得することが可能となる。 For example, Non-Patent Document 1 discloses a CNN that acquires an input learning image with a smartphone camera and an output learning image with a digital single-lens camera, and converts the smartphone camera image, which is the input image, into digital single-lens camera image quality. . As a result, it is possible to obtain an image with image quality close to that of a digital single-lens camera, which is a large imaging device, with a small smartphone camera.

ＡｎｄｒｅｙＩｇｎａｏｖ、ＮｉｋｏｌａｙＫｏｂｙｓｈｅｖ、ＲａｄｕＴｉｍｏｆｔｅ、ＫｅｎｎｅｔｈＶａｎｈｏｅｙ、ＬｕｃＶａｎＧｏｏｌ、「ＤＳＬＲ－ＱｕａｌｉｔｙＰｈｏｔｏｓｏｎＭｏｂｉｌｅＤｅｖｉｃｅｓｗｉｔｈＤｅｅｐＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｔｗｏｒｋｓ」、ａｒＸｉｖ：１７０４．０２４７０ｖ２、アメリカ合衆国、２０１７Andrey Ignaov, Nikolay Kobyshev, Radu Timofte, Kenneth Vanhoey, Luc Van Gool, "DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks" , arXiv:1704.02470v2, United States, 2017

しかしながら、非特許文献１に開示されているＣＮＮを用いた手法では、入力画像として１つの撮像部で取得された画像のみを用いるため、高周波成分の復元とノイズの低減効果が不十分である。一般的に、撮影画像のノイズ量は撮像素子の画素サイズに依存する。このため、画素サイズが大きいほど、ノイズ量が低い画像を取得することが可能である。一方、画素サイズが小さいほど、被写体の高周波成分の再現性を高めることができる。すなわち、高周波成分の再現性とノイズ量の低減とはトレードオフの関係にあり、１つの撮像部で取得された画像のみを用いて両方を高性能化することは困難である。 However, in the method using CNN disclosed in Non-Patent Document 1, since only the image acquired by one imaging unit is used as the input image, the effect of restoring high-frequency components and reducing noise is insufficient. In general, the amount of noise in a captured image depends on the pixel size of the image sensor. Therefore, it is possible to obtain an image with a lower noise amount as the pixel size is larger. On the other hand, the smaller the pixel size, the higher the reproducibility of the high frequency components of the subject. That is, the reproducibility of high-frequency components and the reduction of the amount of noise are in a trade-off relationship, and it is difficult to improve the performance of both by using only the image acquired by one imaging unit.

そこで本発明は、撮影画像の高周波成分を復元しつつノイズ量を低減することが可能な画像処理方法、画像処理装置、撮像装置、プログラム、記憶媒体、画像処理システム、および、学習済みモデルの製造方法を提供することを目的とする。 Accordingly, the present invention provides an image processing method, an image processing apparatus, an imaging apparatus, a program, a storage medium, an image processing system, and a trained model capable of reducing noise while restoring high-frequency components of a captured image. The purpose is to provide a method.

本発明の一側面としての画像処理方法は、被写体の第一の画像を取得する工程と、前記第一の画像よりもノイズ量が多く、前記第一の画像よりも前記被写体の高周波成分を多く含む第二の画像を取得する工程と、前記第一の画像および前記第二の画像をニューラルネットワークに入力する工程と、前記ニューラルネットワークを用いて、前記第一の画像よりも前記被写体の高周波成分を多く含み前記第二の画像よりもノイズ量が少ない第三の画像を生成する工程とを有する。 An image processing method as one aspect of the present invention includes the step of acquiring a first image of a subject , and the amount of noise is greater than that of the first image, and the high frequency component of the subject is greater than that of the first image. inputting the first image and the second image to a neural network ; using the neural network to detect higher frequency components of the subject than the first image and generating a third image that contains more and has a lower amount of noise than the second image.

本発明の他の側面としての画像処理装置は、被写体の第一の画像と、前記第一の画像よりもノイズ量が多く、前記第一の画像よりも前記被写体の高周波成分を多く含む第二の画像を取得する取得部と、前記第一の画像および前記第二の画像をニューラルネットワークに入力して、前記ニューラルネットワークを用いて前記第一の画像よりも前記被写体の高周波成分を多く含み前記第二の画像よりもノイズ量が少ない第三の画像を生成する算出部とを有する。 An image processing apparatus as another aspect of the present invention includes a first image of a subject , and a second image having a larger amount of noise than the first image and containing more high-frequency components of the subject than the first image. and an acquisition unit that acquires an image of the and a calculator for generating a third image with less noise than the second image.

本発明の他の側面としての撮像装置は、前記の画像処理装置と、前記第一の画像を撮像する第一の撮像部と、前記第二の画像を撮像する第二の撮像部とを有する。 An imaging device as another aspect of the present invention comprises the image processing device, a first imaging unit that captures the first image , and a second imaging unit that captures the second image. have

本発明の他の側面としてのプログラムは、前記画像処理方法をコンピュータに実行させる。 A program as another aspect of the present invention causes a computer to execute the image processing method.

本発明の他の側面としての記憶媒体は、前記プログラムを記憶している。 A storage medium as another aspect of the present invention stores the program.

本発明の他の側面としての画像処理システムは、第一の装置と第二の装置を有する画像処理システムであって、前記第一の装置は、被写体の第一の画像と、前記第一の画像よりもノイズ量が多く、前記第一の画像よりも前記被写体の高周波成分を多く含む第二の画像を用いた画像処理を前記第二の装置に実行させるための要求を送信する送信手段を有し、前記第二の装置は、前記送信手段によって送信された前記要求を受信する受信手段と、前記第一の画像と前記第二の画像を取得する取得手段と、前記第一の画像および前記第二の画像をニューラルネットワークに入力して、前記ニューラルネットワークを用いて前記第一の画像よりも前記被写体の高周波成分を多く含み前記第二の画像よりもノイズ量が少ない第三の画像を生成する算出部とを有する。 An image processing system as another aspect of the present invention is an image processing system having a first device and a second device, wherein the first device comprises a first image of a subject and the first transmitting means for transmitting a request to cause the second device to execute image processing using a second image that has a larger amount of noise than the image and contains more high-frequency components of the subject than the first image ; wherein the second device comprises: receiving means for receiving the request transmitted by the transmitting means; obtaining means for obtaining the first image and the second image; inputting the second image to a neural network, and using the neural network to generate a third image containing more high-frequency components of the subject than the first image and having less noise than the second image; and a calculation unit for generating.

本発明の他の側面としての学習済みモデルの製造方法は、被写体の第一の画像を取得する工程と、前記第一の画像よりもノイズ量が多く、前記第一の画像よりも前記被写体の高周波成分を多く含む第二の画像を取得する工程と、前記第一の画像よりも前記被写体の高周波成分を多く含み、前記第二の画像よりもノイズ量が少ない第三の画像を取得する工程と、前記第一の画像と前記第二の画像と前記第三の画像を用いてニューラルネットワークのパラメータを学習する工程とを有する。 According to another aspect of the present invention, there is provided a method for producing a trained model, comprising: acquiring a first image of a subject ; acquiring a second image containing many high-frequency components; and acquiring a third image containing more high-frequency components of the subject than the first image and having less noise than the second image. and learning parameters of a neural network using the first image, the second image, and the third image.

本発明の他の目的及び特徴は、以下の実施例において説明される。 Other objects and features of the invention are illustrated in the following examples.

本発明によれば、撮影画像の高周波成分を復元しつつノイズ量を低減することが可能な画像処理方法、画像処理装置、撮像装置、プログラム、記憶媒体、画像処理システム、および、学習済みモデルの製造方法を提供することができる。 According to the present invention, an image processing method, an image processing device, an imaging device, a program, a storage medium, an image processing system, and a trained model capable of reducing noise while restoring high-frequency components of a captured image. A manufacturing method can be provided.

実施例１における撮像装置の外観図である。1 is an external view of an imaging device in Example 1. FIG. 実施例１における撮像装置のブロック図である。1 is a block diagram of an imaging device in Example 1. FIG. 実施例１および実施例２における画像を補正するネットワーク構造の説明図である。FIG. 5 is an explanatory diagram of a network structure for correcting an image in Examples 1 and 2; 実施例１および実施例２における画像の補正処理を示すフローチャートである。5 is a flow chart showing image correction processing in Examples 1 and 2. FIG. 実施例１および実施例２における学習情報の学習を示すフローチャートである。5 is a flowchart showing learning of learning information in Examples 1 and 2; 実施例１における画像処理結果である。4 is an image processing result in Example 1. FIG. 実施例１における画像処理の数値計算結果である。4 is a numerical calculation result of image processing in Example 1. FIG. 実施例２における画像処理システムのブロック図である。FIG. 11 is a block diagram of an image processing system in Example 2; 実施例２における画像処理システムの外観図である。FIG. 11 is an external view of an image processing system in Example 2;

以下、本発明の実施例について、図面を参照しながら詳細に説明する。各図において、同一の部材については同一の参照符号を付し、重複する説明は省略する。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In each figure, the same members are denoted by the same reference numerals, and overlapping descriptions are omitted.

実施例の具体的な説明へ入る前に、本発明の要旨を述べる。本発明では、ディープラーニング（深層学習）の１つである畳み込みニューラルネットワーク（ＣＮＮ）を用いて、複数の異なる撮像部で取得された撮影画像に対して高周波成分を復元しつつノイズ量を低減する。以下に述べる各実施例は、特に、低解像度かつノイズ量が少ない画像（第一の画像）と高解像度かつノイズ量が多い画像（第二の画像）とに基づいて、高解像度かつノイズ量が少ない高画質な画像（第三の画像）を生成するものである。 Before going into the specific description of the embodiments, the gist of the present invention will be described. In the present invention, a convolutional neural network (CNN), which is one of deep learning, is used to reduce noise while restoring high-frequency components for images captured by a plurality of different imaging units. . Each embodiment described below is based on a low-resolution image with a low noise amount (first image) and a high-resolution image with a high noise amount (second image). A small number of high-quality images (third images) are generated.

まず、図１および図２を参照して、本発明の実施例１における撮像装置について説明する。図１は撮像装置１の外観図であり、図１（ａ）は俯瞰図、図１（ｂ）は正面図をそれぞれ示している。図２は、撮像装置１のブロック図である。本実施例において、撮像装置１は、画像処理方法を実行し、撮影画像に対して高周波成分を復元しつつノイズ量を低減する。 First, an imaging apparatus according to a first embodiment of the present invention will be described with reference to FIGS. 1 and 2. FIG. 1A and 1B are external views of the imaging device 1, FIG. 1A is a bird's-eye view, and FIG. 1B is a front view. FIG. 2 is a block diagram of the imaging device 1. As shown in FIG. In this embodiment, the imaging apparatus 1 executes an image processing method to restore high-frequency components to the captured image while reducing the amount of noise.

撮像装置１は、広画角な被写体の撮像に用いられる主撮像部１００、および、狭画角な被写体の撮像に用いられる副撮像部１１０を有する。主撮像部１００は、撮像光学系１０１および撮像素子（第一の撮像素子）１０２を備えて構成される。撮像光学系１０１は、１つ以上のレンズ、絞り１０１Ａおよびフォーカスレンズ１０１Ｆ（フォーカシング機構）を備えて構成され、不図示の被写体からの光を撮像素子１０２上に結像させる。また、撮像光学系１０１は、その内部に設けられた１つ以上のレンズが駆動することで焦点距離が変化する変倍光学系であってもよい。なお図１において、撮像光学系１０１は撮像装置１の一部として（撮像装置１と一体的に）構成されているが、一眼レフカメラのように交換式の（撮像装置本体に対して着脱可能な）撮像光学系であってもよい。 The imaging apparatus 1 has a main imaging unit 100 used for imaging a subject with a wide angle of view, and a sub imaging unit 110 used for imaging a subject with a narrow angle of view. The main imaging unit 100 is configured with an imaging optical system 101 and an imaging element (first imaging element) 102 . The imaging optical system 101 includes one or more lenses, an aperture 101A and a focus lens 101F (focusing mechanism), and forms an image of light from a subject (not shown) on the imaging device 102 . Further, the imaging optical system 101 may be a variable magnification optical system in which one or more lenses provided therein are driven to change the focal length. In FIG. 1, the imaging optical system 101 is configured as a part of the imaging device 1 (integrated with the imaging device 1), but it is replaceable (detachable from the imaging device main body) like a single-lens reflex camera. a) It may be an imaging optical system.

撮像素子１０２は、ＣＭＯＳセンサやＣＣＤセンサ等の固体撮像素子であり、撮像光学系１０１を介して形成された光学像（被写体像）を光電変換してアナログ電気信号（画像信号）を出力する。撮像光学系１０１における絞り１０１Ａおよびフォーカスレンズ１０１Ｆの機械的な駆動は、システムコントローラ３０からの制御指示に応じて、撮像制御部４０（フォーカス制御部４１）により行われる。撮像制御部４０は、設定された絞り値（Ｆ値、Ｆナンバー）に応じて、絞り１０１Ａの開口径を制御する。フォーカス制御部４１は、被写体距離に応じてフォーカスレンズ１０１Ｆの位置を制御することで、フォーカス調整を行う。 The imaging device 102 is a solid-state imaging device such as a CMOS sensor or a CCD sensor, and photoelectrically converts an optical image (object image) formed via the imaging optical system 101 to output an analog electrical signal (image signal). Mechanical driving of the diaphragm 101A and the focus lens 101F in the imaging optical system 101 is performed by the imaging control unit 40 (focus control unit 41) according to control instructions from the system controller 30. FIG. The imaging control unit 40 controls the aperture diameter of the diaphragm 101A according to the set diaphragm value (F number, F number). The focus control unit 41 performs focus adjustment by controlling the position of the focus lens 101F according to the subject distance.

Ａ／Ｄコンバータ１０は、撮像素子１０２の光電変換により生成されたアナログ電気信号をデジタル信号に変換し、画像処理部２０に出力する。画像処理部２０は、Ａ／Ｄコンバータ１０から出力されたデジタル信号に対して、画素補間処理、輝度信号処理、および、色信号処理等、いわゆる現像処理を行い、画像を生成する。画像処理部２０により生成された画像は、半導体メモリや光ディスク等の画像記録媒体６０に記録される。また、画像処理部２０により生成された画像は、表示部７０に表示されてもよい。情報入力部５０は、ユーザの操作に従って様々な情報を入力する。様々な情報の一例としては、画像取得時の撮像条件であり、具体的には、主撮像部１００のＦ値やＩＳＯ感度等である。 The A/D converter 10 converts an analog electrical signal generated by photoelectric conversion of the imaging element 102 into a digital signal and outputs the digital signal to the image processing unit 20 . The image processing unit 20 performs so-called development processing such as pixel interpolation processing, luminance signal processing, and color signal processing on the digital signal output from the A/D converter 10 to generate an image. An image generated by the image processing section 20 is recorded in an image recording medium 60 such as a semiconductor memory or an optical disk. Also, the image generated by the image processing section 20 may be displayed on the display section 70 . The information input unit 50 inputs various information according to the user's operation. An example of various information is imaging conditions at the time of image acquisition, specifically, the F value of the main imaging unit 100, ISO sensitivity, and the like.

副撮像部１１０は、撮像光学系１１１および撮像素子（第二の撮像素子）１１２を有する。撮像光学系１１１は、不図示の被写体からの光を撮像素子１１２上に結像させる単焦点撮像光学系である。撮像光学系１１１は、撮像光学系１０１よりも狭い画角（望遠）の光学系である。また撮像光学系１１１は、フォーカスレンズ１１１Ｆを有する。撮像素子１１２の光電変換により生成されたアナログ電気信号（画像信号）は、撮像素子１０２により生成されたアナログ電気信号（画像信号）と同様に扱われ、画像処理部２０は撮像素子１１２から出力された画像信号に基づいて画像を生成する。画像処理部２０で生成された画像は、主撮像部１００と同様に表示部７０に表示することができる。ここで、副撮像部１１０は撮像装置１から着脱可能であってもよく、複数の副撮像部１１０の中から主撮像部１００に適した副撮像部が選択されて撮像装置１に装着されてもよい。 The sub imaging unit 110 has an imaging optical system 111 and an imaging element (second imaging element) 112 . The imaging optical system 111 is a single focus imaging optical system that forms an image of light from a subject (not shown) on the imaging element 112 . The imaging optical system 111 is an optical system with a narrower angle of view (telephoto) than the imaging optical system 101 . The imaging optical system 111 also has a focus lens 111F. An analog electrical signal (image signal) generated by photoelectric conversion of the image sensor 112 is treated in the same manner as an analog electrical signal (image signal) generated by the image sensor 102, and the image processing unit 20 outputs the signal from the image sensor 112. An image is generated based on the obtained image signal. An image generated by the image processing section 20 can be displayed on the display section 70 in the same manner as the main imaging section 100 . Here, the sub-imaging unit 110 may be detachable from the imaging device 1, and the sub-imaging unit suitable for the main imaging unit 100 is selected from among the plurality of sub-imaging units 110 and attached to the imaging device 1. good too.

副撮像部１１０は、主撮像部１００に対して相対的に狭い画角を撮影する望遠撮像部である。また、副撮像部１１０に設けられた撮像素子１１２は、主撮像部１００に設けられた撮像素子１０２と比較して、撮像素子を構成する画素が配置された撮像領域のサイズが小さく、画素サイズ（画素ピッチ）も小さい。すなわち、主撮像部１００で取得される画像（第一の入力画像）は広角画像でノイズ量が少ない画像であり、副撮像部１１０で取得される画像（第二の入力画像）は相対的に望遠画像でノイズ量が多い画像である。 The sub-imaging unit 110 is a telephoto imaging unit that captures a relatively narrow angle of view with respect to the main imaging unit 100 . In addition, compared to the image sensor 102 provided in the main imaging unit 100, the image sensor 112 provided in the sub imaging unit 110 has a smaller imaging area in which the pixels constituting the image sensor are arranged, and the pixel size (pixel pitch) is also small. That is, the image (first input image) acquired by the main imaging unit 100 is a wide-angle image with a small amount of noise, and the image (second input image) acquired by the sub imaging unit 110 is relatively The image is a telephoto image with a large amount of noise.

ここで、本実施例におけるノイズ量について説明する。画像に含まれるノイズ量σ０は、該画像から測定または推定することにより求められる。ノイズが実空間および周波数空間において一様なホワイトガウシアンノイズであるとき、入力画像に含まれるノイズを、以下の式（１）に示すようなＭＡＤ（ＭｅｄｉａｎＡｂｓｏｌｕｔｅＤｅｖｉａｔｉｏｎ）から推定することができる。 Here, the amount of noise in this embodiment will be described. The amount of noise σ0 contained in the image is obtained by measuring or estimating from the image. When the noise is uniform white Gaussian noise in real space and frequency space, the noise included in the input image can be estimated from MAD (Median Absolute Deviation) as shown in Equation (1) below.

ＭＡＤ＝ｍｅｄｉａｎ（｜ｗＨＨ１－ｍｅｄｉａｎ（ｗＨＨ１）｜）・・・（１）
ＭＡＤは、入力画像をウェーブレット変換して得られたＨＨ１のサブバンド画像におけるウェーブレット係数ｗＨＨ１のメディアン（中央値）を用いて求められる。このとき、標準偏差（ノイズ量σ０）とＭＡＤが以下の式（２）の関係であることから、ノイズ成分の標準偏差を推定することができる。 MAD=median(|wHH1−median(wHH1)|) (1)
The MAD is obtained using the median (central value) of the wavelet coefficients wHH1 in the HH1 subband image obtained by wavelet transforming the input image. At this time, the standard deviation of the noise component can be estimated from the relationship between the standard deviation (noise amount σ0) and the MAD expressed by the following equation (2).

σ０＝ＭＡＤ／０．６７４５・・・（２）
なお、画像のウェーブレット変換では、画像の水平方向に対してウェーブレット変換を行って低周波成分と高周波成分に分解し、さらに分解して得られた低周波成分と高周波成分の垂直方向に対してウェーブレット変換を行う。ウェーブレット変換により、画像は４分割され、周波数帯域が互いに異なる４つのサブバンド画像に周波数分解される。このとき、左上の低周波帯域成分（スケーリング係数）のサブバンド画像をＬＬ１とし、右下の高周波帯域成分（ウェーブレット係数）のサブバンド画像をＨＨ１という。また、右上（ＨＬ１）と左下（ＬＨ１）のサブバンド画像はそれぞれ、水平方向に高周波帯域成分をとって垂直方向に低周波帯域成分を取り出したもの、および水平方向に低周波帯域成分をとって垂直方向に高周波帯域成分を取り出したものである。 σ0=MAD/0.6745 (2)
In the wavelet transform of an image, wavelet transform is applied to the horizontal direction of the image to decompose it into low-frequency components and high-frequency components. do the conversion. By wavelet transform, the image is divided into four parts and frequency-decomposed into four sub-band images having different frequency bands. At this time, the subband image of the upper left low frequency band component (scaling coefficient) is called LL1, and the subband image of the lower right high frequency band component (wavelet coefficient) is called HH1. The upper right (HL1) and lower left (LH1) subband images are obtained by taking the high frequency band components in the horizontal direction and extracting the low frequency band components in the vertical direction, and by taking the low frequency band components in the horizontal direction. High-frequency band components are taken out in the vertical direction.

また、副撮像部１１０に設けられた撮像素子１１２と主撮像部１００に設けられた撮像素子１０２の画素数は同じである。すなわち、同じ被写体領域（画角内）では、主撮像部１００よりも副撮像部１１０で取得される画像は相対的に解像度が高い（高周波成分の再現性が高い）画像である。副撮像部１１０で取得される画像は、主撮像部１００で取得される画像よりも被写体の高周波成分を多く含んでいる画像である、とも言える。 Also, the number of pixels of the imaging device 112 provided in the sub-imaging unit 110 and the imaging device 102 provided in the main imaging unit 100 are the same. That is, in the same subject area (within the angle of view), the image acquired by the sub imaging section 110 is an image with relatively higher resolution (higher reproducibility of high frequency components) than that of the main imaging section 100 . It can also be said that the image acquired by the sub-imaging unit 110 contains more high-frequency components of the subject than the image acquired by the main imaging unit 100 .

画像処理部２０は、入力された画像を用いて、高周波成分の復元処理およびノイズ低減処理（合わせて補正処理ともいう）を行う。画像処理部２０は、学習部２１および補正部２２を有する。補正部２２は、取得部２２ａおよび算出部２２ｂを有する。また画像処理部２０は、補正処理を実行する際に、メモリ（記憶部）８０に記憶された学習情報を呼び出して使用する。なお、補正処理に関する詳細は後述する。 The image processing unit 20 uses the input image to perform high-frequency component restoration processing and noise reduction processing (collectively referred to as correction processing). The image processing section 20 has a learning section 21 and a correction section 22 . The correction unit 22 has an acquisition unit 22a and a calculation unit 22b. Further, the image processing unit 20 calls and uses the learning information stored in the memory (storage unit) 80 when executing the correction processing. Details of the correction process will be described later.

補正された画像等の出力画像は、液晶ディスプレイ等の表示部７０に表示され、または画像記録媒体６０に保存される。ただし、撮影画像を画像記録媒体６０に保存し、任意のタイミングで補正を行ってもよい。また、撮影画像は動画でもよく、この場合、各フレームに対して補正を行う。以上の一連の制御は、システムコントローラ３０によって行われる。 An output image such as a corrected image is displayed on a display unit 70 such as a liquid crystal display or saved in an image recording medium 60 . However, the captured image may be stored in the image recording medium 60 and corrected at any timing. Also, the captured image may be a moving image, and in this case, each frame is corrected. The above series of controls are performed by the system controller 30 .

次に、図４を参照して、画像処理部２０で行われる高周波成分の復元処理およびノイズ低減処理（画像の補正処理）について説明する。図４は、高周波成分の復元処理およびノイズ低減処理を示すフローチャートである。図４の各ステップは、主に、システムコントローラ３０の指令に基づいて画像処理部２０（補正部２２）により実行される。なお、高周波成分の復元処理およびノイズ低減処理の際には事前に学習された学習情報を用いるが、学習に関する詳細は後述する。 Next, high-frequency component restoration processing and noise reduction processing (image correction processing) performed by the image processing unit 20 will be described with reference to FIG. FIG. 4 is a flowchart showing high-frequency component restoration processing and noise reduction processing. Each step in FIG. 4 is mainly executed by the image processing section 20 (correction section 22 ) based on commands from the system controller 30 . Note that learning information that has been learned in advance is used in restoration processing and noise reduction processing for high-frequency components, but the details of learning will be described later.

まずステップＳ１０１において、画像処理部２０（補正部２２）は、低解像度かつノイズ量の少ない第一の入力画像と高解像度かつノイズ量の多い第二の入力画像（２つの入力画像）、および、学習情報を取得する。学習情報とは、２つの撮影画像と高周波成分が復元されてノイズ量が低減された画像とを結び付けるために予め学習部２１により学習された情報である。 First, in step S101, the image processing unit 20 (correction unit 22) generates a first input image with a low resolution and a small amount of noise, a second input image with a high resolution and a large amount of noise (two input images), and Get learning information. The learning information is information learned in advance by the learning unit 21 in order to associate the two captured images with the image in which the high-frequency components are restored and the noise amount is reduced.

続いてステップＳ１０２において、補正部２２（取得部２２ａ）は、２つの入力画像から部分画像を取得する。すなわち補正部２２（取得部２２ａ）は、第一の入力画像から、第一の入力画像の一部である第一の部分領域に基づく第一の画像、および、第二の入力画像から、第二の入力画像の一部である第二の部分領域に基づく第二の画像を取得する。高周波成分の復元処理およびノイズ量の低減処理は、部分領域（第一の部分領域、第二の部分領域）を単位として（部分領域ごとに）行われる。本実施例において、第一の部分領域および第二の部分領域は、第一の入力画像および第二の入力画像のそれぞれ対応する領域、すなわち同一被写体領域に対応する領域である。なお第一の部分領域および第二の部分領域はそれぞれ、第一の入力画像および第二の入力画像の全てであってもよい。 Subsequently, in step S102, the correction unit 22 (acquisition unit 22a) acquires partial images from the two input images. That is, the correction unit 22 (acquisition unit 22a) converts the first input image into a first image based on a first partial region that is a part of the first input image, and the second input image into a second image. Obtaining a second image based on a second subregion that is part of the second input image. High-frequency component restoration processing and noise amount reduction processing are performed in units of partial regions (first partial region, second partial region) (each partial region). In this embodiment, the first partial area and the second partial area are areas corresponding to the first input image and the second input image, respectively, that is, areas corresponding to the same subject area. Note that the first partial area and the second partial area may be all of the first input image and the second input image, respectively.

続いてステップＳ１０３において、補正部２２は、学習情報、第一の部分領域、第二の部分領域を用いて、高周波成分の復元およびノイズ量の低減（補正処理）が行われた部分領域である補正部分領域を生成する。ここで、図３を参照して、補正処理の詳細に関して説明する。図３は、ディープラーニングの１つであるＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）のネットワーク構造を示している。 Subsequently, in step S103, the correction unit 22 uses the learning information, the first partial region, and the second partial region to restore the high-frequency component and reduce the noise amount (correction processing). Generate a correction partial area. Here, details of the correction process will be described with reference to FIG. FIG. 3 shows the network structure of CNN (Convolutional Neural Network), which is one of deep learning.

ＣＮＮは、複数の層構造になっており、各層で学習情報を用いた線型変換と非線型変換とが実行される。ｎを１からＮまでの整数とするとき、ｎ番目の層を第ｎ層、第ｎ層における線型変換と非線型変換とをそれぞれ、第ｎ線型変換と第ｎ非線型変換と呼称する。ただし、Ｎは２以上の整数である。部分領域２０１に関しては、第１層において、複数のフィルタ２０２のそれぞれとのコンボリューション（複数の線型関数による第１線型変換）が実行される。その後、活性化関数（ＡｃｔｉｖａｔｉｏｎＦｕｎｃｔｉｏｎ）と呼ばれる非線型関数を用いて変換（第１非線型変換）が実行される。図３において、活性化関数をＡＦとして示している。また、部分領域２０１が複数枚描画されているのは、入力画像（撮影画像）が複数のチャンネルを有するためである。本実施例において、各部分領域はＲＧＢ（Ｒｅｄ、Ｇｒｅｅｎ、Ｂｌｕｅ）の３チャンネルを有する。ただし、チャンネルの数はこれに限定されるものではない。また、部分領域が複数のチャンネルを有していても、１チャンネルずつ個別にＣＮＮへ入力しても構わない。 A CNN has a multi-layered structure, and linear transformation and non-linear transformation using learning information are executed in each layer. When n is an integer from 1 to N, the n-th layer is called the n-th layer, and the linear transformation and the non-linear transformation in the n-th layer are called the n-th linear transformation and the n-th non-linear transformation, respectively. However, N is an integer of 2 or more. Concerning the partial region 201, convolution (first linear transformation by a plurality of linear functions) with each of the plurality of filters 202 is performed in the first layer. A transformation (first nonlinear transformation) is then performed using a nonlinear function called the Activation Function. In FIG. 3 the activation function is indicated as AF. A plurality of partial regions 201 are drawn because the input image (captured image) has a plurality of channels. In this embodiment, each partial area has three channels of RGB (Red, Green, Blue). However, the number of channels is not limited to this. Also, even if the partial area has a plurality of channels, each channel may be individually input to the CNN.

フィルタ２０２は複数存在する。補正部２２は、複数のフィルタ２０２のそれぞれと部分領域２０１とのコンボリューションを個別に算出する。フィルタ２０２の係数は、学習情報に基づいて決定される。学習情報は、フィルタ２０２の係数（フィルタ係数）そのもの、または、フィルタ２０２を所定の関数でフィッティングした際の係数でもよい。フィルタ２０２のそれぞれのチャンネル数は、部分領域２０１の数と一致する。部分領域２０１のチャンネル数が２以上の場合、３次元フィルタとなる（３次元目がチャンネル数を表す）。また、コンボリューションの結果に対して、学習情報から決定される定数（負もとり得る）を加算してもよい。 A plurality of filters 202 exist. The correction unit 22 individually calculates the convolution of each of the plurality of filters 202 and the partial region 201 . The coefficients of filter 202 are determined based on the learning information. The learning information may be the coefficients (filter coefficients) of the filter 202 themselves, or the coefficients obtained by fitting the filter 202 with a predetermined function. The number of channels in each filter 202 matches the number of sub-regions 201 . When the number of channels in the partial area 201 is two or more, the filter becomes a three-dimensional filter (the third dimension represents the number of channels). Alternatively, a constant (possibly negative) determined from learning information may be added to the result of convolution.

活性化関数ｆ（ｘ）の例として、以下の式（１）～（３）が挙げられる。 Examples of the activation function f(x) include the following equations (1) to (3).

式（１）はシグモイド関数、式（２）はハイパボリックタンジェント関数、式（３）はＲｅＬＵ（ＲｅｃｔｉｆｉｅｄＬｉｎｅａｒＵｎｉｔ）と呼ばれる。式（３）中のｍａｘは、引数のうち最大値を出力するＭＡＸ関数を表す。式（１）～（３）に示される活性化関数ｆ（ｘ）は、全て単調増加関数である。また、活性化関数としてＭａｘｏｕｔを使用してもよい。Ｍａｘｏｕｔは、第ｎ線型変換の出力である複数の画像のうち、各画素で最大値である信号値を出力するＭＡＸ関数である。 Equation (1) is called a sigmoid function, Equation (2) is called a hyperbolic tangent function, and Equation (3) is called a ReLU (Rectified Linear Unit). max in equation (3) represents a MAX function that outputs the maximum value of the arguments. The activation functions f(x) shown in equations (1)-(3) are all monotonically increasing functions. Also, Maxout may be used as the activation function. Maxout is a MAX function that outputs the maximum signal value for each pixel in a plurality of images output from the n-th linear transformation.

図３において、第１線型変換および第１非線型変換が施された部分領域を、第１変換部分領域２０３と呼称する。第１変換部分領域２０３の各チャンネル成分は、部分領域２０１と複数のフィルタ２０２のそれぞれとのコンボリューションから生成される。このため、第１変換部分領域２０３のチャンネル数は、フィルタ２０２の数と同じになる。 In FIG. 3 , the partial area subjected to the first linear transformation and the first nonlinear transformation is called a first transformed partial area 203 . Each channel component of the first transform sub-region 203 is generated from the convolution of the sub-region 201 with each of the plurality of filters 202 . Therefore, the number of channels in the first conversion partial area 203 is the same as the number of filters 202 .

第２層では、第１変換部分領域２０３に対して、第１層と同様に学習情報から決定される複数のフィルタ２０４とのコンボリューション（第２線型変換）と、活性化関数による非線型変換（第２非線型変換）とを行う。第２層で用いられるフィルタ２０４は、一般的に、第１層で用いられるフィルタ２０２と同一ではない。フィルタ２０４のサイズや数も、フィルタ２０４と一致しなくてもよい。ただし、フィルタ２０４のチャンネル数と第１変換部分領域２０３のチャンネル数とは互いに一致する。補正部２２は、同様の演算を第Ｎ層まで繰り返す（第ｎ線型変換および第ｎ非線型変換（ｎ＝１～Ｎ）を実行する）ことにより、中間データ２１０を取得する。 In the second layer, the first transformation partial region 203 is subjected to convolution (second linear transformation) with a plurality of filters 204 determined from learning information in the same manner as in the first layer, and nonlinear transformation using an activation function. (second nonlinear transformation). The filter 204 used in the second layer is generally not the same as the filter 202 used in the first layer. The size and number of filters 204 also need not match the filters 204 . However, the number of channels of the filter 204 and the number of channels of the first conversion partial area 203 match each other. The correction unit 22 acquires the intermediate data 210 by repeating similar calculations up to the Nth layer (executing the nth linear transformation and the nth nonlinear transformation (n=1 to N)).

最後に、第Ｎ＋１層において、中間データ２１０と複数のフィルタ２１１のそれぞれとのコンボリューションに定数を加算すること（第Ｎ＋１線型変換）により、補正された補正部分領域２１２が取得される。ここで用いられるフィルタ２１１および定数もそれぞれ、学習情報に基づいて決定される。補正部分領域２１２のチャンネル数は、部分領域２０１と同じである。このため、フィルタ２１１の数も部分領域２０１のチャンネル数と同じである。補正部分領域２１２の各チャンネルの成分は、中間データ２１０とフィルタ２１１のそれぞれ（フィルタ２１１が一つの場合もある）とのコンボリューションを含む演算から求められる。なお、部分領域２０１と補正部分領域２１２とのサイズは互いに一致しなくてもよい。コンボリューションの際に、部分領域２０１の外側にはデータが存在しないため、データの存在する領域のみで演算すると、コンボリューション結果はサイズが小さくなる。ただし、周期境界条件などを設定することにより、サイズを保つこともできる。 Finally, in the N+1-th layer, a corrected partial region 212 is obtained by adding a constant to the convolution of the intermediate data 210 and each of the plurality of filters 211 (N+1-th linear transformation). The filter 211 and the constants used here are also each determined based on the learning information. The number of channels of the correction partial area 212 is the same as that of the partial area 201 . Therefore, the number of filters 211 is also the same as the number of channels of the partial area 201 . The components of each channel of the correction partial area 212 are obtained from operations including convolution between the intermediate data 210 and each of the filters 211 (there may be one filter 211). Note that the sizes of the partial area 201 and the size of the correction partial area 212 do not have to match each other. Since there is no data outside the partial area 201 during convolution, the size of the convolution result will be reduced if the calculation is performed only in the area where the data exists. However, the size can be maintained by setting a periodic boundary condition or the like.

ディープラーニングが高い性能を発揮できる理由は、非線型変換を多層構造によって何度も実行することにより、高い非線型性が得られるためである。仮に、非線型変換を担う活性化関数が存在せず、線型変換のみでネットワークが構成されていた場合、いくら多層にしてもそれと等価な単層の線型変換が存在するため、多層構造にする意味がない。ディープラーニングは、より多層にする方が強い非線型を得られるため、高い性能が出やすいと言われている。一般に、少なくとも３層以上を有する場合がディープラーニングと呼ばれる。 The reason why deep learning can exhibit high performance is that high nonlinearity can be obtained by executing nonlinear transformation many times with a multi-layered structure. If the activation function responsible for the nonlinear transformation does not exist and the network is composed only of linear transformations, no matter how many layers there are, there is an equivalent single-layer linear transformation. There is no Deep learning is said to be more likely to achieve high performance because the more layers it has, the more strongly nonlinear it can be obtained. In general, the case of having at least three layers is called deep learning.

続いて、図４のステップＳ１０４において、補正部２２は、第一の入力画像および第二の入力画像のうち所定の領域の全てに対して、高周波成分の復元処理およびノイズ量の低減処理（補正処理、すなわち補正部分領域の生成）が完了したか否かを判定する。所定の領域の全てに対する補正処理が完了していない場合、ステップＳ１０２へ戻り、補正部２２は、まだ補正されていない部分領域を撮影画像から取得する。一方、所定の領域の全てに対して補正処理が完了した場合（所定の領域の全てに関して、補正された補正部分領域が生成されている場合）、ステップＳ１０５へ進む。 Subsequently, in step S104 in FIG. 4, the correction unit 22 performs high-frequency component restoration processing and noise amount reduction processing (correction processing) on all predetermined regions of the first input image and the second input image. A determination is made as to whether or not the processing (that is, the generation of the correction partial area) has been completed. If correction processing has not been completed for all of the predetermined regions, the process returns to step S102, and the correction unit 22 acquires partial regions that have not yet been corrected from the captured image. On the other hand, if the correction process has been completed for all of the predetermined regions (if corrected partial regions have been generated for all of the predetermined regions), the process proceeds to step S105.

ステップＳ１０５において、補正部２２（算出部２２ｂ）は、高周波成分の復元処理およびノイズ量の低減処理がなされた画像（補正画像）を出力する。補正画像は、生成された補正部分領域を合成することで生成される。ただし、部分領域が撮影画像（入力画像）の全体である場合、補正部分領域をそのまま高周波成分の復元処理およびノイズ量の低減処理がなされた画像とする。 In step S105, the correction unit 22 (calculation unit 22b) outputs an image (corrected image) that has been subjected to high-frequency component restoration processing and noise amount reduction processing. A corrected image is generated by synthesizing the generated corrected partial areas. However, if the partial area is the entire captured image (input image), the corrected partial area is the image that has undergone high-frequency component restoration processing and noise amount reduction processing.

以上の処理により、撮影画像の高周波成分を復元するとともにノイズ量を低減した画像を得ることができる。本実施例では、第一の入力画像および第二の入力画像の同一被写体領域のみが補正される。すなわち、第二の入力画像と同じまたはそれ以下の画角の画像が出力画像として生成される。換言すると、撮像部サイズの大きい主撮像部１００と撮像部サイズが小さい副撮像部１１０の２つの撮像部からの撮影画像を用いて、副撮像部１１０と同等の画角の高画質な望遠撮影画像を出力することが可能である。通常撮影により高画質な望遠撮影画像を得るには、主撮像部１００の撮像光学系１０１を副撮像部１１０の撮像光学系１１１と同等画角の望遠レンズとする必要があるが、一般的に撮像領域の大きい撮像素子に対応する望遠レンズはサイズが大型となる。本実施例によれば、ノイズ量の少ない撮像が可能な主撮像部１００と望遠撮像が可能な小型の副撮像部１１０とを備え、前述の画像処理方法を実行することで、装置サイズを小さく保ったまま高画質な望遠撮影画像を出力することができる。 By the above processing, it is possible to obtain an image in which the high frequency components of the photographed image are restored and the amount of noise is reduced. In this embodiment, only the same object regions of the first input image and the second input image are corrected. That is, an image with an angle of view equal to or smaller than that of the second input image is generated as the output image. In other words, high-quality telephoto shooting with an angle of view equivalent to that of the sub-imaging unit 110 is performed using images captured by two imaging units, the main imaging unit 100 having a large imaging unit size and the sub-imaging unit 110 having a small imaging unit size. Images can be output. In order to obtain a high-quality telephoto image by normal shooting, the imaging optical system 101 of the main imaging unit 100 needs to be a telephoto lens with an angle of view equivalent to that of the imaging optical system 111 of the sub imaging unit 110. A telephoto lens corresponding to an imaging device having a large imaging area is large in size. According to this embodiment, the main imaging unit 100 capable of imaging with a small amount of noise and the small sub-imaging unit 110 capable of telephoto imaging are provided, and the image processing method described above is executed, thereby reducing the device size. It is possible to output a high-quality telephoto image while maintaining it.

次に、図５を参照して、本実施例における学習情報の学習（学習済みモデルの製造方法）について説明する。図５は、学習情報の学習を示すフローチャートである。図５の各ステップは、主に、撮像装置１（画像処理部２０）の学習部２１により行われる。ただし本実施例はこれに限定されるものではなく、学習情報の学習は、高周波成分の復元およびノイズ量の低減を実行する前であれば、撮像装置１とは別の装置（演算装置）に設けられた学習部で行ってもよい。本実施例では、撮像装置１の学習部２１が学習情報を学習する場合について説明する。 Next, with reference to FIG. 5, learning of learning information (method of manufacturing a learned model) in this embodiment will be described. FIG. 5 is a flow chart showing learning of learning information. Each step in FIG. 5 is mainly performed by the learning unit 21 of the imaging device 1 (image processing unit 20). However, the present embodiment is not limited to this, and the learning of the learning information can be performed by a device (arithmetic device) other than the imaging device 1, as long as it is before the restoration of the high-frequency component and the reduction of the noise amount. It may be done in the provided study section. In this embodiment, a case where the learning unit 21 of the imaging device 1 learns learning information will be described.

まずステップＳ２０１において、学習部２１は、少なくとも一組の学習画像を取得する。一組の学習画像は、同一の被写体が存在する複数の画像であって、広角画像でノイズ量が少ない第一の入力学習画像、望遠画像でノイズ量が多い第二の入力学習画像、および、望遠画像でノイズ量が少ない出力学習画像を含む。第一の入力学習画像および第二の入力学習画像は、出力学習画像と一対一に対応していてもよいし、一枚の出力学習画像に対して複数枚存在していてもよい。後者の場合、第一の入力学習画像および第二の入力学習画像は、ノイズ量等が異なる複数の画像である。 First, in step S201, the learning unit 21 acquires at least one set of learning images. The set of learning images is a plurality of images in which the same subject exists, and is a first input learning image that is a wide-angle image with a small amount of noise, a second input learning image that is a telephoto image and has a large amount of noise, and Includes output training images that are telephoto images and have a low amount of noise. The first input learning image and the second input learning image may have a one-to-one correspondence with the output learning image, or a plurality of images may exist for one output learning image. In the latter case, the first input learning image and the second input learning image are a plurality of images with different noise amounts and the like.

学習画像を用意する方法として、シミュレーションや実写画像を用いることができる。シミュレーションを行う場合、出力学習画像に対して、撮像部の画質劣化要因を考慮した撮像シミュレーションを行うことで入力学習画像を生成すればよい。実写画像を用いる場合、同一の被写体を撮像装置１の主撮像部１００と副撮像部１１０において同じ条件で撮影した画像を使用すればよい。なお、学習画像には様々な特徴を有する被写体が含まれていることが好ましい。学習画像に含まれない特徴を有する画像は、高精度に補正することができないためである。 As a method of preparing learning images, simulations and photographed images can be used. When performing the simulation, an input learning image may be generated by performing an imaging simulation on the output learning image in consideration of the image quality deterioration factor of the imaging unit. When a photographed image is used, an image of the same subject photographed under the same conditions by the main imaging section 100 and the sub-imaging section 110 of the imaging apparatus 1 may be used. Note that the learning images preferably include subjects having various characteristics. This is because images having features not included in the learning images cannot be corrected with high accuracy.

続いてステップＳ２０２において、学習部２１は、ステップＳ２０１にて取得した学習画像から、複数の学習ペアを取得する。学習ペアは、学習部分領域（学習領域）と学習補正部分領域とからなる。学習補正部分領域は、第一の入力学習画像および第二の入力学習画像から取得され、そのサイズはステップＳ１０２にて取得した撮影画像の部分領域と同じである。学習部分領域は、出力学習画像から取得され、学習部分領域の中心は画像において学習補正部分領域の中心と同じ位置である。そのサイズは、ステップＳ１０３にて生成された補正部分領域と同じである。前述と同様に、学習部分領域と学習補正部分領域のペア（学習ペア）は、一対一に対応している必要はない。一つの学習補正部分領域と、複数の学習部分領域とがペア（グループ）になっていてもよい。 Subsequently, in step S202, the learning unit 21 acquires a plurality of learning pairs from the learning images acquired in step S201. A learning pair consists of a learning partial area (learning area) and a learning correction partial area. The learning correction partial area is obtained from the first input learning image and the second input learning image, and has the same size as the partial area of the captured image obtained in step S102. A training sub-region is obtained from the output training image, and the center of the training sub-region is at the same position in the image as the center of the training correction sub-region. Its size is the same as the corrected partial area generated in step S103. As described above, pairs of learning partial areas and learning correction partial areas (learning pairs) do not need to correspond one-to-one. One learning correction partial area and a plurality of learning partial areas may be paired (grouped).

続いてステップＳ２０３において、学習部２１は、複数の学習ペア（学習部分領域と学習補正部分領域）から、学習情報を学習によって取得（生成）する。学習では、高周波成分の復元とノイズ量の低減を実行するネットワーク構造と同じネットワーク構造を使用する。本実施例では、図３に示されるネットワーク構造に対して学習補正部分領域を入力し、その出力結果と学習部分領域との誤差を算出する。この誤差が最小となるように、例えば誤差逆伝播法（Ｂａｃｋｐｒｏｐａｇａｔｉｏｎ）等を用いて、第１乃至Ｎ＋１層で用いる複数のフィルタのそれぞれの係数や加算する定数などのパラメータ（学習情報）を更新して最適化する。各フィルタの係数および定数の初期値は任意に設定することができ、例えば乱数から決定される。または、各層ごとに初期値を事前学習するＡｕｔｏＥｎｃｏｄｅｒ等のプレトレーニングを行ってもよい。 Subsequently, in step S203, the learning unit 21 acquires (generates) learning information from a plurality of learning pairs (learning partial areas and learning correction partial areas) through learning. Training uses the same network structure that performs high-frequency restoration and noise reduction. In this embodiment, a learning correction partial area is input to the network structure shown in FIG. 3, and an error between the output result and the learning partial area is calculated. In order to minimize this error, parameters (learning information) such as the coefficients of the multiple filters used in the 1st to N+1 layers and constants to be added are updated using, for example, the error backpropagation method. optimized for The initial values of the coefficients and constants of each filter can be arbitrarily set, and are determined from random numbers, for example. Alternatively, pre-training such as Auto Encoder that pre-learns initial values for each layer may be performed.

学習ペアの全てをネットワーク構造へ入力し、それら全ての情報を使って学習情報を更新する手法をバッチ学習と呼ぶ。ただし、この学習方法は、学習ペアの数が増えるにつれて計算負荷が膨大になる。一方、学習情報の更新に一つの学習ペアのみを使用し、更新ごとに異なる学習ペアを使用する学習手法をオンライン学習と呼ぶ。この手法は、学習ペアが増えても計算量が増大しないが、一つの学習ペアに存在するノイズの影響を大きく受ける。このため、これら２つの手法の中間に位置するミニバッチ法を用いて学習することが好ましい。ミニバッチ法は、全学習ペアの中から少数を抽出し、それらを用いて学習情報の更新を行う。次の更新では、異なる小数の学習ペアを抽出して使用する。これを繰り返すことにより、バッチ学習とオンライン学習の不利な点を小さくすることができ、高い補正効果を得やすくなる。 A method of inputting all learning pairs into a network structure and updating learning information using all the information is called batch learning. However, this learning method has a huge computational load as the number of learning pairs increases. On the other hand, a learning method that uses only one learning pair for updating learning information and uses a different learning pair for each update is called online learning. Although this method does not increase the amount of calculation even if the number of learning pairs increases, it is greatly affected by noise that exists in one learning pair. Therefore, it is preferable to learn using the mini-batch method, which is located between these two methods. The mini-batch method extracts a small number of pairs from all learning pairs and uses them to update learning information. In the next update, we will extract and use a different fractional training pair. By repeating this process, the disadvantages of batch learning and online learning can be reduced, making it easier to obtain a high correction effect.

続いてステップＳ２０４において、学習部２１は、学習された学習情報を出力する。本実施例において、学習情報はメモリ８０に記憶される。以上の処理により、高周波成分の復元とノイズ量の低減を実行するための学習情報を学習することができる。すなわち、高周波成分の復元とノイズ量の低減を実行するための学習済みモデルを製造することができる。 Subsequently, in step S204, the learning unit 21 outputs the learned learning information. In this embodiment, the learning information is stored in memory 80 . Through the above processing, it is possible to learn learning information for restoring high-frequency components and reducing the amount of noise. That is, it is possible to manufacture a trained model for restoring high-frequency components and reducing the amount of noise.

また、以上の処理に加えて、ＣＮＮの性能を向上させる工夫を併用してもよい。例えば、ロバスト性の向上のためネットワークの各層において、ドロップアウト（Ｄｒｏｐｏｕｔ）やダウンサンプリングであるプーリング（ｐｏｏｌｉｎｇ）を行ってもよい。または、学習精度の向上のため、学習画像の画素の平均値を０、分散を１に正規化し、隣接する画素の冗長性をなくすＺＣＡホワイトニング（ＺＣＡｗｈｉｔｅｎｉｎｇ）などを併用してもよい。 Also, in addition to the above processing, a device for improving the performance of the CNN may be used together. For example, in order to improve robustness, pooling, which is dropout or downsampling, may be performed in each layer of the network. Alternatively, in order to improve the learning accuracy, ZCA whitening, which normalizes the average value of the pixels of the learning image to 0 and the variance to 1 to eliminate the redundancy of adjacent pixels, may be used together.

図６は本実施例における画像処理結果であり、図６（ａ）は第一の画像、図６（ｂ）は第二の画像、図６（ｃ）は本実施例による画像処理により得られる出力画像、図６（ｄ）は正解画像をそれぞれ示す。また、図６（ａ）～（ｄ）の全ての画像は２５６×２５６画素のモノクロ画像であり、画素値は［０１］の範囲になるように規格化されている。なお、全ての画像は実写画像である。 FIG. 6 shows the result of image processing in this embodiment, where FIG. 6(a) is the first image, FIG. 6(b) is the second image, and FIG. 6(c) is obtained by the image processing in this embodiment. An output image and FIG. 6(d) show a correct image, respectively. All the images in FIGS. 6(a) to 6(d) are monochrome images of 256×256 pixels, and the pixel values are normalized to be in the range of [0 1]. Note that all the images are actual images.

図７は、本実施例における画像処理の数値計算結果であり、第一の画像（メイン画像）、第二の画像（サブ画像）、および、本実施例の画像処理により得られる出力画像のそれぞれ画質を、画質評価指標ＳＳＩＭで表している。なお、ＳＳＩＭは０～１の値をとり、１に近いほど正解画像と類似していることを意味する。図７より、第一の画像および第二の画像のそれぞれよりも、本実施例の画像処理により得られる出力画像の方が画質評価指数ＳＳＩＭの値が１に近いことが分かる。このため、ＣＮＮを用いて、ノイズ量の少ない広角画像とノイズ量の多い望遠画像とに基づいて、高周波成分が復元されノイズ量が低減された望遠画像へ変換できることが定量的に分かる。 FIG. 7 shows numerical calculation results of the image processing in this embodiment, showing the first image (main image), the second image (sub-image), and the output image obtained by the image processing in this embodiment. The image quality is represented by the image quality evaluation index SSIM. Note that SSIM takes a value between 0 and 1, and the closer to 1, the more similar it is to the correct image. From FIG. 7, it can be seen that the value of the image quality evaluation index SSIM of the output image obtained by the image processing of this embodiment is closer to 1 than each of the first image and the second image. Therefore, it is possible to quantitatively convert a wide-angle image with a small amount of noise and a telephoto image with a large amount of noise into a telephoto image in which high-frequency components are restored and the amount of noise is reduced using CNN.

本実施例によれば、撮影画像の高周波成分の復元処理およびノイズ量の低減処理を実行することが可能な撮像装置を提供することができる。 According to this embodiment, it is possible to provide an imaging apparatus capable of executing high-frequency component restoration processing and noise amount reduction processing of a captured image.

次に、図８および図９を参照して、本発明の画像処理方法を画像処理システムに適用した実施例２に関して説明する。本実施例では、撮影画像を補正する画像処理装置、撮影画像を取得する撮像装置、および、学習を行うサーバが個別に存在する。また本実施例では、撮影に使用された撮像装置の種類を判別することにより、使用する学習情報を切り替える。撮影に使用されたそれぞれの撮像装置の組み合わせに対して個別に学習情報を準備しておくことで、より高精度な画像補正が可能となる。 Next, Embodiment 2 in which the image processing method of the present invention is applied to an image processing system will be described with reference to FIGS. 8 and 9. FIG. In the present embodiment, an image processing device that corrects a captured image, an imaging device that acquires the captured image, and a server that performs learning exist separately. Also, in this embodiment, the learning information to be used is switched by determining the type of imaging device used for shooting. By individually preparing learning information for each combination of imaging devices used for imaging, more highly accurate image correction becomes possible.

図８は、画像処理システム２００のブロック図である。図９は、画像処理システム２００の外観図である。図８および図９に示されるように、画像処理システム２００は、複数の撮像装置３００、画像処理装置３０１、サーバ３０５、表示装置３０８、記録媒体３０９、および、出力装置３１０を備えて構成される。 FIG. 8 is a block diagram of the image processing system 200. As shown in FIG. FIG. 9 is an external view of the image processing system 200. As shown in FIG. As shown in FIGS. 8 and 9, the image processing system 200 includes a plurality of imaging devices 300, an image processing device 301, a server 305, a display device 308, a recording medium 309, and an output device 310. .

撮像装置３００は、複数の撮像装置３００ａ、３００ｂ、…、３００ｎを含む。本実施例では、例えば、複数の撮像装置３００のうち一般的な一眼レフカメラを実施例１で説明した主撮像部とし、コンパクトカメラを副撮像部とすることができる。また、副撮像部として、スマートフォン等に搭載される小型のカメラとしてもよい。また本実施例において、複数の撮像装置として２つの撮像装置を用いる場合に限定されるものではなく、３つ以上の撮像装置を用いることも可能である。 The imaging device 300 includes a plurality of imaging devices 300a, 300b, . . . , 300n. In this embodiment, for example, among the plurality of imaging devices 300, a general single-lens reflex camera can be used as the main imaging section described in the first embodiment, and a compact camera can be used as the sub-imaging section. Also, a small camera mounted on a smartphone or the like may be used as the sub-imaging unit. In addition, in this embodiment, the plurality of imaging devices is not limited to the case of using two imaging devices, and it is also possible to use three or more imaging devices.

複数の撮像装置３００ａ～３００ｎを用いて撮影された撮影画像（入力画像）は、画像処理装置３０１に設けられた記憶部３０２に記憶される。画像処理装置３０１は、ネットワーク３０４と有線または無線で接続されており、ネットワーク３０４を介してサーバ３０５にアクセスすることができる。サーバ３０５は、撮影画像の高周波成分を復元しつつノイズ量を低減するための学習情報を学習する学習部３０７と、学習情報を記憶する記憶部３０６とを有する。画像処理装置３０１に設けられた補正部３０３（画像処理部）は、サーバ３０５の記憶部３０６からネットワーク３０４を介して学習情報を取得し、撮影画像の高周波成分を復元しつつノイズ量を低減して出力画像を生成する。生成された出力画像は、表示装置３０８、記録媒体３０９、および、出力装置３１０の少なくとも一つに出力される。表示装置３０８は、例えば液晶ディスプレイやプロジェクタである。ユーザは、表示装置３０８を介して、処理途中の画像を確認しながら作業を行うことができる。記録媒体３０９は、例えば半導体メモリ、ハードディスク、ネットワーク上のサーバである。出力装置３１０は、例えばプリンタである。画像処理装置３０１は、必要に応じて現像処理やその他の画像処理を行う機能を有してもよい。なお、高周波成分の復元、ノイズ低減処理、および、学習情報の学習に関しては、実施例１と同様であるため、それらの説明は省略する。 Captured images (input images) captured using a plurality of imaging devices 300 a to 300 n are stored in a storage unit 302 provided in the image processing device 301 . The image processing apparatus 301 is connected to a network 304 by wire or wirelessly, and can access a server 305 via the network 304 . The server 305 has a learning unit 307 that learns learning information for reducing the amount of noise while restoring high-frequency components of the captured image, and a storage unit 306 that stores the learning information. A correction unit 303 (image processing unit) provided in the image processing apparatus 301 acquires learning information from the storage unit 306 of the server 305 via the network 304 and restores the high frequency components of the captured image while reducing the amount of noise. to generate the output image. The generated output image is output to at least one of display device 308 , recording medium 309 and output device 310 . The display device 308 is, for example, a liquid crystal display or a projector. The user can work while confirming the image being processed through the display device 308 . A recording medium 309 is, for example, a semiconductor memory, a hard disk, or a server on a network. The output device 310 is, for example, a printer. The image processing apparatus 301 may have a function of performing development processing and other image processing as necessary. Restoration of high-frequency components, noise reduction processing, and learning of learning information are the same as those in the first embodiment, so description thereof will be omitted.

このように各実施例において、画像処理方法は、第一の画像を取得する工程、および、第一の画像よりノイズ量が多く、被写体の高周波成分を多く含む第二の画像を取得する工程（Ｓ１０２）を有する。また画像処理方法は、第一の画像および第二の画像をニューラルネットワークに入力して、第一の画像より被写体の高周波成分を多く含み第二の画像よりノイズ量が少ない第三の画像を生成する工程（Ｓ１０３～Ｓ１０５）を有する。 Thus, in each embodiment, the image processing method includes the steps of acquiring a first image, and acquiring a second image that has a larger amount of noise than the first image and contains many high-frequency components of the subject ( S102). In the image processing method, the first image and the second image are input to a neural network to generate a third image containing more high-frequency components of the subject than the first image and having less noise than the second image. (S103 to S105).

好ましくは、第一の画像および第二の画像はそれぞれ、第一の入力画像および第二の入力画像の少なくとも一部である。より好ましくは、第一の入力画像は、第二の入力画像よりも広い画角を有する。より好ましくは、第三の画像の画角は、第二の入力画像の画角以下である。 Preferably, the first image and the second image are at least part of the first input image and the second input image, respectively. More preferably, the first input image has a wider angle of view than the second input image. More preferably, the angle of view of the third image is less than or equal to the angle of view of the second input image.

好ましくは、第二の画像は、第一の画像を取得する際に用いられた撮像素子１０２よりも画素ピッチが小さい撮像素子１１２を用いた撮像により得られた画像である。また好ましくは、第二の画像は、第一の画像を取得する際に用いられた撮像素子１０２よりも撮像領域が小さい撮像素子１１２を用いた撮像により得られた画像である。また好ましくは、ニューラルネットワークは、少なくとも一層の畳み込み層を有する。 Preferably, the second image is an image obtained by imaging using the image sensor 112 having a smaller pixel pitch than the image sensor 102 used to acquire the first image. Moreover, preferably, the second image is an image obtained by imaging using the imaging device 112 having a smaller imaging area than the imaging device 102 used to obtain the first image. Also preferably, the neural network has at least one convolutional layer.

好ましくは、画像処理方法は、予め学習されたニューラルネットワークに関する学習情報を取得する工程を有する。第三の画像を生成する工程は、Ｎを２以上の整数、ｎを１からＮまでの整数とするとき、以下の第一の工程と第二の工程とを有する。第一の工程は、第一の画像および第二の画像に対して、学習情報に基づく複数の線型関数のそれぞれによる第ｎ線型変換と、非線型関数による第ｎ非線型変換とをｎが１からＮになるまで順に実行することで中間データを生成する。第二の工程は、中間データに対して、学習情報に基づく少なくとも一つの線型関数による第Ｎ＋１線型変換を実行する。より好ましくは、学習情報は、同一の被写体が存在する複数の学習画像であって、第一の入力学習画像と、第二の入力学習画像と、出力学習画像とを用いて学習された情報である。ここで第二の入力学習画像は、第一の入力学習画像よりノイズ量が多く、被写体の高周波成分を多く含む画像である。出力学習画像は、第一の入力学習画像より被写体の高周波成分を多く含み第二の画像よりノイズ量が少ない画像である。より好ましくは、第一の入力学習画像および第二の入力学習画像の少なくとも一方は、シミュレーションにより生成された画像である。 Preferably, the image processing method has a step of acquiring learning information about a pretrained neural network. The step of generating the third image includes the following first step and second step, where N is an integer of 2 or more and n is an integer from 1 to N. A first step performs n-th linear transformation by each of a plurality of linear functions based on learning information and n-th nonlinear transformation by a nonlinear function on the first image and the second image, where n is 1. to N until intermediate data is generated. A second step is to perform an N+1th linear transformation on the intermediate data by at least one linear function based on the learning information. More preferably, the learning information is a plurality of learning images in which the same subject exists, and is information learned using a first input learning image, a second input learning image, and an output learning image. be. Here, the second input learning image is an image that has a larger amount of noise than the first input learning image and contains many high-frequency components of the subject. The output learning image is an image that contains more high-frequency components of the subject than the first input learning image and has less noise than the second image. More preferably, at least one of the first input learning image and the second input learning image is an image generated by simulation.

（その他の実施例）
本発明は、画像処理に関する要求を行い実体的にシステム全体の制御を支配する第一の装置（撮像装置、スマートフォン、ＰＣなどのユーザ端末）と、該要求に応じて画像処理を行う第二の装置（サーバ等）で構成された画像処理システムとしても実現可能である。例えば、実施例２の画像処理システム２００における補正部３０３を第二の装置としてのサーバ３０５側に設け、第一の装置としての画像処理装置３０１がサーバ３０５に対して第一の画像および第二の画像を用いた画像処理の実行を要求するように構成しても良い。この場合、第一の装置（ユーザ端末）は画像処理に関する要求を第二の装置（サーバ）に送信するための送信手段を有し、第二の装置（サーバ）は第一の装置（ユーザ端末）から送信された要求を受信する受信手段を有する。 (Other examples)
The present invention provides a first device (user terminal such as an imaging device, a smartphone, a PC, etc.) that makes a request related to image processing and substantively controls the entire system, and a second device that performs image processing according to the request. It can also be implemented as an image processing system configured by devices (servers, etc.). For example, the correction unit 303 in the image processing system 200 of the second embodiment is provided on the server 305 side as the second device, and the image processing device 301 as the first device transmits the first image and the second image to the server 305 . may be configured to request the execution of image processing using the image. In this case, the first device (user terminal) has a transmission means for transmitting a request for image processing to the second device (server), and the second device (server) sends the request to the first device (user terminal). ) for receiving a request sent from the

なお、この場合、第一の処理装置は画像処理の要求と共に第一の画像および第二の画像を第二の装置に送信しても良い。ただし、第二の装置は第一の装置の要求に応じて第一の装置以外の場所（外部記憶装置）に記憶された第一の画像および第二の画像を取得しても良い。また、第二の装置による第一の画像および第二の画像に対する画像処理が行われた後、第二の装置は出力画像を第一の装置に送信するようにしても良い。このように画像処理システムを構成することにより、比較的処理負荷の重い補正部による処理を第二の装置側で行うことが可能となり、ユーザ端末側の負担を低減することが可能となる。 In this case, the first processing device may transmit the first image and the second image to the second device together with the image processing request. However, the second device may acquire the first image and the second image stored in a location (external storage device) other than the first device in response to a request from the first device. Also, after image processing is performed on the first image and the second image by the second device, the second device may transmit the output image to the first device. By configuring the image processing system in this way, it is possible to perform the processing by the correction unit, which has a relatively heavy processing load, on the second device side, and it is possible to reduce the burden on the user terminal side.

また本発明は、上述の実施例の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 Further, the present invention supplies a program that implements one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in the computer of the system or apparatus reads the program. It can also be realized by executing processing. It can also be implemented by a circuit (for example, ASIC) that implements one or more functions.

各実施例によれば、撮影画像の高周波成分を復元しつつノイズ量を低減することが可能な画像処理方法、画像処理装置、撮像装置、プログラム、記憶媒体、画像処理システム、および、学習済みモデルの製造方法を提供することができる。 According to each embodiment, an image processing method, an image processing apparatus, an imaging apparatus, a program, a storage medium, an image processing system, and a learned model that can reduce noise while restoring high-frequency components of a captured image. can provide a manufacturing method of

以上、本発明の好ましい実施形態について説明したが、本発明はこれらの実施形態に限定されず、その要旨の範囲内で種々の変形及び変更が可能である。 Although preferred embodiments of the present invention have been described above, the present invention is not limited to these embodiments, and various modifications and changes are possible within the scope of the gist.

２０画像処理部（画像処理装置）
２２ａ取得部
２２ｂ算出部 20 Image processing unit (image processing device)
22a acquisition unit 22b calculation unit

Claims

obtaining a first image of the subject ;
Acquiring a second image that has more noise than the first image and contains more high-frequency components of the subject than the first image ;
inputting the first image and the second image into a neural network;
using the neural network to generate a third image containing more high-frequency components of the subject than the first image and having less noise than the second image. Image processing method.

2. The image processing method according to claim 1, wherein said first image and said second image are at least parts of a first input image and a second input image, respectively.

3. The image processing method according to claim 2, wherein said first input image has a wider angle of view than said second input image.

4. The image processing method according to claim 3, wherein the angle of view of said third image is equal to or less than the angle of view of said second input image.

1. The second image is an image obtained by imaging using an imaging device having a pixel pitch smaller than that of the imaging device used to acquire the first image. 5. The image processing method according to any one of 4.

1. The second image is an image obtained by imaging using an imaging device having an imaging area smaller than that of the imaging device used to obtain the first image. 6. The image processing method according to any one of 5.

7. The image processing method according to claim 1, wherein said neural network has at least one convolutional layer.

further comprising acquiring learning information about the pre-trained neural network;
In the step of generating the third image, when N is an integer of 2 or more and n is an integer from 1 to N,
n-th linear transformation by each of a plurality of linear functions based on the learning information and n-th nonlinear transformation by a nonlinear function with respect to the first image and the second image, where n is from 1 to N a step of generating intermediate data by sequentially executing until
8. The image processing according to any one of claims 1 to 7, further comprising the step of executing an (N+1)th linear transformation on the intermediate data using at least one linear function based on the learning information. Method.

The learning information includes a plurality of learning images in which the same subject is present, and a first input learning image and a second input learning image having a larger amount of noise than the first input learning image and containing a large amount of high-frequency components of the subject. Information learned using two input learning images and an output learning image containing more high-frequency components of the subject than the first input learning image and having less noise than the second input learning image 9. The image processing method according to claim 8, wherein:

10. The image processing method according to claim 9, wherein at least one of said first input learning image and said second input learning image is an image generated by simulation.

obtaining a first partial image from the first image;
obtaining a second partial image from the second image;
further comprising obtaining a third partial image by inputting the first partial image and the second partial image into the neural network;
11. The image processing method according to claim 1, wherein said third image is generated by synthesizing said third partial images.

a first image of a subject , and an acquisition unit that acquires a second image that has a larger amount of noise than the first image and contains more high-frequency components of the subject than the first image ;
The first image and the second image are input to a neural network, and the neural network contains more high-frequency components of the subject than the first image and has a noise amount that is greater than that of the second image. and a calculator for generating a small number of third images.

13. The image processing apparatus according to claim 12 , further comprising a storage unit for storing learning information related to the neural network learned in advance.

The acquisition unit acquires a first partial image from the first image, acquires a second partial image from the second image,
The calculation unit acquires a third partial image by inputting the first partial image and the second partial image into the neural network, and synthesizes the third partial image to obtain the third partial image. 14. The image processing apparatus according to claim 12, wherein three images are generated.

An image processing device according to any one of claims 12 to 14;
a first imaging unit that captures the first image ;
and a second imaging unit that captures the second image .

A program for causing a computer to execute the image processing method according to any one of claims 1 to 11 .

17. A storage medium storing the program according to claim 16 .

An image processing system having a first device and a second device,
The first device is
The second device performs image processing using a first image of a subject and a second image that has a larger amount of noise than the first image and contains more high-frequency components of the subject than the first image. a sending means for sending a request to cause the
The second device is
receiving means for receiving the request transmitted by the transmitting means;
acquisition means for acquiring the first image and the second image;
The first image and the second image are input to a neural network, and the neural network contains more high-frequency components of the subject than the first image and has a noise amount that is greater than that of the second image. and a calculator for generating fewer third images.

the acquisition means acquires a first partial image from the first image and acquires a second partial image from the second image;
The calculation unit acquires a third partial image by inputting the first partial image and the second partial image into the neural network, and synthesizes the third partial image to obtain the third partial image. 19. The image processing system of claim 18, wherein three images are generated.

obtaining a first image of the subject ;
Acquiring a second image that has more noise than the first image and contains more high-frequency components of the subject than the first image ;
Acquiring a third image that contains more high-frequency components of the subject than the first image and has less noise than the second image;
and learning parameters of a neural network using the first image, the second image, and the third image.

obtaining a first partial image from the first image;
obtaining a second partial image from the second image;
further comprising obtaining a third partial image by inputting the first partial image and the second partial image into the neural network;
21. The method of manufacturing a trained model according to claim 20, wherein said third image is generated by synthesizing said third partial images.