JP2023179838A

JP2023179838A - Image processing method, image processor, image processing system, and image processing program

Info

Publication number: JP2023179838A
Application number: JP2022092681A
Authority: JP
Inventors: 法人日浅; Norito Hiasa
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2022-06-08
Filing date: 2022-06-08
Publication date: 2023-12-20

Abstract

To provide an image processing method that highly accurately increases image resolution and reduces noise while suppressing a training load of a machine learning model.SOLUTION: An image processing method comprises steps of: generating a first noise reduction output; generating a high-resolution output based on first input data according to the first noise reduction output using a first machine learning model; generating a second noise reduction output based on second input data according to the high-resolution output using a second machine learning model; and generating an output image based on the second noise reduction output. The second input data is an image that has a larger difference from a second addition image than difference from a first addition image. The first addition image is an image obtained by adding the image and a high-resolution residual map based on the high-resolution output. The second addition image is an image obtained by adding the high-resolution residual map and the noise-reduced image based on the first noise reduction output.SELECTED DRAWING: Figure 4

Description

本発明は、撮像画像に対して高解像化とノイズ低減を実行する画像処理方法に関する。 The present invention relates to an image processing method for performing high resolution and noise reduction on a captured image.

特許文献１には、ユーザに指定されたノイズ低減のレベルに対応するウエイトを使用することで、１つのニューラルネットワークで画像のノイズ低減とアップスケールを実行する方法が開示されている。 Patent Document 1 discloses a method of performing noise reduction and upscaling of an image using one neural network by using weights corresponding to a noise reduction level specified by a user.

米国特許第１０５５２９４４号明細書US Patent No. 10552944

しかしながら、特許文献１の方法では、データセットは膨大な数になり、ニューラルネットワークの訓練の負荷が増大してしまう。 However, in the method of Patent Document 1, the number of data sets becomes enormous, and the training load of the neural network increases.

高解像化とノイズ低減を個別の機械学習モデルで実行する場合、互いの処理で生じる画質の弊害によって、最終的な画像の画質が低下する。高解像化を先に行う場合、ノイズが増幅されることで、ノイズ低減を行っても増幅されたノイズが残存する。ノイズ低減を先に行う場合、ぼけ像が変化することで、高解像化により過剰補正や補正不足が発生する。 When high-resolution and noise reduction are performed using separate machine learning models, the image quality of the final image deteriorates due to the adverse effects of both processes. When high resolution is performed first, noise is amplified, and even if noise reduction is performed, the amplified noise remains. If noise reduction is performed first, the blurred image changes, resulting in over-correction or under-correction due to higher resolution.

本発明は、機械学習モデルの訓練の負荷を抑制しつつ、高精度に撮像画像の高解像化とノイズ低減を行う画像処理方法を提供することを目的とする。 An object of the present invention is to provide an image processing method that highly accurately increases the resolution of a captured image and reduces noise while suppressing the training load of a machine learning model.

本発明の一側面としての画像処理方法は、撮像画像を用いて第１のノイズ低減に関する出力を生成するステップと、第１の機械学習モデルを用いて、第１のノイズ低減に関する出力に応じた第１の入力データに基づく高解像化に関する出力を生成するステップと、第２の機械学習モデルを用いて、高解像化に関する出力に応じた第２の入力データに基づく第２のノイズ低減に関する出力を生成するステップと、第２のノイズ低減に関する出力に基づく出力画像を生成するステップとを有し、第２の入力データは、第１の加算画像との差異よりも、第２の加算画像との差異が大きい画像であり、第１の加算画像は、高解像化に関する出力に基づく高解像化の残差マップと、撮像画像とを加算することで得られる画像であり、第２の加算画像は、高解像化の残差マップと、第１のノイズ低減に関する出力に基づくノイズ低減された撮像画像とを加算することで得られる画像であることを特徴とする。 An image processing method as one aspect of the present invention includes the steps of: generating an output related to first noise reduction using a captured image; and generating an output related to first noise reduction using a first machine learning model. generating an output related to high resolution based on the first input data; and a second noise reduction based on the second input data according to the output related to high resolution using a second machine learning model. and generating an output image based on the output regarding the second noise reduction, wherein the second input data is larger than the difference between the second summation image and the first summation image. The first addition image is an image that has a large difference from the image, and the first addition image is an image obtained by adding the high-resolution residual map based on the output regarding high-resolution and the captured image. The second added image is characterized in that it is an image obtained by adding the high-resolution residual map and the captured image with noise reduced based on the output related to the first noise reduction.

本発明によれば、機械学習モデルの訓練の負荷を抑制しつつ、高精度に撮像画像の高解像化とノイズ低減を行うことが可能な画像処理方法を提供することができる。 According to the present invention, it is possible to provide an image processing method that can highly accurately improve the resolution and reduce noise of a captured image while suppressing the training load of a machine learning model.

実施例１の画像処理システムの外観図である。1 is an external view of an image processing system of Example 1. FIG. 実施例１の画像処理システムのブロック図である。1 is a block diagram of an image processing system according to a first embodiment; FIG. 機械学習モデルの訓練を示すフローチャートである。3 is a flowchart showing training of a machine learning model. 実施例１の出力画像の生成を示すフローチャートである。5 is a flowchart illustrating generation of an output image in Example 1. FIG. 実施例２の画像処理システムの外観図である。FIG. 2 is an external view of an image processing system according to a second embodiment. 実施例２の画像処理システムのブロック図である。FIG. 2 is a block diagram of an image processing system according to a second embodiment. 実施例２の出力画像の生成を示すフローチャートである。7 is a flowchart showing generation of an output image in Example 2. FIG.

以下、本発明の実施形態について、図面を参照しながら詳細に説明する。各図において、同一の部材については同一の参照符号を付し、重複する説明は省略する。 Embodiments of the present invention will be described in detail below with reference to the drawings. In each figure, the same reference numerals are given to the same members, and overlapping explanations will be omitted.

本実施形態を詳しく説明する前に、本発明の要旨を簡単に説明する。本発明では、撮像画像に対して、高解像化とノイズ低減を行う。高解像化は、ぼけの補正、又はアップスケールを含む。アップスケールは、撮像画像全体の拡大（高画素化）と、撮像画像の一部の拡大（デジタルズーム等）を含む。ノイズ低減は、撮像素子で発生するノイズの低減や、画像を不可逆圧縮（ＪＰＥＧ圧縮等）した際の圧縮ノイズの低減を含む。 Before explaining this embodiment in detail, the gist of the present invention will be briefly explained. In the present invention, high resolution and noise reduction are performed on captured images. High resolution includes blur correction or upscaling. Upscaling includes enlarging the entire captured image (increasing the number of pixels) and enlarging a portion of the captured image (digital zoom, etc.). Noise reduction includes reduction of noise generated in an image sensor and reduction of compression noise when an image is irreversibly compressed (JPEG compression, etc.).

ノイズ低減のみを行うニューラルネットワークを訓練する場合、データセットはノイズのバリエーションのみ網羅すればよい。アップスケールのみを行う場合も同様である。しかし、ノイズ低減とアップスケールの両方を１つのニューラルネットワークで実行する場合、そのデータセットはノイズと解像性能の劣化の両方を含む必要があり、そのバリエーションは組み合わせで増大する。このため、データセットが膨大な数になり、訓練の負荷が増大する。 When training a neural network that only performs noise reduction, the dataset only needs to cover noise variations. The same applies when only upscaling is performed. However, if one neural network performs both noise reduction and upscaling, the data set must include both noise and resolution degradation, and the variation increases with the combination. As a result, the number of data sets becomes enormous, increasing the training load.

そこで、本発明では、撮像画像に対する高解像化とノイズ低減を、個別の機械学習モデルを用いて行う。これにより、訓練に用いるデータセットの数を抑えることができるため、機械学習モデルの訓練の負荷を抑制することができる。 Therefore, in the present invention, high resolution and noise reduction of a captured image are performed using individual machine learning models. This makes it possible to reduce the number of datasets used for training, thereby reducing the training load of the machine learning model.

また、本発明では、ノイズ低減後の撮像画像に対して高解像化を実行し、ノイズ低減されていない高解像化画像を生成する。例えば、求められた高解像化の残差マップを、ノイズ低減されていない撮像画像に加算することで、ノイズ低減されていない高解像化画像を生成する。ノイズ低減されていない高解像化画像に再度ノイズ低減を行い、最終的な出力画像を生成する。この構成により、ノイズ低減と高解像化のいずれか一方の弊害によって、最終的な出力画像の画質が低下することを抑制することができる。以下、上記処理について説明する。 Further, in the present invention, high resolution is performed on the captured image after noise reduction, and a high resolution image without noise reduction is generated. For example, a high-resolution image without noise reduction is generated by adding the obtained high-resolution residual map to a captured image without noise reduction. Noise reduction is performed again on the high-resolution image that has not been subjected to noise reduction, and a final output image is generated. With this configuration, it is possible to prevent the image quality of the final output image from deteriorating due to the adverse effects of either noise reduction or high resolution. The above processing will be explained below.

要点は、ノイズ低減の推定は入力画像の信号分布に大きく依存し、ノイズ低減に対して高解像化の推定は入力画像の信号分布に対する依存性が小さいことである。説明を簡単にするため、ノイズ低減は特定の分散を有するガウシアンノイズを対象とし、高解像化は特定の分散を有するガウス分布関数で表されるぼけを対象とする。このとき、高解像化では、入力画像は常に同じぼけが作用しているため、様々な信号分布に対しても似たような補正を実行することになる。しかしながら、ノイズ低減においては、入力画像のノイズの分散が同じだけで、ノイズの分布は全て未知である。このため、ノイズ低減は、入力画像の信号分布から、被写体とノイズを区別する必要があり、その推定結果は入力画像の信号分布に大きく影響される。 The key point is that the estimation of noise reduction largely depends on the signal distribution of the input image, and the estimation of high resolution has little dependence on the signal distribution of the input image. To simplify the explanation, noise reduction targets Gaussian noise with a specific variance, and resolution enhancement targets blur expressed by a Gaussian distribution function with a specific variance. At this time, in the case of high resolution, since the same blur always acts on the input image, similar corrections are performed on various signal distributions. However, in noise reduction, only the variance of the noise in the input image is the same, and the distribution of the noise is completely unknown. Therefore, in noise reduction, it is necessary to distinguish between a subject and noise based on the signal distribution of the input image, and the estimation result is greatly influenced by the signal distribution of the input image.

仮に、撮像画像に対して、高解像化の後にノイズ低減を実行する場合、高解像化の弊害によって一部の強いノイズ（又は全てのノイズ）が強調される。強調されたノイズは、ノイズ低減が前提としている特定の分散より大きいため、ノイズでなく被写体とみなされ、ノイズ低減されずに残存する。 If noise reduction is performed on a captured image after increasing the resolution, some strong noises (or all noises) will be emphasized due to the adverse effect of increasing the resolution. Since the emphasized noise is larger than the specific variance that noise reduction is based on, it is regarded as an object rather than noise, and remains without noise reduction.

逆に、撮像画像に対して、ノイズ低減の後に高解像化を実行する場合、ノイズ低減の弊害によって一部のぼけ像が変形する。しかしながら、前述したように、高解像化の推定は入力画像の信号分布に対する依存性が小さいため、ぼけ像が変化しても、高解像化の補正は大きく変化しない。このため、ノイズ低減によってぼけ像が広がっていれば補正不足となり、ぼけ像が小さくなっていれば逆に過剰補正となる。 Conversely, when performing high resolution on a captured image after noise reduction, a part of the blurred image is deformed due to the adverse effects of noise reduction. However, as described above, the high-resolution estimation has little dependence on the signal distribution of the input image, so even if the blurred image changes, the high-resolution correction does not change significantly. Therefore, if the blurred image becomes wider due to noise reduction, the correction will be insufficient, and if the blurred image becomes smaller, the correction will be overcorrected.

そこで、本発明では例えば、ノイズ低減後の撮像画像に対して求めた高解像化の残差マップを、ノイズ低減されていない撮像画像に加算することで、高解像化のみ実行された撮像画像を生成する。残差マップとは、ノイズ低減と高解像化が実行された撮像画像から、ノイズ低減のみ実行された撮像画像を減算した成分である。ノイズ低減後の撮像画像に対して高解像化を行っているため、高解像化の残差マップに含まれるノイズ増幅の成分は小さくなる。また、高解像化の推定は、入力画像の信号分布に対する依存性が小さいことから、ノイズ低減後の撮像画像に対して求めた高解像化の残差マップを、撮像画像に加算しても補正精度の低下は小さい。これにより、高解像化の過不足とノイズ増幅を抑制し、高解像化された撮像画像を生成することができる。高解像化された撮像画像はぼけ像の高解像化によって、被写体の解像性能やコントラストが向上しているため、被写体とノイズの区別がしやすい。このため、高解像化された撮像画像に再びノイズ低減を実行することで、被写体の変形を抑えてノイズを低減することができる。 Therefore, in the present invention, for example, by adding a high-resolution residual map obtained for a captured image after noise reduction to a captured image that has not undergone noise reduction, Generate an image. The residual map is a component obtained by subtracting the captured image for which only noise reduction has been performed from the captured image for which noise reduction and resolution enhancement have been performed. Since high resolution is performed on the captured image after noise reduction, the noise amplification component included in the high resolution residual map becomes small. In addition, since high-resolution estimation has little dependence on the signal distribution of the input image, the high-resolution residual map obtained for the captured image after noise reduction is added to the captured image. However, the decrease in correction accuracy is small. Thereby, it is possible to suppress excessive or insufficient resolution and noise amplification, and to generate a high-resolution captured image. In a high-resolution captured image, the resolution performance and contrast of the subject are improved by increasing the resolution of the blurred image, so it is easier to distinguish between the subject and noise. Therefore, by performing noise reduction again on the high-resolution captured image, it is possible to suppress deformation of the subject and reduce noise.

以上の理由により、機械学習モデルの訓練の負荷を抑制しつつ、高精度に撮像画像の高解像化とノイズ低減を行うことができる。 For the above reasons, it is possible to highly accurately improve the resolution of a captured image and reduce noise while suppressing the training load of a machine learning model.

図１は、本実施例の画像処理システム１００の外観図である。図２は、本実施例の画像処理システム１００のブロック図である。画像処理システム１００は、互いに有線又は無線のネットワークで接続された、訓練装置１０１、画像処理装置１０２、及び撮像装置１０３を有する。 FIG. 1 is an external view of an image processing system 100 of this embodiment. FIG. 2 is a block diagram of the image processing system 100 of this embodiment. The image processing system 100 includes a training device 101, an image processing device 102, and an imaging device 103, which are connected to each other via a wired or wireless network.

撮像装置１０３は、結像光学系１３１、撮像素子１３２、記憶部１３３、及び通信部１３４を備える。結像光学系１３１は、被写体空間の光から被写体の像を形成する。撮像素子１３２は、複数の画素を含み、被写体の像を光電変換によって撮像画像に変換する。被写体の像には、結像光学系１３１で生じる収差や回折によるぼけが発生しているため、撮像画像にも該ぼけによる劣化が存在する。本実施例では、高解像化として、上記ぼけの補正を行う。また、撮像素子１３２では、ショットノイズ、暗電流ノイズ、及び読み出しノイズ等が発生するため、撮像画像にノイズが存在する。本実施例では、ノイズ低減として、上記ノイズを低減する。記憶部１３３は、撮像画像を記憶する。なお、本実施例では、高解像化として、ぼけの補正を行うが、アップスケール等を行ってもよい。また、本実施例では、ノイズ低減の対象は、撮像素子で発生するノイズとするが、圧縮ノイズ等であってもよい。 The imaging device 103 includes an imaging optical system 131, an image sensor 132, a storage section 133, and a communication section 134. The imaging optical system 131 forms an image of the subject from light in the subject space. The image sensor 132 includes a plurality of pixels and converts an image of a subject into a captured image by photoelectric conversion. Since the image of the subject is blurred due to aberrations and diffraction caused by the imaging optical system 131, the captured image is also degraded due to the blurring. In this embodiment, the above-mentioned blur is corrected to improve the resolution. Further, shot noise, dark current noise, read noise, and the like occur in the image sensor 132, so that noise is present in the captured image. In this embodiment, the above-mentioned noise is reduced as noise reduction. The storage unit 133 stores captured images. In this embodiment, blur correction is performed to increase resolution, but upscaling or the like may also be performed. Further, in this embodiment, the target of noise reduction is the noise generated in the image sensor, but it may be compressed noise or the like.

画像処理装置１０２は、記憶部１２１、通信部１２２、取得部１２３、ノイズ低減部（第１生成部、第３生成部）１２４、高解像化部（第２生成部）１２５、演算部（出力部）１２６、及び表示部１２７を備える。通信部１２２は、撮像装置１０３の通信部１３４を介して、撮像画像を取得する。なお、撮像画像を記録媒体に記録し、記録媒体と画像処理装置１０２を接続することで、画像処理装置１０２が撮像画像を取得してもよい。画像処理装置１０２は、第１の機械学習モデルと第２の機械学習モデルを用いて、撮像画像に対して高解像化とノイズ低減を実行した出力画像を生成する。第１及び第２の機械学習モデルでは、予め訓練装置１０１で生成されたウエイトのセットが使用される。画像処理装置１０２は、通信部１２２を介して訓練装置１０１からウエイトのセットを取得し、記憶部１２１に記憶する。生成された出力画像は、表示部１２７を介してユーザに提示されるか、記憶部１２１に記憶される。 The image processing device 102 includes a storage section 121, a communication section 122, an acquisition section 123, a noise reduction section (first generation section, third generation section) 124, a high resolution section (second generation section) 125, and a calculation section ( output section) 126, and a display section 127. The communication unit 122 acquires a captured image via the communication unit 134 of the imaging device 103. Note that the image processing device 102 may acquire the captured image by recording the captured image on a recording medium and connecting the recording medium to the image processing device 102. The image processing device 102 generates an output image by performing high resolution and noise reduction on the captured image using the first machine learning model and the second machine learning model. The first and second machine learning models use a set of weights generated in advance by the training device 101. The image processing device 102 acquires a set of weights from the training device 101 via the communication unit 122 and stores it in the storage unit 121. The generated output image is presented to the user via the display unit 127 or stored in the storage unit 121.

訓練装置１０１は、記憶部１１１、取得部１１２、演算部１１３、及び更新部１１４を備える。訓練装置１０１は、データセットを用いて、高解像化を行う第１の機械学習モデルを訓練し、訓練済みの第１のウエイトのセットを生成する。また、訓練装置１０１は、データセットを用いて、ノイズ低減を行う第２機械学習モデルを訓練し、訓練済みの第２のウエイトのセットを生成する。 The training device 101 includes a storage section 111, an acquisition section 112, a calculation section 113, and an update section 114. The training device 101 uses the data set to train a first machine learning model that performs high resolution, and generates a trained first set of weights. The training device 101 also uses the data set to train a second machine learning model that performs noise reduction, and generates a trained second set of weights.

以下、図３を参照して、訓練装置１０１で実行される機械学習モデルの訓練（ウエイトのセットの決定）について説明する。図３は、機械学習モデルの訓練を示すフローチャートである。本実施例では、機械学習モデルとしてＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）を使用する。ただし、本発明はこれに限定されない。機械学習モデルは、ニューラルネットワーク、遺伝的プログラミング、及びベイジアンネットワーク等を含む。ニューラルネットワークは、ＣＮＮ、ＧＡＮ（ＧｅｎｅｒａｔｉｖｅＡｄｖｅｒｓａｒｉａｌＮｅｔｗｏｒｋ）、及びＲＮＮ（ＲｅｃｕｒｒｅｎｔＮｅｕｒａｌＮｅｔｗｏｒｋ）等を含む。 Hereinafter, with reference to FIG. 3, training of a machine learning model (determination of a set of weights) executed by the training device 101 will be described. FIG. 3 is a flowchart illustrating training of a machine learning model. In this embodiment, a CNN (Convolutional Neural Network) is used as a machine learning model. However, the present invention is not limited to this. Machine learning models include neural networks, genetic programming, Bayesian networks, and the like. Neural networks include CNNs, GANs (Generative Adversarial Networks), RNNs (Recurrent Neural Networks), and the like.

まず、ノイズ低減を行う第２の機械学習モデルの訓練について説明する。 First, training of the second machine learning model that performs noise reduction will be explained.

ステップＳ１０１では、取得部１１２は、記憶部１１１から１組以上の正解画像と訓練画像を取得する。第２の機械学習モデルは訓練によってノイズ低減の効果を得たいため、訓練画像はノイズの存在する画像、正解画像は訓練画像と同一のシーンの画像であり、かつ訓練画像よりノイズが小さい（又はない）画像である。記憶部１１１には、複数の正解画像と訓練画像を含むデータセットが保存されている。訓練画像に存在するノイズは、撮像素子１３２で発生するノイズと同様のノイズである。第１の機械学習モデルが様々な被写体の撮像画像に対応できるように、訓練に用いられる複数の正解画像と訓練画像は様々な被写体（向きや強さの異なるエッジ、テクスチャ、グラデーション、及び平坦部等）を含んでいることが望ましい。 In step S101, the acquisition unit 112 acquires one or more sets of correct images and training images from the storage unit 111. The second machine learning model wants to obtain the effect of noise reduction through training, so the training image is an image with noise, the correct image is an image of the same scene as the training image, and the noise is smaller than the training image (or (no) image. The storage unit 111 stores a data set including a plurality of correct images and training images. The noise present in the training images is similar to the noise generated by the image sensor 132. In order for the first machine learning model to be able to handle captured images of various subjects, the multiple ground truth images and training images used for training are of various subjects (edges with different orientations and strengths, textures, gradations, and flat areas). etc.) is desirable.

本実施例で使用される正解画像と訓練画像は、原画像に対してノイズを付与して生成された画像である。ただし、撮像素子１３２とそれよりＳ／Ｎ比の良い条件で同一シーンを実写し、訓練画像と正解画像としてもよい。原画像は未現像のＲＡＷ画像（光の強度と信号値が線型の関係）であり、これに対して撮像素子１３２で発生するノイズを付与することで訓練画像が生成される。撮像素子１３２が複数の種類やＩＳＯ感度を取りうる場合、データセットにはそれらで発生する様々な強さのノイズの訓練画像が含まれるようにする。正解画像は、原画像そのままでもよいし、訓練画像に付与したノイズより小さいノイズを付与された画像でもよい。付与するノイズは、訓練画像と相関のあるノイズであることが望ましい。相関のないノイズを付与した場合、データセットの複数の画像で訓練することで正解画像のノイズの影響が平均化され、結果的にノイズを付与しなかった場合と殆ど変わらなくなる。原画像は、実写のＲＡＷ画像でもよいし、ＣＧ（ＣｏｍｐｕｔｅｒＧｒａｐｈｉｃｓ）でもよい。実写のＲＡＷ画像には既にノイズが存在するが、機械学習モデルの訓練において、そのノイズは被写体として扱われるため、特に問題はない。本実施例では、第２の機械学習モデルは、ＲＡＷ画像に対してノイズ低減を行う。このため、正解画像と訓練画像は、ＲＡＷ画像である。現像画像に対してノイズ低減を行いたい場合、正解画像と訓練画像を現像すればよい。 The correct image and training image used in this embodiment are images generated by adding noise to the original image. However, the same scene may be actually photographed using the image sensor 132 and a condition with a better S/N ratio, and may be used as the training image and the correct image. The original image is an undeveloped RAW image (light intensity and signal value have a linear relationship), and a training image is generated by adding noise generated by the image sensor 132 to this image. If the image sensor 132 can have multiple types and ISO sensitivities, the dataset includes training images of noises of various intensities generated by the image sensor 132. The correct image may be the original image as it is, or may be an image added with noise smaller than the noise added to the training image. It is desirable that the added noise be noise that is correlated with the training images. When uncorrelated noise is added, training with multiple images in the dataset averages out the influence of noise on the correct image, resulting in almost no difference from the case where no noise is added. The original image may be a real RAW image or CG (Computer Graphics). Noise already exists in real RAW images, but in training a machine learning model, that noise is treated as an object, so there is no particular problem. In this embodiment, the second machine learning model performs noise reduction on the RAW image. Therefore, the correct image and the training image are RAW images. When it is desired to perform noise reduction on a developed image, it is sufficient to develop the correct image and the training image.

ステップＳ１０２では、演算部１１３は、第２の機械学習モデルを用いて、訓練画像に基づく第２のノイズ低減に関する出力（情報）を生成する。具体的には、訓練画像を第２の機械学習モデルに入力することで第２のノイズ低減に関する出力を生成する。第２のノイズ低減に関する出力は、ノイズ低減された訓練画像、及び訓練画像に対するノイズ低減の残差マップ（第２のノイズ低減の残差マップ）等を含む。訓練画像は、ノイズ低減の残差マップを加算されると、ノイズ低減された画像になる。本実施例では、第２のノイズ低減に関する出力は、ノイズ低減された訓練画像（推定画像）である。第２の機械学習モデルはＣＮＮであり、複数のウエイト（第２のウエイトのセット）を有する。各ウエイトの初期値は、乱数等で決定すればよい。 In step S102, the calculation unit 113 uses the second machine learning model to generate output (information) regarding second noise reduction based on the training image. Specifically, the training image is input to the second machine learning model to generate an output related to the second noise reduction. The output related to the second noise reduction includes a noise-reduced training image, a noise-reduction residual map for the training image (second noise-reduction residual map), and the like. The training image becomes a noise-reduced image when the noise-reduction residual map is summed. In this embodiment, the output related to the second noise reduction is a noise-reduced training image (estimated image). The second machine learning model is CNN and has multiple weights (second set of weights). The initial value of each weight may be determined using random numbers or the like.

ステップＳ１０３では、更新部１１４は、正解画像と推定画像との差に基づいて、第２の機械学習モデルの各ウエイトを更新する。本実施例では、正解画像と推定画像との平均二乗誤差を損失関数とし、誤差逆伝搬法（Ｂａｃｋｐｒｏｐａｇａｔｉｏｎ）を用いてウエイトを更新する。ただし、本発明はこれに限定されない。正解画像と推定画像との差が小さくなるように最適化を行うことで、第２の機械学習モデルは、入力された画像に対するノイズ低減の効果を獲得する。第２のノイズ低減に関する出力が残差マップである場合、訓練画像と正解画像との差分との差を最小化すればよい。 In step S103, the updating unit 114 updates each weight of the second machine learning model based on the difference between the correct image and the estimated image. In this embodiment, the mean squared error between the correct image and the estimated image is used as a loss function, and the weights are updated using error backpropagation. However, the present invention is not limited to this. By performing optimization so that the difference between the correct image and the estimated image becomes small, the second machine learning model acquires the effect of noise reduction on the input image. When the output related to the second noise reduction is a residual map, the difference between the training image and the correct image may be minimized.

ステップＳ１０４では、更新部１１４は、第２の機械学習モデルの訓練が完了したかどうかを判定する。訓練が完了していないと（未完である）と判定された場合、ステップＳ１０１に戻り、新たな１組以上の正解画像と訓練画像とが取得される。訓練が完了したと判定された場合、訓練済みの第２のウエイトのセットの情報を記憶部１１１に記憶する。 In step S104, the update unit 114 determines whether training of the second machine learning model is completed. If it is determined that the training is not completed (incomplete), the process returns to step S101, and a new set of one or more correct images and training images is acquired. If it is determined that the training has been completed, information on the second set of trained weights is stored in the storage unit 111.

次に、高解像化を行う第１の機械学習モデルの訓練について説明する。第２の機械学習モデルの訓練と同様の箇所は、説明を省略する。 Next, training of the first machine learning model that performs high resolution will be explained. Descriptions of parts similar to those for training the second machine learning model will be omitted.

ステップＳ１０１では、訓練画像は、結像光学系１３１で生じる収差や回折によるぼけが発生している画像である。正解画像は、訓練画像と同一のシーンの画像であり、かつ訓練画像よりぼけが小さい（又はない）画像である。本実施例では、原画像にぼけを付与することで、訓練画像と正解画像を生成する。原画像に対して、結像光学系１３１で発生する収差や回折によるぼけを付与し、訓練画像を生成する。必要に応じて、光学ローパスフィルタや撮像素子１３２の画素開口などによるぼけを与えてもよい。結像光学系１３１に複数の種類や状態（焦点距離、絞り値、及びフォーカス距離等）が存在し、それらによって撮像画像に異なるぼけが作用しうる場合、データセットにそれら複数のぼけが付与された訓練画像が含まれるようにする。ぼけは、撮像素子１３２の各画素の位置（結像光学系１３１の光軸に対する像高とアジムス）で変化しうる他、結像光学系１３１が様々な状態（焦点距離、絞り値、及びフォーカス距離等）を取りうる場合、その状態によっても変化する。また、交換レンズのように結像光学系１３１が複数の種類を取りうる場合、その種類によってもぼけが変化する。なお、原画像に付与するぼけは、結像光学系１３１で発生するぼけそのものでもよいし、そのぼけを近似したぼけでもよい。正解画像は、原画像に対してぼけを付与しないか、訓練画像に付与したぼけより小さいぼけを付与することで生成される。 In step S101, the training image is an image in which blur has occurred due to aberrations and diffraction caused by the imaging optical system 131. The correct image is an image of the same scene as the training image, and is an image with less blur (or no blur) than the training image. In this embodiment, a training image and a correct image are generated by adding blur to the original image. A training image is generated by adding blur due to aberrations and diffraction generated in the imaging optical system 131 to the original image. If necessary, blurring may be provided by an optical low-pass filter, a pixel aperture of the image sensor 132, or the like. If there are multiple types and conditions (focal length, aperture value, focus distance, etc.) of the imaging optical system 131, and different blurs may be applied to the captured image depending on the types and conditions, the plurality of blurs may be added to the data set. Include training images. Blur can vary depending on the position of each pixel of the image sensor 132 (image height and azimuth with respect to the optical axis of the imaging optical system 131), and can also vary depending on various states of the imaging optical system 131 (focal length, aperture value, and focus). (distance, etc.), it also changes depending on the state. Furthermore, when the imaging optical system 131 can take on a plurality of types, such as an interchangeable lens, the blur changes depending on the type. Note that the blur imparted to the original image may be the blur itself generated in the imaging optical system 131, or may be a blur that approximates the blur. The correct image is generated by adding no blur to the original image or adding a smaller blur than the blur added to the training image.

本実施例では、第１の機械学習モデルの訓練に用いられる訓練画像と正解画像とにはノイズを付与しない。本発明では、ノイズ低減された撮像画像に対して高解像化を行うためである。ただし、両者に相関のあるノイズを付与してもよい。相関のあるノイズを付与した場合、第１の機械学習モデルは訓練によって、ぼけの補正と同時に、ノイズ変化の抑制の効果を獲得することもできる。 In this embodiment, noise is not added to the training images and correct images used for training the first machine learning model. In the present invention, the purpose is to increase the resolution of a captured image with reduced noise. However, correlated noise may be added to both. When correlated noise is added, the first machine learning model can correct blur and simultaneously obtain the effect of suppressing noise changes through training.

本実施例では、第１の機械学習モデルは、ＲＡＷ画像に対してぼけ補正を行う。このため、正解画像と訓練画像はＲＡＷ画像である。現像画像に対してぼけ補正を行いたい場合、正解画像と訓練画像を現像すればよい。 In this embodiment, the first machine learning model performs blur correction on the RAW image. Therefore, the correct image and the training image are RAW images. If you want to perform blur correction on a developed image, you can develop the correct image and the training image.

ステップＳ１０２では、演算部１１３は、第１の機械学習モデルを用いて、訓練画像に基づく高解像化に関する出力（情報）を生成する。具体的には、訓練画像を第１の機械学習モデルに入力することで、高解像化に関する出力を生成する。高解像化に関する出力は、高解像化（ぼけ補正）された訓練画像、及び訓練画像に対する高解像化の残差マップ等を含む。訓練画像は、高解像化の残差マップを加算されると、高解像化された画像になる。本実施例では、高解像化に関する出力は、高解像化された訓練画像（推定画像）である。第１の機械学習モデルはＣＮＮであり、複数のウエイト（第１のウエイトのセット）を有する。 In step S102, the calculation unit 113 uses the first machine learning model to generate output (information) regarding high resolution based on the training image. Specifically, by inputting the training image to the first machine learning model, an output related to high resolution is generated. The output related to high resolution includes a high resolution (blur corrected) training image, a high resolution residual map for the training image, and the like. A training image becomes a high-resolution image when a high-resolution residual map is added to the training image. In this embodiment, the output related to high resolution is a high-resolution training image (estimated image). The first machine learning model is CNN and has multiple weights (first set of weights).

ステップＳ１０３では、更新部１１４は、正解画像と推定画像との差に基づいて、第１の機械学習モデルの各ウエイトを更新する。 In step S103, the updating unit 114 updates each weight of the first machine learning model based on the difference between the correct image and the estimated image.

ステップＳ１０４では、更新部１１４は、第１の機械学習モデルの訓練が完了したかどうかを判定する。訓練が完了していないと（未完である）と判定された場合、ステップＳ１０１に戻り、訓練が完了したと判定された場合、訓練済みの第１のウエイトのセットの情報を記憶部１１１に記憶する。 In step S104, the update unit 114 determines whether training of the first machine learning model is completed. If it is determined that the training has not been completed (incomplete), the process returns to step S101, and if it is determined that the training has been completed, information on the trained first set of weights is stored in the storage unit 111. do.

なお、本実施例では、第１の機械学習モデルと第２の機械学習モデルの訓練は独立して行われる。撮像画像に処理を行う際は、第１の機械学習モデルで高解像化した後に、第２の機械学習モデルによるノイズ低減を行うため、高解像化に関する出力に基づく訓練画像で第２の機械学習モデルを訓練する方法も考えられる。しかしながら、両者に相関を持たせて訓練すると、訓練画像の生成に第１の機械学習モデルを用いた演算が必要になることや、第１の機械学習モデルに変更が生じた際に第２の機械学習モデルも訓練し直しになることから、訓練の負荷が増大する。 Note that in this embodiment, the first machine learning model and the second machine learning model are trained independently. When processing a captured image, the first machine learning model is used to increase the resolution, and then the second machine learning model is used to perform noise reduction. Another possibility is to train a machine learning model. However, when training with a correlation between the two, it becomes necessary to perform calculations using the first machine learning model to generate training images, and when a change occurs in the first machine learning model, the second machine learning model Machine learning models also have to be retrained, which increases the training load.

以下、図４のフローチャートを参照して、画像処理装置１０２で実行される撮像画像のノイズ低減と高解像化について説明する。図４は、本実施例の出力画像の生成を示すフローチャートである。 Hereinafter, with reference to the flowchart of FIG. 4, noise reduction and resolution enhancement of a captured image executed by the image processing apparatus 102 will be described. FIG. 4 is a flowchart showing the generation of an output image in this embodiment.

ステップＳ２０１では、取得部１２３は、撮像画像と機械学習モデルのウエイトのセットを取得する。本実施例では、ウエイトのセットを３セット取得する。具体的には、第１の機械学習モデルで用いる第１のウエイトのセット、第２の機械学習モデルで用いる第２のウエイトのセット、第３の機械学習モデルで用いる第３のウエイトのセットである。第３の機械学習モデルは、第１の機械学習モデルによる高解像化の前のノイズ低減を行う。本実施例では、第３の機械学習モデルは、第２の機械学習モデルと同一である。このため、第３のウエイトのセットも、第２のウエイトのセットと同じである。ただし、本発明はこれに限定されない。 In step S201, the acquisition unit 123 acquires a captured image and a set of machine learning model weights. In this embodiment, three sets of weights are acquired. Specifically, a first set of weights used in the first machine learning model, a second set of weights used in the second machine learning model, and a third set of weights used in the third machine learning model. be. The third machine learning model performs noise reduction before increasing the resolution using the first machine learning model. In this example, the third machine learning model is the same as the second machine learning model. Therefore, the third set of weights is also the same as the second set of weights. However, the present invention is not limited to this.

ステップＳ２０２では、ノイズ低減部１２４は、第３の機械学習モデルを用いて、撮像画像に基づく第１のノイズ低減に関する出力（情報）を生成する。具体的には、撮像画像を第３の機械学習モデルに入力することで第１のノイズ低減に関する出力を生成する。本実施例では、第１のノイズ低減に関する出力は、ノイズ低減された撮像画像（第１の画像）である。なお、第１の画像の代わりに、撮像画像に対するノイズ低減の残差マップ（第１のノイズ低減の残差マップ）を生成してもよい。撮像画像は、ノイズ低減の残差マップを加算されると、第１の画像になる。このため、第１のノイズ低減に関する出力は、ノイズ低減された撮像画像、又は撮像画像に対するノイズ低減の残差マップである。なお、本ステップでは、機械学習モデルを用いないノイズ低減の方法、例えば、ＮＬＭ（Ｎｏｎ－ｌｏｃａｌｍｅａｎｓｆｉｌｔｅｒ）やＢＭ３Ｄ（Ｂｌｏｃｋ－ｍａｔｃｈｉｎｇａｎｄ３Ｄｆｉｌｔｅｒｉｎｇ）等を用いてもよい。 In step S202, the noise reduction unit 124 uses the third machine learning model to generate output (information) regarding the first noise reduction based on the captured image. Specifically, the captured image is input to the third machine learning model to generate an output related to the first noise reduction. In this embodiment, the output related to the first noise reduction is a captured image (first image) in which noise has been reduced. Note that instead of the first image, a noise reduction residual map (first noise reduction residual map) for the captured image may be generated. The captured image becomes the first image when the noise reduction residual map is added. Therefore, the output related to the first noise reduction is a noise-reduced captured image or a noise reduction residual map for the captured image. Note that in this step, a noise reduction method that does not use a machine learning model, such as NLM (Non-local means filter) or BM3D (Block-matching and 3D filtering), may be used.

ステップＳ２０３では、高解像化部１２５は、第１の機械学習モデルを用いて、第１のノイズ低減に関する出力に応じた第１の入力データに基づく高解像化に関する出力（情報）を生成する。本実施例では、第１の入力データは、第１の画像である。だたし、本発明はこれに限定されない。例えば、第１のノイズ低減の残差マップ（第１のノイズ低減に関する出力）と撮像画像とを加算、又はチャンネル方向に結合したデータを第１の入力データとしてもよい。この場合、第１の機械学習モデルの訓練時も、第１の入力データと同様の形式のデータを用いて訓練する。第１の入力データを第１の機械学習モデルに入力することで、高解像化に関する出力を生成する。本実施例では、高解像化に関する出力は、ぼけ補正（高解像化）された第１の画像（第２の画像）である。なお、第２の画像の代わりに、第１の画像に対する高解像化の残差マップを生成してもよい。第１の画像は、高解像化の残差マップを加算されると、第２の画像になる。このため、高解像化に関する出力は、高解像化された第１の画像（ノイズ低減と高解像化された撮像画像）、又は第１の画像に対する高解像化の残差マップである。 In step S203, the high resolution unit 125 uses the first machine learning model to generate an output (information) regarding high resolution based on the first input data corresponding to the output regarding the first noise reduction. do. In this embodiment, the first input data is the first image. However, the present invention is not limited to this. For example, data obtained by adding the residual map of the first noise reduction (output related to the first noise reduction) and the captured image or combining them in the channel direction may be used as the first input data. In this case, the first machine learning model is trained using data in the same format as the first input data. By inputting the first input data to the first machine learning model, an output related to high resolution is generated. In this embodiment, the output related to high resolution is a first image (second image) that has been subjected to blur correction (high resolution). Note that a high-resolution residual map for the first image may be generated instead of the second image. The first image becomes the second image when the high-resolution residual map is added. Therefore, the output related to high resolution is the high resolution first image (noise reduction and high resolution captured image) or the high resolution residual map for the first image. be.

ステップＳ２０４では、演算部１２６は、撮像画像と第１の画像とを用いて、撮像画像に対するノイズ低減の残差マップ（第１のノイズ低減の残差マップ）を生成する。第１の画像から撮像画像を減算することで、第１のノイズ低減の残差マップを生成する。なお、ステップＳ２０２で該残差マップを生成している場合、ステップＳ２０４を実行しなくてもよい。また、ステップＳ２０４は、ステップＳ２０３より前に実行してもよい。 In step S204, the calculation unit 126 uses the captured image and the first image to generate a noise reduction residual map (first noise reduction residual map) for the captured image. A first noise reduction residual map is generated by subtracting the captured image from the first image. Note that if the residual map is generated in step S202, step S204 may not be executed. Further, step S204 may be executed before step S203.

ステップＳ２０５では、演算部１２６は、第２の画像と第１のノイズ低減の残差マップとを用いて、第３の画像を生成する。第３の画像は、第２の画像から第１のノイズ低減の残差マップを減算した画像であり、ステップＳ２０２のノイズ低減を打ち消して撮像画像に高解像化のみが実行された画像である。第３の画像は、第２の機械学習モデルに対する第２の入力データとなる。これにより、本実施例の説明の前に述べたように、ノイズの増幅を抑えて高解像化された撮像画像を生成することができる。第３の画像は、撮像画像からノイズが殆ど変化していないため、機械学習モデルによって良好にノイズを低減することができる。また、既にぼけ像が高解像化されているため、ノイズと被写体を判別しやすく、ノイズ低減による被写体の変形も抑制することができる。第３の画像は、高解像化の残差マップを撮像画像に加算することで生成してもよい。また、高解像化の残差マップを第１の画像に加算し、第１のノイズ低減の残差マップを減算することで、第３の画像を生成してもよい。このため、本実施例における第２の入力データ（第３の画像）は、高解像化に関する出力と、撮像画像又は第１のノイズ低減に関する出力とに基づいて生成される。 In step S205, the calculation unit 126 generates a third image using the second image and the first noise reduction residual map. The third image is an image obtained by subtracting the first noise reduction residual map from the second image, and is an image obtained by canceling the noise reduction in step S202 and performing only high resolution on the captured image. . The third image becomes the second input data for the second machine learning model. Thereby, as described before the description of this embodiment, it is possible to suppress noise amplification and generate a high-resolution captured image. In the third image, since the noise has hardly changed from the captured image, the noise can be favorably reduced by the machine learning model. Furthermore, since the blurred image has already been made to have a high resolution, it is easy to distinguish between noise and the subject, and deformation of the subject due to noise reduction can also be suppressed. The third image may be generated by adding a high-resolution residual map to the captured image. Alternatively, the third image may be generated by adding the high-resolution residual map to the first image and subtracting the first noise reduction residual map. Therefore, the second input data (third image) in this embodiment is generated based on the output related to high resolution and the output related to the captured image or first noise reduction.

なお、第３の画像を生成する際、ステップＳ２０２のノイズ低減の効果を完全に打ち消す必要はなく、効果の一部を残してもよい。このため、第３の画像は、ノイズ低減の効果が打ち消されて高解像化のみ実行された撮像画像（第１の加算画像）との差異より、ノイズ低減の後に高解像化が実行された撮像画像（第２の加算画像）との差異が大きい画像である。第１の加算画像は、高解像化に関する出力に基づく高解像化の残差マップ（第２の画像から第１の画像を減算したマップ）と撮像画像とを加算した画像であり、第２の画像からノイズ低減の残差マップを減算した画像と等しい。第２の加算画像は、高解像化の残差マップと第１のノイズ低減に関する出力を用いてノイズ低減された撮像画像（第１の画像）とを加算した画像（第２の画像）である。２つの画像の間の差異は、ＭＳＥ（平均二乗誤差）やＭＡＥ（平均絶対誤差）等で評価可能である。第３の画像の特徴は、第３の画像のノイズが、第１の画像のノイズより、撮像画像のノイズに近いこと、と言い換えることもできる。 Note that when generating the third image, it is not necessary to completely cancel out the effect of noise reduction in step S202, and a part of the effect may be left. For this reason, the third image is different from the captured image (first addition image) in which the effect of noise reduction is canceled and only high resolution is performed, so that high resolution is performed after noise reduction. This is an image that has a large difference from the captured image (second added image). The first added image is an image obtained by adding a high-resolution residual map (a map obtained by subtracting the first image from the second image) based on the output related to high-resolution and the captured image. It is equivalent to the image obtained by subtracting the noise reduction residual map from the image No. 2. The second added image is an image (second image) obtained by adding the high-resolution residual map and the captured image (first image) whose noise has been reduced using the output related to the first noise reduction. be. The difference between two images can be evaluated by MSE (mean square error), MAE (mean absolute error), or the like. The characteristic of the third image can also be expressed as that the noise in the third image is closer to the noise in the captured image than the noise in the first image.

第３の画像は、例えば、第１のノイズ低減の残差マップに０．５より大きく、１以下の係数をかけて、第２の画像から減算することで生成される。なお、ステップＳ２０３において、高解像化に関する出力が第１の画像に対する高解像化の残差マップである場合、高解像化の残差マップを撮像画像に加算することで、第３の画像を生成することができる。 The third image is generated, for example, by multiplying the first noise reduction residual map by a coefficient greater than 0.5 and less than or equal to 1, and subtracting it from the second image. Note that in step S203, if the output related to high resolution is a high resolution residual map for the first image, by adding the high resolution residual map to the captured image, the third Images can be generated.

ステップＳ２０６では、ノイズ低減部１２４は、第２の機械学習モデルを用いて、第２の入力データに基づく第２のノイズ低減に関する出力を生成する。具体的には、第２の入力データを第２の機械学習モデルに入力することで、第２のノイズ低減に関する出力を生成する。本実施例では、第２のノイズ低減に関する出力は、ノイズ低減と高解像化された撮像画像である第４の画像である。ただし、第２のノイズ低減に関する出力は、第３の画像（第２の入力データ）に対するノイズ低減の残差マップ（第２のノイズ低減の残差マップ）でもよい。 In step S206, the noise reduction unit 124 uses the second machine learning model to generate an output regarding second noise reduction based on the second input data. Specifically, by inputting the second input data to the second machine learning model, an output related to the second noise reduction is generated. In this embodiment, the output related to the second noise reduction is a fourth image that is a captured image with noise reduction and high resolution. However, the output related to the second noise reduction may be a noise reduction residual map (second noise reduction residual map) for the third image (second input data).

ステップＳ２０７では、演算部１２６は、撮像画像、高解像化に関する出力、及び第２のノイズ低減に関する出力に基づいて、出力画像を生成する。本実施例では、高解像化に関する出力は第３の画像、第２のノイズ低減に関する出力は第４の画像である。第３の画像から第４の画像を減算することで、第２のノイズ低減の残差マップを生成し、第２のノイズ低減の残差マップにノイズ低減の強度を表す係数をかけて撮像画像に加算することで、ノイズ低減の強度を調整することができる。同様に、第３の画像から撮像画像を減算することで、高解像化の残差マップを生成し、高解像化の残差マップに高解像化の強度を表す係数をかけて撮像画像に加算することで、高解像化の強度を調整することができる。ノイズ低減と高解像化のそれぞれの強度調整が行われた出力画像は、表示部１２７に表示され、ユーザは表示された出力画像を確認しながら、ノイズ低減と高解像化のそれぞれの強度を自由に調整することができる。強度の調整は、ノイズ低減と高解像化のそれぞれの強度を表す係数を変更することで実行することができる。上記構成により、ノイズ低減と高解像化の強度を変更するたびに機械学習モデルの計算を再度行う必要がなく、軽量な計算で高速に強度調整を実行することができる。なお、強度の調整が不要である場合、第４の画像をそのまま出力画像として出力すればよい。 In step S207, the calculation unit 126 generates an output image based on the captured image, the output related to high resolution, and the output related to second noise reduction. In this embodiment, the output related to high resolution is the third image, and the output related to the second noise reduction is the fourth image. By subtracting the fourth image from the third image, a second noise reduction residual map is generated, and the second noise reduction residual map is multiplied by a coefficient representing the intensity of noise reduction to obtain the captured image. The strength of noise reduction can be adjusted by adding . Similarly, by subtracting the captured image from the third image, a high-resolution residual map is generated, and the high-resolution residual map is multiplied by a coefficient representing the strength of the high-resolution image. By adding it to the image, the intensity of high resolution can be adjusted. The output image that has been adjusted in intensity for noise reduction and high resolution is displayed on the display unit 127, and the user can adjust the intensity for noise reduction and high resolution while checking the displayed output image. can be adjusted freely. The intensity can be adjusted by changing the coefficients representing the respective intensities of noise reduction and resolution enhancement. With the above configuration, there is no need to recalculate the machine learning model each time the intensity of noise reduction and resolution enhancement is changed, and intensity adjustment can be performed quickly with lightweight calculations. Note that if intensity adjustment is not necessary, the fourth image may be output as is as an output image.

以上説明したように、本実施例の構成によれば、機械学習モデルの訓練の負荷を抑制しつつ、高精度に撮像画像の高解像化とノイズ低減を行うことが可能である。 As described above, according to the configuration of this embodiment, it is possible to highly accurately improve the resolution and reduce noise of a captured image while suppressing the training load of a machine learning model.

以下、本発明の効果を高める望ましい構成に関して説明する。 Desirable configurations that enhance the effects of the present invention will be described below.

ステップＳ２０３で生成される高解像化に関する出力は、撮像画像を取得する際に用いられる光学系に関する情報に基づいて生成されることが望ましい。高解像化を行う第１の機械学習モデルの訓練で、様々な種類の解像性能の劣化がデータセットに混合されている場合、第１の機械学習モデルはその劣化に対する平均的な高解像化を獲得する。このため、劣化に対応した高解像化を行うためには、第１の機械学習モデルが、撮像画像の撮像に用いた光学系に関する情報に基づいて、高解像化に関する出力を生成する必要がある。光学系に関する情報は、撮像画像を取得する際に用いられる光学系の種類、撮像時の焦点距離、絞り値、フォーカス距離、光軸に対する撮像画像の画素の位置、及び撮像画像の画素における光学系の解像性能のいずれかに関する情報である。光学系の種類は、結像光学系１３１の種類や光学ローパスフィルタの種類等である。光学系の種類、焦点距離、絞り値、フォーカス距離、及び光軸に対する撮像画像の画素の位置が特定されると、その画素の位置において、光学系で発生する収差や回折によるぼけが一意的に決まる。第１の機械学習モデルの訓練と実行時に、画像だけでなく、光学系に関する情報を入力することで、画像の各画素に適した高解像化を行うことができる。光学系に関する情報を表す値をそれぞれチャンネル成分に有するマップ（２次元の画素数が画像と同じ）を、画像のチャンネル方向に連結する等して、第１の機械学習モデルに入力するとよい。光学系の種類、焦点距離、絞り値、及びフォーカス距離は、画素の位置によって変化しないため、それぞれ同一の値を有する平坦なマップとなる。光軸に対する撮像画像の画素の位置は、２方向（水平・垂直など）のグラデーションを異なるチャンネル成分に有するマップとなる。これらの情報の代わりに、画像の画素における光学系の解像性能に関する情報を用いてもよい。各画素に作用しているぼけが、既定値以上のＭＴＦ（変調伝達関数）を有している周波数等で解像性能を表すマップを生成することができる。また、２つ以上の異なる方向（水平・垂直、メリジオナル・サジタル）に対する解像性能に関する情報を、異なるチャンネル成分に有していてもよい。なお、光学系に関する情報は、撮像画像のメタデータから直接取得してもよいし、メタデータの情報をもとに所望のデータ形式になるよう演算して生成してもよい。 It is desirable that the output related to high resolution generated in step S203 be generated based on information regarding the optical system used when acquiring the captured image. When training the first machine learning model that performs high resolution, if various types of resolution degradation are mixed in the dataset, the first machine learning model is trained to obtain an average high resolution for the degradation. Acquire image. Therefore, in order to increase the resolution in response to deterioration, the first machine learning model needs to generate an output related to the increase in resolution based on information about the optical system used to capture the captured image. There is. Information regarding the optical system includes the type of optical system used to acquire the captured image, the focal length at the time of imaging, the aperture value, the focus distance, the position of the pixel of the captured image with respect to the optical axis, and the optical system at the pixel of the captured image. This is information regarding any of the resolution performance. The types of optical systems include the type of imaging optical system 131, the type of optical low-pass filter, and the like. Once the type of optical system, focal length, aperture value, focus distance, and position of a pixel of the captured image with respect to the optical axis are specified, blurring due to aberrations and diffraction occurring in the optical system can be uniquely determined at that pixel position. It's decided. When training and executing the first machine learning model, by inputting not only the image but also information regarding the optical system, high resolution suitable for each pixel of the image can be achieved. Maps (with the same number of two-dimensional pixels as the image) each having a value representing information about the optical system in its channel component may be connected in the channel direction of the image and input into the first machine learning model. Since the type of optical system, focal length, aperture value, and focus distance do not change depending on the position of the pixel, a flat map having the same values is obtained. The position of a pixel of a captured image with respect to the optical axis becomes a map having gradations in two directions (horizontal, vertical, etc.) in different channel components. Instead of these pieces of information, information regarding the resolution performance of the optical system at each pixel of the image may be used. It is possible to generate a map representing the resolution performance at frequencies, etc., at which the blur acting on each pixel has an MTF (modulation transfer function) greater than a predetermined value. Further, information regarding resolution performance in two or more different directions (horizontal/vertical, meridional/sagittal) may be included in different channel components. Note that the information regarding the optical system may be directly acquired from the metadata of the captured image, or may be generated by calculation to obtain a desired data format based on the information of the metadata.

また、ステップＳ２０１では、高解像化に関する情報に基づいて、複数の第１のウエイトのセットから、第１の機械学習モデルで用いる第１のウエイトのセットを選択することが望ましい。更に、撮像画像のノイズに関する情報に基づいて、複数の第２のウエイトのセットから、第２の機械学習モデルで用いる第２のウエイトのセットを選択することが望ましい。撮像素子１３２で取り得るＩＳＯ感度（例えば、ＩＳＯ１００～２５６００）全てのノイズを、１つの機械学習モデルでノイズ低減するより、ＩＳＯ感度の範囲を複数に分け、それぞれの範囲を個別に訓練した機械学習モデルでノイズ低減した方が高精度になる。高解像化に関しても同様である。撮像装置１０３がレンズ交換式カメラである場合、結像光学系１３１は複数の種類を取りうる。例えば、結像光学系１３１の種類ごとに第２の機械学習モデルを訓練することで、精度を向上させることができる。例えば、第２の機械学習モデルを、ＩＳＯ１００～４００、４００～１６００、１６００～６４００、６４００～２５６００のように、ノイズの強さに応じて個別に訓練し、４つの第２のウエイトのセットを生成したとする。同様に、第１の機械学習モデルを、結像光学系１３１の種類（例えば８種類）に応じて個別に訓練し、８つの第１のウエイトのセットを生成したとする。この場合、機械学習モデルの訓練を実行する回数と、保持するウエイトのセットの数はそれぞれ１２である。しかしながら、ノイズ低減と高解像化を１つの機械学習モデルで実行する場合、訓練の実行回数とウエイトのセットの数はそれぞれ４と８の積である３２となる。このため、ノイズ低減と高解像化を分けて個別の機械学習モデルで実行することにより、訓練の負荷と保持するデータの量を低減することができる。第２のウエイトのセットは、撮像画像のノイズに関する情報に基づいて選択される。撮像画像のノイズに関する情報は、撮像素子１３２の種類、撮像時のＩＳＯ感度、撮像画像のオプティカルブラック領域の信号分布（分散等）、及び撮像画像の圧縮品質（ＪＥＰＧ圧縮のＱ値等）等に関する情報を含む。同様に、第１のウエイトのセットは、高解像化に関する情報に基づいて選択される。高解像化に関する情報は、撮像画像の解像性能に関連する情報であり、結像光学系１３１の種類、撮像時の結像光学系１３１の状態（焦点距離、絞り値、及びフォーカス距離等）、及びアップスケールの拡大倍率等に関する情報を含む。 Further, in step S201, it is desirable to select a first weight set to be used in the first machine learning model from a plurality of first weight sets based on information regarding high resolution. Furthermore, it is desirable to select a second set of weights to be used in the second machine learning model from a plurality of second weight sets based on information regarding noise in the captured image. Rather than using a single machine learning model to reduce noise from all ISO sensitivities that can be taken by the image sensor 132 (for example, ISO 100 to 25600), machine learning that divides the ISO sensitivity range into multiple ranges and trains each range separately Reducing noise in the model will result in higher accuracy. The same applies to high resolution. When the imaging device 103 is an interchangeable lens camera, the imaging optical system 131 can be of a plurality of types. For example, accuracy can be improved by training a second machine learning model for each type of imaging optical system 131. For example, a second machine learning model can be trained separately depending on the noise strength, such as ISO100-400, 400-1600, 1600-6400, 6400-25600, and a set of four second weights. Suppose that it is generated. Similarly, it is assumed that the first machine learning model is trained individually according to the type (for example, eight types) of the imaging optical system 131, and a set of eight first weights are generated. In this case, the number of times the machine learning model is trained and the number of weight sets to be retained are each 12. However, when noise reduction and resolution enhancement are performed using one machine learning model, the number of training executions and the number of weight sets are respectively 32, which is the product of 4 and 8. Therefore, by separately executing noise reduction and resolution enhancement using separate machine learning models, it is possible to reduce the training load and the amount of data to be retained. The second set of weights is selected based on information regarding noise in the captured image. The information regarding the noise of the captured image includes the type of the image sensor 132, the ISO sensitivity at the time of imaging, the signal distribution (dispersion, etc.) of the optical black area of the captured image, the compression quality of the captured image (Q value of JEPG compression, etc.), etc. Contains information. Similarly, the first set of weights is selected based on information regarding high resolution. The information regarding high resolution is information related to the resolution performance of the captured image, including the type of the imaging optical system 131 and the state of the imaging optical system 131 at the time of imaging (focal length, aperture value, focus distance, etc.). ), and information regarding upscaling magnification, etc.

また、ステップＳ２０５では、第１のノイズ低減の残差マップを、撮像画像と高解像化に関する出力とに基づいて修正し、修正された第１のノイズ低減の残差マップに基づいて、第２の入力データを生成することが望ましい。撮像素子１３２で発生するノイズはショットノイズ等を含むため、光量によってノイズの強さが変化する。高解像化によってぼけ像のフレアが補正されると、撮像画像内で局所的に明るさが変化する領域ができる。第１のノイズ低減の残差マップを修正しないと、ぼけた状態の光量に対応するノイズが、高解像化後の光量が変化した領域に存在することになるため、被写体の明るさとノイズの強さに不一致が発生する。この不一致が第２の機械学習モデルのノイズ低減の効果を下げる可能性があるため、高解像化による撮像画像の局所的な明るさ変化に応じて、ノイズ低減の残差マップを修正し、明るさとノイズの強さの不一致を解消することが望ましい。 Further, in step S205, the first noise reduction residual map is modified based on the captured image and the output related to high resolution, and the first noise reduction residual map is modified based on the modified first noise reduction residual map. It is desirable to generate two input data. Since the noise generated by the image sensor 132 includes shot noise and the like, the intensity of the noise changes depending on the amount of light. When the flare of a blurred image is corrected by increasing the resolution, a region where the brightness locally changes is created within the captured image. If the residual map of the first noise reduction is not corrected, noise corresponding to the amount of light in the blurred state will exist in the area where the amount of light has changed after high resolution. Discrepancies occur in strength. Since this discrepancy may reduce the noise reduction effect of the second machine learning model, the noise reduction residual map is modified according to local brightness changes in the captured image due to higher resolution. It is desirable to resolve the mismatch between brightness and noise intensity.

本実施例では、実施例１と異なる構成についてのみ説明し、実施例１と同様の構成については説明を省略する。本実施例では、高解像化の対象は光学系で発生するぼけであり、ノイズ低減の対象は撮像素子で発生するノイズである。 In this embodiment, only the configurations that are different from the first embodiment will be explained, and the explanation of the same configurations as the first embodiment will be omitted. In this embodiment, the object of high resolution is the blur generated in the optical system, and the object of noise reduction is the noise generated in the image sensor.

図５は、本実施例の画像処理システム３００の外観図である。図６は、本実施例の画像処理システム３００のブロック図である。画像処理システム３００は、訓練装置３０１、画像処理装置（第２の装置）３０２、制御装置（第１の装置）３０３、及び撮像装置３０４を有する。 FIG. 5 is an external view of the image processing system 300 of this embodiment. FIG. 6 is a block diagram of the image processing system 300 of this embodiment. The image processing system 300 includes a training device 301, an image processing device (second device) 302, a control device (first device) 303, and an imaging device 304.

撮像装置３０４は、結像光学系３４１、撮像素子３４２、記憶部３４３、及び通信部３４４を備える。撮像装置３０４で撮像された撮像画像は、制御装置３０３に送信される。 The imaging device 304 includes an imaging optical system 341, an imaging element 342, a storage section 343, and a communication section 344. A captured image captured by the imaging device 304 is transmitted to the control device 303.

制御装置３０３は、記憶部３３１、通信部（送信部）３３２、演算部（取得部）３３３、及び表示部３３４を備える。制御装置３０３は、撮影画像と、撮像画像に対するノイズ低減と高解像化の実行の要求とを画像処理装置３０２に送信する。また、制御装置３０３は、画像処理装置３０２の出力（後述の高解像化された撮像画像と、第２のノイズ低減の残差マップ）を受信し、ユーザの指示に従って出力画像を生成する。 The control device 303 includes a storage section 331, a communication section (transmission section) 332, a calculation section (obtainment section) 333, and a display section 334. The control device 303 transmits the captured image and a request to perform noise reduction and resolution enhancement on the captured image to the image processing device 302. Further, the control device 303 receives the output of the image processing device 302 (a high-resolution captured image and a second noise reduction residual map, which will be described later), and generates an output image according to a user's instruction.

画像処理装置３０２は、記憶部３２１、通信部３２２、取得部３２３、ノイズ低減部３２４、高解像化部３２５、及び演算部３２６を備える。画像処理装置３０２は、訓練装置３０１で訓練済みのウエイトのセットを用いた機械学習モデルによって、撮像画像にノイズ低減と高解像化を実行する。 The image processing device 302 includes a storage section 321, a communication section 322, an acquisition section 323, a noise reduction section 324, a high resolution section 325, and a calculation section 326. The image processing device 302 performs noise reduction and resolution enhancement on the captured image using a machine learning model using a set of weights trained by the training device 301.

訓練装置３０１は、記憶部３１１、取得部３１２、演算部３１３、及び更新部３１４を備える。訓練装置３０１は、図３のフローチャートに沿って、機械学習モデルの訓練を実行する。本実施例では、第１の機械学習モデルの訓練に用いる訓練画像と正解画像が、実施例１と異なる。訓練画像は、原画像にぼけとノイズを付与した画像である。ただし、第１の機械学習モデルには、訓練画像と、付与したノイズに－１をかけたマップ（第１のノイズ低減の残差マップに相当）とをチャンネル方向に結合したデータを入力する。正解画像は、原画像に訓練画像より小さいぼけを付与し（又はぼけを付与せずに）、訓練画像に付与したノイズと相関のある同程度の強さのノイズを付与した画像である。訓練画像に存在するノイズを露わに、第１の機械学習モデルに入力することで、第１の機械学習モデルは訓練画像内の被写体とノイズを容易に切り分けることができるため、ノイズ変化を抑制した高解像化のみの効果を獲得することができる。なお、ノイズのマップは、必ずしも訓練画像のチャンネル方向に結合する必要はない。訓練画像とノイズのマップをそれぞれ畳み込み層に入力し、生成された特徴マップをチャンネル方向に結合する等してもよい。 The training device 301 includes a storage section 311, an acquisition section 312, a calculation section 313, and an update section 314. The training device 301 executes training of the machine learning model according to the flowchart in FIG. In this embodiment, the training images and correct images used for training the first machine learning model are different from those in the first embodiment. The training image is an image obtained by adding blur and noise to the original image. However, data obtained by combining the training image and a map (corresponding to the residual map of the first noise reduction) obtained by multiplying the added noise by -1 in the channel direction is input to the first machine learning model. The correct image is an image in which a smaller blur than that of the training image is added to the original image (or no blur is added), and noise of a similar strength that is correlated with the noise added to the training image is added. By exposing the noise present in the training image and inputting it to the first machine learning model, the first machine learning model can easily separate the subject and noise in the training image, thereby suppressing noise changes. It is possible to obtain the effects of high resolution. Note that the noise map does not necessarily need to be combined in the channel direction of the training images. A training image and a noise map may each be input to a convolution layer, and the generated feature maps may be combined in the channel direction.

以下、図７を参照して、制御装置３０３と画像処理装置３０２により実行される、撮像画像に対するノイズ低減と高解像化について説明する。図７は、本実施例の出力画像の生成を示すフローチャートである。 Hereinafter, with reference to FIG. 7, noise reduction and resolution enhancement for the captured image, which are executed by the control device 303 and the image processing device 302, will be described. FIG. 7 is a flowchart showing the generation of an output image in this embodiment.

ステップＳ３０１では、通信部３３２は、撮像画像と、撮影画像に対するノイズ低減と高解像化の実行の要求とを画像処理装置３０２に送信する。 In step S301, the communication unit 332 transmits the captured image and a request to perform noise reduction and resolution enhancement on the captured image to the image processing device 302.

ステップＳ３０２では、通信部３２２は、撮像画像と、撮影画像に対するノイズ低減と高解像化の実行の要求とを取得する。なお、撮像画像は、予め記憶部３２１に記憶されていてもよいし、その他の記録媒体から読み込まれてもよい。 In step S302, the communication unit 322 acquires the captured image and a request to perform noise reduction and resolution enhancement on the captured image. Note that the captured image may be stored in advance in the storage unit 321, or may be read from another recording medium.

ステップＳ３０３では、取得部３２３は、機械学習モデルのウエイトのセットを取得する。具体的には、取得部３２３は、第１乃至第３の機械学習モデルのそれぞれで使用する第１乃至第３のウエイトのセットを取得する。 In step S303, the acquisition unit 323 acquires a set of weights of the machine learning model. Specifically, the acquisition unit 323 acquires sets of first to third weights used in each of the first to third machine learning models.

ステップＳ３０４では、ノイズ低減部３２４は、第３の機械学習モデルを用いて、撮像画像に基づく第１のノイズ低減に関する出力を生成する。本実施例では、第１のノイズ低減に関する出力は、第１のノイズ低減の残差マップである。第１のノイズ低減の残差マップは、撮像画像のノイズを打ち消す成分である。 In step S304, the noise reduction unit 324 uses the third machine learning model to generate an output regarding the first noise reduction based on the captured image. In this example, the output for the first noise reduction is a residual map of the first noise reduction. The first noise reduction residual map is a component that cancels noise in the captured image.

ステップＳ３０５では、演算部３２６は、撮像画像と第１のノイズ低減の残差マップとをチャンネル方向に結合し、第１の入力データを生成する。 In step S305, the calculation unit 326 combines the captured image and the first noise reduction residual map in the channel direction to generate first input data.

ステップＳ３０６では、高解像化部３２５は、第１の機械学習モデルを用いて、第１の入力データに基づく高解像化に関する出力を生成する。本実施例では、高解像化に関する出力は、ノイズ低減なしで高解像化された撮像画像である第３の画像である。撮像画像と第１のノイズ低減の残差マップを含む第１の入力データと、第１の機械学習モデルの訓練とによって、実施例１と異なり、本実施例では第１の機械学習モデルが直接、第３の画像を生成することができる。なお、第１の入力データは、撮像画像と第１のノイズ低減の残差マップを加算した画像（第１の画像）と、第１のノイズ低減の残差マップと、をチャンネル方向に結合したデータでもよい。また、撮像画像と第１の画像をチャンネル方向に結合したデータでもよい。これらの場合、第１の機械学習モデルの訓練時にも同様の形式の入力を使用する。本実施例では、高解像化に関する出力は、撮像画像、第１のノイズ低減に関する出力に基づくノイズ低減された撮像画像、及び第１のノイズ低減に関する出力に基づく第１のノイズ低減の残差マップのうち少なくとも２つを用いて生成される。 In step S306, the high-resolution unit 325 uses the first machine learning model to generate an output related to high-resolution based on the first input data. In this embodiment, the output related to high resolution is a third image that is a captured image that has been high resolved without noise reduction. Unlike Example 1, in this example, the first machine learning model is directly trained by the first input data including the captured image and the first noise reduction residual map, and by training the first machine learning model. , a third image can be generated. Note that the first input data is an image (first image) obtained by adding the captured image and the residual map of the first noise reduction, and the residual map of the first noise reduction, which are combined in the channel direction. It can also be data. Alternatively, data may be obtained by combining the captured image and the first image in the channel direction. In these cases, a similar type of input is used when training the first machine learning model. In this embodiment, the output related to high resolution is a captured image, a captured image with noise reduced based on the output related to the first noise reduction, and a residual of the first noise reduction based on the output related to the first noise reduction. The map is generated using at least two of the maps.

ステップＳ３０７では、ノイズ低減部３２４は、第２の機械学習モデルを用いて、第２の入力データに基づく第２のノイズ低減に関する出力を生成する。本実施例では、第２の入力データは第３の画像であり、第２のノイズ低減に関する出力は第２のノイズ低減の残差マップである。第３の画像と第２のノイズ低減の残差マップとを加算することで、ノイズ低減と高解像化が実行された撮像画像となる。 In step S307, the noise reduction unit 324 uses the second machine learning model to generate an output related to second noise reduction based on the second input data. In this example, the second input data is the third image, and the output for the second noise reduction is the residual map of the second noise reduction. By adding the third image and the second noise reduction residual map, a captured image in which noise reduction and resolution enhancement have been performed is obtained.

ステップＳ３０８では、通信部３２２は、高解像化に関する出力（第３の画像）と第２のノイズ低減に関する出力（第２のノイズ低減の残差マップ）とを制御装置３０３に送信する。 In step S308, the communication unit 322 transmits to the control device 303 an output related to high resolution (third image) and an output related to second noise reduction (residual map of second noise reduction).

ステップＳ３０９では、通信部３３２は、高解像化に関する出力（第３の画像）と第２のノイズ低減に関する出力（第２のノイズ低減の残差マップ）とを取得する。 In step S309, the communication unit 332 obtains an output related to high resolution (third image) and an output related to second noise reduction (residual map of second noise reduction).

ステップＳ３１０では、演算部３３３は、撮像画像、高解像化に関する出力（第３の画像）、及び第２のノイズ低減に関する出力（第２のノイズ低減の残差マップ）に基づいて出力画像を生成する。撮像画像と第３の画像の加重平均の重みを変更することで、高解像化の強度を調整することができる。撮像画像と第３の画像の加重平均に対し、ノイズ低減の強度を表す係数をかけた第２のノイズ低減の残差マップを加算することで、ノイズ低減の強度を調整することができる。表示部３３４に表示された出力画像を確認しながら、ユーザは高解像化とノイズ低減の強度調整を高速に行うことができる。 In step S310, the calculation unit 333 calculates an output image based on the captured image, the output related to high resolution (third image), and the output related to second noise reduction (residual map of second noise reduction). generate. By changing the weight of the weighted average of the captured image and the third image, the intensity of high resolution can be adjusted. The intensity of noise reduction can be adjusted by adding a second noise reduction residual map obtained by multiplying the weighted average of the captured image and the third image by a coefficient representing the intensity of noise reduction. While checking the output image displayed on the display unit 334, the user can quickly adjust the intensity of high resolution and noise reduction.

本実施形態の開示は、以下の方法及び構成を含む。
（方法１）
撮像画像を用いて第１のノイズ低減に関する出力を生成するステップと、
第１の機械学習モデルを用いて、前記第１のノイズ低減に関する出力に応じた第１の入力データに基づく高解像化に関する出力を生成するステップと、
第２の機械学習モデルを用いて、前記高解像化に関する出力に応じた第２の入力データに基づく第２のノイズ低減に関する出力を生成するステップと、
前記第２のノイズ低減に関する出力に基づく出力画像を生成するステップとを有し、
前記第２の入力データは、第１の加算画像との差異よりも、第２の加算画像との差異が大きい画像であり、
前記第１の加算画像は、前記高解像化に関する出力に基づく高解像化の残差マップと、前記撮像画像とを加算することで得られる画像であり、
前記第２の加算画像は、前記高解像化の残差マップと、前記第１のノイズ低減に関する出力に基づくノイズ低減された前記撮像画像とを加算することで得られる画像であることを特徴とする画像処理方法。
（方法２）
前記高解像化に関する出力は、前記撮像画像を取得する際に用いられる光学系に関する情報に基づいて生成されることを特徴とする方法１に記載の画像処理方法。
（方法３）
前記光学系に関する情報は、前記光学系の種類、前記光学系の焦点距離、前記光学系の絞り値、前記光学系のフォーカス距離、前記光学系の光軸に対する前記撮像画像の画素の位置、及び前記撮像画像の画素における前記光学系の解像性能のいずれかに関する情報であることを特徴とする方法２に記載の画像処理方法。
（方法４）
前記高解像化は、ぼけの補正又はアップスケールの少なくとも一方を含むことを特徴とする構成１乃至３の何れか一つの方法に記載の画像処理方法。
（方法５）
前記高解像化は、前記撮像画像を取得する際に用いられる光学系で生じるぼけの補正を含むことを特徴とする方法１乃至４の何れか一つの方法に記載の画像処理方法。
（方法６）
前記出力画像は、前記撮像画像と、前記高解像化に関する出力とに基づいて生成されることを特徴とする方法１乃至５の何れか一つの方法に記載の画像処理方法。
（方法７）
高解像化に関する情報に基づいて、前記第１の機械学習モデルで用いる第１のウエイトのセットを選択するステップと、
前記撮像画像のノイズに関する情報に基づいて、前記第２の機械学習モデルで用いる第２のウエイトのセットを選択するステップとを更に有することを特徴とする方法１乃至６の何れか一つの方法に記載の画像処理方法。
（方法８）
前記第１のノイズ低減に関する出力は、機械学習モデルを用いて生成されることを特徴とする方法１乃至７の何れか一つの方法に記載の画像処理方法。
（方法９）
前記機械学習モデルは、前記第２の機械学習モデルと同一であることを特徴とする方法８に記載の画像処理方法。
（方法１０）
前記第２の入力データは、前記撮像画像と前記高解像化に関する出力とに基づいて修正された、前記第１のノイズ低減に関する出力に基づく第１のノイズ低減の残差マップを用いて生成されることを特徴とする方法１乃至９の何れか一つの方法に記載の画像処理方法。
（方法１１）
前記第２の入力データは、前記高解像化に関する出力と、前記撮像画像、又は前記第１のノイズ低減に関する出力とに基づいて生成されることを特徴とする方法１乃至１０の何れか一つの方法に記載の画像処理方法。
（方法１２）
前記高解像化に関する出力は、前記撮像画像、前記第１のノイズ低減に関する出力に基づくノイズ低減された前記撮像画像、及び前記第１のノイズ低減に関する出力に基づく第１のノイズ低減の残差マップのうち少なくとも２つを用いて生成されることを特徴とする方法１乃至１１の何れか一つの方法に記載の画像処理方法。
（構成１）
撮像画像を用いて第１のノイズ低減に関する出力を生成する第１生成部と、
高解像化を行う第１の機械学習モデルを用いて、前記第１のノイズ低減に関する出力に応じた第１の入力データに基づく高解像化に関する出力を生成する第１生成部と、
ノイズ低減を行う第２の機械学習モデルを用いて、前記高解像化に関する出力に応じた第２の入力データに基づく第２のノイズ低減に関する出力を生成する第３生成部と、
前記第２のノイズ低減に関する出力に基づく出力画像を出力する出力部とを有し、
前記第２の入力データは、第１の加算画像との差異より、第２の加算画像との差異が大きい画像であり、
前記第１の加算画像は、前記高解像化に関する出力に基づく高解像化の残差マップと、前記撮像画像とを加算した画像であり、
前記第２の加算画像は、前記高解像化の残差マップと、前記第１のノイズ低減に関する出力に基づくノイズ低減された前記撮像画像とを加算した画像であることを特徴とする画像処理装置。
（構成２）
第１の装置と第２の装置とを有する画像処理システムであって、
前記第１の装置は、
前記第２の装置に処理の実行に関する要求を送信する送信部と、
前記第２の装置から取得した出力を用いて出力画像を取得する取得部とを備え、
前記第２の装置は、
撮像画像を用いて第１のノイズ低減に関する出力を生成する第１生成部と、
高解像化を行う第１の機械学習モデルを用いて、前記第１のノイズ低減に関する出力に応じた第１の入力データに基づく高解像化に関する出力を生成する第１生成部と、
ノイズ低減を行う第２の機械学習モデルを用いて、前記高解像化に関する出力に応じた第２の入力データに基づく第２のノイズ低減に関する出力を生成する第３生成部と、
前記第２のノイズ低減に関する出力に基づく前記出力を出力する出力部とを備え、
前記第２の入力データは、第１の加算画像との差異より、第２の加算画像との差異が大きい画像であり、
前記第１の加算画像は、前記高解像化に関する出力に基づく高解像化の残差マップと、前記撮像画像とを加算した画像であり、
前記第２の加算画像は、前記高解像化の残差マップと、前記第１のノイズ低減に関する出力に基づくノイズ低減された前記撮像画像とを加算した画像であることを特徴とする画像処理システム。
（構成３）
方法１乃至１２の何れか一つの構成に記載の画像処理方法をコンピュータに実行させることを特徴とするプログラム。
［その他の実施例］
本発明は、上述の実施例の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The disclosure of this embodiment includes the following methods and configurations.
(Method 1)
generating an output related to first noise reduction using the captured image;
using a first machine learning model to generate an output related to high resolution based on first input data corresponding to the output related to the first noise reduction;
using a second machine learning model to generate an output related to second noise reduction based on second input data corresponding to the output related to high resolution;
generating an output image based on the output related to the second noise reduction;
The second input data is an image that has a larger difference from the second addition image than from the first addition image,
The first addition image is an image obtained by adding the high-resolution residual map based on the output regarding the high-resolution and the captured image,
The second added image is an image obtained by adding the high-resolution residual map and the captured image whose noise has been reduced based on the output regarding the first noise reduction. Image processing method.
(Method 2)
The image processing method according to method 1, wherein the output regarding the high resolution is generated based on information regarding an optical system used when acquiring the captured image.
(Method 3)
The information regarding the optical system includes the type of the optical system, the focal length of the optical system, the aperture value of the optical system, the focal distance of the optical system, the position of the pixel of the captured image with respect to the optical axis of the optical system, and The image processing method according to method 2, characterized in that the information is information regarding any of the resolution performance of the optical system in pixels of the captured image.
(Method 4)
4. The image processing method according to any one of configurations 1 to 3, wherein the increase in resolution includes at least one of blur correction and upscaling.
(Method 5)
5. The image processing method according to any one of methods 1 to 4, wherein the increase in resolution includes correction of blur caused in an optical system used when acquiring the captured image.
(Method 6)
6. The image processing method according to any one of methods 1 to 5, wherein the output image is generated based on the captured image and the output related to high resolution.
(Method 7)
selecting a first set of weights to be used in the first machine learning model based on information regarding high resolution;
The method according to any one of methods 1 to 6, further comprising the step of selecting a second set of weights to be used in the second machine learning model based on information regarding noise in the captured image. Image processing method described.
(Method 8)
8. The image processing method according to any one of methods 1 to 7, wherein the output related to the first noise reduction is generated using a machine learning model.
(Method 9)
9. The image processing method according to method 8, wherein the machine learning model is the same as the second machine learning model.
(Method 10)
The second input data is generated using a residual map of the first noise reduction based on the output related to the first noise reduction, which is corrected based on the captured image and the output related to high resolution. 10. The image processing method according to any one of methods 1 to 9.
(Method 11)
Any one of methods 1 to 10, characterized in that the second input data is generated based on the output related to the high resolution and the captured image or the output related to the first noise reduction. The image processing method described in the above method.
(Method 12)
The output related to the high resolution includes the captured image, the captured image with noise reduced based on the output related to the first noise reduction, and the residual of the first noise reduction based on the output related to the first noise reduction. The image processing method according to any one of methods 1 to 11, characterized in that the image processing method is generated using at least two of the maps.
(Configuration 1)
a first generation unit that generates an output related to first noise reduction using the captured image;
A first generation unit that uses a first machine learning model that performs high resolution to generate an output regarding high resolution based on first input data corresponding to the output regarding the first noise reduction;
a third generation unit that uses a second machine learning model that performs noise reduction to generate an output related to second noise reduction based on second input data corresponding to the output related to high resolution;
an output unit that outputs an output image based on the output related to the second noise reduction,
The second input data is an image that has a larger difference from the second addition image than from the first addition image,
The first addition image is an image obtained by adding a high-resolution residual map based on the output regarding the high-resolution and the captured image,
Image processing characterized in that the second added image is an image obtained by adding the high-resolution residual map and the captured image whose noise has been reduced based on the output related to the first noise reduction. Device.
(Configuration 2)
An image processing system comprising a first device and a second device,
The first device includes:
a transmitter that transmits a request regarding execution of processing to the second device;
an acquisition unit that acquires an output image using the output acquired from the second device,
The second device includes:
a first generation unit that generates an output related to first noise reduction using the captured image;
A first generation unit that uses a first machine learning model that performs high resolution to generate an output regarding high resolution based on first input data corresponding to the output regarding the first noise reduction;
a third generation unit that uses a second machine learning model that performs noise reduction to generate an output related to second noise reduction based on second input data corresponding to the output related to high resolution;
an output unit that outputs the output based on the output related to the second noise reduction,
The second input data is an image that has a larger difference from the second addition image than from the first addition image,
The first addition image is an image obtained by adding a high-resolution residual map based on the output regarding the high-resolution and the captured image,
Image processing characterized in that the second added image is an image obtained by adding the high-resolution residual map and the captured image whose noise has been reduced based on the output related to the first noise reduction. system.
(Configuration 3)
A program that causes a computer to execute the image processing method described in any one of methods 1 to 12.
[Other Examples]
The present invention provides a system or device with a program that implements one or more of the functions of the above-described embodiments via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. This can also be achieved by processing. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

以上、本発明の好ましい実施形態について説明したが、本発明はこれらの実施形態に限定されず、その要旨の範囲内で種々の変形及び変更が可能である。 Although preferred embodiments of the present invention have been described above, the present invention is not limited to these embodiments, and various modifications and changes can be made within the scope of the invention.

１０２画像処理装置
１２４ノイズ低減部
１２５高解像化部
１２６演算部 102 Image processing device 124 Noise reduction section 125 High resolution section 126 Arithmetic section

Claims

generating an output related to first noise reduction using the captured image;
using a first machine learning model to generate an output related to high resolution based on first input data corresponding to the output related to the first noise reduction;
using a second machine learning model to generate an output related to second noise reduction based on second input data corresponding to the output related to high resolution;
generating an output image based on the output related to the second noise reduction;
The second input data is an image that has a larger difference from the second addition image than from the first addition image,
The first addition image is an image obtained by adding the high-resolution residual map based on the output regarding the high-resolution and the captured image,
The second added image is an image obtained by adding the high-resolution residual map and the captured image whose noise has been reduced based on the output regarding the first noise reduction. Image processing method.

2. The image processing method according to claim 1, wherein the output regarding high resolution is generated based on information regarding an optical system used when acquiring the captured image.

The information regarding the optical system includes the type of the optical system, the focal length of the optical system, the aperture value of the optical system, the focal distance of the optical system, the position of the pixel of the captured image with respect to the optical axis of the optical system, and 3. The image processing method according to claim 2, wherein the information is information regarding any one of the resolution performance of the optical system in a pixel of the captured image.

3. The image processing method according to claim 1, wherein the high resolution includes at least one of blur correction and upscaling.

3. The image processing method according to claim 1, wherein the increase in resolution includes correction of blur caused in an optical system used when acquiring the captured image.

3. The image processing method according to claim 1, wherein the output image is generated based on the captured image and the output related to high resolution.

selecting a first set of weights to be used in the first machine learning model based on information regarding high resolution;
The image processing method according to claim 1 or 2, further comprising the step of selecting a second set of weights to be used in the second machine learning model based on information regarding noise in the captured image. .

3. The image processing method according to claim 1, wherein the output related to the first noise reduction is generated using a machine learning model.

The image processing method according to claim 8, wherein the machine learning model is the same as the second machine learning model.

The second input data is generated using a residual map of the first noise reduction based on the output related to the first noise reduction, which is corrected based on the captured image and the output related to high resolution. The image processing method according to claim 1 or 2, characterized in that:

3. The second input data is generated based on the output related to the high resolution and the output related to the captured image or the first noise reduction. Image processing method.

The output related to the high resolution includes the captured image, the captured image with noise reduced based on the output related to the first noise reduction, and the residual of the first noise reduction based on the output related to the first noise reduction. The image processing method according to claim 1 or 2, wherein the image processing method is generated using at least two of the maps.

a first generation unit that generates an output related to first noise reduction using the captured image;
a second generation unit that uses a first machine learning model that performs high resolution to generate an output regarding high resolution based on first input data that corresponds to the output regarding the first noise reduction;
a third generation unit that uses a second machine learning model that performs noise reduction to generate an output related to second noise reduction based on second input data corresponding to the output related to high resolution;
an output unit that outputs an output image based on the output related to the second noise reduction,
The second input data is an image that has a larger difference from the second addition image than from the first addition image,
The first addition image is an image obtained by adding a high-resolution residual map based on the output regarding the high-resolution and the captured image,
Image processing characterized in that the second added image is an image obtained by adding the high-resolution residual map and the captured image whose noise has been reduced based on the output related to the first noise reduction. Device.

An image processing system having a first device and a second device,
The first device includes:
a transmitter that transmits a request regarding execution of processing to the second device;
an acquisition unit that acquires an output image using the output acquired from the second device,
The second device includes:
a first generation unit that generates an output related to first noise reduction using the captured image;
a second generation unit that uses a first machine learning model that performs high resolution to generate an output regarding high resolution based on first input data that corresponds to the output regarding the first noise reduction;
a third generation unit that uses a second machine learning model that performs noise reduction to generate an output related to second noise reduction based on second input data corresponding to the output related to high resolution;
an output unit that outputs the output based on the output related to the second noise reduction,
The second input data is an image that has a larger difference from the second addition image than from the first addition image,
The first addition image is an image obtained by adding a high-resolution residual map based on the output regarding the high-resolution and the captured image,
Image processing characterized in that the second added image is an image obtained by adding the high-resolution residual map and the captured image whose noise has been reduced based on the output related to the first noise reduction. system.

A program for causing a computer to execute the image processing method according to claim 1 or 2.