JP2023084092A

JP2023084092A - Image processing device, image processing method, generation method, and program

Info

Publication number: JP2023084092A
Application number: JP2022161834A
Authority: JP
Inventors: 廣輝中村; Hiroki Nakamura; 啓行長谷川; Hiroyuki Hasegawa; 一輝細井; Kazuki Hosoi; 竣介川原; Shunsuke Kawahara
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2021-12-06
Filing date: 2022-10-06
Publication date: 2023-06-16

Abstract

To provide an image processing device capable of suppressing gradation degrading of a processed image even when a bit number expressing a pixel number in a neural network is smaller than a bit number expressing a bit number of an inputted image.SOLUTION: An image processing device according to the present disclosure includes: gradation compression means which compresses a gradation of a first image; processing means which outputs image data to which prescribed image processing has been applied by applying a neural network for executing the prescribed image processing to the image data in which the gradation has been compressed by the gradation compression means; and gradation expansion means which expands a gradation of the image data to which the prescribed image processing has been applied. The gradation compression means compresses the gradation by using a characteristic where larger gradation is assigned as the bit number expressing a pixel value in the neural network is smaller than the bit number expressing the pixel value of the first image data, and as brightness is lower.SELECTED DRAWING: Figure 3

Description

本発明は、画像処理装置、画像処理方法、生成方法及びプログラムに関する。 The present invention relates to an image processing device, an image processing method, a generation method and a program.

従来、中間層に複数の層を有するニューラルネットワーク（ディープニューラルネットワーク）を用いた画像処理技術が知られている。撮像装置（例えばカメラ）で撮影した画像には撮影時の撮像装置の設定や撮影状況に応じてノイズが発生することがあるため、撮影画像をディープニューラルネットワークに入力して、撮影画像のノイズを除去することができる。特許文献１は、ニューラルネットワークによって圧縮ノイズの除去やアップサンプリング等の処理を施した画像を出力する技術を開示している。 Conventionally, an image processing technique using a neural network (deep neural network) having a plurality of intermediate layers is known. Images taken with an imaging device (such as a camera) may contain noise depending on the settings of the imaging device at the time of shooting and the shooting conditions. can be removed. Japanese Patent Application Laid-Open No. 2002-200001 discloses a technique for outputting an image that has undergone processing such as compression noise removal and upsampling using a neural network.

特開２０１９－１２１２５２号公報JP 2019-121252 A

ところで、一般に、畳み込みニューラルネットワーク（ＣＮＮ）などのニューラルネットワークを用いる処理の演算量は多大であるため、その演算量を低減するためにネットワークの内部の演算精度を制限（例えばＩＮＴ８に）することが考えられる。 By the way, in general, processing using a neural network such as a convolutional neural network (CNN) requires a large amount of computation. Conceivable.

特許文献１で提案される技術では、例えば、画素値が８ビットより大きいビット数で表される画像を撮像センサが出力する場合に、ＣＮＮでＩＮＴ８の精度で処理を行うことは考慮されていない。すなわち、特許文献１で提案される技術において、処理対象の画像よりも小さいビット数で画素値を表現するＣＮＮが上記画像を処理すると、画素値のビット数が削減されて階調が失われるという課題がある。 In the technology proposed in Patent Document 1, for example, when an image sensor outputs an image in which pixel values are represented by a number of bits greater than 8 bits, it is not considered that the CNN performs processing with an accuracy of INT8. . That is, in the technique proposed in Patent Document 1, when a CNN that expresses pixel values with a smaller number of bits than the image to be processed processes the above image, the number of bits of the pixel values is reduced and the gradation is lost. I have a problem.

本発明は、上記課題に鑑みてなされ、その目的は、対応するビット数が限られたニューラルネットワークであっても、適切な処理を行えるようにすることである。 SUMMARY OF THE INVENTION The present invention has been made in view of the above problem, and an object of the present invention is to enable appropriate processing even in a neural network with a limited number of corresponding bits.

この課題を解決するため、例えば本発明の画像処理装置は以下の構成を備える。すなわち、第１の画像データの階調を圧縮する階調圧縮手段と、前記階調圧縮手段によって階調が圧縮された画像データに対して、所定の画像処理を行うニューラルネットワークを適用することにより、前記所定の画像処理が行われた画像データを出力する処理手段と、前記所定の画像処理が行われた画像データの階調を伸長する階調伸長手段と、を有し、前記ニューラルネットワークの内部で画素値を表すビット数が前記第１の画像データの画素値を表すビット数よりも小さく、前記階調圧縮手段は、明るさが低いほど多くの階調が割り当てられる特性を用いて階調を圧縮する、とを有することを特徴とする。 In order to solve this problem, for example, the image processing apparatus of the present invention has the following configuration. That is, by applying a gradation compression means for compressing the gradation of the first image data and a neural network for performing predetermined image processing on the image data whose gradation is compressed by the gradation compression means, , processing means for outputting the image data on which the predetermined image processing has been performed, and gradation decompression means for decompressing the gradation of the image data on which the predetermined image processing has been performed; The number of bits representing the pixel value inside is smaller than the number of bits representing the pixel value of the first image data, and the gradation compression means uses the characteristic that the lower the brightness, the more gradation is assigned. and compressing the tone.

本発明によれば、対応するビット数が限られたニューラルネットワークであっても、適切な処理を行うことが可能になる。 According to the present invention, even a neural network with a limited number of corresponding bits can perform appropriate processing.

実施形態１に係る撮像装置の機能構成例を示すブロック図1 is a block diagram showing a functional configuration example of an imaging device according to a first embodiment; FIG. 実施形態１に係る画像処理システムの機能構成例を示すブロック図1 is a block diagram showing a functional configuration example of an image processing system according to a first embodiment; FIG. 実施形態１の推論処理の動作を示すフローチャート3 is a flow chart showing operations of inference processing according to the first embodiment; 実施形態１の推論処理の動作におけるガンマ補正を説明するグラフGraph explaining gamma correction in operation of inference processing of Embodiment 1 実施形態２の推論処理の動作を示すフローチャートFlowchart showing operation of inference processing of the second embodiment 実施形態２の推論処理の動作におけるガンマ補正を説明するグラフGraphs for explaining gamma correction in the inference processing operation of the second embodiment 実施形態３の推論処理の動作を示すフローチャートFlowchart showing the operation of the inference processing of the third embodiment 実施形態３の領域分割について概念的に示す図FIG. 11 is a diagram conceptually showing region division according to the third embodiment; 実施形態３の領域分割した画像の領域選択を概念的に示す図FIG. 11 is a diagram conceptually showing region selection of a region-divided image according to the third embodiment; 実施形態１の学習処理の動作を示すフローチャート4 is a flowchart showing the operation of learning processing according to the first embodiment; 実施形態１に係るニューラルネットワークを概念的に説明する図1 is a diagram conceptually explaining a neural network according to a first embodiment; FIG. 実施形態２の学習処理の動作を示すフローチャートFlowchart showing the operation of the learning process of the second embodiment 実施形態４の推論処理の動作を示すフローチャート（１）Flowchart (1) showing operation of inference processing of Embodiment 4 実施形態４の推論処理の動作を示すフローチャート（２）Flowchart (2) showing operation of inference processing of Embodiment 4 実施形態４に係るＥＯＴＦ及びＯＥＴＦについて説明する図FIG. 10 is a diagram for explaining EOTF and OETF according to Embodiment 4; 実施形態５の推論処理の動作を示すフローチャート（１）Flowchart (1) showing operation of inference processing of Embodiment 5 実施形態５の推論処理の動作を示すフローチャート（２）Flowchart (2) showing operation of inference processing of Embodiment 5

以下、添付図面を参照して実施形態を詳しく説明する。なお、以下の実施形態は特許請求の範囲に係る発明を限定するものではない。実施形態には複数の特徴が記載されているが、これらの複数の特徴の全てが発明に必須のものとは限らず、また、複数の特徴は任意に組み合わせられてもよい。さらに、添付図面においては、同一若しくは同様の構成に同一の参照番号を付し、重複した説明は省略する。 Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. In addition, the following embodiments do not limit the invention according to the scope of claims. Although multiple features are described in the embodiments, not all of these multiple features are essential to the invention, and multiple features may be combined arbitrarily. Furthermore, in the accompanying drawings, the same or similar configurations are denoted by the same reference numerals, and redundant description is omitted.

＜撮像装置の構成例＞
まず、図１を参照して、後述する推論処理を実行する撮像装置の機能構成例について説明する。以下の説明では、例えばデジタルカメラである撮像装置が推論処理を実行する場合を例に説明する。しかし、入力画像を取得して推論処理を実行することができれば、撮像手段を含まない画像処理装置が本実施形態を実施することもできる。撮像装置或いは画像処理装置は、推論処理を実行可能な電子機器であれば、デジタルカメラ以外の電子機器であってよい。なお、以下の実施形態において、「第１の」、「第２の」と付するものは、理解を容易にするために付すものであり、必ずしも特許請求の範囲において付される場合の「第１の」、「第２の」ものと同一のものを指すものではない。 <Configuration example of imaging device>
First, with reference to FIG. 1, an example of the functional configuration of an imaging device that executes inference processing, which will be described later, will be described. In the following description, a case in which an imaging device such as a digital camera executes inference processing will be described as an example. However, as long as an input image can be acquired and an inference process can be executed, an image processing apparatus that does not include an imaging means can implement this embodiment. The imaging device or image processing device may be an electronic device other than a digital camera as long as it is an electronic device capable of executing inference processing. In the following embodiments, the terms "first" and "second" are used to facilitate understanding, and are not necessarily used in the scope of claims. The terms "first" and "second" do not refer to the same thing.

撮像装置１００は、例えば、プロセッサ１０６、ＲＯＭ１０５、ＲＡＭ１０７、画像処理部１０４、光学レンズ１０１、撮像素子１０２、フレームメモリ１０３、映像出力駆動部１０８、表示駆動部１１０、メタデータ抽出部１１２を含む。これらの各部は、内部バス１１３に接続されている。内部バス１１３に接続される各部は、内部バス１１３を介して互いにデータのやりとりを行うことができる。 The imaging device 100 includes, for example, a processor 106, a ROM 105, a RAM 107, an image processing section 104, an optical lens 101, an imaging device 102, a frame memory 103, a video output driving section 108, a display driving section 110, and a metadata extracting section 112. These units are connected to the internal bus 113 . Each unit connected to the internal bus 113 can exchange data with each other via the internal bus 113 .

光学レンズ１０１は、レンズや、レンズを駆動させるためのモータを含む。光学レンズ１０１は、制御信号に基づいて動作し、光学的に映像の拡大や縮小を行ったり、焦点距離などの調整を行うことができる。また、入射光量を調整したい場合、絞りの開口面積を制御することで、所望の明るさとなるように光量を調整することができる。レンズを透過した光は、撮像素子１０２で結像される。 The optical lens 101 includes a lens and a motor for driving the lens. The optical lens 101 operates based on a control signal, can optically enlarge or reduce an image, and can adjust a focal length or the like. Moreover, when it is desired to adjust the amount of incident light, the amount of light can be adjusted so as to achieve desired brightness by controlling the aperture area of the diaphragm. Light transmitted through the lens forms an image on the imaging element 102 .

撮像素子１０２は、ＣＣＤセンサやＣＭＯＳセンサなどが用いられ、光学的な信号を電気的な信号に変換する役割を果たす。撮像素子１０２は、制御信号に基づいて駆動され、画素内の電荷をリセットしたり、読み出しのタイミングを制御したりする。撮像素子１０２は、電気的なアナログ信号（電圧値）として読み出された画素信号にゲイン処理を行ったり、アナログ信号をデジタル信号に変換する機能を有するが、ゲイン処理やデジタル信号への変換は撮像素子１０２の外部で行われてもよい。 A CCD sensor, a CMOS sensor, or the like is used as the imaging device 102, and serves to convert an optical signal into an electrical signal. The image sensor 102 is driven based on a control signal to reset charges in pixels and control readout timing. The image sensor 102 has a function of performing gain processing on a pixel signal read out as an electrical analog signal (voltage value) and converting the analog signal into a digital signal. It may be performed outside the imaging device 102 .

画像処理部１０４は、撮像素子１０２より出力された画像に対して、様々な画像処理を行う。画像処理部１０４は、例えば、光学レンズ１０１の特性により発生した画像周辺部の光量を補正したり、撮像素子１０２の画素ごとの感度ばらつきを補正したり、色に関する補正や、フリッカ補正などを行うことができる。また、画像処理部１０４は、ニューラルネットワークを用いたノイズ低減処理を行う機能を有する。ノイズ低減処理の詳細については後述する。なお、画像処理部１０４は、プロセッサ１０６或いは不図示のＧＰＵなどの他のプロセッサがプログラムを実行することにより実現されてよい。 The image processing unit 104 performs various image processing on the image output from the image sensor 102 . The image processing unit 104 corrects, for example, the amount of light in the peripheral portion of the image caused by the characteristics of the optical lens 101, corrects variations in sensitivity of each pixel of the image sensor 102, corrects colors, corrects flicker, and the like. be able to. The image processing unit 104 also has a function of performing noise reduction processing using a neural network. Details of the noise reduction processing will be described later. Note that the image processing unit 104 may be implemented by executing a program by the processor 106 or another processor such as a GPU (not shown).

フレームメモリ１０３は、揮発性の記憶媒体を含む。フレームメモリ１０３は、例えば、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）と呼ばれ、映像信号を一時的に記憶させておき、必要な時に読み出すことが可能な素子である。映像信号は膨大なデータ量であるため、高速かつ高容量のものが求められる。近年では、例えば、ＤＤＲ４－ＳＤＲＡＭ（ＤｕａｌＤａｔａＲａｔｅ４－ＳｙｎｃｈｒｏｎｏｕｓＤｙｎａｍｉｃＲＡＭ）などが用いられる。このフレームメモリ１０３を使えば、様々な処理が可能となる。フレームメモリ１０３は、例えば、時間的に異なる画を合成したり、必要な領域だけを切り出すなど、画像処理を行う上で有用である。 Frame memory 103 includes a volatile storage medium. The frame memory 103 is called a RAM (Random Access Memory), for example, and is an element that temporarily stores video signals and can be read out when necessary. Since the video signal has a huge amount of data, a high-speed and high-capacity signal is required. In recent years, for example, DDR4-SDRAM (Dual Data Rate 4-Synchronous Dynamic RAM) is used. Using this frame memory 103 enables various kinds of processing. The frame memory 103 is useful for performing image processing such as synthesizing temporally different images and cutting out only a necessary area.

プロセッサ１０６は、１つ以上のプロセッサで構成されてよく、プロセッサは、例えば、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）を含む。プロセッサ１０６は、ＣＰＵのほか、ＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）や機械学習などの特定の演算を高速に処理するための特定用途用のプロセッサを含んでもよい。プロセッサ１０６は、撮像装置１００の各機能を制御するための制御部として機能する。プロセッサ１０６には、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）やＲＡＭが接続されている。ＲＯＭ１０５は、不揮発性の記憶媒体であり、プロセッサ１０６を動作させるためのプログラムや、各種調整パラメータなどが記憶されている。また、ＲＯＭ１０５は、推論処理を実行するための学習済みのモデルの情報、例えばディープニューラルネットワーク（単にニューラルネットワークともいう）の学習済みの重みパラメータやハイパーパラメータを含んでよい。ＲＯＭ１０５から読み出されたプログラムは揮発性のＲＡＭ１０７に展開されて実行される。ＲＡＭ１０７は、フレームメモリ１０３に比べて、低速、低容量のものが使用されてもよい。ニューラルネットワークは、例えば、入力画像のノイズを低減した画像を出力するように構成されるが、これに限らず、入力画像に対して何等かの所定の画像処理を施してその結果画像を出力するように構成されてよい。 The processor 106 may be composed of one or more processors, and the processor includes, for example, a CPU (Central Processing Unit). The processor 106 may include a CPU, a GPU (Graphics Processing Unit), and a special-purpose processor for high-speed processing of specific operations such as machine learning. The processor 106 functions as a control unit for controlling each function of the imaging device 100 . A ROM (Read Only Memory) and a RAM are connected to the processor 106 . The ROM 105 is a non-volatile storage medium, and stores programs for operating the processor 106, various adjustment parameters, and the like. In addition, the ROM 105 may contain information of a trained model for executing inference processing, such as weight parameters and hyperparameters of a deep neural network (also simply referred to as a neural network) that have been trained. A program read from the ROM 105 is developed in the volatile RAM 107 and executed. The RAM 107 may be slower and have a lower capacity than the frame memory 103 . The neural network, for example, is configured to output an image in which the noise of the input image is reduced, but is not limited to this, and performs some predetermined image processing on the input image and outputs the resulting image. It may be configured as

メタデータ抽出部１１２は、例えば、レンズ駆動条件およびセンサ駆動条件等のメタデータ情報を抽出する。画像処理部１０４により生成された画像は映像出力駆動部１０８、および、映像端子１０９を介して撮像装置１００の外部へ出力される。画像を外部へ出力するインタフェースは、外部モニタなどにリアルタイムな映像を表示可能にする。インタフェースは、例えば、ＳＤＩ（ＳｅｒｉａｌＤｉｇｉｔａｌＩｎｔｅｒｆａｃｅ）、ＨＤＭＩ（登録商標）（ＨｉｇｈＤｅｆｉｎｉｔｉｏｎＭｕｌｔｉｍｅｄｉａＩｎｔｅｒｆａｃｅ）、他にもＤｉｓｐｌａｙＰｏｒｔ（登録商標）など様々なインタフェースであってよい。画像処理部１０４により生成された画像は表示駆動部１１０、および、表示部１１１を介して表示デバイスに表示される。 The metadata extraction unit 112 extracts metadata information such as lens drive conditions and sensor drive conditions. An image generated by the image processing unit 104 is output to the outside of the imaging device 100 via the video output driving unit 108 and the video terminal 109 . An interface for outputting images to the outside enables real-time images to be displayed on an external monitor or the like. The interface may be, for example, SDI (Serial Digital Interface), HDMI (registered trademark) (High Definition Multimedia Interface), DisplayPort (registered trademark), and other various interfaces. An image generated by the image processing unit 104 is displayed on a display device via the display driving unit 110 and the display unit 111 .

表示部１１１は、ユーザが表示内容を視認することができる表示デバイスである。表示部１１１は、例えば、画像処理部１０４で処理された映像や設定メニューなどを表示可能であり、ユーザは撮像装置１００の動作状況を確認することができる。表示部１１１は、例えば、表示デバイスとしてＬＣＤ（ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ）や有機ＥＬ（Ｅｌｅｃｔｒｏｌｕｍｉｎｅｓｃｅｎｃｅ）といった、小型で低消費電力のデバイスを利用可能である。また、表示部１１１は、タッチパネルと呼ばれる抵抗膜式や静電容量式の薄膜素子などが兼備される場合もある。プロセッサ１０６は、撮像装置１００の設定状態などをユーザに知らせるための文字列や、撮像装置１００の設定をするためのメニューを生成し、画像処理部１０４で処理された映像に重畳して、表示部１１１に表示する。文字情報の他にも、ヒストグラム、ベクトルスコープ、波形モニタ、ゼブラ、ピーキング、フォルスカラーなどのような撮影アシスト表示が重畳されてもよい。 The display unit 111 is a display device that allows the user to view display content. The display unit 111 can display, for example, images processed by the image processing unit 104 , setting menus, and the like, so that the user can check the operation status of the imaging device 100 . The display unit 111 can use, for example, a small and low power consumption device such as an LCD (Liquid Crystal Display) or an organic EL (Electroluminescence) as a display device. In some cases, the display unit 111 also includes a resistive or capacitive thin film element called a touch panel. The processor 106 generates a character string for informing the user of the setting state of the imaging device 100 and a menu for setting the imaging device 100, and superimposes them on the image processed by the image processing unit 104 for display. Displayed in section 111 . In addition to the character information, shooting assist displays such as histogram, vector scope, waveform monitor, zebra, peaking, false color, etc. may be superimposed.

＜画像処理システムの概要＞
次に、本実施形態の画像処理システムについて、図２を参照して説明する。画像処理システムでは、後述する学習処理を実行可能なシステムである。画像処理システムは、撮像装置２００、画像処理装置２１０、表示装置２２０、ストレージ装置２３０から構成されるシステムである。撮像装置２００の光学レンズ２０１及び撮像素子２０２の構成は、撮像装置１００の光学レンズ１０１及び撮像素子１０２と実質的に同一の構成である。また、画像処理装置２１０のフレームメモリ２０３、画像処理部２０４、ＲＯＭ２０５、プロセッサ２０６、ＲＡＭ２０７はそれぞれ、フレームメモリ１０３、画像処理部１０４、ＲＯＭ１０５、プロセッサ１０６、ＲＡＭ１０７と実質的に同一の構成である。また、画像処理装置２１０のメタデータ抽出部２０８、内部バス２１８は、それぞれ、撮像装置１００のメタデータ抽出部２０８、内部バス１１３と実質的に同一の構成である。このため、撮像装置１００の構成と実質的に同一である構成についての詳細な説明は省略する。 <Overview of image processing system>
Next, the image processing system of this embodiment will be described with reference to FIG. The image processing system is a system capable of executing learning processing, which will be described later. The image processing system is a system composed of an imaging device 200 , an image processing device 210 , a display device 220 and a storage device 230 . The configurations of the optical lens 201 and the imaging device 202 of the imaging device 200 are substantially the same as the optical lens 101 and the imaging device 102 of the imaging device 100 . Frame memory 203, image processing unit 204, ROM 205, processor 206, and RAM 207 of image processing apparatus 210 have substantially the same configuration as frame memory 103, image processing unit 104, ROM 105, processor 106, and RAM 107, respectively. The metadata extraction unit 208 and internal bus 218 of the image processing device 210 have substantially the same configurations as the metadata extraction unit 208 and internal bus 113 of the imaging device 100, respectively. Therefore, detailed description of the configuration that is substantially the same as the configuration of the imaging device 100 will be omitted.

撮像装置２００におけるカメラ制御部２０９は、画像処理装置２１０のカメラ通信接続部２１２から出力された通信信号に基づいて、光学レンズ２０１および撮像素子２０２の駆動制御を行う。 A camera control unit 209 in the imaging device 200 performs drive control of the optical lens 201 and the imaging element 202 based on the communication signal output from the camera communication connection unit 212 of the image processing device 210 .

画像処理装置２１０の画像信号受信部２１１は、撮像装置２００の撮像素子２０２から出力された画像信号を受信する受信部である。ＧＰＵ２１３は、１つ以上のＧＰＵを含み、画像処理部１０４或いはプロセッサ１０６の指示に応じてニューラルネットワークの学習処理を実行可能である。学習処理を実行する際には膨大な計算が必要とされるため、本実施形態では、ＣＰＵよりも画像処理に関して処理能力の高いＧＰＵを用いている。ＧＰＵ２１３は、また、表示装置２２０に表示させるための画像を生成するために使用されてもよい。その際、ＧＰＵ２１３からの制御により生成された画像は、表示駆動部２１６と表示装置接続部２１７を介して、表示装置２２０に表示される。 An image signal receiving unit 211 of the image processing device 210 is a receiving unit that receives an image signal output from the imaging device 202 of the imaging device 200 . The GPU 213 includes one or more GPUs, and can execute neural network learning processing according to instructions from the image processing unit 104 or the processor 106 . Since a huge amount of calculation is required when executing the learning process, the present embodiment uses a GPU, which has a higher processing capacity for image processing than the CPU. GPU 213 may also be used to generate images for display on display device 220 . At that time, an image generated under the control of the GPU 213 is displayed on the display device 220 via the display drive section 216 and the display device connection section 217 .

ストレージ装置２３０は、膨大な画像データを訓練画像として記憶させるために用いることができる。また、ストレージ装置２３０は、学習処理によって更新されたネットワークパラメータ（ニューラルネットワークの重みパラメータ等）やハイパーパラメータ等を記憶してもよい。画像処理装置２１０は、システムが有するストレージ駆動部２１４とストレージ接続部２１５を介して、ストレージ装置２３０とデータのやり取りを行う。 Storage device 230 can be used to store a large amount of image data as training images. The storage device 230 may also store network parameters (such as neural network weight parameters) and hyperparameters updated by the learning process. The image processing device 210 exchanges data with the storage device 230 via the storage drive unit 214 and storage connection unit 215 of the system.

なお、本実施形態では、学習処理の動作時には図２に示した画像処理システムを使用し、推論処理の動作時には図１に示した撮像装置１００を使用する場合を例に説明する。しかし、このような使用に限定されず、例えば図２に示した画像処理システムにおいて推論処理を実行してもよい。また、本実施形態では、一例として、訓練画像がベイヤー配列の画像であることを前提としている。しかし、三板式の撮像センサを用いて撮影された画像を使用しても良いし、ＦＯＶＥＯＮセンサのように垂直色分離方式の撮像センサなどで撮影された画像を使用してもよい。また、ベイヤー配列にとどまらず、その他の配列（ハニカム構造、Ｘ－ＴｒａｎｓＣＭＯＳセンサのフィルタ配列など）に関しても同様である。ベイヤー配列の画像の場合、ベイヤー配列の１ｃｈのままとしても良いし、カラーチャネルごとに分離し、訓練画像としてもよい。さらに、本実施形態において、ニューラルネットワークに入力する訓練画像やニューラルネットワークから出力される画像が１枚である場合を例に説明するが、複数枚の画像を入力したり出力したりするニューラルネットワークを用いてもよい。 In this embodiment, an example will be described in which the image processing system shown in FIG. 2 is used during learning processing, and the imaging apparatus 100 shown in FIG. 1 is used during inference processing. However, the use is not limited to such use, and the inference processing may be executed in the image processing system shown in FIG. 2, for example. Further, in the present embodiment, as an example, it is assumed that the training images are images of Bayer array. However, an image captured using a three-chip image sensor may be used, or an image captured by a vertical color separation type image sensor such as a FOVEON sensor may be used. In addition to the Bayer array, other arrays (honeycomb structure, X-Trans CMOS sensor filter array, etc.) are similar. In the case of a Bayer array image, 1ch of the Bayer array may be left as is, or may be separated for each color channel and used as a training image. Furthermore, in the present embodiment, the case where one training image is input to the neural network and one image is output from the neural network will be described as an example. may be used.

（実施形態１）
＜撮像装置における推論処理の動作＞
次に、図３及び図４を参照して、撮像装置１００における推論処理の動作について説明する。なお、図３に示す一連の動作は、例えばプロセッサ１０６がＲＯＭ１０５に記憶されたプログラムを実行することにより、撮像装置１００の各部を制御して実現される。また、画像処理部１０４による動作は、プロセッサ１０６、或いは不図示のＧＰＵなどの他のプロセッサが、ＲＯＭ１０５に記憶されたプログラムを実行することにより実現されてよい。 (Embodiment 1)
<Operation of Inference Processing in Imaging Apparatus>
Next, operation of the inference processing in the imaging device 100 will be described with reference to FIGS. 3 and 4. FIG. A series of operations shown in FIG. 3 are realized by controlling each unit of the imaging apparatus 100 by executing a program stored in the ROM 105 by the processor 106, for example. Also, the operation of the image processing unit 104 may be realized by executing a program stored in the ROM 105 by the processor 106 or another processor such as a GPU (not shown).

ステップＳ３００１で、プロセッサ１０６は、ＲＯＭ１０５に記録されているニューラルネットワークのパラメータ（ネットワークパラメータ）を、画像処理部１０４にあるニューラルネットワークに設定する。後述するように、ネットワークパラメータは、例えば、ニューラルネットワークを構成する重みやバイアスなどである。ステップＳ３００１で設定するネットワークパラメータは、例えば、図２に示した画像処理システムの学習処理により予め算出されている。 In step S3001 , the processor 106 sets the neural network parameters (network parameters) recorded in the ROM 105 to the neural network in the image processing unit 104 . As will be described later, the network parameters are, for example, weights and biases that make up the neural network. The network parameters set in step S3001 are calculated in advance by learning processing of the image processing system shown in FIG. 2, for example.

ステップＳ３００２で、撮像素子１０２は、第１の画像を取得し、取得した画像を画像処理部１０４に出力する。ステップＳ３００３で、画像処理部１０４は第１の画像に対して補正処理を行う。ここでの補正処理は、光学レンズ１０１や撮像素子１０２のばらつきを低減するための補正処理などであり、例えば、周辺光量の補正や画素ごとの感度ばらつきの補正などである。ただし、補正処理をする必要のない場合はこのステップを行わなくてもよい。 In step S3002 , the image sensor 102 acquires the first image and outputs the acquired image to the image processing unit 104 . In step S3003, the image processing unit 104 performs correction processing on the first image. The correction processing here is correction processing for reducing variations in the optical lens 101 and the image sensor 102, and includes, for example, correction of peripheral light amount and correction of sensitivity variations for each pixel. However, this step may not be performed if correction processing is not required.

ステップＳ３００４で、画像処理部１０４は、補正処理が施された画像に、デジタルゲインをかける。なお、デジタルゲインの処理では、画像処理部１０４は、画素値がオフセットを持っている場合には、画素値から当該オフセットを引いてからゲインをかけ、その後、ゲインをかけた画素値にオフセットを足す。 In step S3004, the image processing unit 104 applies a digital gain to the corrected image. In the digital gain process, if the pixel value has an offset, the image processing unit 104 subtracts the offset from the pixel value and then multiplies the gain, and then applies the offset to the pixel value multiplied by the gain. add

ステップＳ３００５で、画像処理部１０４は、ステップＳ３００４でデジタルゲインをかけた画像の各画素値からオフセットを引いた第２の画像を生成する。ここでのオフセットは撮像素子１０２で加算された黒レベルのことである。ステップＳ３００６で、画像処理部１０４は、第２の画像の各画素値を正規化した第３の画像を生成する。なお、本実施形態では、ステップＳ３００２で撮像素子１０２から取得した第１の画像の各画素値は１４ビットであり、ステップＳ３００４で生成される第２の画像まで、各画素値は、１４ビットのビット数のデータで表される。ここでの正規化は１４ビットの各画素値を０から１の範囲に正規化するために各画素値を２の１４乗で除算し、計算結果は小数点以下を含んだｆｌｏａｔ３２などの形式で取り扱う。 In step S3005, the image processing unit 104 generates a second image by subtracting the offset from each pixel value of the image multiplied by the digital gain in step S3004. The offset here is the black level added by the image sensor 102 . In step S3006, the image processing unit 104 generates a third image by normalizing each pixel value of the second image. Note that in this embodiment, each pixel value of the first image acquired from the image sensor 102 in step S3002 is 14 bits, and each pixel value up to the second image generated in step S3004 is 14 bits. It is represented by data of the number of bits. Normalization here divides each pixel value by 2 to the 14th power in order to normalize each pixel value of 14 bits to the range of 0 to 1, and the calculation result is handled in a format such as float32 including decimal places. .

ステップＳ３００７で、画像処理部１０４は、第３の画像の各画素値にガンマ補正をかけた第４の画像を生成する。ここでのガンマ補正は、以下の式（１）に従って適用される。例えば、本実施形態におけるガンマ補正は、明るさが低いほど多くの階調が割り当てられる特性を有する。

In step S3007, the image processing unit 104 generates a fourth image by applying gamma correction to each pixel value of the third image. The gamma correction here is applied according to equation (1) below. For example, the gamma correction in this embodiment has the characteristic that more gradations are assigned to lower brightness.

ステップＳ３００８で、画像処理部１０４は、第４の画像の各画素値が８ビットになるように正規化を解除した第５の画像を生成する。ここでの正規化解除では、画像処理部１０４は、８ビットへ正規化解除するために各画素値に２の８乗を乗算する。計算結果はＩＮＴ８などの形式で取り扱われる。すなわち、第５の画像の画素値は、８ビットのビット数で表される。 In step S3008, the image processing unit 104 generates a fifth image that is denormalized so that each pixel value of the fourth image becomes 8 bits. In the denormalization here, the image processing unit 104 multiplies each pixel value by 2 to the eighth power to denormalize to 8 bits. The calculation result is handled in a format such as INT8. That is, the pixel value of the fifth image is represented by 8 bits.

ここで、図４は、横軸をガンマ補正前の画素値の値、縦軸をガンマ補正後の画素値とするガンマ補正の特性を示している。式（１）においてγが１以上の場合、第４の画像の各画素値は図４に示すガンマカーブを描く。すなわち、このようなガンマ補正により、ガンマ補正前の低輝度域の画素値に多くの値が割り当てられたガンマ補正結果を得ることができるため、ステップＳ３００８で８ビットへ正規化解除された際に下位ビットの階調を維持することが可能となる。 Here, FIG. 4 shows gamma correction characteristics with the horizontal axis representing pixel values before gamma correction and the vertical axis representing pixel values after gamma correction. When γ is 1 or more in Equation (1), each pixel value of the fourth image draws a gamma curve shown in FIG. That is, with such gamma correction, it is possible to obtain a gamma correction result in which many values are assigned to the pixel values in the low-luminance region before gamma correction. It becomes possible to maintain the gradation of the lower bits.

ステップＳ３００９で、画像処理部１０４は、第５の画像をニューラルネットワークに入力する。ここでのニューラルネットワークは、ステップＳ３００７でガンマ補正された画像に対して最適にノイズ除去を行うように学習された、学習済みのニューラルネットワークである。 In step S3009, the image processing unit 104 inputs the fifth image to the neural network. The neural network here is a trained neural network trained to optimally remove noise from the gamma-corrected image in step S3007.

ステップＳ３０１０で、画像処理部１０４は、ニューラルネットワークから出力される第６の画像の各画素値を正規化した第７の画像を生成する。ここでの正規化では、画像処理部１０４は、８ビットの各画素値を０から１の範囲に正規化するために各画素値を２の８乗で除算する。計算結果は小数点以下を含んだｆｌｏａｔ３２などの形式で取り扱われる。 In step S3010, the image processing unit 104 generates a seventh image by normalizing each pixel value of the sixth image output from the neural network. In this normalization, the image processing unit 104 divides each 8-bit pixel value by 2 to the eighth power to normalize each 8-bit pixel value to a range of 0 to 1. FIG. The calculation result is handled in a format such as float32 including decimal places.

ステップＳ３０１１で、画像処理部１０４は、第７の画像の各画素値にデガンマ補正をかけた第８の画像を生成する。ここでのデガンマ補正は、例えば、以下の式（２）に従って適用される。

In step S3011, the image processing unit 104 generates an eighth image by applying degamma correction to each pixel value of the seventh image. The degamma correction here is applied, for example, according to Equation (2) below.

ステップＳ３０１２で、画像処理部１０４は、第８の画像の各画素値を１４ビットで正規化を解除した第９の画像を生成する。ここでの正規化解除では、画像処理部１０４は、１４ビットへ正規化解除するために各画素値に２の１４乗を乗算する。計算結果は１４ビットの形式で取り扱われる。なお、本実施形態では、１４ビットへ正規化を解除する場合を例に説明するが、撮像装置１００からの映像出力規格やビット数に応じて１４ビット以外へ正規化を解除してもよい。 In step S3012, the image processing unit 104 generates a ninth image by denormalizing each pixel value of the eighth image to 14 bits. In the denormalization here, the image processing unit 104 multiplies each pixel value by 2 to the 14th power to denormalize to 14 bits. Calculation results are handled in 14-bit format. In the present embodiment, a case of canceling normalization to 14 bits will be described as an example.

ステップＳ３０１３で、画像処理部１０４は、第９の画像の各画素値にオフセットを足した第１０画像を生成する。なお、本実施形態では、階調圧縮としてのガンマ補正と階調伸長としてのデガンマ補正とをそれぞれ用いる場合を例に説明したが、他の方法を用いてもよい。また、本実施形態では、ニューラルネットワークへ入力する前にデジタルゲインをかける場合を例に説明した。しかし、ニューラルネットワークを通した後にデジタルゲインをかけるようにしてもよい。この場合、各ニューラルネットワークはデジタルゲインをかける前の画像で最適に学習されたものを用いる。 In step S3013, the image processing unit 104 generates a tenth image by adding an offset to each pixel value of the ninth image. In this embodiment, gamma correction as gradation compression and degamma correction as gradation expansion are used as an example, but other methods may be used. Further, in the present embodiment, the case of applying a digital gain before inputting to the neural network has been described as an example. However, the digital gain may be applied after passing through the neural network. In this case, each neural network uses an optimally learned image before digital gain is applied.

＜画像処理システムにおける学習処理の動作＞
次に、図１０及び図１１を参照して、画像処理システム（撮像装置２００、画像処理装置２１０、表示装置２２０、ストレージ装置２３０）における学習処理の動作について説明する。なお、図１０に示す学習処理の動作は、画像処理装置２１０のプロセッサ２０６が、ＲＯＭ２０５に記憶されたプログラムをＲＡＭ２０７に展開、実行し、画像処理装置２１０の各部（画像処理部２０４やＧＰＵ２１３等）を制御して実現される。また、画像処理部２０４による動作は、ＧＰＵ２１３がＲＯＭ２０５に記憶されたプログラムを実行することにより実現されてよい。 <Operation of learning processing in image processing system>
Next, with reference to FIGS. 10 and 11, operation of learning processing in the image processing system (imaging device 200, image processing device 210, display device 220, storage device 230) will be described. 10, the processor 206 of the image processing apparatus 210 develops and executes a program stored in the ROM 205 in the RAM 207, and each unit of the image processing apparatus 210 (image processing unit 204, GPU 213, etc.) is realized by controlling Also, the operation of the image processing unit 204 may be realized by the GPU 213 executing a program stored in the ROM 205 .

ステップＳ９００１で、プロセッサ２０６は、ストレージ装置２３０から、訓練画像（ノイズ画像）と正解画像（教師画像）とを取得する。ここで、訓練画像とはノイズを含んだ画像である。正解画像とは、訓練画像と同一の被写体が写っており、且つノイズの無い（又は非常に少ない）画像である。訓練画像は、例えば、ノイズの影響が少ない正解画像に対して、シミュレーションでノイズを付与することで生成することができる。また、正解画像と同一の被写体を実際にノイズが発生しうる状況（例えば、高感度設定）で撮影した画像を用いても良い。この場合、例えば、訓練画像は低感度で撮影された画像であり、正解画像は高感度で撮影された画像、又は低照度で撮影された画像に感度補正を行って、正解画像と同程度の明るさになるよう補正された画像である。学習処理の動作において含まれていないノイズパターンや被写体の構造（エッジなど）は、後の推論処理の動作で精度よく推論することができない。そのため、様々なノイズパターンや被写体の構造が含まれるように生成された、複数の訓練画像と正解画像が準備されている。なお、ノイズ量は単一でも良いし、複数のノイズ量を混ぜても良い。 In step S9001 , the processor 206 acquires a training image (noise image) and a correct image (teacher image) from the storage device 230 . Here, the training images are images containing noise. A correct image is an image that contains the same subject as the training image and has no noise (or very little noise). A training image can be generated, for example, by adding noise in a simulation to a correct image that is less affected by noise. Alternatively, an image of the same subject as the correct image that is actually captured under conditions where noise may occur (for example, high sensitivity setting) may be used. In this case, for example, the training images are images shot with low sensitivity, and the correct images are images shot with high sensitivity or images shot with low illuminance, and the sensitivity correction is performed to obtain the same level of accuracy as the correct images. This is an image that has been corrected for brightness. Noise patterns and object structures (such as edges) that are not included in the learning processing operation cannot be accurately inferred in the later inference processing operation. Therefore, a plurality of training images and correct images are prepared, which are generated so as to include various noise patterns and subject structures. A single noise amount may be used, or a plurality of noise amounts may be mixed.

ステップＳ９００２で、プロセッサ２０６は、ステップＳ９００１で取得された訓練画像と正解画像を信号の上限値（飽和輝度値）で除算することで正規化し、上述の式（１）に従って各画素に対してガンマ補正をかける。ステップＳ９００３で、プロセッサ２０６は、ステップＳ９００２でガンマ補正された複数の訓練画像のうちの少なくとも一つを選択し、選択した訓練画像を画像処理部２０４のニューラルネットワークへ入力して、出力画像を生成する。このとき、学習処理の動作で用いる訓練画像のノイズ量は、他の訓練画像と同一でも良いし、変更されていても良い。 At step S9002, the processor 206 normalizes the training image and the correct image obtained at step S9001 by dividing by the upper limit value (saturation luminance value) of the signal, and gamma make corrections. At step S9003, the processor 206 selects at least one of the plurality of training images gamma-corrected at step S9002 and inputs the selected training image to the neural network of the image processing unit 204 to generate an output image. do. At this time, the noise amount of the training image used in the operation of the learning process may be the same as that of other training images, or may be changed.

図１１を参照して、ニューラルネットワークで行われる処理について説明する。図１１は、ニューラルネットワークによる処理を模式的に示している。図１１に示す例では、畳み込みニューラルネットワーク（ＣＮＮ）を例に説明しているが、本実施形態はＣＮＮに限定されるものではない。画像を出力するニューラルネットワークとして、ＧＡＮ（ＧｅｎｅｒａｔｉｖｅＡｄｖｅｒｓａｒｉａｌＮｅｔｗｏｒｋ）を用いても良い。或いは、ニューラルネットワークが、スキップコネクションなどを有しても良いし、ＲＮＮ（ＲｅｃｕｒｒｅｎｔＮｅｕｒａｌＮｅｔｗｏｒｋ）などのように再帰型のニューラルネットワークであっても良い。 Processing performed by the neural network will be described with reference to FIG. FIG. 11 schematically shows processing by a neural network. In the example shown in FIG. 11, a convolutional neural network (CNN) is described as an example, but this embodiment is not limited to CNN. A GAN (Generative Adversarial Network) may be used as a neural network that outputs an image. Alternatively, the neural network may have a skip connection or the like, or may be a recursive neural network such as an RNN (Recurrent Neural Network).

図１１に示す入力画像１００１は、ニューラルネットワークに入力する画像又は後述の特徴マップを表す。演算記号１００２は、畳み込み演算を表す。畳み込み行列１００３は、入力画像１００１に対して畳み込み演算を行うフィルタである。バイアス１００４は、入力画像１００１と畳み込み行列１００３との畳み込み演算によって出力された結果に加算する。特徴マップ１００５は、バイアス１００４を加算した畳み込み演算結果である。なお、図１１では、簡単のために各ニューロンや中間層、チャネル数を少なく描いているが、ニューロンや層の数、またニューロン間の結合の数や重みなど、これに限定されるものではない。また、ＦＰＧＡなどに実装する際に、ニューロン間の結合や重みを削減しても良い。本実施形態では、複数のカラーチャネルをまとめて学習処理の動作及び推論処理の動作を実行しているが、各色ごとに個別に学習処理の動作及び推論処理の動作を実行しても良い。 An input image 1001 shown in FIG. 11 represents an image to be input to the neural network or a feature map, which will be described later. Operation symbol 1002 represents a convolution operation. A convolution matrix 1003 is a filter that performs a convolution operation on the input image 1001 . A bias 1004 is added to the result output by the convolution operation between the input image 1001 and the convolution matrix 1003 . A feature map 1005 is the result of the convolution operation with the bias 1004 added. In FIG. 11, the numbers of neurons, intermediate layers, and channels are reduced for the sake of simplicity, but the number of neurons and layers, the number of connections between neurons, weights, and the like are not limited to these. . Also, when implementing in FPGA or the like, the connections and weights between neurons may be reduced. In the present embodiment, the learning processing operation and the inference processing operation are executed collectively for a plurality of color channels, but the learning processing operation and the inference processing operation may be executed individually for each color.

ＣＮＮにおいて、入力画像をあるフィルタによって畳み込み演算を実行することで入力画像の特徴マップを得る。なお、フィルタの大きさは任意である。次の層では、前層の特徴マップを別のフィルタによる畳み込み演算を実行することで異なる特徴マップを得る。各層において、ある入力信号を、接続の強さを表すフィルタの重みと掛け合わせ、バイアスとの和を取る。この結果に活性化関数を適用することによって、各ニューロンにおける出力信号を得る。なお、各層における重みとバイアスをネットワークパラメータと呼び、その値を学習処理の動作によって更新する。一般に用いられる活性化関数の例としては、シグモイド関数やＲｅＬＵ関数などがあり、本実施形態においては、以下の式（３）に従うＬｅａｋｙＲｅＬＵ関数が用いられるが、これに限定するものではない。

式（３）において、ｍａｘは、引数のうち最大値を出力する関数を表す。 In CNN, a feature map of the input image is obtained by convolving the input image with a certain filter. Note that the size of the filter is arbitrary. In the next layer, a different feature map is obtained by convolving the feature map of the previous layer with another filter. At each layer, an input signal is multiplied by a filter weight representing the strength of the connection and summed with the bias. By applying an activation function to this result, we obtain the output signal at each neuron. The weights and biases in each layer are called network parameters, and their values are updated by the operation of learning processing. Examples of commonly used activation functions include a sigmoid function and a ReLU function. In the present embodiment, a Leaky ReLU function according to the following equation (3) is used, but is not limited to this.

In Expression (3), max represents a function that outputs the maximum value among the arguments.

ＣＮＮは、畳み込み演算を繰り返し実行するための複数の層を有し、その後に、例えば１層以上の全結合層を有してもよく、当該全結合層の後に出力層が接続されてよい。 A CNN may have multiple layers for repeatedly performing convolution operations, followed by, for example, one or more fully connected layers, to which an output layer may be connected.

ステップＳ９００４において、画像処理部２０４は、ニューラルネットワークの出力画像と正解画像とにそれぞれ画像処理を行う。推論処理の動作で行う画像処理と、学習処理の動作で行う画像処理の条件を合わせることで、推論時のノイズ低減処理の推論精度を向上させることができる。なお、画像処理を行うタイミングは、ステップＳ９００４及びステップＳ９００５の前であれば、いつ実行しても良い。例えば、ニューラルネットワークの入力側で実行しても良い。学習処理の動作で用いる訓練画像のノイズ量を複数パターン適用した場合、推論時に学習外のノイズ量を持つ撮像画像が入力されても、効果的にノイズ除去を行うことが可能である。訓練画像の枚数が十分でない場合は、切り取りや回転、反転などの水増し処理を行っても良い。その場合は、正解画像も同一の処理を施す必要がある。 In step S9004, the image processing unit 204 performs image processing on each of the output image of the neural network and the correct image. The inference accuracy of noise reduction processing during inference can be improved by matching the conditions of the image processing performed in the inference processing operation and the image processing performed in the learning processing operation. Note that the image processing may be performed at any time before steps S9004 and S9005. For example, it may be performed on the input side of a neural network. When a plurality of patterns of noise amount are applied to the training image used in the operation of the learning process, noise can be effectively removed even if a captured image having a noise amount outside the learning is input during inference. If the number of training images is not sufficient, padding processing such as cropping, rotation, and reversal may be performed. In that case, it is necessary to apply the same processing to the correct image.

ステップＳ９００５では、画像処理部２０４は、ステップＳ９００４で画像処理された出力画像と正解画像との誤差を算出する。正解画像も訓練画像と同様の並びの色成分の配列となっている。誤差の算出には、一般に各画素の平均二乗誤差や、各画素の差分の絶対値の総和が用いられるが、他の指標で算出しても良い。ステップＳ９００６で、画像処理部２０４は、ステップＳ９００５で算出した誤差が小さくなるように誤差逆伝播法を用いてニュートラルネットワークの各パラメータを更新する。ただし、本実施形態はこれに限定されるものではない。各パラメータの更新量は、固定でも良いし、変動させても良い。 In step S9005, the image processing unit 204 calculates the error between the output image subjected to image processing in step S9004 and the correct image. The correct image also has an array of color components arranged in the same manner as the training image. The error is generally calculated using the mean square error of each pixel or the sum of the absolute values of the differences of each pixel, but other indices may be used. In step S9006, the image processing unit 204 updates each parameter of the neural network using error backpropagation so that the error calculated in step S9005 becomes smaller. However, this embodiment is not limited to this. The update amount of each parameter may be fixed or may be varied.

次に、ステップＳ９００７において、プロセッサ２０６は、所定の終了条件を満たしたかどうかを判定し、条件を満たさない場合、ステップＳ９００１に戻り、新たに学習を進める。一方、所定の終了条件を満たす場合、ステップＳ９００８に進む。所定の終了条件とは、学習回数が規定値に達した場合でも良いし、上記誤差がある所定の値以下になった場合でも良い。又は、上記誤差の減少がほとんどなくなった場合や、ユーザの判断で終了しても良い。次に、ステップＳ９００８において、プロセッサ２０６は、学習によって更新されたネットワークパラメータやニューラルネットワークの構造などに関する情報を、ストレージ装置２３０に記憶させる。ストレージ装置２３０は、出力されたネットワークパラメータを保存するために用いても良い。本実施形態においてはストレージ装置に記憶させる前提で説明しているが、その他の記憶媒体であっても良い。 Next, in step S9007, the processor 206 determines whether or not a predetermined end condition is satisfied.If the condition is not satisfied, the process returns to step S9001 to proceed with learning again. On the other hand, if the predetermined termination condition is satisfied, the process proceeds to step S9008. The predetermined termination condition may be when the number of times of learning reaches a predetermined value, or when the error is less than or equal to a predetermined value. Alternatively, the process may be terminated when the error is almost no longer reduced, or at the discretion of the user. Next, in step S9008 , the processor 206 causes the storage device 230 to store information regarding the network parameters updated by learning, the structure of the neural network, and the like. A storage device 230 may be used to store the output network parameters. In the present embodiment, the description is based on the premise that the data is stored in the storage device, but other storage media may be used.

ステップＳ９００９では、プロセッサ２０６は、ＦＰ３２で学習されたニューラルネットワークのパラメータをＩＮＴ８にするための量子化を行う。データのビット幅やデータ型はこれに限らず、ＦＰ１６のパラメータを用いてもよいし、ＩＮＴ４に量子化を行ってもよい。ステップＳ９０１０で、プロセッサ２０６は、量子化したネットワークパラメータをパラメータ保存領域に記憶する。プロセッサ２０６は、以上の動作で学習処理の動作を終了する。本学習処理により、学習済みのニューラルネットワークを得ることができる。 In step S9009, the processor 206 quantizes the parameters of the neural network learned by the FP32 to INT8. The bit width and data type of data are not limited to these, and parameters of FP16 may be used, or INT4 may be quantized. At step S9010, the processor 206 stores the quantized network parameters in the parameter storage area. The processor 206 ends the operation of the learning process with the above operations. Through this learning process, a trained neural network can be obtained.

なお、ノイズ低減以外の処理に関しても、同様にシミュレーションで訓練画像と正解画像のペアを用意することで、学習処理の動作を実行することができる。例えば、超解像においては、正解画像をダウンサンプリングすることで訓練画像を用意することができる。このとき、正解画像と訓練画像はサイズ合わせを行っても良いし、行わなくても良い。また、ボケ除去やブレ除去（デブラー）であれば、正解画像にボケ関数を適用することで訓練画像を用意することができる。ホワイトバランス補正であれば、ホワイトバランスを適切に撮った正解画像に対し、適切に合わせていない、又は補正していない画像を訓練画像とすれば良い。カラーマトリクス補正などの色補正も同様である。欠損補間であれば、正解画像を欠損させることで訓練画像が得られる。デモザイキングであれば、三板式の撮像素子などを用いて正解画像を用意し、正解画像をベイヤー配列などで再サンプリングすることで訓練画像を用意しても良い。更に、色成分の推論では、正解画像に対して色成分を減らすことで訓練画像が用意できる。ディヘイズに関しては、霞などが無い正解画像に対して、物理現象のシミュレーションによる散乱光を付与することで、訓練画像を用意することができる。なお、動画などの複数フレームが連続する場合においては、所望のフレーム数を奥行き方向にまとめてニューラルネットワークに入力すると、より効果的にノイズの除去が可能である。 As for processing other than noise reduction, learning processing operations can be executed by similarly preparing a pair of a training image and a correct image in a simulation. For example, in super-resolution, training images can be prepared by downsampling correct images. At this time, the correct image and the training image may or may not be matched in size. In the case of deblurring or deblurring, a training image can be prepared by applying a blurring function to a correct image. In the case of white balance correction, an image that has not been properly adjusted or corrected may be used as a training image with respect to a correct image that has been photographed with an appropriate white balance. The same applies to color correction such as color matrix correction. With loss interpolation, a training image can be obtained by missing a correct image. In the case of demosaicing, correct images may be prepared using a three-chip imaging device or the like, and training images may be prepared by re-sampling the correct images using a Bayer array or the like. Furthermore, in inference of color components, a training image can be prepared by reducing the color components with respect to the correct image. As for dehaze, a training image can be prepared by adding scattered light obtained by simulating a physical phenomenon to a correct image without haze or the like. In addition, in the case where a plurality of frames such as a moving image continue, noise can be removed more effectively by collectively inputting a desired number of frames in the depth direction to the neural network.

以上説明したように本実施形態では、ニューラルネットワークの内部で画素値を表すビット数が、処理対象の画像データの画素値を表すビット数よりも小さい構成において、まず、当該画像データの階調を圧縮する。具体的には、画像データの階調を、明るさが低いほど多くの階調が割り当てられるように階調を圧縮する。そのうえで、階調が圧縮された画像データに対して、所定の画像処理を実施するニューラルネットワークを適用することにより出力画像を生成し、ニューラルネットワークから出力された画像データに階調を伸長する処理を実施する。このようにすることで、対応するビット数が限られたニューラルネットワークであっても、適切な処理を行うことが可能になる。換言すれば、画像の階調低下を抑制しながら、より演算負荷の少ないニューラルネットワークを用いることが可能になる。 As described above, in this embodiment, in a configuration in which the number of bits representing pixel values inside the neural network is smaller than the number of bits representing pixel values of image data to be processed, first, the gradation of the image data is adjusted. Compress. Specifically, the gradation of the image data is compressed so that the lower the brightness, the more gradation is assigned. Then, a neural network that performs predetermined image processing is applied to the image data whose gradation is compressed to generate an output image, and the image data output from the neural network is processed to expand the gradation. implement. By doing so, even a neural network with a limited number of corresponding bits can perform appropriate processing. In other words, it is possible to use a neural network with a smaller computational load while suppressing a decrease in image gradation.

（実施形態２）
次に、実施形態２について説明する。実施形態１では、階調圧縮のために予め特性の定められたガンマ補正を適用したが、実施形態２では、処理対象の画像の明るさに応じて特性の異なるガンマ補正を適用する点が実施形態１と異なる。しかし、撮像装置１００の構成及び画像処理システムの機能構成例は、実施形態１と実質的に同一であってよい。従って、実質的に同一の構成及び処理については同一の参照符号を付して説明を省略し、相違点について重点的に説明する。 (Embodiment 2)
Next, Embodiment 2 will be described. In the first embodiment, gamma correction with predetermined characteristics is applied for gradation compression. Different from form 1. However, the configuration of the imaging device 100 and the functional configuration example of the image processing system may be substantially the same as those of the first embodiment. Therefore, substantially the same configurations and processes are denoted by the same reference numerals, and descriptions thereof are omitted, and differences are mainly described.

＜撮像装置における推論処理の動作＞
以下、図５及び図６を参照して、撮像装置１００における推論処理の動作について説明する。なお、図５に示す一連の動作は、例えばプロセッサ１０６がＲＯＭ１０５に記憶されたプログラムを実行することにより、撮像装置１００の各部を制御して実現される。また、画像処理部１０４による動作は、プロセッサ１０６或いは不図示のＧＰＵなどの他のプロセッサがＲＯＭ１０５に記憶されたプログラムを実行することにより実現されてよい。まず、プロセッサ１０６又は画像処理部１０４は、ステップＳ３００２からステップＳ３００５において、実施形態１と同様に処理を実行し、第２の画像を生成する。 <Operation of Inference Processing in Imaging Apparatus>
The operation of inference processing in the imaging device 100 will be described below with reference to FIGS. 5 and 6. FIG. A series of operations shown in FIG. 5 are realized by controlling each unit of the imaging apparatus 100 by executing a program stored in the ROM 105 by the processor 106, for example. Further, the operation of the image processing unit 104 may be realized by executing a program stored in the ROM 105 by the processor 106 or another processor such as a GPU (not shown). First, the processor 106 or the image processing unit 104 performs the same processing as in the first embodiment in steps S3002 to S3005 to generate a second image.

ステップＳ６００１で、画像処理部１０４は、ステップＳ３００５においてオフセットが除去された第２の画像の明るさを算出する。なお、本実施形態では、第２の画像の各画素値の平均値を明るさとして算出する場合を例に説明するが、各画素値を輝度に変換した値から算出してもよい。 In step S6001, the image processing unit 104 calculates the brightness of the second image from which the offset has been removed in step S3005. In this embodiment, an example of calculating the average value of the pixel values of the second image as the brightness will be described, but the brightness may be calculated from a value obtained by converting each pixel value into luminance.

ステップＳ６００２で、プロセッサ１０６は、ＲＯＭ１０５に記録されている各画素値の平均値とガンマ補正用のγ値との関係を示す第１のルックアップテーブルを参照する。そして、プロセッサ１０６は、当該第１のルックアップテーブルに基づいて、各画素値の平均値に応じたガンマ補正用のγ値を画像処理部１０４に設定する。ステップＳ６００３で、プロセッサ１０６は、ＲＯＭ１０５に記録されている各画素値の平均値とデガンマ補正用のγ値との関係を示す第２のルックアップテーブルを参照する。そして、プロセッサ１０６は、当該第２のルックアップテーブルに基づいて、各画素値の平均値に応じたデガンマ補正のγ値を画像処理部１０４に設定する。なお、本実施形態では、各画素値の平均値に応じたデガンマ補正のγ値（特性）を設定する場合を例に説明しているが、Ｓ６００２で設定されたガンマ補正用のγ値に対応したデガンマ補正用の特性を設定してもよい。 In step S6002, the processor 106 refers to the first lookup table showing the relationship between the average value of each pixel value recorded in the ROM 105 and the γ value for gamma correction. Based on the first lookup table, the processor 106 sets a γ value for gamma correction corresponding to the average value of each pixel value in the image processing unit 104 . In step S6003, the processor 106 refers to the second lookup table showing the relationship between the average value of each pixel value recorded in the ROM 105 and the γ value for degamma correction. Based on the second lookup table, the processor 106 sets a γ value for degamma correction corresponding to the average value of each pixel value in the image processing unit 104 . Note that in this embodiment, the case of setting the γ value (characteristic) for degamma correction according to the average value of each pixel value is described as an example. A characteristic for degamma correction may be set.

ステップＳ６００４で、プロセッサ１０６は、ＲＯＭ１０５に記録されている各画素値の平均値とニューラルネットワークのパラメータとの関係を示す第３のルックアップテーブルを参照する。プロセッサ１０６は、当該第３のルックアップテーブルに基づいて、各画素値の平均値に応じたニューラルネットワークのパラメータを画像処理部１０４にあるニューラルネットワークに設定する。なお、本実施形態では、各画素値の平均値に応じたネットワークパラメータを取得する場合を例に説明しているが、ガンマ補正用のγ値ごとに対応したニューラルネットワークのパラメータを設定してもよい。例えば、段階的に異なるガンマ補正用のγ値に対してニューラルネットワークのパラメータを対応付けたルックアップテーブルを参照して、Ｓ６００２で設定されたγ値に対応するニューラルネットワークのパラメータを画像処理部１０４に設定してもよい。 In step S6004, the processor 106 refers to the third lookup table showing the relationship between the average value of each pixel value recorded in the ROM 105 and the parameters of the neural network. Based on the third lookup table, the processor 106 sets parameters of the neural network corresponding to the average value of each pixel value to the neural network in the image processing unit 104 . In the present embodiment, a case of acquiring network parameters corresponding to the average value of each pixel value is described as an example. good. For example, by referring to a lookup table in which neural network parameters are associated with γ values for gamma correction that differ in stages, the image processing unit 104 obtains neural network parameters corresponding to the γ values set in step S6002. can be set to

プロセッサ１０６は、更に、実施形態１と同様にステップＳ３００６からステップＳ３０１３を実行して、ニューラルネットワークによって生成される第８の画像にデガンマ処理等を行ってその後第１０の画像を生成する。プロセッサ１０６は、第１０の画像を生成するとその後、本処理を終了する。 The processor 106 further executes steps S3006 to S3013 in the same manner as in the first embodiment, performs degamma processing and the like on the eighth image generated by the neural network, and then generates a tenth image. After generating the tenth image, the processor 106 ends the process.

ここで、図６を参照して、本実施形態において適用するガンマ補正の特性について説明する。図６では、横軸はガンマ補正前の画素値の値を示し、縦軸はガンマ補正後の画素値を示している。図６には、上述の式（１）において、γが２．６、２．４、２．２、２．０、１．８、１．６、１．４のときのガンマカーブを描いている。本実施形態では、ステップＳ６００１からＳ６００４の動作では、各画素値の平均値に応じた特定のガンマカーブが対応付けられる。例えば、プロセッサ１０６は、画像の各画素値が所定の低輝度用閾値より低い場合にはγ＝２．６のガンマカーブ（特性）を選択する。このようにすることで、ガンマ補正前の低輝度域の画素値に多くの階調が割り当てられたガンマ補正結果を得ることができるので、ステップＳ３００８で８ビットへ正規化解除された際に低輝度域の画素の階調を維持することが可能となる。また、プロセッサ１０６は、画像の各画素値が所定の高輝度用閾値より高い場合にはγ＝１．４のガンマカーブを選択する。このようにすることで、他のガンマカーブと比較してガンマ補正前の高輝度域の画素値に多くの階調が割り当てられたガンマ補正結果を得ることができる。このため、ステップＳ３００８で８ビットへ正規化解除された際に高輝度域の画素の階調を維持することが可能となる。 Here, with reference to FIG. 6, the characteristics of gamma correction applied in this embodiment will be described. In FIG. 6, the horizontal axis indicates pixel values before gamma correction, and the vertical axis indicates pixel values after gamma correction. FIG. 6 shows gamma curves when γ is 2.6, 2.4, 2.2, 2.0, 1.8, 1.6, and 1.4 in the above equation (1). there is In this embodiment, in the operations of steps S6001 to S6004, a specific gamma curve corresponding to the average value of each pixel value is associated. For example, processor 106 selects a gamma curve (characteristic) of γ=2.6 when each pixel value of the image is below a predetermined low luminance threshold. By doing so, it is possible to obtain a gamma correction result in which many gradations are assigned to pixel values in the low-luminance region before gamma correction. It becomes possible to maintain the gradation of pixels in the luminance range. Also, the processor 106 selects a gamma curve of γ=1.4 when each pixel value of the image is higher than a predetermined threshold value for high brightness. By doing so, it is possible to obtain a gamma correction result in which more gradations are assigned to the pixel values in the high-luminance region before gamma correction compared to other gamma curves. Therefore, it is possible to maintain the gradation of pixels in the high-luminance region when denormalization to 8 bits is performed in step S3008.

＜画像処理システムにおける学習処理の動作＞
次に、図１２を参照して、画像処理システム（撮像装置２００、画像処理装置２１０、表示装置２２０、ストレージ装置２３０）における実施形態２に係る学習処理の動作について説明する。なお、図１２に示す一連の動作は、画像処理装置２１０のプロセッサ２０６が、ＲＯＭ２０５に記憶されたプログラムをＲＡＭ２０７に展開、実行し、画像処理装置２１０の各部（画像処理部２０４やＧＰＵ２１３等）を制御して実現される。また、画像処理部２０４による動作は、ＧＰＵ２１３がＲＯＭ２０５に記憶されたプログラムを実行することにより実現されてよい。まず、プロセッサ２０６又は画像処理部２０４が、実施形態１と同様に、ステップＳ９００１からステップＳ９００８までの処理を行う。 <Operation of learning processing in image processing system>
Next, with reference to FIG. 12, operation of learning processing according to the second embodiment in the image processing system (imaging device 200, image processing device 210, display device 220, storage device 230) will be described. In the series of operations shown in FIG. 12, the processor 206 of the image processing apparatus 210 develops and executes a program stored in the ROM 205 in the RAM 207, and each unit of the image processing apparatus 210 (image processing unit 204, GPU 213, etc.) is executed. Realized under control. Also, the operation of the image processing unit 204 may be realized by the GPU 213 executing a program stored in the ROM 205 . First, the processor 206 or the image processing unit 204 performs processing from step S9001 to step S9008 as in the first embodiment.

ステップＳ１０００９では、プロセッサ２０６は、全ての条件の（例えば段階的に設けられた各画素値の平均値に対応する）ニューラルネットワークのネットワークパラメータを取得したかどうかを判定する。推論処理のＳ６００４でネットワークパラメータを切り替える動作と学習処理の動作の条件を合わせることで、推論時のノイズ低減処理の推論精度を向上させることができる。そのため、推論時に複数条件で画像処理を行う（又は条件を切り替える）ことがある場合、条件ごとのネットワークパラメータを持つことに利点がある。プロセッサ２０６は、全ての条件のネットワークパラメータが取得されたと判定する場合、ステップＳ９００９に進む。一方、プロセッサ２０６は、全ての条件のネットワークパラメータが取得されていないと判定した場合、ステップＳ１００１０に進んで、条件を変更する。プロセッサ２０６は、その後、処理をステップＳ９００１に戻して再び上述の処理を行う。ネットワークパラメータは、条件ごとにパラメータ保存領域に記憶される。パラメータ保存領域は、ＲＯＭ２０５でも良いしＲＡＭ２０７でも良い。また、必要に応じてパラメータ保存領域に記憶されたネットワークパラメータは、ストレージ装置２３０に記憶させても良い。また、プロセッサ２０６は、ステップＳ９００９とＳ９０１０とを図１０と同様に実行する。 In step S10009, the processor 206 determines whether the network parameters of the neural network (eg, corresponding to the mean value of each stepped pixel value) for all conditions have been obtained. Inference accuracy of noise reduction processing at the time of inference can be improved by matching the conditions of the operation of switching network parameters and the operation of learning processing in S6004 of the inference processing. Therefore, when image processing is performed under multiple conditions (or conditions are switched) during inference, it is advantageous to have network parameters for each condition. If the processor 206 determines that network parameters for all conditions have been acquired, it moves to step S9009. On the other hand, if the processor 206 determines that the network parameters for all the conditions have not been acquired, it proceeds to step S10010 and changes the conditions. The processor 206 then returns the process to step S9001 and performs the above process again. Network parameters are stored in the parameter storage area for each condition. A parameter storage area may be the ROM 205 or the RAM 207 . Moreover, the network parameters stored in the parameter storage area may be stored in the storage device 230 as necessary. Processor 206 also executes steps S9009 and S9010 in the same manner as in FIG.

以上説明したように、本実施形態では、段階的に異なる複数のガンマ補正の特性のうち、画像の明るさに応じたガンマ補正の特性を用いて階調を圧縮するようにした。また、予め学習された複数のニューラルネットワークのネットワークパラメータのうち、画像の明るさに応じた異なるネットワークパラメータを用いて、（ノイズ低減処理などの）画像処理を実施するようにした。更に、階調を伸長する複数のデガンマ補正の特性のうち、画像の明るさに応じた特性を用いて画像データの階調を伸長するようにした。また、学習処理では、条件ごと（例えば画像の明るさごと）に最適なネットワークパラメータを取得し、記憶するようにした。このようにすることで、画像の明るさなどの条件の変化によって影響を受ける画像処理についても、推論精度が影響されにくいニューラルネットワークを得ることができる。 As described above, in the present embodiment, among a plurality of gamma correction characteristics that differ in stages, the gamma correction characteristics corresponding to the brightness of the image are used to compress the gradation. Image processing (such as noise reduction processing) is performed using different network parameters according to the brightness of the image among the network parameters of a plurality of neural networks learned in advance. Furthermore, the gradation of the image data is expanded using the characteristic corresponding to the brightness of the image among the plurality of degamma correction characteristics for expanding the gradation. Also, in the learning process, optimal network parameters are acquired and stored for each condition (for example, each image brightness). By doing so, it is possible to obtain a neural network whose inference accuracy is less likely to be affected by image processing that is affected by changes in image brightness and other conditions.

なお、上述の例では、画像のノイズを低減するニューラルネットワークを例に説明した。しかし、ノイズ低減以外の処理に関しても、同様にシミュレーションにより訓練画像と正解画像のペアを用意することで、学習処理の動作を実行することができる。超解像においては、正解画像をダウンサンプリングすることで訓練画像を用意できる。このとき、正解画像と訓練画像はサイズ合わせを行っても良いし、行わなくても良い。ボケ除去やブレ除去（デブラー）であれば、正解画像にボケ関数を適用することで訓練画像を生成することができる。ホワイトバランス補正であれば、ホワイトバランスを適切に撮った正解画像に対し、適切に合わせていない、又は補正していない画像を訓練画像とすれば良い。カラーマトリクス補正などの色補正も同様である。欠損補間であれば、正解画像を欠損させることで訓練画像が得られる。デモザイキングであれば、三板式の撮像素子などを用いて正解画像を用意し、正解画像をベイヤー配列などで再サンプリングすることで訓練画像を生成しても良い。色成分の推論では、正解画像に対して色成分を減らすことで訓練画像を生成することができる。ディヘイズに関しては、霞などが無い正解画像に対して、物理現象のシミュレーションによる散乱光を付与することで、訓練画像を生成することができる。なお、動画などの複数フレームが連続する場合においては、所望のフレーム数を奥行き方向にまとめてニューラルネットワークに入力すると、より効果的にノイズの除去が可能である。 In the above example, a neural network for reducing image noise has been described as an example. However, for processing other than noise reduction, learning processing operations can be executed by similarly preparing pairs of training images and correct images through simulation. In super-resolution, training images can be prepared by downsampling the correct images. At this time, the correct image and the training image may or may not be matched in size. For deblurring and deblurring, a training image can be generated by applying a blurring function to the correct image. In the case of white balance correction, an image that has not been properly adjusted or corrected may be used as a training image with respect to a correct image that has been photographed with an appropriate white balance. The same applies to color correction such as color matrix correction. With loss interpolation, a training image can be obtained by missing a correct image. In the case of demosaicing, a training image may be generated by preparing a correct image using a three-chip imaging device or the like and resampling the correct image using a Bayer array or the like. In inference of color components, training images can be generated by reducing the color components of correct images. As for dehaze, a training image can be generated by adding scattered light obtained by simulating a physical phenomenon to a correct image without haze or the like. In addition, in the case where a plurality of frames such as a moving image continue, noise can be removed more effectively by collectively inputting a desired number of frames in the depth direction to the neural network.

なお、本実施形態では、ルックアップテーブルを用いて明るさに応じたγ値を一意的に選択する場合を例に説明した。しかし、γ値を大きく変更すると、出力される画像の輝度変化が大きくなり見づらい画像となる場合がある。そこで、ルックアップテーブルを用いて明るさに応じたγの値を一意的に選択するのではなく、現在のγ値から明るさに応じてＲＯＭ２０５に記録されている近傍のγ値へ変更するようにしても良い。すなわち、図６に示すような、ルックアップテーブルに含まれる、段階的に異なる複数のガンマ補正のγ値（特性）のうち、現在のγ値（例えば２．２）に隣接するγ値（例えば２．４又は２．０）を選択する。この場合、γ値が一度に大きく変化することなく、時間を経て目標のγ値へ変更される。また、このようにする場合、γ値ごとにニューラルネットワークのパラメータを対応付けたルックアップテーブルを参照することで、上記近傍のγ値が設定された場合に、対応するニューラルネットワークのパラメータを設定可能である。更に、デガンマ補正については、段階的に異なるデガンマ補正の特定のうち、上記近傍のγ値に対応するデガンマ補正の特性を用いてデガンマ補正を行えばよい。また本実施形態では、ニューラルネットワークへ入力する前にデジタルゲインをかけたが、ニューラルネットワークによる処理を行った後にデジタルゲインをかけるようにしてもよい。この場合、各ニューラルネットワークはデジタルゲインをかける前の画像で最適に学習されたものを用意する。 In this embodiment, the case of uniquely selecting the γ value corresponding to the brightness using the lookup table has been described as an example. However, if the γ value is greatly changed, the luminance change of the output image becomes large and the image may become difficult to see. Therefore, instead of using a lookup table to uniquely select a γ value corresponding to the brightness, the current γ value is changed to a nearby γ value recorded in the ROM 205 according to the brightness. You can do it. That is, among a plurality of stepwise different γ values (characteristics) of gamma correction contained in a lookup table as shown in FIG. 2.4 or 2.0). In this case, the γ value is changed to the target γ value over time without a large change at once. Also, in this case, by referring to a lookup table that associates neural network parameters with each γ value, it is possible to set the corresponding neural network parameters when the above neighboring γ values are set. is. Further, with respect to degamma correction, degamma correction may be performed using characteristics of degamma correction corresponding to γ values in the vicinity of the above-described degamma correction, among stepwise different degamma correction specifications. Further, in the present embodiment, digital gain is applied before input to the neural network, but digital gain may be applied after processing by the neural network. In this case, each neural network prepares an optimally learned image before digital gain is applied.

（実施形態３）
次に、実施形態３について説明する。実施形態２では、第２の画像の明るさを取得し、取得した明るさに基づいてガンマ補正値を設定した。実施形態３では、第２の画像の領域ごとに明るさを取得し、領域の明るさに基づいてガンマ補正値を設定する点で実施形態２と異なる。しかし、撮像装置１００の構成及び画像処理システムの機能構成例は、実施形態１と実質的に同一であってよい。従って、実質的に同一の構成及び処理については同一の参照符号を付して説明を省略し、相違点について重点的に説明する。 (Embodiment 3)
Next, Embodiment 3 will be described. In the second embodiment, the brightness of the second image is obtained, and the gamma correction value is set based on the obtained brightness. Embodiment 3 differs from Embodiment 2 in that the brightness is obtained for each region of the second image and the gamma correction value is set based on the brightness of the region. However, the configuration of the imaging device 100 and the functional configuration example of the image processing system may be substantially the same as those of the first embodiment. Therefore, substantially the same configurations and processes are denoted by the same reference numerals, and descriptions thereof are omitted, and differences are mainly described.

＜撮像装置１００における推論処理の動作＞
図７を参照して、撮像装置１００における推論処理の動作について説明する。なお、図７に示す一連の動作は、例えばプロセッサ１０６がＲＯＭ１０５に記憶されたプログラムを実行することにより、撮像装置１００の各部を制御して実現される。また、画像処理部１０４による動作は、プロセッサ１０６或いは不図示のＧＰＵなどの他のプロセッサがＲＯＭ１０５に記憶されたプログラムを実行することにより実現されてよい。まず、プロセッサ１０６又は画像処理部１０４は、ステップＳ３００２からステップＳ３００５の処理を、実施形態２（図５）と同様に行って、第２の画像を生成する。 <Operation of Inference Processing in Imaging Apparatus 100>
The operation of inference processing in the imaging device 100 will be described with reference to FIG. 7 . A series of operations shown in FIG. 7 are realized by controlling each unit of the imaging apparatus 100 by executing a program stored in the ROM 105 by the processor 106, for example. Further, the operation of the image processing unit 104 may be realized by executing a program stored in the ROM 105 by the processor 106 or another processor such as a GPU (not shown). First, the processor 106 or the image processing unit 104 performs the processing from step S3002 to step S3005 in the same manner as in the second embodiment (FIG. 5) to generate a second image.

ステップＳ７００１で、プロセッサ１０６は、ＲＯＭ１０５に記録されている、領域毎の明るさを算出するための領域分割用の座標情報を取得する。そして、プロセッサ１０６は、当該領域分割用の座標情報に基づいて、第２の画像に対する分割領域の座標を画像処理部１０４に設定する。領域分割用の座標情報は、例えば、画像の左上隅画素を画像の座標原点（Ｘ，Ｙ）＝（０，０）としたときの座標原点に基づいて、領域毎の始点座標と終点座標の（Ｘ，Ｙ）で構成されてよい。或いは、領域分割用の座標情報は、領域毎の始点座標（Ｘ，Ｙ）と領域の幅と高さで構成されてもよい。また、プロセッサ１０６が、第２の画像の幅、高さ情報と、Ｘ方向、Ｙ方向の分割数の情報から領域分割用の座標情報を算出してもよい。 In step S7001 , the processor 106 acquires the coordinate information for region division for calculating the brightness of each region recorded in the ROM 105 . Then, the processor 106 sets the coordinates of the divided regions for the second image in the image processing unit 104 based on the coordinate information for dividing the regions. The coordinate information for area division is, for example, the starting point coordinates and the ending point coordinates of each area based on the coordinate origin when the upper left corner pixel of the image is the coordinate origin of the image (X, Y)=(0, 0). (X, Y). Alternatively, the coordinate information for region division may be composed of the starting point coordinates (X, Y) of each region and the width and height of the region. Alternatively, the processor 106 may calculate coordinate information for area division from information on the width and height of the second image and information on the number of divisions in the X and Y directions.

例えば、本実施形態では、図８に示すように、第２の画像をＸ方向、Ｙ方向にそれぞれ４分割することにより、合計で１６の領域（領域８０１～８１６）に分割する場合を例に説明する。図８は、撮像装置１００が、例えば、港湾監視などの用途で撮影された、暗い海と、灯台や電灯などの光とを含む画像の例を模式的に示している。画像全体の明るさを用いてガンマ補正の特性を設定する場合には灯台や電灯の光の影響を受けてしまうが、暗い海に応じた階調性を得ることが望ましい。このため、本実施形態では、ステップＳ７００２～７００４の処理により、暗い海の領域の明るさを取得する処理を行う。 For example, in this embodiment, as shown in FIG. 8, the second image is divided into four regions in the X direction and the Y direction, respectively, so that a total of 16 regions (regions 801 to 816) are obtained. explain. FIG. 8 schematically shows an example of an image including a dark sea and light from a lighthouse or an electric light, which is captured by the image capturing apparatus 100 for harbor surveillance, for example. When gamma correction characteristics are set using the brightness of the entire image, it is affected by the light from a lighthouse or an electric light, but it is desirable to obtain a gradation that corresponds to the dark sea. For this reason, in the present embodiment, the processing of steps S7002 to S7004 is performed to acquire the brightness of the dark sea area.

ステップＳ７００２で、画像処理部１０４は、ステップＳ７００１で設定された分割領域の座標に基づいて、第２の画像の領域毎の明るさを算出する。本実施形態では、第２の画像の領域毎の各画素値の平均値を明るさとして算出する場合を例に説明するが、各画素値を輝度に変換した値から算出してもよい。ステップＳ７００３で、プロセッサ１０６は、ＲＯＭ１０５で記録されている領域選択条件に基づいて、（ステップＳ７００２で算出した領域毎の各画素値の平均値に応じて）第２の画像の領域を画像処理部１０４に設定する。なお、領域選択条件は、ユーザが１以上の任意の領域を選択するようにしてもよいし、使用する領域個数と各画素値の平均値の明暗の優先度から１以上の任意の領域を選択してもよい。或いは、各画素値の平均値が所定の閾値より低い１以上の領域を選択してもよい。例えば、領域選択条件として、使用する領域個数が８個であり且つ各画素値の平均値が暗い領域を優先する場合を考える。この場合、プロセッサ１０６は、図９に示す８つの領域（領域８０４、８０８、８０９、８１１、８１２、８１３、８１５及び８１６）をステップＳ７００４で使用する第２の画像の領域として選択する。 In step S7002, the image processing unit 104 calculates the brightness of each region of the second image based on the coordinates of the divided regions set in step S7001. In the present embodiment, an example will be described in which the brightness is calculated as the average value of the pixel values in each region of the second image. In step S7003, the processor 106 selects the area of the second image (according to the average value of the pixel values for each area calculated in step S7002) based on the area selection conditions recorded in the ROM 105. Set to 104. As for the area selection condition, the user may select one or more arbitrary areas, or one or more arbitrary areas may be selected based on the number of areas to be used and the brightness priority of the average value of each pixel value. You may Alternatively, one or more regions in which the average value of each pixel value is lower than a predetermined threshold may be selected. For example, let us consider a case where the number of regions to be used is eight and a region with a dark average pixel value is prioritized as a region selection condition. In this case, processor 106 selects the eight regions shown in FIG. 9 (regions 804, 808, 809, 811, 812, 813, 815 and 816) as regions of the second image to be used in step S7004.

なお、本実施形態では上述の領域選択条件によって、ステップＳ７００４に使用する第２の画像の領域を選択しているが、領域の選択はこれに限らない。撮像装置１００が、例えば監視カメラの用途で用いられるような場合に、以下のように画像の領域を選択してもよい。例えば、前日以前の時間情報と領域毎の明るさ情報に基づいて、時間毎に使用する第２の画像の領域を選択してもよい。また、その場合も使用する領域毎の明るさを算出し、前日以前の同じ領域の明るさとの乖離が所定値より大きい場合、前日以前とは異なる特殊な条件下であると判定し、使用する領域を変更してもよい。換言すれば、前日以前の同じ領域の明るさとの差が所定値以下である領域が選択される。また、使用する領域毎の明るさを算出し、前回算出した同じ領域からの明るさの変化が前日以前の同じ領域かつ同一時間帯での明るさの変化との乖離が所定値より大きい場合に、前日以前とは異なる特殊な条件下であると判定し、使用する領域を変更してもよい。例えば、日没から夜間の時間帯において、前日以前の同一時間帯の同じ領域では暗いのに対して、当日の同一時間帯の同じ領域において明るさが所定の閾値以上上がる場合には、照明が点灯したり、点灯した照明が移動してきたりすることが考えられる。上述の処理により、このような領域がガンマ補正の特性を選択する領域として適当ではないと判定することができる。 In this embodiment, the area of the second image to be used in step S7004 is selected according to the above-described area selection conditions, but selection of the area is not limited to this. For example, when the imaging apparatus 100 is used as a surveillance camera, the image area may be selected as follows. For example, the area of the second image to be used may be selected for each hour based on the time information of the previous day and earlier and the brightness information for each area. Also in that case, the brightness of each area to be used is calculated, and if the deviation from the brightness of the same area before the previous day is greater than a predetermined value, it is determined that the brightness is different from that before the previous day, and is used. You can change the area. In other words, an area is selected in which the difference from the brightness of the same area before the previous day is equal to or less than a predetermined value. In addition, the brightness of each area to be used is calculated, and if the difference in the brightness change from the same area calculated last time is greater than a predetermined value from the brightness change in the same area and at the same time period on the previous day or earlier, , the area to be used may be changed based on the determination that there is a special condition different from that of the day before. For example, in the time period from sunset to nighttime, if the same area in the same time period before the previous day is dark, but the brightness in the same area in the same time period on the current day rises by a predetermined threshold or more, the illumination is turned off. It is conceivable that the light is turned on or that the light that is turned on moves. Through the above-described processing, it can be determined that such an area is not suitable as an area for selecting gamma correction characteristics.

ステップＳ７００４で、画像処理部１０４は、ステップＳ７００３で設定された第２の画像の領域毎の明るさを算出する。明るさの算出は、使用する領域毎の各画素値の平均値の和を使用する領域数で除算してもよい。また、ステップＳ７００２で算出する領域毎の各画素値の平均値をＲＡＭ１０７に保持しておき、使用する領域の各画素値の平均値のみを読み出し、その和を使用する領域数で除算してもよい。 In step S7004, the image processing unit 104 calculates the brightness of each area of the second image set in step S7003. The brightness may be calculated by dividing the sum of the average values of the pixel values for each area to be used by the number of areas to be used. Alternatively, the average value of each pixel value for each region calculated in step S7002 is stored in the RAM 107, only the average value of each pixel value in the region to be used is read, and the sum is divided by the number of regions to be used. good.

その後、画像処理部１０４は、ステップＳ６００２からステップＳ３０１３までの動作を図５で説明したように実行し、Ｓ３０１３の処理の後に本一連の処理を終了する。 After that, the image processing unit 104 executes the operations from step S6002 to step S3013 as described with reference to FIG. 5, and ends this series of processes after the process of step S3013.

以上説明したように、本実施形態では、画像内の所定の領域毎に明るさを算出し、算出した明るさに対応するガンマ補正と対応するニューラルネットワークを適用するようにした。このようにすることで、上述の実施形態の効果に加えて、領域ごとに明暗差の大きい画像を処理する場合に画像内の暗い領域において多くの階調が割り当てられるようになる。 As described above, in this embodiment, the brightness is calculated for each predetermined region in the image, and the gamma correction corresponding to the calculated brightness and the corresponding neural network are applied. By doing so, in addition to the effects of the above-described embodiments, when processing an image with a large difference in brightness for each area, many gradations are assigned to dark areas in the image.

なお、上述の説明では、装置内で画像或いは画像の領域内の明るさを算出する処理を例に説明したが、明るさを算出する代わりにルックアップテーブルから値を取得したり、外部の装置で算出した値を取得するようにしてもよい。 In the above description, the processing for calculating the brightness of an image or an area of an image within the device was explained as an example. You may make it acquire the value calculated by.

（実施形態４）
次に、実施形態４について説明する。実施形態４では、出力する画像の設定に応じた階調補正を画像に適用したうえで、ニューラルネットワークによる画像処理を行う。なお、撮像装置１００の構成及び画像処理システムの機能構成例は、実施形態１と実質的に同一であってよい。従って、実質的に同一の構成及び処理については同一の参照符号を付して説明を省略し、相違点について重点的に説明する。 (Embodiment 4)
Next, Embodiment 4 will be described. In the fourth embodiment, image processing is performed using a neural network after tone correction is applied to the image according to the setting of the image to be output. Note that the configuration of the imaging device 100 and the functional configuration example of the image processing system may be substantially the same as those in the first embodiment. Therefore, substantially the same configurations and processes are denoted by the same reference numerals, and descriptions thereof are omitted, and differences are mainly described.

＜撮像装置１００における推論処理の動作＞
図１３Ａ及び図１３Ｂを参照して、撮像装置１００における推論処理の動作について説明する。なお、図１３Ａ及び図１３Ｂに示す一連の動作は、例えばプロセッサ１０６がＲＯＭ１０５に記憶されたプログラムを実行することにより、撮像装置１００の各部を制御して実現される。また、画像処理部１０４による動作は、プロセッサ１０６或いは不図示のＧＰＵなどの他のプロセッサがＲＯＭ１０５に記憶されたプログラムを実行することにより実現されてよい。 <Operation of Inference Processing in Imaging Apparatus 100>
13A and 13B, the operation of inference processing in the imaging device 100 will be described. A series of operations shown in FIGS. 13A and 13B are realized by controlling each unit of the imaging apparatus 100 by executing a program stored in the ROM 105 by the processor 106, for example. Further, the operation of the image processing unit 104 may be realized by executing a program stored in the ROM 105 by the processor 106 or another processor such as a GPU (not shown).

まず、ステップＳ１３００１で、プロセッサ１０６は、撮像装置１００に設定されている画像の出力モードがＨＤＲ（ＨｉｇｈＤｙｎａｍｉｃＲａｎｇｅ）モードであるかＳＤＲ（ＳｔａｎｄａｒｄＤｙｎａｍｉｃＲａｎｇｅ）モードであるかを判定する。プロセッサ１０６は、例えば、ＲＡＭ１０７に格納されている設定値を参照すること等により、画像の出力モードがＨＤＲモードであるかＳＤＲモードであるかを判定する。プロセッサ１０６は、設定されている画像の出力モードがＨＤＲモードであると判定した場合はステップＳ１３００２に処理を進め、そうでない場合にはステップＳ１３００３に処理を進める。なお、本実施形態では、出力する画像の設定は、例えば、撮像装置１００から出力される画像がＨＤＲであるかＳＤＲであるかを示す設定である場合を例に説明する。しかし、出力する画像の設定はこれに限らない。例えば、本一連の処理により出力される画像がＨＤＲであるかＳＤＲであるかを示す設定であってもよい。 First, in step S13001, the processor 106 determines whether the image output mode set in the imaging apparatus 100 is HDR (High Dynamic Range) mode or SDR (Standard Dynamic Range) mode. The processor 106 determines whether the image output mode is the HDR mode or the SDR mode, for example, by referring to the setting values stored in the RAM 107 . If the processor 106 determines that the set image output mode is the HDR mode, the process proceeds to step S13002; otherwise, the process proceeds to step S13003. In this embodiment, an example will be described in which the setting of the image to be output is a setting indicating whether the image output from the imaging device 100 is HDR or SDR. However, the setting of the image to be output is not limited to this. For example, the setting may indicate whether the image output by this series of processes is HDR or SDR.

ステップＳ１３００２で、プロセッサ１０６はＲＯＭ１０５に記録されている第１のニューラルネットワークのパラメータを画像処理部１０４にあるニューラルネットワークに設定する。ここで、第１のニューラルネットワークは、階調数が例えば１０２４段階であるＨＤＲの画像に対応する入力に最適化されたニューラルネットワークである。ステップＳ１３００３で、プロセッサ１０６はＲＯＭ１０５に記録されている第２のニューラルネットワークのパラメータを画像処理部１０４にあるニューラルネットワークに設定する。ここで、第２のニューラルネットワークは、ＨＤＲよりも階調数が少ない（例えば２５６段階）ＳＤＲの画像に対応する入力に最適化されたニューラルネットワークである。なお、本実施形態では、出力する画像の階調数についての設定に応じてニューラルネットワークを設定する場合を例に説明している。しかし、出力する画像の輝度の最大値（或いは上限値）についての設定に応じてニューラルネットワークを設定してもよい。本実施形態では、それぞれの画像の出力モードに関連付けられたニューラルネットワークを設定する場合を例に説明している。しかし、ＨＤＲとＳＤＲの両方を１つのニューラルネットワークで処理可能である場合には、ステップＳ１３００１の判定処理を行わなくてもよい。続いて、ステップＳ３００２からステップＳ３００５までの処理が上述の実施形態と同様に実行される。 In step S13002 , the processor 106 sets the parameters of the first neural network recorded in the ROM 105 to the neural network in the image processing unit 104 . Here, the first neural network is a neural network optimized for input corresponding to an HDR image having, for example, 1024 levels of gradation. In step S13003 , the processor 106 sets the parameters of the second neural network recorded in the ROM 105 to the neural network in the image processing unit 104 . Here, the second neural network is a neural network optimized for input corresponding to an SDR image having fewer gradations than HDR (for example, 256 levels). In the present embodiment, an example is described in which a neural network is set according to the setting of the number of gradations of an image to be output. However, the neural network may be set according to the setting of the maximum value (or upper limit value) of the brightness of the image to be output. In this embodiment, a case of setting a neural network associated with the output mode of each image will be described as an example. However, if both HDR and SDR can be processed by one neural network, the determination process in step S13001 does not have to be performed. Subsequently, the processing from step S3002 to step S3005 is executed in the same manner as in the above embodiment.

ステップＳ１３００４で、プロセッサ１０６は、ステップＳ１３００１と同様に、設定されている画像の出力モードがＨＤＲモードであるかＳＤＲモードであるかを判定する。プロセッサ１０６は、画像の出力モードがＨＤＲモードであると判定した場合にはステップＳ１３００５へ処理を進め、そうでない場合にはＳ１３００７に処理を進める。 In step S13004, the processor 106 determines whether the set image output mode is the HDR mode or the SDR mode, as in step S13001. If the processor 106 determines that the image output mode is the HDR mode, the process proceeds to step S13005; otherwise, the process proceeds to step S13007.

ステップＳ１３００５で、画像処理部１０４は、第２の画像の各画素値を正規化した第３の画像を生成する。なお、本実施形態の例では、例えばステップＳ３００２で撮像素子１０２から取得した第１の画像の各画素値が１４ビットであるのに対し、ステップＳ３００４でデジタルゲインをかけて生成される画像の各画素値が１８ビットで取り扱われる場合を例に説明する。本ステップの正規化は１８ビットの各画素値を０から１の範囲に対応付ける処理であり、画像処理部１０４は各画素値を２の１８乗で除算する。計算結果は小数点以下を含んだｆｌｏａｔ３２などの形式で取り扱われる。 In step S13005, the image processing unit 104 generates a third image by normalizing each pixel value of the second image. Note that in the example of the present embodiment, for example, each pixel value of the first image acquired from the image sensor 102 in step S3002 is 14 bits, whereas each pixel value of the image generated by digital gain is applied in step S3004. A case where pixel values are handled in 18 bits will be described as an example. The normalization in this step is a process of associating each 18-bit pixel value with a range from 0 to 1, and the image processing unit 104 divides each pixel value by 2 to the 18th power. The calculation result is handled in a format such as float32 including decimal places.

ステップＳ１３００６で、画像処理部１０４は、第３の画像の各画素値にＰＱカーブのＯＥＴＦ（Ｏｐｔｏ－ＥｌｅｃｔｒｏｎｉｃＴｒａｎｓｆｅｒＦｕｎｃｔｉｏｎ）をかけた第４の画像を生成する。すなわち、画像処理部１０４は、第３の画像の各画素値にＰＱカーブのＯＥＴＦを適用することにより、第３の画像の階調を圧縮する。ＯＥＴＦについては、図１４（Ｂ）を参照して後述するが、ОＥＴＦは、明るさが低いほど多くの階調が割り当てられる特性を有する。また、ＯＥＴＦは、（ステップＳ１３００８について後述する）ＳＤＲモードの画像に対して適用されるガンマ補正の特性よりも、所定の低輝度領域において、多くの階調が割り当てられる特性を有する。つまり、画像処理部１０４は、画像の出力モードの設定において、処理対象の画像の画素値を表すビット数が大きい設定ほど、所定の低輝度領域において、より多くの階調が割り当てられる特性を用いる。 In step S13006, the image processing unit 104 generates a fourth image by multiplying each pixel value of the third image by OETF (Opto-Electronic Transfer Function) of the PQ curve. That is, the image processing unit 104 compresses the gradation of the third image by applying the OETF of the PQ curve to each pixel value of the third image. The OETF will be described later with reference to FIG. 14B, but the OETF has the characteristic that more gradations are assigned as the brightness is lower. In addition, OETF has the characteristic of assigning more gradations in a predetermined low-luminance region than the characteristic of gamma correction applied to an image in SDR mode (step S13008 will be described later). In other words, in setting the image output mode, the image processing unit 104 uses the characteristic that the greater the number of bits representing the pixel values of the image to be processed, the more gradations are assigned in a predetermined low-luminance region. .

ＰＱカーブとは、ＩＴＵ－Ｒ（ＲａｄｉｏｃｏｍｍｕｎｉｃａｔｉｏｎＳｅｃｔｏｒｏｆＩＴＵ）ＢＴ．２１００で規定されたＥＯＴＦ（Ｅｌｅｃｔｒｏ－ＯｐｔｉｃａｌＴｒａｎｓｆｅｒＦｕｎｃｔｉｏｎ）に準拠した階調値である。 The PQ curve is defined in ITU-R (Radiocommunication Sector of ITU) BT. 2100 compliant EOTF (Electro-Optical Transfer Function).

ＰＱカーブについて図１４を参照して説明する。図１４（Ａ）はＰＱカーブ（ＥＯＴＦ）の一例を示す。ＥＯＴＦは、画像信号である階調値（輝度階調値）を光出力の輝度に変換する関数に対応する。具体的には図１４（Ａ）のＥＯＴＦは以下の式４で表される。ｐ_ｉｎは、ＥＯＴＦの入力値であり、階調値（Ｒ値やＧ値、Ｂ値など）を０．０～１．０に正規化した値である。ｐ_ｉｎ＝１．０は階調値の上限（ビット数に応じた上限）に対応し、ｐ_ｉｎ＝０．０は階調値の下限に対応する。例えば、階調値のビット数が１０ビットである場合は、階調値の上限は１０２３、階調値の下限は０となる。ｐ_ｏｕｔは、ＥＯＴＦの出力値であり、輝度に比例する階調値（Ｒ値やＧ値、Ｂ値など）を０．０～１．０に正規化した階調値である。例えば、ｐ_ｏｕｔ＝０．０は０ｎｉｔに対応し、ｐ_ｏｕｔ＝１．０は１０００００ｎｉｔに対応する。ｍａｘ［ｘ、ｙ］は、ｘとｙのうち大きい方の値を出力する関数である。

The PQ curve will be explained with reference to FIG. FIG. 14A shows an example of the PQ curve (EOTF). The EOTF corresponds to a function that converts the gradation value (luminance gradation value), which is an image signal, into the luminance of the light output. Specifically, the EOTF in FIG. 14(A) is represented by Equation 4 below. The _pin is an EOTF input value, and is a value obtained by normalizing the gradation value (R value, G value, B value, etc.) to 0.0 to 1.0. p _in =1.0 corresponds to the upper limit of the gradation value (upper limit according to the number of bits), and p _in =0.0 corresponds to the lower limit of the gradation value. For example, when the number of bits of the gradation value is 10 bits, the upper limit of the gradation value is 1023 and the lower limit of the gradation value is 0. p _out is an EOTF output value, which is a gradation value obtained by normalizing a gradation value (R value, G value, B value, etc.) proportional to luminance to 0.0 to 1.0. For example, p _out =0.0 corresponds to 0 nits and p _out =1.0 corresponds to 100000 nits. max[x,y] is a function that outputs the larger of x and y.

図１４（Ｂ）は、図１４（Ａ）のＥＯＴＦとは真逆の特性を有するＯＥＴＦの一例を示している。ＯＥＴＦは、輝度を画像信号の階調値に変換する関数に対応する。具体的には、図１４（Ｂ）のＯＥＴＦは以下の式５で表される。ｑ_ｉｎは、ＯＥＴＦの入力値であり、輝度に比例する階調値（Ｒ値やＧ値、Ｂ値など）を０．０～１．０に正規化した階調値である。例えば、ｑ_ｉｎ＝０．０は０ｎｉｔに対応し、ｑ_ｉｎ＝１．０は１００００ｎｉｔに対応する。ｑ_ｏｕｔは、ＯＥＴＦの出力値であり、階調値（Ｒ値やＧ値、Ｂ値など）を０．０～１．０に正規化した値である。ｑ_ｏｕｔ＝１．０は階調値の上限（ビット数に応じた上限）に対応し、ｑ_ｏｕｔ＝０．０は階調値の下限に対応する。例えば、階調値のビット数が１０ビットである場合は、階調値の上限は１０２３、階調値の下限は０となる。

FIG. 14(B) shows an example of an OETF having characteristics opposite to those of the EOTF of FIG. 14(A). OETF corresponds to a function that converts luminance into a gradation value of an image signal. Specifically, the OETF in FIG. 14B is represented by Equation 5 below. q _in is an input value of OETF, and is a gradation value obtained by normalizing a gradation value (R value, G value, B value, etc.) proportional to luminance to 0.0 to 1.0. For example, q _in =0.0 corresponds to 0 nits and q _in =1.0 corresponds to 10000 nits. q _out is an output value of OETF, which is a value obtained by normalizing the gradation value (R value, G value, B value, etc.) to 0.0 to 1.0. q _out =1.0 corresponds to the upper limit of the gradation value (upper limit according to the number of bits), and q _out =0.0 corresponds to the lower limit of the gradation value. For example, when the number of bits of the gradation value is 10 bits, the upper limit of the gradation value is 1023 and the lower limit of the gradation value is 0.

ステップＳ１３００７で、画像処理部１０４は、ステップＳ３００５で生成した第２の画像の各画素値を所定値でクリップする処理を行い（例えば所定値以上の画素値を所定値とする）、処理後の値を正規化した第５の画像を生成する。所定値は、例えば、ＳＤＲで十分なダイナミックレンジの上限である。例えば、画像処理部１０４は１４ビットの１６３８３でクリップする。本ステップの正規化では、画像処理部１０４は、１４ビットの各画素値を０から１の範囲に正規化するために各画素値を２の１４乗で除算する。計算結果は小数点以下を含んだｆｌｏａｔ３２などの形式で取り扱われる。ステップＳ１３００８で、画像処理部１０４は、第５の画像の各画素値にガンマ補正をかけた第６の画像を生成する。ここでのガンマ補正は上述したステップＳ３００７と同様の処理となる。 In step S13007, the image processing unit 104 performs processing for clipping each pixel value of the second image generated in step S3005 by a predetermined value (for example, a pixel value equal to or greater than a predetermined value is set as a predetermined value). Generate a fifth image with normalized values. The predetermined value is, for example, the upper limit of the dynamic range sufficient for SDR. For example, the image processing unit 104 clips at 16383 of 14 bits. In the normalization of this step, the image processing unit 104 divides each pixel value by 2 to the 14th power in order to normalize each pixel value of 14 bits to the range of 0 to 1. FIG. The calculation result is handled in a format such as float32 including decimal places. In step S13008, the image processing unit 104 generates a sixth image by gamma-correcting each pixel value of the fifth image. The gamma correction here is the same processing as in step S3007 described above.

ステップＳ１３００９で、画像処理部１０４は、ステップＳ１３００６で生成した第４の画像もしくはステップＳ１３００８で生成した第６の画像の各画素値を８ビットに正規化を解除した第７の画像を生成する。本ステップの正規化解除は、上述したステップＳ３００８と同様の処理となる。ステップＳ１３０１０で、画像処理部１０４は第７の画像をニューラルネットワークに入力する。本ステップで用いられるニューラルネットワークは、ステップＳ１３００２もしくはステップＳ１３００３で設定された第１又は第２のニューラルネットワークであり、画像に対してノイズ除去を行うニューラルネットワークである。すなわち、本実施形態では、画像の出力モードの設定に応じて、予め学習された複数の前記ニューラルネットワークのパラメータのうちの異なるパラメータを用いて、ニューラルネットワークを適用する。 In step S13009, the image processing unit 104 generates a seventh image by denormalizing each pixel value of the fourth image generated in step S13006 or the sixth image generated in step S13008 to 8 bits. The cancellation of normalization in this step is the same processing as in step S3008 described above. In step S13010, the image processing unit 104 inputs the seventh image to the neural network. The neural network used in this step is the first or second neural network set in step S13002 or S13003, and is a neural network that removes noise from the image. That is, in the present embodiment, a neural network is applied using different parameters among the plurality of previously learned neural network parameters according to the setting of the image output mode.

ステップＳ１３０１１で、画像処理部１０４は、ニューラルネットワークから出力される第８の画像の各画素値を正規化した第９の画像を生成する。本ステップの正規化は、ステップＳ３０１０と同様の処理となる。ステップＳ１３０１２で、プロセッサ１０６は、ステップＳ１３００１と同様に、設定されている画像の出力モードがＨＤＲモードかＳＤＲモードかを判定する。プロセッサ１０６は、画像の出力モードがＨＤＲモードであると判定した場合はステップＳ１３０１３へ処理を進め、そうでない場合には、Ｓ１３０１５に処理を進める。 In step S13011, the image processing unit 104 generates a ninth image by normalizing each pixel value of the eighth image output from the neural network. The normalization in this step is the same processing as in step S3010. In step S13012, the processor 106 determines whether the set image output mode is the HDR mode or the SDR mode, as in step S13001. If the processor 106 determines that the image output mode is the HDR mode, the process proceeds to step S13013; otherwise, the process proceeds to step S13015.

画像の出力モードがＨＤＲモードであるため、ステップ１３０１３で、画像処理部１０４は、第９の画像の各画素値にＰＱカーブのＥＯＴＦをかけた第１０の画像を生成する。そして、ステップＳ１３０１４で、画像処理部１０４は、第１０の画像の各画素値を１８ビットで正規化を解除した第１１の画像を生成する。本ステップの正規化解除では、画像処理部１０４は、１８ビットへ正規化解除するので各画素値に２の１８乗を乗算する。計算結果は１８ビットで取り扱われる。 Since the image output mode is the HDR mode, in step 13013, the image processing unit 104 multiplies each pixel value of the ninth image by the EOTF of the PQ curve to generate the tenth image. Then, in step S13014, the image processing unit 104 generates an eleventh image by denormalizing each pixel value of the tenth image to 18 bits. In the denormalization in this step, the image processing unit 104 denormalizes to 18 bits, so each pixel value is multiplied by 2 to the 18th power. Calculation results are handled in 18 bits.

画像の出力モードがＳＤＲモードであるため、ステップＳ１３０１５で、画像処理部１０４は、第９の画像の各画素値にデガンマ補正をかけた第１２の画像を生成する。本ステップのデガンマ補正の処理はステップＳ３０１１と同様の処理となる。ステップＳ１３０１６で、画像処理部１０４は、第１２の画像の各画素値を１４ビットで正規化を解除した第１３の画像を生成する。本ステップの正規化解除はステップＳ３０１２と同様の処理となる。このように本実施形態では、画像の出力モードの設定に応じて、階調を伸長する複数の特性（ＥＯＴＦ及びデガンマ補正の特性）のうちの異なる特性を用いて画像データの階調を伸長する。 Since the image output mode is the SDR mode, in step S13015, the image processing unit 104 applies degamma correction to each pixel value of the ninth image to generate the twelfth image. The degamma correction process in this step is the same process as in step S3011. In step S13016, the image processing unit 104 generates a 13th image by denormalizing each pixel value of the 12th image to 14 bits. The cancellation of normalization in this step is the same processing as in step S3012. As described above, in this embodiment, the gradation of image data is expanded using different characteristics among a plurality of characteristics for expanding gradation (the characteristics of EOTF and degamma correction) according to the setting of the image output mode. .

ステップＳ１３０１７で、画像処理部１０４は、ステップＳ１３０１４で生成した第１１の画像もしくはステップＳ１３０１６で生成した第１３の画像と、第２の画像とをαブレンドした第１４の画像を生成する。αブレンドとは２つの画像を各画素ごとに設定された重み（α値）に基づいて合成することである。画像処理部１０４は、入力された画像の画素値に応じてαブレンドを実行する。画像処理部１０４は、第１４の画像に係数（１―α）を乗算し、第１１の画像もしくは第１３の画像に係数αを乗算した後に、乗算後の結果を加算する。このときのα値は、画素値の大きさに応じて０～１の間でリニア変換される。ただし、このαブレンドは必ずしも行う必要はない。 In step S13017, the image processing unit 104 generates a 14th image by α-blending the 11th image generated in step S13014 or the 13th image generated in step S13016 with the second image. Alpha-blending is to synthesize two images based on a weight (alpha value) set for each pixel. The image processing unit 104 performs α-blending according to the pixel values of the input image. The image processing unit 104 multiplies the 14th image by the coefficient (1-α), multiplies the 11th image or the 13th image by the coefficient α, and then adds the multiplied results. The α value at this time is linearly transformed between 0 and 1 according to the magnitude of the pixel value. However, it is not always necessary to perform this α-blending.

ステップＳ１３０１８で、画像処理部１０４は、第１４の画像の各画素値にオフセットを足した第１５の画像を生成する。プロセッサ１０６は、第１５の画像を生成すると、本一連の処理を終了する。なお、本実施形態では、階調圧縮を行うために、ＰＱカーブのＯＥＴＦやガンマ補正を適用するようにした。また、階調伸長を行うために、ＰＱカーブのＥＯＴＦやデガンマ補正を適用するようにした。しかし、階調圧縮及び階調伸長のために、他の伝達関数や変換特性を用いてもよい。 In step S13018, the image processing unit 104 generates a fifteenth image by adding an offset to each pixel value of the fourteenth image. After generating the fifteenth image, the processor 106 terminates this series of processes. Note that, in this embodiment, OETF and gamma correction of the PQ curve are applied in order to perform gradation compression. In addition, EOTF and degamma correction of the PQ curve are applied in order to extend the gradation. However, other transfer functions and transformation characteristics may be used for tone compression and tone decompression.

以上、説明したように本実施形態では、出力する画像の設定に応じた階調補正を画像に適用したうえで、ニューラルネットワークによる画像処理を行うようにした。このようにすることで、出力する画像の設定（例えばＨＤＲ又はＳＤＲ）に応じて必要なダイナミックレンジを維持しながら、ビット数が限られたニューラルネットワークの処理を行うことができる。また、ニューラルネットワークを用いた処理を行う前の画像（例えば第２の画像等）を用いて、ニューラルネットワークを用いた処理を行った後の画像とαブレンドするようにした。このようにすることでビット縮小して失われた情報を復元することが可能となる。 As described above, in the present embodiment, the tone correction corresponding to the setting of the image to be output is applied to the image, and then the image processing is performed by the neural network. By doing so, neural network processing with a limited number of bits can be performed while maintaining the necessary dynamic range according to the setting of the image to be output (for example, HDR or SDR). Also, an image (for example, a second image) before processing using a neural network is used, and an image after processing using a neural network is alpha-blended. By doing so, it becomes possible to restore information lost by bit reduction.

なお、本実施形態では、出力する画像の設定が、ＨＤＲであるかＳＤＲである（すなわち、出力する画像の階調数が異なる）場合を例に説明した。しかし、出力する画像の設定は、画像の階調を圧縮又は伸長する特性（例えばＯＥＴＦ／ＥＯＴＦ、γ値）であってもよい。或いは、出力する画像の設定は、出力する画像の画素値を表すビット数、画素値の上限値であってもよい。 In this embodiment, the case where the setting of the image to be output is HDR or SDR (that is, the number of gradations of the image to be output is different) has been described as an example. However, the setting of the image to be output may be a characteristic (for example, OETF/EOTF, γ value) that compresses or expands the gradation of the image. Alternatively, the setting of the image to be output may be the number of bits representing the pixel value of the image to be output, or the upper limit value of the pixel value.

また、本実施形態では、出力する画像の設定に応じて、階調圧縮に用いる特性、ニューラルネットワークのパラメータ、及び階調伸長に用いる特性を制御する場合を例に説明した。しかし、階調圧縮に用いる特性、ニューラルネットワークのパラメータ、及び階調伸長に用いる特性は、他の情報に基づいて制御されてもよい。例えば、階調圧縮に用いる特性等は、ニューラルネットワークに入力される画像の設定に基づいて制御されてもよい。ニューラルネットワークに入力される画像の設定は、例えば、当該画像の画素値を表すビット数、当該画像の階調数、或いは当該画像の画素値の上限値であってよい。 Further, in the present embodiment, the case where the characteristics used for tone compression, the parameters of the neural network, and the characteristics used for tone expansion are controlled according to the setting of the image to be output has been described as an example. However, the properties used for tone compression, the parameters of the neural network, and the properties used for tone expansion may be controlled based on other information. For example, the characteristics used for tone compression may be controlled based on the settings of the image input to the neural network. The setting of the image input to the neural network may be, for example, the number of bits representing the pixel values of the image, the number of gradations of the image, or the upper limit of the pixel values of the image.

（第５の実施形態）
更に、実施形態５について説明する。実施形態５では、入力する画像データをクリップしたうえで、階調処理やニューラルネットワークによる画像処理を行う。なお、撮像装置１００の構成及び画像処理システムの機能構成例は、実施形態１と実質的に同一であってよい。従って、実質的に同一の構成及び処理については同一の参照符号を付して説明を省略し、相違点について重点的に説明する。 (Fifth embodiment)
Furthermore, Embodiment 5 will be described. In the fifth embodiment, after clipping input image data, image processing using gradation processing and a neural network is performed. Note that the configuration of the imaging device 100 and the functional configuration example of the image processing system may be substantially the same as those in the first embodiment. Therefore, substantially the same configurations and processes are denoted by the same reference numerals, and descriptions thereof are omitted, and differences are mainly described.

＜撮像装置１００における推論処理の動作＞
図１５Ａ及び図１５Ｂを参照して、撮像装置１００で行う推論処理について説明する。図１５Ａ及び図１５Ｂに示す一連の動作は、例えばプロセッサ１０６がＲＯＭ１０５に記憶されたプログラムを実行することにより、撮像装置１００の各部を制御して実現される。また、画像処理部１０４による動作は、プロセッサ１０６或いは不図示のＧＰＵなどの他のプロセッサがＲＯＭ１０５に記憶されたプログラムを実行することにより実現されてよい。 <Operation of Inference Processing in Imaging Apparatus 100>
The inference processing performed by the imaging device 100 will be described with reference to FIGS. 15A and 15B. A series of operations shown in FIGS. 15A and 15B are realized by controlling each unit of the imaging apparatus 100 by executing a program stored in the ROM 105 by the processor 106, for example. Further, the operation of the image processing unit 104 may be realized by executing a program stored in the ROM 105 by the processor 106 or another processor such as a GPU (not shown).

まず、ステップＳ３００１からステップＳ３００５までの処理が上述の実施形態と同様に実行され、ニューラルネットワークのパラメータの設定や、画像の画素値からオフセットを引いた第２の画像の生成などが行われる。 First, the processing from step S3001 to step S3005 is executed in the same manner as in the above-described embodiment to set parameters of the neural network and generate a second image by subtracting the offset from the pixel values of the image.

次に、ステップＳ１５００１で、画像処理部１０４は、第２の画像の画素値を所定値でクリップした（例えば所定値以上の画素値を所定値とした）第３の画像を生成する。本実施形態では、画素値の上限値を１０ビットとして、１０２３でクリップした画像を生成する場合を例に説明しているが、この例に限らず撮像素子１０２から取得した画素値より小さいビット数であればどのビット数で画素値をクリップしてもよい。 Next, in step S15001, the image processing unit 104 generates a third image by clipping the pixel values of the second image by a predetermined value (for example, setting pixel values equal to or greater than a predetermined value to a predetermined value). In this embodiment, the case where an image clipped at 1023 is generated with the upper limit of the pixel value set to 10 bits will be described as an example. Any number of bits may be used to clip pixel values.

ステップＳ１５００２で、画像処理部１０４は、第３の画像の各画素値を正規化した第４の画像を生成する。本ステップの正規化では、画像処理部１０４は、１０ビットの各画素値を０から１の範囲に正規化するために各画素値を２の１０乗で除算する。計算結果は小数点以下を含んだｆｌｏａｔ３２などの形式で取り扱われる。ステップＳ１５００３で、画像処理部１０４は、第４の画像の各画素値にガンマ補正をかけた第５の画像を生成する。本ステップのガンマ補正の処理は上述したステップＳ３００７と同様の処理となる。 In step S15002, the image processing unit 104 generates a fourth image by normalizing each pixel value of the third image. In the normalization of this step, the image processing unit 104 divides each pixel value by 2 to the 10th power in order to normalize each pixel value of 10 bits to the range of 0 to 1. FIG. The calculation result is handled in a format such as float32 including decimal places. In step S15003, the image processing unit 104 generates a fifth image by gamma-correcting each pixel value of the fourth image. The gamma correction processing in this step is the same processing as in step S3007 described above.

ステップＳ１５００４で、画像処理部１０４は、第５の画像の各画素値を８ビットに正規化解除した第６の画像を生成する。本ステップの正規化解除は、ステップＳ３００８と同様の処理である。ステップＳ１５００５で、画像処理部１０４は、第６の画像をニューラルネットワークに入力する。このニューラルネットワークは、ステップＳ１５００３においてガンマ補正された画像に対して最適に学習された、ノイズ除去を行うニューラルネットワークである。 In step S15004, the image processing unit 104 generates a sixth image by denormalizing each pixel value of the fifth image to 8 bits. The cancellation of normalization in this step is the same processing as in step S3008. In step S15005, the image processing unit 104 inputs the sixth image to the neural network. This neural network is a neural network that removes noise that is optimally trained on the gamma-corrected image in step S15003.

ステップＳ１５００６で、画像処理部１０４は、ニューラルネットワークから出力される第７の画像の各画素値を正規化した第８の画像を生成する。ここでの正規化はステップＳ３０１０と同様の処理である。ステップＳ１５００７で、画像処理部１０４は、第８の画像の各画素値にデガンマ補正をかけた第９の画像を生成する。本ステップのデガンマ補正の処理は、ステップＳ３０１１と同様の処理である。ステップＳ１５００８で、画像処理部１０４は、第９の画像の各画素値を１０ビットで正規化を解除した第１０の画像を生成する。本ステップの正規化解除の処理では、画像処理部１０４は、１０ビットへ正規化解除するので各画素値に２の１０乗を乗算する。計算結果は１０ビットで取り扱われる。 In step S15006, the image processing unit 104 generates an eighth image by normalizing each pixel value of the seventh image output from the neural network. The normalization here is the same processing as in step S3010. In step S15007, the image processing unit 104 generates a ninth image by applying degamma correction to each pixel value of the eighth image. The degamma correction process in this step is the same process as in step S3011. In step S15008, the image processing unit 104 generates a tenth image by denormalizing each pixel value of the ninth image to 10 bits. In the denormalization process of this step, the image processing unit 104 denormalizes to 10 bits, so each pixel value is multiplied by 2 to the 10th power. Calculation results are handled in 10 bits.

ステップＳ１５００９で、画像処理部１０４は、ステップＳ１５００８で生成した第１０の画像と第２の画像をαブレンドした第１１の画像を生成する。画像処理部１０４は入力された画素値に応じてαブレンドを実施する。画像処理部１０４は、第１４の画像に係数（１―α）を乗算し、第１１の画像もしくは第１３の画像に係数αを乗算した後、乗算した結果を加算する。このときのα値は、画素値の大きさに応じて０～１の間でリニア変換される。ただし、このαブレンドは必ずしも行う必要はない。 In step S15009, the image processing unit 104 generates an eleventh image by α-blending the tenth image generated in step S15008 and the second image. The image processing unit 104 performs α-blending according to the input pixel values. The image processing unit 104 multiplies the 14th image by the coefficient (1-α), multiplies the 11th image or the 13th image by the coefficient α, and then adds the multiplication results. The α value at this time is linearly transformed between 0 and 1 according to the magnitude of the pixel value. However, it is not always necessary to perform this α-blending.

ステップＳ１５０１０で、画像処理部１０４は、第１１の画像の各画素値にオフセットを足した第１２の画像を生成する。プロセッサ１０６は、第１２の画像を生成すると、本一連の処理を終了する。 In step S15010, the image processing unit 104 generates a twelfth image by adding an offset to each pixel value of the eleventh image. After generating the twelfth image, the processor 106 ends this series of processes.

以上、説明したように本実施形態では、ニューラルネットワークに入力するデータをクリップするようにした。このようにすることで、暗部の情報を多く残した画像を生成することができる。さらに、クリップされて失われた明部の情報にニューラルネットワークを用いた処理を行う前の画像をαブレンドするようにした。このようにすることでクリップされて失われた情報を復元することが可能となる。 As described above, in this embodiment, the data input to the neural network is clipped. By doing so, it is possible to generate an image in which much of the information in the dark part remains. Furthermore, the clipped and lost bright area information is alpha blended with the image before processing using the neural network. By doing so, it is possible to restore the information that has been clipped and lost.

なお、上述の実施形態の機能を実現するソフトウェアのプログラムを実行する場合も本発明に含む。従って、本発明の機能処理をコンピュータで実現するために、該コンピュータに供給、インストールされるプログラムコード自体も本発明を実現するものである。つまり、本発明の機能処理を実現するためのコンピュータプログラム自体も本発明に含まれる。その場合、プログラムの機能を有していれば、オブジェクトコード、インタプリタにより実行されるプログラム、ＯＳに供給するスクリプトデータ等、プログラムの形態を問わない。 The present invention also includes the execution of a software program that implements the functions of the above-described embodiments. Therefore, in order to implement the functional processing of the present invention in a computer, the program code itself supplied and installed in the computer also implements the present invention. In other words, the present invention also includes the computer program itself for realizing the functional processing of the present invention. In that case, as long as it has the function of a program, the form of the program, such as an object code, a program executed by an interpreter, or script data supplied to the OS, does not matter.

プログラムを供給するための記録媒体としては、例えば、ハードディスク、磁気テープ等の磁気記録媒体、光／光磁気記憶媒体、不揮発性の半導体メモリでもよい。また、プログラムの供給方法としては、コンピュータネットワーク上のサーバに本発明を形成するコンピュータプログラムを記憶し、接続のあったクライアントコンピュータがコンピュータプログラムをダウンロードしてプログラムするような方法も考えられる。 The recording medium for supplying the program may be, for example, a hard disk, a magnetic recording medium such as a magnetic tape, an optical/magneto-optical storage medium, or a nonvolatile semiconductor memory. As a program supply method, a method of storing a computer program forming the present invention in a server on a computer network, and a connected client computer downloading and programming the computer program is also conceivable.

（その他の実施形態）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other embodiments)
The present invention supplies a program that implements one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in the computer of the system or apparatus reads and executes the program. It can also be realized by processing to It can also be implemented by a circuit (for example, ASIC) that implements one or more functions.

（本明細書の開示）
本明細書の開示は、以下の画像処理装置、画像処理方法、生成方法、及びプログラムを含む。 (Disclosure of this Specification)
The disclosure of this specification includes the following image processing apparatus, image processing method, generation method, and program.

（項目１）
第１の画像データの階調を圧縮する階調圧縮手段と、
前記階調圧縮手段によって階調が圧縮された画像データに対して、所定の画像処理を行うニューラルネットワークを適用することにより、前記所定の画像処理が行われた画像データを出力する処理手段と、
前記所定の画像処理が行われた画像データの階調を伸長する階調伸長手段と、を有し、
前記ニューラルネットワークの内部で画素値を表すビット数が前記第１の画像データの画素値を表すビット数よりも小さく、
前記階調圧縮手段は、明るさが低いほど多くの階調が割り当てられる特性を用いて階調を圧縮する、ことを特徴とする画像処理装置が開示される。 (Item 1)
gradation compression means for compressing the gradation of the first image data;
processing means for applying a neural network for performing predetermined image processing to image data whose gradation has been compressed by said gradation compression means, thereby outputting image data on which said predetermined image processing has been performed;
a gradation decompression means for decompressing the gradation of the image data on which the predetermined image processing has been performed;
the number of bits representing pixel values inside the neural network is smaller than the number of bits representing pixel values of the first image data;
The image processing apparatus is disclosed, wherein the gradation compression means compresses the gradation using a characteristic that more gradation is assigned as the brightness is lower.

（項目２）
前記第１の画像データの明るさを取得する取得手段を更に有し、
前記階調圧縮手段は、前記取得手段によって取得された明るさに応じて、明るさが低いほど多くの階調が割り当てられる複数の特性のうちの異なる特性を用いて、前記第１の画像データの階調を圧縮する、ことを特徴とする項目１に記載の画像処理装置が開示される。 (Item 2)
further comprising acquisition means for acquiring the brightness of the first image data;
The gradation compression means compresses the first image data using a different characteristic among a plurality of characteristics in which more gradations are assigned as the brightness is lower, according to the brightness acquired by the acquisition means. The image processing device according to item 1 is disclosed, which compresses the gradation of .

（項目３）
前記処理手段は、前記取得手段によって取得された明るさに応じて、予め学習された複数の前記ニューラルネットワークのパラメータのうちの異なるパラメータを用いて、前記ニューラルネットワークを適用する、ことを特徴とする項目２に記載の画像処理装置が開示される。 (Item 3)
The processing means applies the neural network using different parameters among a plurality of pre-learned parameters of the neural network according to the brightness acquired by the acquisition means. An image processing device according to item 2 is disclosed.

（項目４）
前記階調伸長手段は、前記取得手段によって取得された明るさに応じて、階調を伸長する複数の特性のうちの異なる特性を用いて画像データの階調を伸長する、ことを特徴とする項目２又は３に記載の画像処理装置が開示される。 (Item 4)
The gradation expansion means expands the gradation of the image data using a different one of a plurality of gradation expansion characteristics according to the brightness acquired by the acquisition means. An image processing device according to item 2 or 3 is disclosed.

（項目５）
前記取得手段は、第１の時間に撮影された前記第１の画像データの明るさと、前記第１の時間の後である第２の時間に撮影された第２の画像データの明るさを取得し、
前記階調圧縮手段は、画像データの明るさに対応した段階的に異なる前記複数の特性のうち、前記第１の画像データの明るさに対応する第１の特性に隣接する第２の特性を用いて前記第２の画像データの階調を圧縮する、ことを特徴とする項目２に記載の画像処理装置が開示される。 (Item 5)
The acquiring means acquires the brightness of the first image data captured at a first time and the brightness of the second image data captured at a second time after the first time. death,
The gradation compression means selects a second characteristic adjacent to the first characteristic corresponding to the brightness of the first image data, among the plurality of characteristics that differ in stages corresponding to the brightness of the image data. The image processing apparatus according to item 2, characterized in that the gradation of the second image data is compressed using a .

（項目６）
前記処理手段は、画像データの明るさに対応した段階的に異なる前記複数の特性に対応付けられた複数の前記ニューラルネットワークのパラメータのうち、前記第２の特性に対応付けられたパラメータを用いて前記ニューラルネットワークを適用する、ことを特徴とする項目５に記載の画像処理装置が開示される。 (Item 6)
The processing means uses a parameter associated with the second characteristic among the plurality of neural network parameters associated with the plurality of characteristics that differ in stages corresponding to the brightness of the image data. The image processing device according to item 5 is disclosed, wherein the neural network is applied.

（項目７）
前記階調伸長手段は、階調を伸長するための段階的に異なる複数の特性のうち、前記第２の特性に対応する特性を用いて画像データの階調を伸長する、ことを特徴とする項目５又は６に記載の画像処理装置が開示される。 (Item 7)
The gradation expansion means expands the gradation of the image data using a characteristic corresponding to the second characteristic among a plurality of characteristics that differ in stages for expanding the gradation. An image processing device according to item 5 or 6 is disclosed.

（項目８）
前記取得手段は、前記第１の画像データの複数の領域のうちの選択された領域の明るさを取得し、
前記階調圧縮手段は、前記複数の特性のうち、前記選択された領域の明るさに対応する第３の特性を用いて、前記第１の画像データの階調を圧縮する、ことを特徴とする項目２に記載の画像処理装置が開示される。 (Item 8)
the acquiring means acquires brightness of a selected area from among a plurality of areas of the first image data;
The gradation compression means compresses the gradation of the first image data using a third characteristic corresponding to the brightness of the selected area among the plurality of characteristics. An image processing device according to item 2 is disclosed.

（項目９）
前記処理手段は、前記複数の特性に対応付けられた複数の前記ニューラルネットワークのパラメータのうち、前記選択された領域の明るさに対する前記第３の特性に対応するパラメータを用いて、前記ニューラルネットワークを適用する、ことを特徴とする項目８に記載の画像処理装置が開示される。 (Item 9)
The processing means uses a parameter corresponding to the third characteristic with respect to the brightness of the selected region, among the plurality of parameters of the neural network associated with the plurality of characteristics, to operate the neural network. The image processing device according to item 8 is disclosed, characterized in that:

（項目１０）
前記選択された領域は、前記第１の画像データの複数の領域のうち、領域ごとの明るさが所定の閾値より低い領域である、ことを特徴とする項目８又は９に記載の画像処理装置が開示される。 (Item 10)
10. The image processing apparatus according to item 8 or 9, wherein the selected area is an area in which brightness of each area is lower than a predetermined threshold among the plurality of areas of the first image data. is disclosed.

（項目１１）
前記選択された領域は、前記第１の画像データの複数の領域のうち、前日以前の同一時間帯における同じ領域の明るさとの差が所定値以下である領域である、ことを特徴とする項目８から１０のいずれか１項に記載の画像処理装置が開示される。 (Item 11)
An item characterized in that the selected area is an area, among the plurality of areas of the first image data, whose difference in brightness from the same area in the same time zone on the previous day or earlier is equal to or less than a predetermined value. An image processing apparatus according to any one of 8 to 10 is disclosed.

（項目１２）
前記階調圧縮手段は、所定の設定に応じて、明るさが低いほど多くの階調が割り当てられる複数の特性のうちの異なる特性を用いて、前記第１の画像データの階調を圧縮する、ことを特徴とする項目１に記載の画像処理装置が開示される。 (Item 12)
The gradation compression means compresses the gradation of the first image data using a different characteristic among a plurality of characteristics in which a larger number of gradations are assigned as brightness decreases, according to a predetermined setting. An image processing apparatus according to item 1 is disclosed, characterized by:

（項目１３）
前記階調圧縮手段は、前記所定の設定が、処理対象の画像データの画素値を表すビット数が大きい設定ほど、所定の低輝度領域においてより多くの階調が割り当てられる特性を用いる、ことを特徴とする項目１２に記載の画像処理装置が開示される。 (Item 13)
The gradation compression means uses a characteristic that the larger the number of bits representing the pixel value of the image data to be processed, the more gradation is assigned to the predetermined low-luminance region. Disclosed is an image processing apparatus according to item 12, characterized.

（項目１４）
前記処理手段は、前記所定の設定に応じて、予め学習された複数の前記ニューラルネットワークのパラメータのうちの異なるパラメータを用いて、前記ニューラルネットワークを適用する、ことを特徴とする項目１２又は１３に記載の画像処理装置が開示される。 (Item 14)
Item 12 or 13, wherein the processing means applies the neural network using different parameters among a plurality of pre-learned parameters of the neural network according to the predetermined setting. An image processing apparatus as described is disclosed.

（項目１５）
前記階調伸長手段は、前記所定の設定に応じて、階調を伸長する複数の特性のうちの異なる特性を用いて画像データの階調を伸長する、ことを特徴とする項目１２から１４のいずれか１項に記載の画像処理装置が開示される。 (Item 15)
Items 12 to 14, wherein the gradation expansion means expands the gradation of the image data using a different one of a plurality of characteristics for expanding gradation in accordance with the predetermined setting. An image processing apparatus according to any one of the items is disclosed.

（項目１６）
前記所定の設定は、前記画像処理装置から出力される画像データについての設定である、ことを特徴とする項目１２から１５のいずれか１項に記載の画像処理装置が開示される。 (Item 16)
The image processing apparatus according to any one of items 12 to 15 is disclosed, wherein the predetermined settings are settings for image data output from the image processing apparatus.

（項目１７）
前記画像処理装置から出力される画像データについての設定は、階調圧縮に用いる特性、階調伸長に用いる特性、前記出力される画像データの階調数、及び前記出力される画像データの画素値を表すビット数、のいずれかを含む、ことを特徴とする項目１６に記載の画像処理装置が開示される。 (Item 17)
The settings for the image data output from the image processing apparatus include the characteristics used for tone compression, the characteristics used for tone expansion, the number of tones of the output image data, and the pixel values of the output image data. 17. The image processing device according to item 16 is disclosed, characterized in that the number of bits representing .

（項目１８）
前記所定の設定は、前記ニューラルネットワークに入力される画像データについての設定である、ことを特徴とする項目１２から１５のいずれか１項に記載の画像処理装置が開示される。 (Item 18)
16. The image processing apparatus according to any one of items 12 to 15, wherein the predetermined settings are settings for image data to be input to the neural network.

（項目１９）
前記ニューラルネットワークに入力される画像データについての設定は、前記ニューラルネットワークに入力される画像データの画素値の上限値、当該画像データの階調数、及び、当該画像データの画素値を表すビット数のいずれかを含む、ことを特徴とする項目１８に記載の画像処理装置が開示される。 (Item 19)
The settings for the image data input to the neural network include the upper limit of the pixel value of the image data input to the neural network, the number of gradations of the image data, and the number of bits representing the pixel value of the image data. 18. An image processing apparatus according to item 18, characterized by including any of

（項目２０）
前記階調伸長手段により伸長された画像データと、前記第１の画像データとを合成する合成手段を更に有する、ことを特徴とする項目１２から１９のいずれか１項に記載の画像処理装置が開示される。 (Item 20)
20. The image processing apparatus according to any one of items 12 to 19, further comprising synthesizing means for synthesizing the image data decompressed by the tone decompression means and the first image data. disclosed.

（項目２１）
前記第１の画像データは、予め定めらた画素値の上限値を用いてクリップされた画像データを含む、ことを特徴とする項目２０に記載の画像処理装置が開示される。 (Item 21)
21. The image processing apparatus according to item 20, wherein the first image data includes image data clipped using a predetermined upper limit of pixel values.

（項目２２）
ニューラルネットワークを学習させる画像処理装置であって、
訓練画像の画像データと正解画像の画像データの階調を圧縮する階調圧縮手段と、
前記訓練画像の画像データの階調を圧縮した画像データに対して、所定の画像処理を行う前記ニューラルネットワークを適用することにより、前記所定の画像処理が行われた画像データを出力する処理手段と、
前記所定の画像処理が行われた画像データと、前記正解画像の画像データの階調を圧縮した画像データとの誤差に基づいて、前記ニューラルネットワークのパラメータを変更する変更手段と、を有し、
前記ニューラルネットワークの内部で画素値を表すビット数が前記訓練画像の画像データの画素値を表すビット数よりも小さく、
前記階調圧縮手段は、明るさが低いほど多くの階調が割り当てられる特性を用いて階調を圧縮する、ことを特徴とする画像処理装置が開示される。 (Item 22)
An image processing device for learning a neural network,
a gradation compression means for compressing the gradation of the image data of the training image and the image data of the correct image;
a processing means for outputting image data on which the predetermined image processing has been performed by applying the neural network for performing predetermined image processing to the image data obtained by compressing the gradation of the image data of the training image; ,
changing means for changing parameters of the neural network based on an error between the image data on which the predetermined image processing has been performed and the image data obtained by compressing the gradation of the image data of the correct image;
The number of bits representing pixel values inside the neural network is smaller than the number of bits representing pixel values of the image data of the training image;
The image processing apparatus is disclosed, wherein the gradation compression means compresses the gradation using a characteristic that more gradation is assigned as the brightness is lower.

（項目２３）
第１の画像データの階調を圧縮する階調圧縮工程と、
前記階調圧縮工程において階調が圧縮された画像データに対して、所定の画像処理を行うニューラルネットワークを適用することにより、前記所定の画像処理が行われた画像データを出力する処理工程と、
前記所定の画像処理が行われた画像データの階調を伸長する階調伸長工程と、を有し、
前記ニューラルネットワークの内部で画素値を表すビット数が前記第１の画像データの画素値を表すビット数よりも小さく、
前記階調圧縮工程では、明るさが低いほど多くの階調が割り当てられる特性を用いて階調を圧縮する、ことを特徴とする画像処理方法が開示される。 (Item 23)
a gradation compression step of compressing the gradation of the first image data;
a processing step of applying a neural network that performs predetermined image processing to the image data whose gradation has been compressed in the gradation compression step, thereby outputting image data that has been subjected to the predetermined image processing;
a gradation decompression step of decompressing the gradation of the image data on which the predetermined image processing has been performed;
the number of bits representing pixel values inside the neural network is smaller than the number of bits representing pixel values of the first image data;
An image processing method is disclosed, wherein, in the gradation compression step, the gradation is compressed using a characteristic that more gradation is assigned as the brightness is lower.

（項目２４）
画像処理装置において各工程が実行される、学習済みのニューラルネットワークの生成方法であって、
訓練画像の画像データと正解画像の画像データの階調を圧縮する階調圧縮工程と、
前記訓練画像の画像データの階調を圧縮した画像データに対して、所定の画像処理を行う前記ニューラルネットワークを適用することにより、前記所定の画像処理が行われた画像データを出力する処理工程と、
前記所定の画像処理が行われた画像データと、前記正解画像の画像データの階調を圧縮した画像データとの誤差に基づいて、前記ニューラルネットワークのパラメータを変更する変更工程と、を含み、
前記ニューラルネットワークの内部で画素値を表すビット数が前記訓練画像の画像データの画素値を表すビット数よりも小さく、
前記階調圧縮工程では、明るさが低いほど多くの階調が割り当てられる特性を用いて階調を圧縮する、ことを特徴とする生成方法が開示される。 (Item 24)
A method for generating a trained neural network in which each step is executed in an image processing device,
a gradation compression step of compressing the gradation of the image data of the training image and the image data of the correct image;
a processing step of applying the neural network for performing predetermined image processing to the image data obtained by compressing the gradation of the image data of the training image, thereby outputting image data on which the predetermined image processing has been performed; ,
a changing step of changing parameters of the neural network based on an error between the image data on which the predetermined image processing has been performed and the image data obtained by compressing the gradation of the image data of the correct image,
The number of bits representing pixel values inside the neural network is smaller than the number of bits representing pixel values of the image data of the training image;
A generation method is disclosed in which, in the gradation compression step, the gradation is compressed using the characteristic that more gradation is assigned as the brightness is lower.

（項目２５）
コンピュータを、項目１乃至２１のいずれか１項に記載の画像処理装置の各手段として機能させるためのプログラムが開示される。 (Item 25)
A program for causing a computer to function as each means of the image processing apparatus according to any one of items 1 to 21 is disclosed.

発明は上記実施形態に制限されるものではなく、発明の精神及び範囲から離脱することなく、様々な変更及び変形が可能である。従って、発明の範囲を公にするために請求項を添付する。 The invention is not limited to the embodiments described above, and various modifications and variations are possible without departing from the spirit and scope of the invention. Accordingly, the claims are appended to make public the scope of the invention.

１００…撮像装置、１０２…撮像素子、１０４…画像処理装置、１０５…ＲＯＭ、１０６…プロセッサ、１０７…ＲＡＭ DESCRIPTION OF SYMBOLS 100... Imaging device, 102... Imaging element, 104... Image processing apparatus, 105... ROM, 106... Processor, 107... RAM

Claims

gradation compression means for compressing the gradation of the first image data;
processing means for applying a neural network for performing predetermined image processing to image data whose gradation has been compressed by said gradation compression means, thereby outputting image data on which said predetermined image processing has been performed;
a gradation decompression means for decompressing the gradation of the image data on which the predetermined image processing has been performed;
the number of bits representing pixel values inside the neural network is smaller than the number of bits representing pixel values of the first image data;
The image processing apparatus according to claim 1, wherein the gradation compression means compresses the gradation using a characteristic that more gradation is assigned as the brightness is lower.

further comprising acquisition means for acquiring the brightness of the first image data;
The gradation compression means compresses the first image data using a different characteristic among a plurality of characteristics in which more gradations are assigned as the brightness is lower, according to the brightness acquired by the acquisition means. 2. The image processing apparatus according to claim 1, wherein the gradation of is compressed.

The processing means applies the neural network using different parameters among a plurality of pre-learned parameters of the neural network according to the brightness acquired by the acquisition means. The image processing apparatus according to claim 2.

The gradation expansion means expands the gradation of the image data using a different one of a plurality of gradation expansion characteristics according to the brightness acquired by the acquisition means. The image processing apparatus according to claim 2.

The acquiring means acquires the brightness of the first image data captured at a first time and the brightness of the second image data captured at a second time after the first time. death,
The gradation compression means selects a second characteristic adjacent to the first characteristic corresponding to the brightness of the first image data, among the plurality of characteristics that differ in stages corresponding to the brightness of the image data. 3. The image processing apparatus according to claim 2, wherein the gradation of said second image data is compressed using a .

The processing means uses a parameter associated with the second characteristic among the plurality of neural network parameters associated with the plurality of characteristics that differ in stages corresponding to the brightness of the image data. 6. The image processing apparatus according to claim 5, wherein said neural network is applied.

The gradation expansion means expands the gradation of the image data using a characteristic corresponding to the second characteristic among a plurality of characteristics that differ in stages for expanding the gradation. The image processing apparatus according to claim 5.

the acquiring means acquires brightness of a selected area from among a plurality of areas of the first image data;
The gradation compression means compresses the gradation of the first image data using a third characteristic corresponding to the brightness of the selected area among the plurality of characteristics. 3. The image processing apparatus according to claim 2.

The processing means uses a parameter corresponding to the third characteristic with respect to the brightness of the selected region, among the plurality of parameters of the neural network associated with the plurality of characteristics, to operate the neural network. 9. The image processing apparatus according to claim 8, wherein:

9. The image processing apparatus according to claim 8, wherein the selected area is an area in which the brightness of each area is lower than a predetermined threshold among the plurality of areas of the first image data.

wherein the selected area is an area, of the plurality of areas of the first image data, whose difference in brightness from the same area in the same time zone on the previous day or earlier is equal to or less than a predetermined value. Item 9. The image processing apparatus according to item 8.

The gradation compression means compresses the gradation of the first image data using a different characteristic among a plurality of characteristics in which a larger number of gradations are assigned as brightness decreases, according to a predetermined setting. 2. The image processing apparatus according to claim 1, characterized by:

The gradation compression means uses a characteristic that the larger the number of bits representing the pixel value of the image data to be processed, the more gradation is assigned to the predetermined low-luminance region. 13. The image processing device according to claim 12.

13. The method according to claim 12, wherein said processing means applies said neural network using different parameters among a plurality of pre-learned parameters of said neural network according to said predetermined setting. image processing device.

13. The method according to claim 12, wherein said gradation expansion means expands the gradation of image data using a different one of a plurality of characteristics for expanding gradation in accordance with said predetermined setting. image processing device.

13. The image processing apparatus according to claim 12, wherein said predetermined setting is for image data output from said image processing apparatus.

The settings for the image data output from the image processing apparatus include the characteristics used for tone compression, the characteristics used for tone expansion, the number of tones of the output image data, and the pixel values of the output image data. 17. The image processing apparatus according to claim 16, wherein the number of bits representing

13. The image processing apparatus according to claim 12, wherein said predetermined setting is a setting for image data input to said neural network.

The settings for the image data input to the neural network include the upper limit of the pixel value of the image data input to the neural network, the number of gradations of the image data, and the number of bits representing the pixel value of the image data. 19. The image processing apparatus according to claim 18, comprising any one of:

13. The image processing apparatus according to claim 12, further comprising synthesizing means for synthesizing the image data decompressed by said tone decompression means and said first image data.

21. The image processing apparatus according to claim 20, wherein said first image data includes image data clipped using a predetermined upper limit of pixel values.

An image processing device for learning a neural network,
a gradation compression means for compressing the gradation of the image data of the training image and the image data of the correct image;
a processing means for outputting image data on which the predetermined image processing has been performed by applying the neural network for performing predetermined image processing to the image data obtained by compressing the gradation of the image data of the training image; ,
changing means for changing parameters of the neural network based on an error between the image data on which the predetermined image processing has been performed and the image data obtained by compressing the gradation of the image data of the correct image;
The number of bits representing pixel values inside the neural network is smaller than the number of bits representing pixel values of the image data of the training image;
The image processing apparatus according to claim 1, wherein the gradation compression means compresses the gradation using a characteristic that more gradation is assigned as the brightness is lower.

a gradation compression step of compressing the gradation of the first image data;
a processing step of applying a neural network that performs predetermined image processing to the image data whose gradation has been compressed in the gradation compression step, thereby outputting image data that has been subjected to the predetermined image processing;
a gradation decompression step of decompressing the gradation of the image data on which the predetermined image processing has been performed;
the number of bits representing pixel values inside the neural network is smaller than the number of bits representing pixel values of the first image data;
The image processing method, wherein in the gradation compression step, the gradation is compressed using a characteristic that more gradation is assigned as the brightness is lower.

A method for generating a trained neural network in which each step is executed in an image processing device,
a gradation compression step of compressing the gradation of the image data of the training image and the image data of the correct image;
a processing step of applying the neural network for performing predetermined image processing to the image data obtained by compressing the gradation of the image data of the training image, thereby outputting image data on which the predetermined image processing has been performed; ,
a changing step of changing parameters of the neural network based on an error between the image data on which the predetermined image processing has been performed and the image data obtained by compressing the gradation of the image data of the correct image,
The number of bits representing pixel values inside the neural network is smaller than the number of bits representing pixel values of the image data of the training image;
The generating method, wherein in the gradation compression step, the gradation is compressed using a characteristic that more gradation is assigned as the brightness is lower.

A program for causing a computer to function as each means of the image processing apparatus according to any one of claims 1 to 21.