JP7009219B2

JP7009219B2 - Image processing method, image processing device, image pickup device, image processing program, and storage medium

Info

Publication number: JP7009219B2
Application number: JP2018001551A
Authority: JP
Inventors: 法人日浅
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2018-01-10
Filing date: 2018-01-10
Publication date: 2022-01-25
Anticipated expiration: 2038-01-10
Also published as: JP2019121972A

Description

本発明は、撮像画像から光学系の瞳を分割した画像を推定する画像処理方法に関する。 The present invention relates to an image processing method for estimating an image in which the pupil of an optical system is divided from a captured image.

特許文献１には、一つの画素内に二つの光電変換部を有する撮像素子において、二つの光電変換部の加算信号と一方の光電変換部の信号とを読み出し、両者の信号の差分から他方の光電変換部の信号を求める方法が開示されている。 In Patent Document 1, in an image pickup element having two photoelectric conversion units in one pixel, the addition signal of the two photoelectric conversion units and the signal of one photoelectric conversion unit are read out, and the difference between the two signals is used as the other signal. A method for obtaining a signal of a photoelectric conversion unit is disclosed.

特許第４６９１９３０号公報Japanese Patent No. 469930

しかし、特許文献１に開示された方法では、二つの光電変換部の加算信号が輝度飽和している場合、二つの光電変換部の加算信号と一方の光電変換部の信号との差分から他方の光電変換部の信号を正確に求めることができない。すなわち、この場合には瞳を分割した画像を高精度に推定することができない。 However, in the method disclosed in Patent Document 1, when the addition signals of the two photoelectric conversion units are saturated in luminance, the other is obtained from the difference between the addition signal of the two photoelectric conversion units and the signal of one photoelectric conversion unit. The signal of the photoelectric conversion unit cannot be obtained accurately. That is, in this case, it is not possible to estimate the image in which the pupil is divided with high accuracy.

そこで本発明は、輝度飽和が発生している場合でも高精度に瞳を分割した画像を推定することが可能な画像処理方法、画像処理装置、撮像装置、プログラム、および、記憶媒体を提供することを目的とする。 Therefore, the present invention provides an image processing method, an image processing device, an image pickup device, a program, and a storage medium capable of estimating an image in which the pupil is divided with high accuracy even when luminance saturation occurs. With the goal.

本発明の一側面としての画像処理方法は、光学系の第１の瞳を介して被写体空間を撮像することで得られた第１の画像と、前記第１の瞳の一部である第２の瞳を介して前記被写体空間を撮像することで得られた第２の画像とに基づく入力画像を取得する工程と、入力層と出力層の間に複数の中間層を有するニューラルネットワークを用いて、前記光学系の第３の瞳を介して前記被写体空間を撮像することで得られる画像に相当する第３の画像を前記入力画像から生成する工程とを有し、前記第３の瞳は、前記第１の瞳の一部であって前記第２の瞳とは異なる。 The image processing method as one aspect of the present invention includes a first image obtained by imaging the subject space through the first pupil of the optical system and a second image which is a part of the first pupil. Using a step of acquiring an input image based on a second image obtained by imaging the subject space through the pupil of the eye, and a neural network having a plurality of intermediate layers between the input layer and the output layer. The third pupil comprises a step of generating a third image corresponding to an image obtained by imaging the subject space through the third pupil of the optical system from the input image. It is a part of the first pupil and is different from the second pupil.

本発明の他の側面としての画像処理装置は、光学系の第１の瞳を介して被写体空間を撮像することで得られた第１の画像と、前記第１の瞳の一部である第２の瞳を介して前記被写体空間を撮像することで得られた第２の画像と、に基づく入力画像を取得する取得手段と、入力層と出力層の間に複数の中間層を有するニューラルネットワークを用いて、前記光学系の第３の瞳を介して前記被写体空間を撮像することで得られる画像に相当する第３の画像を前記入力画像から生成する生成手段とを有し、前記第３の瞳は、前記第１の瞳の一部であって前記第２の瞳とは異なる。 The image processing apparatus as another aspect of the present invention includes a first image obtained by capturing an image of the subject space through the first pupil of the optical system, and a part of the first pupil. A second image obtained by imaging the subject space through the second pupil, an acquisition means for acquiring an input image based on the second image, and a neural network having a plurality of intermediate layers between an input layer and an output layer. The third has a generation means for generating a third image corresponding to an image obtained by imaging the subject space through the third pupil of the optical system from the input image. The pupil is a part of the first pupil and is different from the second pupil.

本発明の他の側面としての撮像装置は、光学系により形成された光学像を光電変換する撮像素子と、前記画像処理装置とを有する。 The image pickup apparatus as another aspect of the present invention includes an image pickup element that photoelectrically converts an optical image formed by an optical system, and the image processing apparatus.

本発明の他の側面としての画像処理プログラムは、前記画像処理方法をコンピュータに実行させる。 An image processing program as another aspect of the present invention causes a computer to execute the image processing method.

本発明の他の側面としての記憶媒体は、前記画像処理プログラムを記憶している。 The storage medium as another aspect of the present invention stores the image processing program.

本発明の目的及び特徴は、以下の実施例において説明される。 The objects and features of the present invention will be described in the following examples.

本発明によれば、輝度飽和が発生している場合でも高精度に瞳を分割した画像を推定することが可能な画像処理方法、画像処理装置、撮像装置、画像処理プログラム、および、記憶媒体を提供することができる。 According to the present invention, an image processing method, an image processing device, an image pickup device, an image processing program, and a storage medium capable of estimating an image in which pupils are divided with high accuracy even when luminance saturation occurs are provided. Can be provided.

各実施例における画像生成のネットワーク構造を示す図である。It is a figure which shows the network structure of image generation in each Example. 実施例１における撮像装置のブロック図である。It is a block diagram of the image pickup apparatus in Example 1. FIG. 実施例１および実施例３における撮像装置の外観図である。It is an external view of the image pickup apparatus in Example 1 and Example 3. FIG. 実施例１における撮像部の説明図である。It is explanatory drawing of the image pickup part in Example 1. FIG. 実施例１および実施例２における画像推定処理のフローチャートである。It is a flowchart of the image estimation process in Example 1 and Example 2. 実施例１および実施例３における画像推定処理の説明図である。It is explanatory drawing of the image estimation processing in Example 1 and Example 3. 実施例１における分割瞳と像高とヴィネッティングとの関係を示す図である。It is a figure which shows the relationship between the split pupil, the image height and vignetting in Example 1. FIG. 各実施例における係数データの学習に関するフローチャートである。It is a flowchart about learning of coefficient data in each Example. 実施例１における各像高とアジムスでの瞳分割の説明図である。It is explanatory drawing of each image height in Example 1 and pupil division in Azymuth. 実施例２における画像処理システムのブロック図である。It is a block diagram of the image processing system in Example 2. 実施例２における画像処理システムの外観図である。It is an external view of the image processing system in Example 2. FIG. 実施例２における撮像素子の構成図である。It is a block diagram of the image pickup element in Example 2. FIG. 実施例２における画像推定処理のフローチャートである。It is a flowchart of the image estimation process in Example 2. 実施例３における撮像装置のブロック図である。It is a block diagram of the image pickup apparatus in Example 3. FIG. 実施例３における撮像装置の外観図である。It is an external view of the image pickup apparatus in Example 3. FIG.

以下、本発明の実施例について、図面を参照しながら詳細に説明する。各図において、同一の部材については同一の参照符号を付し、重複する説明は省略する。 Hereinafter, examples of the present invention will be described in detail with reference to the drawings. In each figure, the same members are designated by the same reference numerals, and duplicate description will be omitted.

実施例の具体的な説明へ入る前に、本発明の要旨を述べる。本発明では、ある瞳（第２の瞳）で撮像した画像（第２の画像）と、前記瞳と別の瞳とが合成された瞳（第１の瞳）で撮像した画像（第１の画像）から、ディープラーニングを用いて、前記別の瞳（第３の瞳）で撮像した画像（第３の画像）を推定する。この際、ディープラーニングの学習の際に輝度飽和が発生した学習データを用いることにより、輝度飽和が発生した場合でも高精度に画像（第３の画像）を推定することができる。 A gist of the present invention will be described before going into a specific description of the examples. In the present invention, an image (first image) captured by an image (second image) captured by a certain pupil (second pupil) and a pupil (first pupil) in which the pupil and another pupil are combined (first pupil). An image (third image) captured by the other pupil (third pupil) is estimated from the image) using deep learning. At this time, by using the learning data in which the luminance saturation occurs during the deep learning learning, the image (third image) can be estimated with high accuracy even when the luminance saturation occurs.

まず、図２および図３を参照して、本発明の実施例１における撮像装置について説明する。図２は、撮像装置１００のブロック図である。図３は、撮像装置１００の外観図である。まず、撮像装置１００の各部の概略を説明し、その詳細については後述する。 First, the image pickup apparatus according to the first embodiment of the present invention will be described with reference to FIGS. 2 and 3. FIG. 2 is a block diagram of the image pickup apparatus 100. FIG. 3 is an external view of the image pickup apparatus 100. First, the outline of each part of the image pickup apparatus 100 will be described, and the details thereof will be described later.

図２に示されるように、撮像装置１００は、被写体空間の像を撮影画像（入力画像）として取得する撮像部１０１を有する。撮像部１０１は、被写体空間からの入射光を集光する結像光学系１０１ａと、複数の画素を有する撮像素子１０１ｂとを有する。撮像素子１０１ｂは、例えば、ＣＣＤ（ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅ）センサやＣＭＯＳ（ＣｏｍｐｌｅｍｅｎｔａｒｙＭｅｔａｌ－ＯｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ）センサである。 As shown in FIG. 2, the image pickup apparatus 100 has an image pickup unit 101 that acquires an image of a subject space as a captured image (input image). The image pickup unit 101 includes an imaging optical system 101a that collects incident light from the subject space, and an image pickup element 101b having a plurality of pixels. The image pickup device 101b is, for example, a CCD (Charge Coupled Device) sensor or a CMOS (Complementary Metal-Oxide Semiconductor) sensor.

図４は、撮像部１０１の説明図である。図４（Ａ）は、撮像部１０１の断面図を示し、一点鎖線は軸上光束を表している。図４（Ｂ）は、撮像素子１０１ｂの上面図である。撮像素子１０１ｂは、マイクロレンズアレイ１２２と複数の画素１２１とを有する。マイクロレンズアレイ１２２は、結像光学系１０１ａを介して被写体面１２０と共役の位置に配置されている。図４（Ｂ）に示されるように、マイクロレンズアレイ１２２を構成するマイクロレンズ１２２（マイクロレンズ１２２ａのみ表記し、１２２ｂ以降は省略）は、複数の画素１２１（画素１２１ａのみ表記し、１２１ｂ以降は省略）のそれぞれと対応している。ここで、複数の部位をまとめて指定する際は番号のみを付し、そのうちの１つを示す際は番号とａなどの記号を付す。 FIG. 4 is an explanatory diagram of the image pickup unit 101. FIG. 4A shows a cross-sectional view of the image pickup unit 101, and the alternate long and short dash line represents an axial luminous flux. FIG. 4B is a top view of the image pickup device 101b. The image pickup device 101b has a microlens array 122 and a plurality of pixels 121. The microlens array 122 is arranged at a position conjugate with the subject surface 120 via the imaging optical system 101a. As shown in FIG. 4B, the microlens 122 (only the microlens 122a is shown and omitted after 122b) constituting the microlens array 122 has a plurality of pixels 121 (only the pixel 121a is shown and 121b and later are shown). It corresponds to each of (omitted). Here, when a plurality of parts are collectively designated, only a number is attached, and when one of them is indicated, a number and a symbol such as a are attached.

複数の画素１２１のそれぞれは、結像光学系１０１ａを介して形成された光学像を光電変換する第１の光電変換部１２３および第２の光電変換部１２４を有する。これにより、例えば画素１２１ａに入射した光は、その入射角に依存して、第１の光電変換部１２３ａと第２の光電変換部１２４ａとに分離して受光される（第１の光電変換部１２３ａと第２の光電変換部１２４ａは、互いに異なる入射角で入射する光を受光する）。光の入射角は、その光が結像光学系１０１ａにおける瞳のいずれの位置を通過したかにより決定される。このため、結像光学系１０１ａの瞳は２つの光電変換部により２つの部分瞳に分割され、一つの画素内の２つの光電変換部は互いに異なる視点（瞳の位置）から被写体空間を観察した情報を取得する。なお本実施例において、瞳の分割方向は水平方向であるが、これに限定されるものではなく、垂直方向や斜め方向などの他の方向であってもよい。 Each of the plurality of pixels 121 has a first photoelectric conversion unit 123 and a second photoelectric conversion unit 124 that photoelectrically convert an optical image formed via the imaging optical system 101a. As a result, for example, the light incident on the pixel 121a is separated and received by the first photoelectric conversion unit 123a and the second photoelectric conversion unit 124a depending on the incident angle (first photoelectric conversion unit). The 123a and the second photoelectric conversion unit 124a receive light incident at different angles of incidence). The incident angle of the light is determined by which position of the pupil in the imaging optical system 101a the light has passed. Therefore, the pupil of the imaging optical system 101a is divided into two partial pupils by two photoelectric conversion units, and the two photoelectric conversion units in one pixel observe the subject space from different viewpoints (pupil positions). Get information. In this embodiment, the pupil division direction is the horizontal direction, but the present invention is not limited to this, and other directions such as a vertical direction and an oblique direction may be used.

撮像素子１０１ｂは、第１の光電変換部１２３で取得された信号（第２の撮像画像、Ａ画像）と、この信号（Ａ画像）と第２の光電変換部１２４で取得された信号（第３の撮像画像、Ｂ画像）との加算信号（第１の撮像画像、Ａ＋Ｂ画像）を出力する。Ａ画像およびＡ＋Ｂ画像は、画像処理部１０２に出力される。画像処理部（画像処理装置）１０２は、情報取得部（取得手段）１０２ａおよび画像生成部（生成手段）１０２ｂを有し、本実施例の画像処理方法を実行する。この際、画像処理部１０２は、記憶部（記憶手段）１０３に記憶された係数データを用いるが、この処理の詳細に関しては後述する。これにより、画像処理部１０２は、Ｂ画像を推定し、Ａ画像と推定Ｂ画像とにより位相差情報を取得することができる。システムコントローラ１０６は、画像処理部１０２により取得した位相差情報に基づいて、撮像部１０１の合焦位置を制御する。 The image pickup element 101b has a signal (second captured image, A image) acquired by the first photoelectric conversion unit 123, this signal (A image), and a signal (second) acquired by the second photoelectric conversion unit 124. An addition signal (first captured image, A + B image) with the captured image (B image) of 3 is output. The A image and the A + B image are output to the image processing unit 102. The image processing unit (image processing apparatus) 102 has an information acquisition unit (acquisition means) 102a and an image generation unit (generation means) 102b, and executes the image processing method of the present embodiment. At this time, the image processing unit 102 uses the coefficient data stored in the storage unit (storage means) 103, and the details of this processing will be described later. As a result, the image processing unit 102 can estimate the B image and acquire the phase difference information from the A image and the estimated B image. The system controller 106 controls the in-focus position of the image pickup unit 101 based on the phase difference information acquired by the image processing unit 102.

ユーザからレリーズの指示が出された場合、撮像部１０１はそのときの合焦位置に対して撮像を実行し、得られたＡ画像とＡ＋Ｂ画像とが記録媒体１０５に保存される。ユーザから撮像画像の表示に関する指示が出された場合、システムコントローラ１０６は、記録媒体１０５に保存されたデータを読み出し、そのデータを表示部１０４に表示する。この際、画像処理部１０２は、ユーザにより指定された条件に応じて、表示部１０４に表示する画像を生成する。撮像時と同じ合焦位置が指定された場合、表示部１０４はＡ＋Ｂ画像をそのまま表示する。また、撮像時と異なる合焦位置が指定された場合、画像処理部１０２はリフォーカス画像を生成する。リフォーカス画像は、Ａ＋Ｂ画像とＡ画像とに基づいて、本実施例の画像処理方法を用いてＢ画像を推定し、Ａ画像と推定Ｂ画像とを空間的にシフトして合成することで得られる。以上の一連の制御は、システムコントローラ１０６により行われる。 When the user gives an instruction to release the image, the image pickup unit 101 executes an image pickup at the in-focus position at that time, and the obtained A image and the A + B image are stored in the recording medium 105. When the user gives an instruction regarding the display of the captured image, the system controller 106 reads out the data stored in the recording medium 105 and displays the data on the display unit 104. At this time, the image processing unit 102 generates an image to be displayed on the display unit 104 according to the conditions specified by the user. When the same focusing position as at the time of imaging is specified, the display unit 104 displays the A + B image as it is. Further, when a focusing position different from that at the time of imaging is specified, the image processing unit 102 generates a refocused image. The refocused image is obtained by estimating the B image using the image processing method of this embodiment based on the A + B image and the A image, and spatially shifting and synthesizing the A image and the estimated B image. Be done. The above series of control is performed by the system controller 106.

次に、図５および図６を参照して、画像処理部１０２で実行される画像推定処理（Ｂ画像の推定処理）に関して説明する。Ｂ画像の推定処理の際に、画像処理部１０２は、事前に学習された係数データを用いるが、この学習に関する詳細については後述する。図５は、Ｂ画像の推定処理に関するフローチャートである。図６は、Ｂ画像の推定処理の説明図である。図５の各ステップは、システムコントローラ１０６の指令に基づいて画像処理部１０２により実行される。本実施例において、画像処理部１０２の情報取得部１０２ａは図５のステップＳ１０１乃至ステップＳ１０４を実行し、画像生成部１０２ｂはステップＳ１０５乃至ステップＳ１０８を実行する。 Next, the image estimation process (B image estimation process) executed by the image processing unit 102 will be described with reference to FIGS. 5 and 6. In the estimation process of the B image, the image processing unit 102 uses the coefficient data learned in advance, and the details of this learning will be described later. FIG. 5 is a flowchart relating to the estimation process of the B image. FIG. 6 is an explanatory diagram of the estimation process of the B image. Each step in FIG. 5 is executed by the image processing unit 102 based on the command of the system controller 106. In this embodiment, the information acquisition unit 102a of the image processing unit 102 executes steps S101 to S104 of FIG. 5, and the image generation unit 102b executes steps S105 to S108.

まず、ステップＳ１０１において、画像処理部１０２は、Ａ＋Ｂ画像（第１の撮像画像）２０１とＡ画像（第２の撮像画像）２０２とを取得する。Ａ画像２０２は、結像光学系１０１ａの瞳の一部である部分瞳（第２の瞳）を通過する光束に基づいて被写体空間を撮像して得られた画像である。Ａ＋Ｂ画像２０１は、結像光学系１０１ａの瞳（第１の瞳）を通過する光束に基づいて被写体空間を撮像して得られた画像である。本実施例において、第２の瞳は、第１の瞳に含まれ、第１の瞳の一部である。 First, in step S101, the image processing unit 102 acquires an A + B image (first captured image) 201 and an A image (second captured image) 202. The A image 202 is an image obtained by imaging the subject space based on the luminous flux passing through the partial pupil (second pupil) which is a part of the pupil of the imaging optical system 101a. The A + B image 201 is an image obtained by imaging the subject space based on the luminous flux passing through the pupil (first pupil) of the imaging optical system 101a. In this embodiment, the second pupil is included in the first pupil and is a part of the first pupil.

続いてステップＳ１０２において、画像処理部１０２は、Ａ＋Ｂ画像２０１の輝度飽和に基づいて、Ａ＋Ｂ画像２０１およびＡ画像２０２のそれぞれを２つの領域に分割する。本実施例では、図６に示されるように、画像処理部１０２は、Ａ＋Ｂ画像２０１を第１の領域２０４と第２の領域２１１とに分割する。また画像処理部１０２は、Ａ画像２０２を第１の領域２０５と第２の領域２１２とに分割する。図６において、Ａ＋Ｂ画像２０１中の斜線部は、Ａ＋Ｂ画像２０１の輝度飽和している輝度飽和領域２０３を表す。Ａ＋Ｂ画像２０１における第１の領域２０４は、輝度飽和領域２０３を含むように設定される。なお、輝度飽和している領域が飛び飛びで存在する場合（互いに分離した複数の輝度飽和領域が存在する場合）、第１の領域２０４も同様に飛び飛びに設定してよく、連続的に分布した領域である必要はない。第２の領域２１１は、輝度飽和領域２０３を含まないように設定される。Ａ画像２０２における第１の領域２０５および第２の領域２１２はそれぞれ、第１の領域２０４および第２の領域２１１と合致するように設定される。 Subsequently, in step S102, the image processing unit 102 divides each of the A + B image 201 and the A image 202 into two regions based on the luminance saturation of the A + B image 201. In this embodiment, as shown in FIG. 6, the image processing unit 102 divides the A + B image 201 into a first region 204 and a second region 211. Further, the image processing unit 102 divides the A image 202 into a first region 205 and a second region 212. In FIG. 6, the shaded portion in the A + B image 201 represents the luminance saturated region 203 in which the luminance is saturated in the A + B image 201. The first region 204 in the A + B image 201 is set to include the luminance saturation region 203. When the luminance-saturated regions exist in a discrete manner (when a plurality of luminance-saturated regions separated from each other exist), the first region 204 may be similarly set to the discrete regions, and the regions are continuously distributed. It doesn't have to be. The second region 211 is set so as not to include the luminance saturation region 203. The first region 205 and the second region 212 in the A image 202 are set to match the first region 204 and the second region 211, respectively.

なお、撮像素子１０１ｂがＢａｙｅｒ配列のカラーセンサの場合、Ａ＋Ｂ画像およびＡ画像はＢａｙｅｒ配列のままでよく、または、Ｒ、Ｇ１、Ｇ２、Ｂの４チャンネルに並び替えた画像でもよい。４チャンネルの画像とした場合、色ごとに飽和領域が異なるため、各色に対して個別に処理してもよい。または、全ての色に対して輝度飽和を含むように第１の領域を設定して、４チャンネル画像を一括で処理してもよい。 When the image sensor 101b is a color sensor having a Bayer array, the A + B image and the A image may remain in the Bayer array, or may be images rearranged into four channels of R, G1, G2, and B. In the case of a 4-channel image, the saturation region is different for each color, so each color may be processed individually. Alternatively, the 4-channel image may be processed at once by setting the first region so as to include luminance saturation for all colors.

続いてステップＳ１０３において、画像処理部１０２は、Ａ＋Ｂ画像２０１の第１の領域２０４から第１の画像２０６を抽出する。また画像処理部１０２は、Ａ画像２０２の第１の領域２０５から第２の画像２０７を抽出する。第１の画像２０６はＡ＋Ｂ画像２０１の部分領域であり、第２の画像２０７はＡ画像２０２の部分領域である。そして画像処理部１０２は、第１の画像２０６および第２の画像２０７に基づいて入力画像を設定する。本実施例において、入力画像は、第１の画像２０６および第２の画像２０７である。ただし本実施例は、これに限定されるものではない。例えば、第１の画像２０６から第２の画像２０７を減算した画像などを入力画像としてもよい。 Subsequently, in step S103, the image processing unit 102 extracts the first image 206 from the first region 204 of the A + B image 201. Further, the image processing unit 102 extracts the second image 207 from the first region 205 of the A image 202. The first image 206 is a partial region of the A + B image 201, and the second image 207 is a partial region of the A image 202. Then, the image processing unit 102 sets an input image based on the first image 206 and the second image 207. In this embodiment, the input images are the first image 206 and the second image 207. However, this embodiment is not limited to this. For example, an image obtained by subtracting the second image 207 from the first image 206 may be used as the input image.

続いてステップＳ１０４において、画像処理部１０２は、入力画像に対応する係数データを選択して取得する。係数データは、Ｂ画像の一部に相当する第３の画像を推定するための後述のニューラルネットワーク２０８で用いられる。本実施例において、複数種類の係数データが記憶部１０３に記憶されており、画像処理部１０２は記憶部１０３に記憶された複数種類の係数データから所望の係数データを取得する。ここでは、画像処理部１０２は、ステップＳ１０３にて抽出した第１の画像２０６と第２の画像２０７の位置（第１の撮像画像２０１と第２の撮像画像２０２のそれぞれにおける第１の画像２０６と第２の画像２０７の位置）に基づいて係数データを選択する。なお、第１の画像２０６と第２の画像２０７の位置に基づいて係数データを切り替えるのは、結像光学系１０１ａのヴィネッティング（ケラレ）や収差が像高に応じて変化するためである。 Subsequently, in step S104, the image processing unit 102 selects and acquires the coefficient data corresponding to the input image. The coefficient data is used in the neural network 208 described later for estimating a third image corresponding to a part of the B image. In this embodiment, a plurality of types of coefficient data are stored in the storage unit 103, and the image processing unit 102 acquires desired coefficient data from the plurality of types of coefficient data stored in the storage unit 103. Here, the image processing unit 102 determines the positions of the first image 206 and the second image 207 extracted in step S103 (the first image 206 in each of the first captured image 201 and the second captured image 202). And the position of the second image 207) to select the coefficient data. The coefficient data is switched based on the positions of the first image 206 and the second image 207 because the vignetting (vignetting) and aberration of the imaging optical system 101a change according to the image height.

ここで、図７を参照して、ヴィネッティングの影響について説明する。図７は、分割瞳と像高とヴィネッティングとの関係を示す図である。図７（Ａ）は、結像光学系１０１ａの光軸上における瞳を示している。図７中の破線は、２つの光電変換部により分割される瞳の分割線を表している。図７（Ｂ）は、図７（Ａ）の場合とは異なる像高における瞳を示している。図７（Ａ）では２つの分割瞳の光量は均一だが、図７（Ｂ）ではヴィネッティングにより両者の光量比に偏りが生じている。このため、図７（Ａ）と図７（Ｂ）とを比較すると、輝度飽和したＡ＋Ｂ画像とＡ画像とから同一の係数データを用いて正確なＢ画像を推定することは困難であると分かる。図７（Ｃ）は、図７（Ｂ）と同一像高（光軸に垂直な平面内で光軸から同一の距離の位置）でアジムス（光軸に垂直な平面内で光軸から外周へ向かう方位角）が異なる場合である。この際も部分瞳の光量比が変化する。また、収差に関しても同様に像高とアジムスによって、２つの部分瞳の間で関係が変化する。このため係数データは、第１の画像２０６および第２の画像２０７の像高とアジムスとに基づいて選択（決定）されることが好ましい。 Here, the influence of vignetting will be described with reference to FIG. 7. FIG. 7 is a diagram showing the relationship between the split pupil, the image height, and vignetting. FIG. 7A shows the pupil on the optical axis of the imaging optical system 101a. The broken line in FIG. 7 represents the dividing line of the pupil divided by the two photoelectric conversion units. FIG. 7 (B) shows the pupil at an image height different from that in the case of FIG. 7 (A). In FIG. 7A, the light amounts of the two split pupils are uniform, but in FIG. 7B, the light amount ratio of the two is biased due to vignetting. Therefore, comparing FIGS. 7 (A) and 7 (B), it is found that it is difficult to estimate an accurate B image from the brightness-saturated A + B image and the A image using the same coefficient data. .. FIG. 7 (C) shows Azimuth (from the optical axis to the outer periphery in the plane perpendicular to the optical axis) at the same image height (position at the same distance from the optical axis in the plane perpendicular to the optical axis) as in FIG. 7 (B). This is the case when the azimuth to which the head is heading is different. At this time as well, the light intensity ratio of the partial pupil changes. Similarly, regarding aberrations, the relationship between the two partial pupils changes depending on the image height and the azimuth. Therefore, it is preferable that the coefficient data is selected (determined) based on the image heights of the first image 206 and the second image 207 and the azimuth.

なお、結像光学系１０１ａのヴィネッティングが無視できる場合（望遠レンズ）や、像高による収差変化が無視できる場合（小絞り）には、撮像画像全体に対して同一の係数データを用いてもよい。また、図７では結像光学系１０１ａの瞳が破線で二つに分割され、分割瞳が互いに交わらない場合を示したが、本実施例はこれに限定されるものではない。分割瞳が一部の領域で互いに重なり合っていてもよい（重なった領域では分割瞳同士で光量を分け合う）。また、Ａ画像とＢ画像との光量比の幾つかのパターン（１０：１、８：１、…、１：１、…、１：１０のような）に対して、それぞれ対応する係数データを記憶部１０３に複数記憶しておいてもよい。また画像処理部１０２は、第１の画像２０６および第２の画像２０７における結像光学系１０１ａのヴィネッティングに関する情報を取得し、ヴィネッティングに関する情報に基づいて該当する光量比の係数データを選択するように構成することもできる。 If the vignetting of the imaging optical system 101a can be ignored (telephoto lens) or the aberration change due to the image height can be ignored (small aperture), the same coefficient data can be used for the entire captured image. good. Further, FIG. 7 shows a case where the pupil of the imaging optical system 101a is divided into two by a broken line and the divided pupils do not intersect each other, but the present embodiment is not limited to this. The split pupils may overlap each other in some areas (in the overlapping areas, the split pupils share the amount of light). Further, the corresponding coefficient data is provided for some patterns (such as 10: 1, 8: 1, ..., 1: 1, ..., 1:10) of the light amount ratio between the A image and the B image. A plurality of storage units 103 may be stored. Further, the image processing unit 102 acquires information on the vignetting of the imaging optical system 101a in the first image 206 and the second image 207, and selects the corresponding coefficient data of the light amount ratio based on the information on the vignetting. It can also be configured as follows.

続いて、図５のステップＳ１０５において、画像処理部１０２は、入力画像（第１の画像２０６および第２の画像２０７）から第３の画像２０９を生成する。第３の画像２０９は、結像光学系１０１ａの第３の瞳を通過した光束に基づいて被写体空間を撮像した画像（Ｂ画像）の部分領域に相当する画像である。第３の瞳は、第１の瞳の一部であって、第２の瞳とは異なる。本実施例において、第３の瞳は第１の瞳から第２の瞳を除いた成分であり、第２の瞳と第３の瞳との和で第１の瞳が表される。画像処理部１０２は、入力層と出力層との間に複数の中間層を有するニューラルネットワーク２０８を用いて、第３の画像２０９を生成する。本実施例では、畳み込みニューラルネットワーク（ＣＮＮ：ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）が用いられる。ただし本実施例は、これに限定されるものではなく、ＧＡＮ（ＧｅｎｅｒａｔｉｖｅＡｄｖｅｒｓａｒｉａｌＮｅｔｗｏｒｋ）などの他のニューラルネットワークを用いてもよい。 Subsequently, in step S105 of FIG. 5, the image processing unit 102 generates a third image 209 from the input images (first image 206 and second image 207). The third image 209 is an image corresponding to a partial region of an image (B image) in which the subject space is captured based on the luminous flux passing through the third pupil of the imaging optical system 101a. The third pupil is a part of the first pupil and is different from the second pupil. In this embodiment, the third pupil is a component obtained by removing the second pupil from the first pupil, and the first pupil is represented by the sum of the second pupil and the third pupil. The image processing unit 102 generates a third image 209 by using a neural network 208 having a plurality of intermediate layers between the input layer and the output layer. In this embodiment, a convolutional neural network (CNN) is used. However, this embodiment is not limited to this, and other neural networks such as GAN (Generative Adversarial Network) may be used.

ここで、図１を参照して、ＣＮＮにより第３の画像２０９を生成する工程について詳述する。図１は、画像生成のネットワーク構造を示す図である。本実施例において、入力画像２２１は、第１の画像２０６と第２の画像２０７とがチャンネル方向にスタックされた画像である。第１の画像２０６と第２の画像２０７のそれぞれが複数のカラーチャンネルを有する場合、それらのチャンネル数の２倍のチャンネル数を持つ画像となる。 Here, with reference to FIG. 1, the step of generating the third image 209 by CNN will be described in detail. FIG. 1 is a diagram showing a network structure of image generation. In this embodiment, the input image 221 is an image in which the first image 206 and the second image 207 are stacked in the channel direction. When each of the first image 206 and the second image 207 has a plurality of color channels, the image has twice the number of channels.

ＣＮＮは複数の層構造になっており、各層で学習された係数データを用いた線型変換と非線型変換が実行される。線型変換は、入力されたデータとフィルタの畳み込み、及びバイアス（図１中のｂｉａｓ）との和で表現される。各層におけるフィルタおよびバイアスの値は、係数データにより決定される。非線形変換は、活性化関数（ＡｃｔｉｖａｔｉｏｎＦｕｎｃｔｉｏｎ）と呼ばれる非線型関数による変換である（図１中のＡＦ）。活性化関数の例としては、シグモイド関数やハイパボリックタンジェント関数などがある。本実施例では、活性化関数として、以下の式（１）で表されるＲｅＬＵ（ＲｅｃｔｉｆｉｅｄＬｉｎｅａｒＵｎｉｔ）が用いられる。 The CNN has a plurality of layer structures, and linear transformation and non-linear transformation using the coefficient data learned in each layer are executed. The linear transformation is represented by the sum of the input data, the convolution of the filter, and the bias (bias in FIG. 1). The filter and bias values for each layer are determined by the coefficient data. The non-linear transformation is a transformation by a non-linear function called an activation function (AF in FIG. 1). Examples of activation functions include sigmoid functions and hyperbolic tangent functions. In this embodiment, ReLU (Rectifier Unit) represented by the following equation (1) is used as the activation function.

式（１）において、ｍａｘは、引数のうち最大値を出力するＭＡＸ関数を表す。 In the equation (1), max represents a MAX function that outputs the maximum value among the arguments.

入力層に入力された入力画像２２１は、第１畳み込み層で複数のフィルタ２２２のそれぞれとのコンボリューションと、バイアスとの和を取られる。フィルタ２２２それぞれのチャンネル数は、入力画像２２１と一致し、入力画像２２１のチャンネル数が２以上の場合には３次元フィルタとなる（３次元目がチャンネル数を表す）。コンボリューションと和の結果は、活性化関数により非線形変換を施され、第１特徴マップ２２３が第１中間層に出力される。 The input image 221 input to the input layer is summed with the bias and the convolution with each of the plurality of filters 222 in the first convolution layer. The number of channels of each of the filters 222 matches the input image 221 and becomes a three-dimensional filter when the number of channels of the input image 221 is two or more (the third dimension represents the number of channels). The result of the convolution and the sum is subjected to a non-linear transformation by the activation function, and the first feature map 223 is output to the first intermediate layer.

次に、第２畳み込み層へ第１特徴マップ２２３が入力され、前述と同様に、複数のフィルタ２２４それぞれとのコンボリューションと、バイアスとの和が取られる。その結果を非線形変換し、以下同様に畳み込み層の数だけ繰り返す。最後は、第Ｎ－１中間層の第Ｎ－１特徴マップ２３１を第Ｎ畳み込み層へ入力し、第３の画像２０９を得る。ここで、Ｎは３以上の整数である。第Ｎ層のフィルタ２３２の数は、第３の画像２０９のチャンネル数と一致する。また、第３の画像２０９を生成する最後の畳み込み層では、非線形変換を実行しなくてよい。図１において、第３の画像２０９は、入力画像２２１より画像サイズが小さくなっている。これは、畳み込み層において、入力画像２２１（または特徴マップ）のデータが存在する領域のみでコンボリューションを実行しているためである。入力画像２２１（または特徴マップ）の周囲をゼロなどで埋めることや逆畳み込み層を利用することにより、画像サイズを不変にすることが可能である。 Next, the first feature map 223 is input to the second convolution layer, and the sum of the convolution with each of the plurality of filters 224 and the bias is taken as described above. The result is non-linearly transformed and repeated for the number of convolution layers in the same manner. Finally, the N-1 feature map 231 of the N-1 intermediate layer is input to the N-convolution layer to obtain a third image 209. Here, N is an integer of 3 or more. The number of filters 232 in the Nth layer matches the number of channels in the third image 209. Also, the final convolutional layer that produces the third image 209 does not have to perform a non-linear transformation. In FIG. 1, the image size of the third image 209 is smaller than that of the input image 221. This is because the convolution is executed only in the region where the data of the input image 221 (or the feature map) exists in the convolution layer. It is possible to make the image size invariant by filling the periphery of the input image 221 (or feature map) with zeros or the like or by using a deconvolution layer.

続いて、図５のステップＳ１０６において、画像処理部１０２は、所定の領域に対して第３の画像２０９を生成し終えたか否かを判定する。第３の画像２０９の生成が完了していない場合、ステップＳ１０３へ戻り、画像処理部１０２は所定の領域から新たに第１の画像２０６および第２の画像２０７を抽出する。一方、第３の画像の生成が完了している場合、ステップＳ１０７へ進む。リフォーカス画像などを生成する場合、撮像画像全体のＢ画像が必要になるため、輝度飽和領域２０３の全てに対して、第３の画像２０９を生成する必要がある。焦点検出が目的の場合、指定されたフォーカスポイントの近傍のみで第３の画像２０９を生成すればよい。所定の領域の全てに対して第３の画像２０９を生成した場合、図６に示されるように、Ｂ画像（推定Ｂ画像）における第１の領域２１０が生成される。 Subsequently, in step S106 of FIG. 5, the image processing unit 102 determines whether or not the third image 209 has been generated for the predetermined area. If the generation of the third image 209 is not completed, the process returns to step S103, and the image processing unit 102 newly extracts the first image 206 and the second image 207 from the predetermined area. On the other hand, when the generation of the third image is completed, the process proceeds to step S107. When generating a refocused image or the like, a B image of the entire captured image is required, so it is necessary to generate a third image 209 for all of the luminance saturation regions 203. When the focus detection is the purpose, the third image 209 may be generated only in the vicinity of the designated focus point. When the third image 209 is generated for all of the predetermined areas, the first area 210 in the B image (estimated B image) is generated as shown in FIG.

続いてステップＳ１０７において、画像処理部１０２は、Ａ＋Ｂ画像２０１の第２の領域２１１とＡ画像２０２の第２の領域２１２との差分に基づいて、第４の画像２１３を生成する。すなわち画像処理部１０２は、第２の領域２１１から第２の領域２１２を減算することにより、Ｂ画像における第２の領域に相当する第４の画像２１３を取得する。Ａ＋Ｂ画像２０１の第２の領域２１１には輝度飽和が存在しないため、画像処理部１０２は、減算処理によりＢ画像を求めることができる。輝度飽和が存在する領域のみをニューラルネットワーク２０８を用いて推定することにより、演算負荷を軽減することができる。なお、ステップＳ１０７は、ステップＳ１０２とステップＳ１０８との間であれば、いつ実行しても構わない。 Subsequently, in step S107, the image processing unit 102 generates a fourth image 213 based on the difference between the second region 211 of the A + B image 201 and the second region 212 of the A image 202. That is, the image processing unit 102 acquires the fourth image 213 corresponding to the second region in the B image by subtracting the second region 212 from the second region 211. Since there is no luminance saturation in the second region 211 of the A + B image 201, the image processing unit 102 can obtain the B image by subtraction processing. By estimating only the region where the luminance saturation exists by using the neural network 208, the calculation load can be reduced. Note that step S107 may be executed at any time between steps S102 and S108.

続いてステップＳ１０８において、画像処理部１０２は、第３の画像２０９（第１の領域２１０）と第４の画像２１３（第２の領域）とを合成することにより、第５の画像２１４を生成する。第５の画像２１４は、推定Ｂ画像である。 Subsequently, in step S108, the image processing unit 102 generates a fifth image 214 by synthesizing the third image 209 (first region 210) and the fourth image 213 (second region). do. The fifth image 214 is an estimated B image.

なお、ステップＳ１０７では、第２の領域２１１、２１２に関わらず、Ａ＋Ｂ画像２０１およびＡ画像２０２の全体に渡って差分をとってもよい。輝度飽和している領域は正しくＢ画像が求まらないが、その領域の信号を第１の領域２１０で置換することにより推定Ｂ画像（第５の画像２１４）を取得することができる。 In step S107, the difference may be taken over the entire A + B image 201 and A image 202 regardless of the second regions 211 and 212. Although the B image cannot be obtained correctly in the region where the luminance is saturated, the estimated B image (fifth image 214) can be obtained by substituting the signal in that region with the first region 210.

以上の処理により、Ａ＋Ｂ画像に輝度飽和がある場合でも、高精度にＢ画像を推定することができる。Ａ画像と推定Ｂ画像とを用いることにより、位相差ＡＦによる焦点検出、視差によるデプスマップの推定、および、リフォーカスなどが実行可能となる。 By the above processing, the B image can be estimated with high accuracy even when the A + B image has luminance saturation. By using the A image and the estimated B image, it is possible to perform focus detection by phase difference AF, estimation of depth map by parallax, refocusing, and the like.

次に、図８を参照して、係数データの学習に関して説明する。図８は、係数データの学習に関するフローチャートである。本実施例において、学習は撮像装置１００以外の画像処理装置で事前に実行され、その結果（複数の係数データ）が記憶部１０３に記憶されている。ただし本実施例は、これに限定されるものではなく、撮像装置１００内に学習を実行する部位が存在していてもよい。 Next, the learning of the coefficient data will be described with reference to FIG. FIG. 8 is a flowchart for learning the coefficient data. In this embodiment, the learning is executed in advance by an image processing device other than the image pickup device 100, and the result (a plurality of coefficient data) is stored in the storage unit 103. However, the present embodiment is not limited to this, and a portion for executing learning may exist in the image pickup apparatus 100.

まず、ステップＳ２０１において、画像処理装置は、複数の学習ペアを取得する。学習ペアとは、既知のＡ＋Ｂ画像、Ａ画像、Ｂ画像と、そこから抽出された第１の画像、第２の画像、第３の正解画像である。第１の画像および第２の画像の大きさは、図５および図６に示される第３の画像の生成処理と同じである。第３の正解画像は、第３の画像と同じサイズである。Ａ＋Ｂ画像、Ａ画像、Ｂ画像は、実際の被写体を実写した画像でもよいし、ＣＧ（コンピュータ・グラフィクス）を用いた画像でもよい。学習ペアは、実際にＢ像の推定を行う画像の撮像系（本実施例では撮像部１０１）と略同一の構成で撮像された（撮像はＣＧによるシミュレーションでも可）画像から抽出することが好ましい。また、同一の係数データの算出に用いる学習ペアは、全て略同一のヴィネッティングや収差になっていることが好ましい。Ａ＋Ｂ画像が輝度飽和している場合でのＢ画像の推定を行うため、学習ペアには必ず輝度飽和した第１の画像が含まれていなければならない。 First, in step S201, the image processing apparatus acquires a plurality of learning pairs. The learning pair is a known A + B image, A image, B image, and a first image, a second image, and a third correct answer image extracted from the known A + B image, A image, and B image. The sizes of the first image and the second image are the same as those of the third image generation process shown in FIGS. 5 and 6. The third correct image has the same size as the third image. The A + B image, the A image, and the B image may be an image of an actual subject or an image using CG (computer graphics). It is preferable that the learning pair is extracted from an image captured with substantially the same configuration as the image imaging system (imaging unit 101 in this embodiment) that actually estimates the B image (imaging can be performed by CG simulation). .. Further, it is preferable that all the learning pairs used for calculating the same coefficient data have substantially the same vignetting and aberration. In order to estimate the B image when the A + B image is saturated in brightness, the learning pair must always include the first image in which the brightness is saturated.

続いてステップＳ２０２において、画像処理装置は、複数の学習ペアから係数データを生成する。学習の際には、ステップＳ１０５の第３の画像の生成と同じネットワーク構造を用いる。本実施例では、図１に示されるネットワーク構造に対して第１の画像および第２の画像を入力し、その出力結果（推定された第３の画像）と第３の正解画像との誤差を算出する。この誤差が最小となるように、例えば誤差逆伝播法（Ｂａｃｋｐｒｏｐａｇａｔｉｏｎ）などを用いて、各層で用いるフィルタの係数とバイアス（係数データ）を更新、最適化する。フィルタの係数とバイアスの初期値はそれぞれ任意の値を用いることができ、例えば乱数から決定される。または、各層ごとに初期値を事前学習するＡｕｔｏＥｎｃｏｄｅｒなどのプレトレーニングを行ってもよい。 Subsequently, in step S202, the image processing device generates coefficient data from a plurality of learning pairs. During learning, the same network structure as the generation of the third image in step S105 is used. In this embodiment, the first image and the second image are input to the network structure shown in FIG. 1, and the error between the output result (estimated third image) and the third correct image is calculated. calculate. In order to minimize this error, the coefficient and bias (coefficient data) of the filter used in each layer are updated and optimized by using, for example, backpropagation. Arbitrary values can be used for the coefficient of the filter and the initial value of the bias, and are determined from, for example, a random number. Alternatively, pre-training such as Auto Encoder that pre-learns the initial value for each layer may be performed.

学習ペアを全てネットワーク構造へ入力し、それら全ての情報を使って係数データを更新する手法をバッチ学習と呼ぶ。ただし、この学習方法は学習ペアの数が増えるにつれて、演算負荷が膨大になる。逆に、係数データの更新に１つの学習ペアのみを使用し、更新ごとに異なる学習ペアを使用する学習手法をオンライン学習と呼ぶ。この手法は、学習ペアが増えても計算量が増大しない利点があるが、その代わりに１つの学習ペアに存在するノイズの影響を大きく受ける。このため、これら２つの手法の中間に位置するミニバッチ法を用いて学習することが好ましい。ミニバッチ法は、全学習ペアの中から少数を抽出し、それらを用いて係数データを更新する。次の更新では、異なる小数の学習ペアを抽出して使用する。これを繰り返すことにより、バッチ学習とオンライン学習の欠点を小さくすることができ、デプスの推定精度が向上しやすくなる。 The method of inputting all the learning pairs into the network structure and updating the coefficient data using all the information is called batch learning. However, in this learning method, the computational load becomes enormous as the number of learning pairs increases. Conversely, a learning method that uses only one learning pair to update the coefficient data and uses a different learning pair for each update is called online learning. This method has the advantage that the amount of calculation does not increase even if the number of learning pairs increases, but instead, it is greatly affected by the noise existing in one learning pair. Therefore, it is preferable to study using the mini-batch method located between these two methods. The mini-batch method extracts a small number from all the training pairs and uses them to update the coefficient data. In the next update, we will extract and use different fractional learning pairs. By repeating this, the drawbacks of batch learning and online learning can be reduced, and the accuracy of depth estimation can be easily improved.

続いてステップＳ２０３において、画像処理装置は、学習された係数データを出力する。様々な瞳の大きさ、またはヴィネッティングや収差に対して、同様の学習を繰り返すことにより、複数の係数データを取得することができる。本実施例において、係数データは記憶部１０３に記憶される。 Subsequently, in step S203, the image processing device outputs the learned coefficient data. By repeating the same learning for various pupil sizes, vignetting, and aberrations, a plurality of coefficient data can be acquired. In this embodiment, the coefficient data is stored in the storage unit 103.

次に、本発明の効果を高めるために好ましい条件に関して説明する。図５のステップＳ１０２において、Ａ＋Ｂ像とＡ像とが共に全て輝度飽和している領域を第３の領域とし、異なる処理を実行することが好ましい。ステップＳ１０５において、輝度飽和していない第２の画像（Ａ像の部分領域）または第１の画像（Ａ＋Ｂ画像の部分領域）の輝度飽和していない色をヒントとして、第１の画像の輝度飽和領域における第３の画像を推定する。このため、入力画像が全て輝度飽和している場合、該当領域には第３の画像を推定するヒントが存在しない。したがって、図１のネットワークでは第３の画像を高精度に推定することができない。また、仮にＢ画像を推定することができたとしても、Ａ画像が輝度飽和しているため、焦点検出などを行うことはできない。この場合、演算負荷の低減として、第３の領域では輝度飽和値で第３の画像を生成することが好ましい。または、ＣＮＮなどによるインペインティングを用いて、輝度飽和の周辺から第３の領域の信号（輝度飽和値を超える値を有する）を推定することもできる。この際、Ａ画像から輝度飽和以外の領域も含まれるように部分領域を抽出してインペインティングを行う。Ａ＋Ｂ画像にも同様にインペインティングを行うことにより、その差分からＢ画像を推定することができる。 Next, preferable conditions for enhancing the effect of the present invention will be described. In step S102 of FIG. 5, it is preferable that the region where both the A + B image and the A image are saturated in luminance is set as the third region and different processes are executed. In step S105, the brightness saturation of the first image is hinted at by the color of the second image (partial region of the A image) or the first image (partial region of the A + B image) that is not saturated in brightness. Estimate a third image in the region. Therefore, when all the input images are saturated in brightness, there is no hint for estimating the third image in the corresponding region. Therefore, the network of FIG. 1 cannot estimate the third image with high accuracy. Further, even if the B image can be estimated, the focus detection cannot be performed because the A image is saturated in brightness. In this case, in order to reduce the calculation load, it is preferable to generate a third image with a luminance saturation value in the third region. Alternatively, the signal in the third region (having a value exceeding the luminance saturation value) can be estimated from the periphery of the luminance saturation by using inpainting by CNN or the like. At this time, a partial region is extracted from the A image so as to include a region other than the luminance saturation, and inpainting is performed. By inpainting the A + B image in the same manner, the B image can be estimated from the difference.

また、第３の画像の推定と同時に第２の画像のデノイジングを行うことが好ましい。Ａ＋Ｂ画像は、Ａ画像とＢ画像との和のため、Ａ画像よりも低ノイズである。このため、Ａ＋Ｂ画像を参照することにより、Ａ画像のデノイジングを行うことができる。この場合、係数データの学習の際、既知の低ノイズのＡ画像とＢ画像とを用意する。ここからシミュレーションによりノイズを付与したＡ画像とＢ画像を生成し、これらを加算することによりＡ＋Ｂ画像を生成する。ニューラルネットワークに入力するデータは、ノイズを付与したＡ画像とＡ＋Ｂ画像各々から第２の画像および第１の画像を抽出して取得する。ニューラルネットワークの出力は、第３の画像とデノイジングされた第２の画像の２枚とし、低ノイズのＡ画像とＢ画像とから抽出した第３の正解画像と第２の正解画像とを比較することで誤差を算出する。このようにして学習された係数データを用いることにより、第３の画像の推定と同時にデノイジングされた第２の画像を生成するネットワークを実現することができる。 Further, it is preferable to perform denoising of the second image at the same time as estimating the third image. The A + B image has lower noise than the A image because of the sum of the A image and the B image. Therefore, the A image can be denoised by referring to the A + B image. In this case, when learning the coefficient data, a known low-noise A image and B image are prepared. From here, an A image and a B image to which noise is added are generated by simulation, and an A + B image is generated by adding these. The data to be input to the neural network is acquired by extracting the second image and the first image from each of the noise-added A image and the A + B image. The output of the neural network is two images, a third image and a denoized second image, and the third correct image and the second correct image extracted from the low noise A image and the B image are compared. The error is calculated by this. By using the coefficient data learned in this way, it is possible to realize a network that generates a second image denoized at the same time as estimating the third image.

より好ましくは、ノイズレベルにより係数データを変更する。これによって、より高精度なデノイジングを実現することができる。第１の画像および第２の画像のノイズレベル（ノイズに関する情報）は、撮像時のＩＳＯ感度などから見積もることができる。また、画像中の平坦部における信号の分散などから推定することもできる。ノイズに関する情報を取得し、複数のノイズレベルそれぞれに対して学習された複数の係数データから、該当する係数データを選択して使用する。 More preferably, the coefficient data is changed according to the noise level. This makes it possible to realize more accurate denoising. The noise level (information about noise) of the first image and the second image can be estimated from the ISO sensitivity at the time of imaging and the like. It can also be estimated from the dispersion of signals in the flat portion in the image. Information on noise is acquired, and the corresponding coefficient data is selected and used from a plurality of coefficient data learned for each of a plurality of noise levels.

また本実施例では、撮像画像を二つの分割領域（第１の分割領域および第２の分割領域）に分割し、一方を反転することで係数データの量を削減することができる。これに関して、図９を参照して説明する。図９は、各像高とアジムスでの瞳分割の説明図である。図９はＡ＋Ｂ画像を示し、×印の像高およびアジムスにおける分割瞳を×印の横に描画している。図９中の破線は瞳の分割線（分割直線）である。図９に示されるように、本実施例では一点鎖線を軸としてＡ＋Ｂ画像の上下いずれか一方を反転すると、他方の瞳分割と重なり、線対称になっている。このため、一点鎖線の上下いずれか一方の領域に関して係数データを保持しておけば、他方は画像を反転することで第３の画像（Ｂ画像の一部）が推定することができる。 Further, in this embodiment, the amount of coefficient data can be reduced by dividing the captured image into two divided regions (a first divided region and a second divided region) and inverting one of them. This will be described with reference to FIG. FIG. 9 is an explanatory diagram of each image height and pupil division in Azymuth. FIG. 9 shows an A + B image, and the image height of the x mark and the split pupil in Azymuth are drawn next to the x mark. The broken line in FIG. 9 is a dividing line (dividing straight line) of the pupil. As shown in FIG. 9, in this embodiment, when either the upper or lower side of the A + B image is inverted with the alternate long and short dash line as the axis, it overlaps with the pupil division of the other and becomes line symmetric. Therefore, if the coefficient data is retained for either the upper or lower region of the alternate long and short dash line, the third image (a part of the B image) can be estimated by inverting the image of the other.

本実施例では水平方向に瞳を分割しているため、対称軸は水平な直線であるが、仮に垂直方向に瞳を分割していると対称軸も垂直な直線になる。これをさらに一般的に表現すると、以下のようになる。分割した瞳の関係が画像全体に対して線対称となる軸は、結像光学系１０１ａの光軸を通過し、かつ光軸上で各分割瞳が線対称になる共通の軸（Ａ画像およびＢ画像のそれぞれの光軸上における瞳に対して共通する線対称の軸）と平行である。この対称軸を分割線としてＡ＋Ｂ画像とＡ画像とをそれぞれ二つに分割し、一方の分割された領域では抽出された第１の画像および第２の画像を分割線に対して反転して入力画像を取得する。係数データは、同じ像高でアジムスが正負反転した係数データを使用する。生成された第３の画像を反転し直すことで、第１の画像および第２の画像に対応する第３の画像が推定できる。これにより、係数データを全アジムス（－１８０°～１８０°）で保持する必要がなくなり、データ容量を半分にすることができる。 In this embodiment, since the pupil is divided in the horizontal direction, the axis of symmetry is a horizontal straight line, but if the pupil is divided in the vertical direction, the axis of symmetry is also a vertical straight line. This can be expressed more generally as follows. The axis in which the relationship between the divided pupils is line-symmetrical with respect to the entire image passes through the optical axis of the imaging optical system 101a, and the common axis (A image and) in which each divided pupil is line-symmetrical on the optical axis. It is parallel to the axis of line symmetry common to the pupil on each optical axis of the B image). The A + B image and the A image are each divided into two with this axis of symmetry as the dividing line, and the extracted first image and the second image are inverted and input with respect to the dividing line in one of the divided areas. Get an image. As the coefficient data, the coefficient data in which Azymuth is positive or negative inverted at the same image height is used. By re-inverting the generated third image, a third image corresponding to the first image and the second image can be estimated. This eliminates the need to hold the coefficient data for all azimuths (-180 ° to 180 °), and the data capacity can be halved.

なお、係数データの学習、および第３の画像の生成を行う際に扱う画像は、ＲＡＷ画像でも現像後の画像でもよい。Ａ＋Ｂ画像とＡ画像とが符号化されている場合、復号してから学習および生成を行う。学習に使用した画像と生成の入力画像でガンマ補正の有無や、ガンマ値が異なる場合には、入力画像を処理して学習の画像に合わせることが好ましい。また、Ａ＋Ｂ画像とＡ画像（学習の際はＢ画像も）は、ニューラルネットワークへ入力する前に信号値を規格化しておくことが好ましい。規格化しない場合において学習と生成時にｂｉｔ数が異なっていると、第３の画像が正しく推定できない。また、ｂｉｔ数に応じてスケールが変化するため、学習時の最適化で収束に影響を及ぼす可能性もある。規格化には、信号が実際に取り得る最大値（輝度飽和値）を用いる。例えばＡ＋Ｂ画像が１６ｂｉｔで保存されていたとしても、輝度飽和値は１２ｂｉｔの場合などがあり、この際は１２ｂｉｔの最大値（４０９５）で規格化しなければ信号の範囲が０～１にならない。また、規格化の際はオプティカルブラックの値を減算することが好ましい。これにより、実際に画像が取り得る信号の範囲をより０～１に近づけることができる。具体的には、以下の式（２）に従って規格化することが好ましい。 The image handled when learning the coefficient data and generating the third image may be a RAW image or a developed image. When the A + B image and the A image are encoded, the learning and generation are performed after decoding. If the image used for training and the generated input image have different gamma corrections and gamma values, it is preferable to process the input image to match the training image. Further, it is preferable that the signal values of the A + B image and the A image (also the B image at the time of learning) are standardized before being input to the neural network. In the case of no standardization, if the number of bits is different at the time of learning and generation, the third image cannot be estimated correctly. In addition, since the scale changes according to the number of bits, optimization during learning may affect convergence. For normalization, the maximum value (luminance saturation value) that the signal can actually take is used. For example, even if the A + B image is stored in 16 bits, the luminance saturation value may be 12 bits, and in this case, the signal range cannot be 0 to 1 unless it is standardized by the maximum value (4095) of 12 bits. Further, it is preferable to subtract the value of optical black at the time of standardization. As a result, the range of signals that can actually be captured by the image can be made closer to 0 to 1. Specifically, it is preferable to standardize according to the following formula (2).

式（２）において、ｓはＡ＋Ｂ画像（またはＡ画像もしくはＢ画像）の信号、ｓ_ＯＢはオプティカルブラックの信号値（画像が取り得る信号の最小値）、ｓ_ｓａｔｕは信号の輝度飽和値、ｓ_ｎｏｒは規格化された信号を示す。 In equation (2), s is the signal of the A + B image (or A image or B image), s _OB is the signal value of optical black (minimum value of the signal that the image can take), s _satu is the luminance saturation value of the signal, and s. _nor indicates a standardized signal.

本実施例によれば、輝度飽和が発生している場合でも高精度に瞳を分割した画像を推定することが可能な画像処理方法、画像処理装置、および、撮像装置を提供することができる。 According to this embodiment, it is possible to provide an image processing method, an image processing device, and an image pickup device capable of estimating an image in which the pupil is divided with high accuracy even when luminance saturation occurs.

次に、本発明の実施例２における画像処理システムについて説明する。本実施例では、第３の画像を推定する画像処理装置、撮像画像を取得する撮像装置、および、学習を行うサーバが個別に存在している。 Next, the image processing system according to the second embodiment of the present invention will be described. In this embodiment, there are individually an image processing device that estimates a third image, an image pickup device that acquires a captured image, and a server that performs learning.

図１０および図１１を参照して、本実施例における画像処理システムについて説明する。図１０は、画像処理システム３００のブロック図である。図１１は、画像処理システム３００の外観図である。図１０および図１１に示されるように、画像処理システム３００は、撮像装置３０１、画像処理装置３０２、サーバ３０６、表示装置３０９、記録媒体３１０、および、出力装置３１１を備えて構成される。 The image processing system in this embodiment will be described with reference to FIGS. 10 and 11. FIG. 10 is a block diagram of the image processing system 300. FIG. 11 is an external view of the image processing system 300. As shown in FIGS. 10 and 11, the image processing system 300 includes an image pickup device 301, an image processing device 302, a server 306, a display device 309, a recording medium 310, and an output device 311.

撮像装置３０１の基本構成は、第３の画像を生成する画像処理部、および撮像部を除いて、図２に示される撮像装置１００と同様である。撮像装置３０１の撮像素子は、図１２に示されるように構成されている。図１２は、本実施例における撮像素子の構成図である。図１２において、破線はマイクロレンズを示す。画素３２０（ａ、ｂ以降は省略）のそれぞれには４つの光電変換部３２１、３２２、３２３、３２４（ａ、ｂ以降は省略）が設けられ、結像光学系の瞳を２×２の四つに分割している。光電変換部３２１～３２４で取得される画像を、順に、Ａ画像、Ｂ画像、Ｃ画像、Ｄ画像とし、それらの加算結果をＡＢＣＤ画像とする。撮像素子からは撮像画像として、ＡＢＣＤ画像とＡ画像、Ｃ画像、Ｄ画像の４画像が出力される。 The basic configuration of the image pickup apparatus 301 is the same as that of the image pickup apparatus 100 shown in FIG. 2, except for the image processing unit that generates the third image and the image pickup unit. The image pickup device of the image pickup device 301 is configured as shown in FIG. FIG. 12 is a block diagram of the image pickup device in this embodiment. In FIG. 12, the dashed line indicates a microlens. Each of the pixels 320 (omitted after a and b) is provided with four photoelectric conversion units 321, 322, 323, 324 (omitted after a and b), and the pupil of the imaging optical system is 2 × 2. It is divided into two. The images acquired by the photoelectric conversion units 321 to 324 are, in order, an A image, a B image, a C image, and a D image, and the addition result thereof is used as an ABCD image. The image sensor outputs four images, an ABCD image, an A image, a C image, and a D image, as captured images.

撮像装置３０１と画像処理装置３０２とが接続されると、撮像画像は記憶部３０３に記憶される。画像処理装置３０２は、画像生成部３０４にて撮像画像から推定Ｂ画像（第３の画像の集合）を生成する。この際、画像処理装置３０２は、ネットワーク３０５を介してサーバ３０６にアクセスし、生成に用いる係数データを読み出す。係数データは、学習部３０８で予め学習され、記憶部３０７に記憶されている。係数データは、複数のレンズ、焦点距離、Ｆ値などにより個別に学習されており、複数の係数データが存在する。 When the image pickup device 301 and the image processing device 302 are connected, the captured image is stored in the storage unit 303. The image processing device 302 generates an estimated B image (a set of third images) from the captured image by the image generation unit 304. At this time, the image processing device 302 accesses the server 306 via the network 305 and reads out the coefficient data used for generation. The coefficient data is learned in advance in the learning unit 308 and stored in the storage unit 307. The coefficient data is individually learned by a plurality of lenses, focal lengths, F-numbers, and the like, and there are a plurality of coefficient data.

画像処理装置３０２は、入力された撮像画像に合致する条件の係数データを選択して記憶部３０３に取得し、第３の画像を生成する。生成された推定Ｂ画像は、リフォーカス処理などに使用され、処理後の撮像画像が表示装置３０９、記録媒体３１０、および、出力装置３１１の少なくとも一つに出力される。表示装置３０９は、例えば液晶ディスプレイやプロジェクタなどである。ユーザは、表示装置３０９を介して、処理途中の画像を確認しながら作業を行うことができる。記録媒体３１０は、例えば半導体メモリ、ハードディスク、ネットワーク上のサーバなどである。出力装置３１１は、プリンタなどである。画像処理装置３０２は、必要に応じて現像処理やその他の画像処理を行う機能を有する。 The image processing device 302 selects coefficient data under conditions that match the input captured image, acquires it in the storage unit 303, and generates a third image. The generated estimated B image is used for refocus processing and the like, and the captured image after the processing is output to at least one of the display device 309, the recording medium 310, and the output device 311. The display device 309 is, for example, a liquid crystal display or a projector. The user can perform the work while confirming the image in the process of processing via the display device 309. The recording medium 310 is, for example, a semiconductor memory, a hard disk, a server on a network, or the like. The output device 311 is a printer or the like. The image processing device 302 has a function of performing development processing and other image processing as needed.

次に、図１３を参照して、画像処理装置３０２の画像生成部３０４により実行される画像推定処理（第３の画像（Ｂ画像）の生成処理）について説明する。図１３は、画像推定処理（Ｂ画像の推定処理）に関するフローチャートである。図１３の各ステップは、主に、画像処理装置３０２（画像生成部３０４）により実行される。 Next, with reference to FIG. 13, an image estimation process (a third image (B image) generation process) executed by the image generation unit 304 of the image processing device 302 will be described. FIG. 13 is a flowchart relating to the image estimation process (B image estimation process). Each step of FIG. 13 is mainly executed by the image processing device 302 (image generation unit 304).

まず、ステップＳ３０１において、画像処理装置３０２は、第１の撮像画像および第２の撮像画像を取得する。本実施例において、第１の撮像画像はＡＢＣＤ画像であり、第２の撮像画像はＡ画像、Ｃ画像、および、Ｄ画像の３枚の画像である。続いてステップＳ３０２において、画像処理装置３０２は、第１の画像および第２の画像に基づいて入力画像を取得する。本実施例において、第１の画像はＡＢＣＤ画像から抽出され、第２の画像はＡ画像、Ｃ画像、および、Ｄ画像のそれぞれから抽出される。このため、第２の画像は３枚の画像である。本実施例では、第１の画像および第２の画像をチャンネル方向へスタックした４チャンネル画像を入力画像とする。 First, in step S301, the image processing device 302 acquires the first captured image and the second captured image. In this embodiment, the first captured image is an ABCD image, and the second captured image is three images, an A image, a C image, and a D image. Subsequently, in step S302, the image processing device 302 acquires an input image based on the first image and the second image. In this embodiment, the first image is extracted from the ABCD image and the second image is extracted from each of the A image, the C image, and the D image. Therefore, the second image is three images. In this embodiment, a 4-channel image in which the first image and the second image are stacked in the channel direction is used as an input image.

続いてステップＳ３０３において、画像処理装置３０２は、入力画像に対応する係数データを選択して取得する。続いてステップＳ３０４において、画像処理装置３０２は第３の画像を生成する。本実施例において、第３の画像の生成に用いるネットワークとしては、図１に示される畳み込みニューラルネットワークＣＮＮが用いられる。 Subsequently, in step S303, the image processing device 302 selects and acquires the coefficient data corresponding to the input image. Subsequently, in step S304, the image processing device 302 generates a third image. In this embodiment, the convolutional neural network CNN shown in FIG. 1 is used as the network used to generate the third image.

続いてステップＳ３０５において、画像処理装置３０２は、所定の領域に対して第３の画像を生成し終えたか否かを判定する。本実施例において、所定の領域は、撮像画像全体である。第３の画像の生成が完了していない場合、ステップＳ３０２へ戻り、画像処理装置３０２は新たな入力画像を取得する。一方、第３の画像の生成が完了している場合、ステップＳ３０６へ進む。ステップＳ３０６において、画像処理装置３０２は、生成された複数の第３の画像から推定Ｂ画像を生成する。 Subsequently, in step S305, the image processing device 302 determines whether or not the third image has been generated for the predetermined area. In this embodiment, the predetermined area is the entire captured image. If the generation of the third image is not completed, the process returns to step S302, and the image processing device 302 acquires a new input image. On the other hand, when the generation of the third image is completed, the process proceeds to step S306. In step S306, the image processing device 302 generates an estimated B image from the plurality of generated third images.

本実施例において、学習部３０８による係数データの学習は、実施例１と同様に、図８に示されるフローチャートに従って行われる。レンズ（結像光学系１０１ａ）に応じて収差やヴィネッティングが異なるため、レンズの種類ごとに学習ペアを作成し、係数データを学習する。また、撮像条件（焦点距離やＦ値など）や像高により収差とヴィネッティングの変化が無視できない場合、複数の撮像条件および像高ごとに学習ペアを作成して係数データを学習する。なお本実施例では、第２の撮像画像が３枚の画像である場合の例を挙げているが、逆に第２の撮像画像がＡ画像１枚で、第３の画像がＢ画像、Ｃ画像、および、Ｄ画像それぞれの一部で３枚ある構成としてもよい。 In this embodiment, the learning of the coefficient data by the learning unit 308 is performed according to the flowchart shown in FIG. 8, as in the first embodiment. Since aberrations and vignetting differ depending on the lens (imaging optical system 101a), a learning pair is created for each lens type and coefficient data is learned. If changes in aberration and vignetting cannot be ignored due to imaging conditions (focal length, F value, etc.) and image height, learning pairs are created for each of a plurality of imaging conditions and image height to learn coefficient data. In this embodiment, an example is given in which the second captured image is three images, but conversely, the second captured image is one A image, and the third image is the B image and C. There may be a configuration in which there are three images as a part of each of the image and the D image.

本実施例によれば、輝度飽和が発生している場合でも高精度に瞳を分割した画像を推定することが可能な画像処理システムを提供することができる。 According to this embodiment, it is possible to provide an image processing system capable of estimating an image in which the pupil is divided with high accuracy even when luminance saturation occurs.

次に、本発明の実施例３における撮像装置について説明する。本実施例の撮像装置は、多眼構成の撮像装置である。図１４は、撮像装置４００のブロック図である。図１５は、撮像装置４００の外観図である。 Next, the image pickup apparatus according to the third embodiment of the present invention will be described. The image pickup device of this embodiment is a multi-eye image pickup device. FIG. 14 is a block diagram of the image pickup apparatus 400. FIG. 15 is an external view of the image pickup apparatus 400.

撮像装置４００は撮像部４０１を有し、撮像部４０１は二つの結像光学系４０１ａ、４０１ｂを有する。二つの結像光学系４０１ａ、４０１ｂのそれぞれにより形成された像（被写体像、光学像）は、一つの撮像素子４０１ｃで受光される。この際、二つの像は撮像素子４０１ｃの異なる領域で受光される。本実施例では、結像光学系４０１ａ、４０１ｂのそれぞれを介して撮像された画像をそれぞれＡ画像、Ｂ画像とする。撮像素子４０１ｃは、Ａ画像とＢ画像とを加算したＡ＋Ｂ画像、および、Ａ画像を出力する。この際、Ａ＋Ｂ画像に対応する瞳（第１の瞳）は、結像光学系４０１ａ、４０１ｂのそれぞれの瞳を合算した瞳である。なお、その他の部位に関する説明は、実施例１と同様である。また本実施例では、撮像素子４０１ｃは一つであるが、結像光学系４０１ａ、４０１ｂのそれぞれに対応する二つの撮像素子が配列されていてもよい。この場合、画像処理部４０２は、二つの撮像素子のそれぞれの出力信号を加算した加算信号を第１の撮像画像として取得する。なお、第３の画像（Ｂ画像の一部）の生成と係数データの学習は、実施例１と同様である。 The image pickup apparatus 400 has an image pickup unit 401, and the image pickup unit 401 has two imaging optical systems 401a and 401b. The images (subject image, optical image) formed by each of the two imaging optical systems 401a and 401b are received by one image sensor 401c. At this time, the two images are received in different regions of the image sensor 401c. In this embodiment, the images captured via the imaging optical systems 401a and 401b are referred to as an A image and a B image, respectively. The image sensor 401c outputs an A + B image obtained by adding an A image and a B image, and an A image. At this time, the pupil corresponding to the A + B image (first pupil) is the pupil obtained by adding the pupils of the imaging optical systems 401a and 401b. The description of other parts is the same as that of the first embodiment. Further, in this embodiment, the image sensor 401c is one, but two image pickup elements corresponding to each of the image pickup optical systems 401a and 401b may be arranged. In this case, the image processing unit 402 acquires an addition signal obtained by adding the output signals of the two image pickup elements as the first image pickup image. The generation of the third image (a part of the B image) and the learning of the coefficient data are the same as in the first embodiment.

撮像装置４００は本実施例の画像処理方法を実行する画像処理部（画像処理装置）４０２を有し、画像処理部４０２は情報取得部（取得手段）４０２ａおよび画像生成部（生成手段）４０２ｂを有する。また撮像装置４００は、記憶部４０３、表示部４０４、記録媒体４０５、および、システムコントローラ４０６を有する。 The image pickup apparatus 400 has an image processing unit (image processing apparatus) 402 that executes the image processing method of the present embodiment, and the image processing unit 402 includes an information acquisition unit (acquisition means) 402a and an image generation unit (generation means) 402b. Have. Further, the image pickup apparatus 400 has a storage unit 403, a display unit 404, a recording medium 405, and a system controller 406.

本実施例によれば、輝度飽和が発生している場合でも高精度に瞳を分割した画像を推定することが可能な撮像装置を提供することができる。 According to this embodiment, it is possible to provide an image pickup apparatus capable of estimating an image in which the pupil is divided with high accuracy even when luminance saturation occurs.

（その他の実施例）
本発明は、上述の実施例の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other examples)
The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

各実施例によれば、輝度飽和が発生している場合でも高精度に瞳を分割した画像を推定することが可能な画像処理方法、画像処理装置、撮像装置、画像処理プログラム、および、記憶媒体を提供することができる。 According to each embodiment, an image processing method, an image processing device, an image pickup device, an image processing program, and a storage medium capable of estimating an image in which the pupil is divided with high accuracy even when luminance saturation occurs. Can be provided.

以上、本発明の好ましい実施例について説明したが、本発明はこれらの実施例に限定されず、その要旨の範囲内で種々の変形及び変更が可能である。 Although the preferred embodiments of the present invention have been described above, the present invention is not limited to these examples, and various modifications and modifications can be made within the scope of the gist thereof.

１０２画像処理部
１０２ａ情報取得部
１０２ｂ画像生成部
102 Image processing unit 102a Information acquisition unit 102b Image generation unit

Claims

To image the subject space through a first image obtained by imaging the subject space through the first pupil of the optical system and a second pupil which is a part of the first pupil. The process of acquiring the input image based on the second image obtained in
A third image corresponding to an image obtained by imaging the subject space through the third pupil of the optical system using a neural network having a plurality of intermediate layers between an input layer and an output layer is obtained. It has a step of generating from the input image and
An image processing method characterized in that the third pupil is a part of the first pupil and is different from the second pupil.

The image processing method according to claim 1, further comprising a step of selecting coefficient data to be used in the neural network from a plurality of coefficient data.

The first image and the second image are partial regions of the first captured image and the second captured image obtained by imaging the subject space through different pupils, respectively.
In the step of selecting the coefficient data, the coefficient data is selected based on the position of the first image in the first captured image and the position of the second image in the second captured image. 2. The image processing method according to claim 2.

The first image and the second image are partial regions of the first captured image and the second captured image obtained by imaging the subject space through different pupils, respectively.
2. The image processing method described in.

Further comprising a step of acquiring information regarding the vignetting of the optical system used for capturing the first image and the second image.
The image processing method according to any one of claims 2 to 4, wherein in the step of selecting the coefficient data, the coefficient data is selected based on the information regarding the vignetting.

Further comprising a step of acquiring information regarding noise of each of the first image and the second image.
The image processing method according to any one of claims 2 to 5, wherein in the step of selecting the coefficient data, the coefficient data is selected based on the information regarding the noise.

The noise contained in the first image is smaller than the noise contained in the second image.
Claims 1 to 6 are characterized in that, in the step of generating the third image, the neural network is used to generate the third image and the denosized second image from the input image. The image processing method according to any one of the above items.

The first pupil is the sum of the second pupil and the third pupil.
The image processing method according to any one of claims 1 to 7, wherein the input image includes an image obtained by subtracting a second image from the first image.

A step of extracting the first image and the second image from each of the first captured image and the second captured image obtained by imaging the subject space through different pupils.
A step of subtracting a second captured image from the first captured image to generate a fourth image,
Further comprising a step of synthesizing the third image and the fourth image to generate a fifth image.
The first pupil is the sum of the second pupil and the third pupil.
A claim characterized in that, in the step of extracting the first image and the second image, the first image is extracted so as to include a region in which the luminance is saturated in the first captured image. Item 6. The image processing method according to any one of Items 1 to 7.

A step of extracting the first image and the second image from each of the first captured image and the second captured image obtained by capturing the subject space with different pupils.
A step of dividing each of the first captured image and the second captured image into a first divided region and a second divided region by a dividing straight line.
It further comprises a step of extracting the first image and the second image from each of the first captured image and the second captured image.
The dividing straight line passes through the optical axis of the optical system and is parallel to the axis of line symmetry common to the pupils on the respective optical axes of the first captured image and the second captured image.
In the step of acquiring the input image, the first image extracted from the first divided region or the second divided region and the second image are inverted to acquire the input image. The image processing method according to any one of claims 1 to 9.

To image the subject space through the first image obtained by imaging the subject space through the first pupil of the optical system and the second pupil which is a part of the first pupil. The acquisition means for acquiring the input image based on the second image obtained in
A third image corresponding to an image obtained by imaging the subject space through the third pupil of the optical system using a neural network having a plurality of intermediate layers between an input layer and an output layer is obtained. It has a generation means for generating from the input image, and has.
An image processing apparatus characterized in that the third pupil is a part of the first pupil and is different from the second pupil.

The image processing apparatus according to claim 11, further comprising a storage means for storing coefficient data used in the neural network.

An image pickup element that photoelectrically converts an optical image formed by an optical system,
An image pickup apparatus comprising the image processing apparatus according to claim 11 or 12.

The image pickup device has a plurality of pixels and has a plurality of pixels.
Each of the plurality of pixels includes a first and second photoelectric conversion unit that receives light incident at different angles of incidence and generates first and second signals.
The image pickup device includes a first image pickup image corresponding to an addition signal obtained by adding the first and second signals, and a second image pickup image corresponding to one of the first and second signals. Is output,
13. The image pickup apparatus according to claim 13, wherein the first image is a partial region of the first captured image, and the second image is a partial region of the second captured image.

An image processing program comprising causing a computer to execute the image processing method according to any one of claims 1 to 10.

A storage medium for storing the image processing program according to claim 15.