JP2023160153A

JP2023160153A - Imaging apparatus, method for controlling imaging apparatus, and program

Info

Publication number: JP2023160153A
Application number: JP2022070301A
Authority: JP
Inventors: 浩靖形川; Hiroyasu Katagawa
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2022-04-21
Filing date: 2022-04-21
Publication date: 2023-11-02

Abstract

To provide an imaging apparatus that can improve the accuracy of a score map output by using a hierarchical neural network.SOLUTION: An imaging apparatus 101 comprises: a visible light image pick-up device 104 that receives visible light; and a non-visible light image pick-up device 105 that receives non-visible light. With a visible light image obtained by the visible light image pick-up device 104 as an input image, the imaging apparatus 101 creates a likelihood score map representing the attribute of an area of the input image by using a neural network 201. The imaging apparatus 101 corrects the likelihood score map based on a non-visible light image obtained by the non-visible light image pick-up device 105.SELECTED DRAWING: Figure 1

Description

本発明は、撮像装置、撮像装置の制御方法、及びプログラムに関する。 The present invention relates to an imaging device, a method of controlling the imaging device, and a program.

画像データから特徴量を抽出し、判別機を用いて画像データにおける被写体を判定する技術が知られている。このような技術の一つに、ニューラルネットワークの一種であるＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ（以下、「ＣＮＮ」とする。）がある。ＣＮＮは、局所的畳み込みの演算処理を複数段階で逐次的に行うという性質を持つ。関連する技術として、非特許文献１及び特許文献１の技術が提案されている。 2. Description of the Related Art There is a known technique for extracting feature amounts from image data and using a discriminator to determine a subject in the image data. One such technology is a Convolutional Neural Network (hereinafter referred to as "CNN"), which is a type of neural network. CNN has the property of sequentially performing local convolution processing in multiple stages. As related techniques, the techniques of Non-Patent Document 1 and Patent Document 1 have been proposed.

非特許文献１の技術では、画像がＣＮＮで演算処理され、関心領域（ＲｅｇｉｏｎｏｆＩｎｔｅｒｅｓｔ）毎にＣＮＮの最終層の特徴量が集計され、物体か否かが判定される。この処理が全関心領域に対して行われる。 In the technique of Non-Patent Document 1, an image is subjected to calculation processing using a CNN, and the feature amounts of the final layer of the CNN are aggregated for each region of interest, and it is determined whether the image is an object or not. This process is performed for all regions of interest.

また、特許文献１の技術では、階層型ニューラルネットワークの複数の層の出力を連結して連結階層特徴が生成され、その連結階層特徴を用いて空や建物、草や芝、肌等の属性を表すスコアマップが生成される。このスコアマップは、例えば、撮影時のホワイトバランス制御や露出制御に用いられる。 Furthermore, in the technology of Patent Document 1, connected hierarchical features are generated by connecting the outputs of multiple layers of a hierarchical neural network, and the connected hierarchical features are used to determine attributes such as the sky, buildings, grass, turf, skin, etc. A score map representing the This score map is used, for example, for white balance control and exposure control during photographing.

特開２０１９－３２７７３号公報JP 2019-32773 Publication

ＳｈａｏｑｉｎｇＲｅｎ, ＫａｉｍｉｎｇＨｅ, ＲｏｓｓＧｉｒｓｈｉｃｋ, ＪｉａｎＳｕｎ, ＦａｓｔｅｒＲ－ＣＮＮ：ＴｏｗａｒｄｓＲｅａｌ－ＴｉｍｅＯｂｊｅｃｔＤｅｔｅｃｔｉｏｎｗｉｔｈＲｅｇｉｏｎＰｒｏｐｏｓａｌＮｅｔｗｏｒｋｓ, ＮＩＰＳ２０１５Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, Faster R-CNN:Towards Real-Time Object Detection with Region Propos al Networks, NIPS 2015

しかしながら、上述した特許文献１の技術では、機械学習が行われていないホワイトバランスや露出状態において、正確なスコアマップを算出できない懸念がある。 However, with the technique of Patent Document 1 mentioned above, there is a concern that an accurate score map cannot be calculated in white balance and exposure conditions where machine learning is not performed.

そこで、本発明の目的は、階層型ニューラルネットワークを用いて出力されたスコアマップの精度を向上させることができる撮像装置、撮像装置の制御方法、及びプログラムを提供することである。 SUMMARY OF THE INVENTION Accordingly, an object of the present invention is to provide an imaging device, a control method for the imaging device, and a program that can improve the accuracy of a score map output using a hierarchical neural network.

上記目的を達成するために、本発明の撮像装置は、可視光を受光する可視光撮像手段と、非可視光を受光する非可視光撮像手段とを備える撮像装置であって、前記可視光撮像手段によって得られた可視光画像を入力画像として、階層型ニューラルネットワークを用いて前記入力画像の領域の属性を表すスコアマップを生成する生成手段と、前記非可視光撮像手段によって得られた非可視光画像に基づいて前記スコアマップを補正する補正手段とを有することを特徴とする。 In order to achieve the above object, an imaging device of the present invention is an imaging device comprising a visible light imaging means for receiving visible light and a non-visible light imaging means for receiving non-visible light. generating means for generating a score map representing the attributes of a region of the input image using a hierarchical neural network, using the visible light image obtained by the means as an input image; The present invention is characterized by comprising a correction means for correcting the score map based on the optical image.

本発明によれば、階層型ニューラルネットワークを用いて出力されたスコアマップの精度を向上させることができる。 According to the present invention, it is possible to improve the accuracy of a score map output using a hierarchical neural network.

本実施の形態に係る撮像装置の構成を概略的に示すブロック図である。FIG. 1 is a block diagram schematically showing the configuration of an imaging device according to the present embodiment. 図１の機械学習処理部の構成を概略的に示すブロック図である。FIG. 2 is a block diagram schematically showing the configuration of a machine learning processing section in FIG. 1. FIG. 図１の機械学習処理部によって行われるスコアマップ算出処理の手順を示すフローチャートである。2 is a flowchart showing the procedure of score map calculation processing performed by the machine learning processing section of FIG. 1. FIG. 図１の機械学習処理部の入力画像としての可視光画像の一例を示す図である。FIG. 2 is a diagram showing an example of a visible light image as an input image of the machine learning processing unit in FIG. 1. FIG. 図２の属性判定部による尤度スコアマップの生成を説明するための図である。FIG. 3 is a diagram for explaining generation of a likelihood score map by the attribute determination unit of FIG. 2; 図１の撮像装置によって用いられる画像の一例を示す図である。2 is a diagram illustrating an example of an image used by the imaging device of FIG. 1. FIG. 図１のスコアマップ算出部によって行われる空尤度スコアマップの補正を説明するための図である。FIG. 2 is a diagram for explaining correction of the null likelihood score map performed by the score map calculation unit of FIG. 1. FIG. 図１のスコアマップ算出部によって行われる草芝尤度スコアマップの補正を説明するための図である。FIG. 2 is a diagram for explaining correction of the grass and turf likelihood score map performed by the score map calculation unit of FIG. 1. FIG. 図１のスコアマップ算出部によって行われる肌尤度スコアマップの補正を説明するための図である。FIG. 2 is a diagram for explaining correction of the skin likelihood score map performed by the score map calculation unit of FIG. 1. FIG. 図１の撮像装置によって実行される制御処理の手順を示すフローチャートである。2 is a flowchart showing a procedure of control processing executed by the imaging device of FIG. 1. FIG. 図１のスコアマップ算出部によって行われる空尤度スコアマップの補正を説明するための図である。FIG. 2 is a diagram for explaining correction of the null likelihood score map performed by the score map calculation unit of FIG. 1. FIG. 図１のスコアマップ算出部によって行われる草芝尤度スコアマップの補正を説明するための図である。FIG. 2 is a diagram for explaining correction of the grass and turf likelihood score map performed by the score map calculation unit of FIG. 1. FIG. 図１のスコアマップ算出部によって行われる肌尤度スコアマップの補正を説明するための図である。FIG. 2 is a diagram for explaining correction of the skin likelihood score map performed by the score map calculation unit of FIG. 1. FIG.

以下に、本発明の好ましい実施の形態を、添付の図面に基づいて詳細に説明する。各図面を通じて同一の構成要素には同一の符号を付し、その説明を簡略化又は省略することがある。なお、以下に説明する実施の形態は単なる例示であり、本発明は実施の形態に記載された構成に限定されない。 Hereinafter, preferred embodiments of the present invention will be described in detail based on the accompanying drawings. The same components are given the same reference numerals throughout the drawings, and their explanations may be simplified or omitted. Note that the embodiment described below is merely an example, and the present invention is not limited to the configuration described in the embodiment.

図１は、本実施の形態に係る撮像装置１０１の構成を概略的に示すブロック図である。本実施の形態において、撮像装置１０１は、例えば、被写体の画像を撮像するデジタルスチルカメラやデジタルビデオカメラである。なお、撮像装置１０１は、デジタルスチルカメラやデジタルビデオカメラに限られず、撮影機能を備えるスマートフォンやタブレット端末等の携帯端末であっても良い。 FIG. 1 is a block diagram schematically showing the configuration of an imaging apparatus 101 according to this embodiment. In this embodiment, the imaging device 101 is, for example, a digital still camera or a digital video camera that captures an image of a subject. Note that the imaging device 101 is not limited to a digital still camera or a digital video camera, and may be a mobile terminal such as a smartphone or a tablet terminal that has a shooting function.

撮像装置１０１は、結像光学系１０２及び光分離部１０３からなる１つの光学系、可視光撮像素子１０４、及び非可視光撮像素子１０５を備える。更に撮像装置１０１は、制御部１０６、画像処理部１０７、機械学習処理部１０８、スコアマップ算出部１０９、メモリ１１０、及び表示部１１１を備える。 The imaging device 101 includes one optical system including an imaging optical system 102 and a light separation unit 103, a visible light imaging device 104, and a non-visible light imaging device 105. The imaging device 101 further includes a control section 106, an image processing section 107, a machine learning processing section 108, a score map calculation section 109, a memory 110, and a display section 111.

可視光撮像素子１０４及び非可視光撮像素子１０５は、上記光学系を共通の光学系とする撮像素子である。可視光撮像素子１０４は、上記光学系を通過した可視光を受光して画像信号を生成する。この画像信号に基づいて可視光画像が生成される。非可視光撮像素子１０５は、上記光学系を通過した非可視光である赤外光を受光して画像信号を生成する。この画像信号に基づいて非可視光画像が生成される。可視光撮像素子１０４及び非可視光撮像素子１０５は、それぞれＣＭＯＳセンサやＣＣＤセンサ等により構成されている。可視光撮像素子１０４及び非可視光撮像素子１０５は、それぞれ撮像面に結像された被写体像を電気信号に変換し、当該電気信号を画像信号として画像処理部１０７へ出力する。 The visible light image sensor 104 and the non-visible light image sensor 105 are image sensors that use the above optical system as a common optical system. The visible light image sensor 104 receives visible light that has passed through the optical system and generates an image signal. A visible light image is generated based on this image signal. The invisible light image sensor 105 receives infrared light, which is invisible light, that has passed through the optical system and generates an image signal. A non-visible light image is generated based on this image signal. The visible light image sensor 104 and the non-visible light image sensor 105 each include a CMOS sensor, a CCD sensor, or the like. The visible light imaging device 104 and the non-visible light imaging device 105 each convert the subject image formed on the imaging surface into an electrical signal, and output the electrical signal to the image processing unit 107 as an image signal.

結像光学系１０２は、単一のレンズ又は複数のレンズ群からなる。なお、結像光学系１０２は、ズーム、フォーカス、絞り、手振れ補正等の各制御機構の少なくとも１つを有していても良い。光分離部１０３は、波長選択プリズムで構成され、特定波長よりも短い波長の光（可視光）が波長選択プリズムを透過し、特定波長よりも長い波長の光（赤外光）が波長選択プリズムで反射されるように構成されている。なお、透過する／反射するとは、８０％以上の光が透過する／反射することを意味する。波長選択プリズムを透過した可視光成分は、光分離部１０３の後方に配置された可視光撮像素子１０４により光電変換され、画像化される。一方、波長選択プリズムで反射した赤外光成分は、光軸を通り配置された非可視光撮像素子１０５により光電変換され、画像化される。ここで、特定波長を、例えば、６００ｎｍ以上７５０ｎｍ以下とする。この場合、可視光と赤外光の境目は、６００ｎｍ以上７５０ｎｍ以下と定義される。また、赤外光は、例えば、特定波長から２５００ｎｍまでの波長の光に相当する。 The imaging optical system 102 consists of a single lens or a plurality of lens groups. Note that the imaging optical system 102 may include at least one of various control mechanisms such as zoom, focus, aperture, and camera shake correction. The light separation unit 103 is composed of a wavelength selection prism, in which light with a wavelength shorter than a specific wavelength (visible light) passes through the wavelength selection prism, and light with a wavelength longer than the specific wavelength (infrared light) passes through the wavelength selection prism. It is designed to be reflected by Note that transmitting/reflecting means that 80% or more of light is transmitted/reflected. The visible light component that has passed through the wavelength selection prism is photoelectrically converted by a visible light imaging device 104 placed behind the light separation unit 103 and converted into an image. On the other hand, the infrared light component reflected by the wavelength selection prism is photoelectrically converted by the non-visible light imaging device 105 disposed through the optical axis and converted into an image. Here, the specific wavelength is, for example, 600 nm or more and 750 nm or less. In this case, the boundary between visible light and infrared light is defined as 600 nm or more and 750 nm or less. Further, infrared light corresponds to light having a wavelength from a specific wavelength to 2500 nm, for example.

可視光撮像素子１０４を構成する画素は、ＲＧＢベイヤー配列のオンチップカラーフィルタを備えている。可視光撮像素子１０４から出力されるＲＧＢ形式の可視光画像には、輝度情報の他に色情報が含まれている。一方、非可視光撮像素子１０５から出力される非可視光画像には、輝度情報のみが含まれている。なお、可視光撮像素子１０４は、可視光をメインとした感度分布を持っていれば良く、可視光以外の感度分布を持っていても良い。また、非可視光撮像素子１０５は、赤外光をメインとした感度分布を持っていれば良く、赤外光以外の光に対しても感度分布を持っていても良い。可視光撮像素子１０４、非可視光撮像素子１０５の駆動及び画像信号の読み出しは、制御部１０６によって制御される。 The pixels constituting the visible light image sensor 104 include on-chip color filters in an RGB Bayer array. The RGB format visible light image output from the visible light image sensor 104 includes color information in addition to brightness information. On the other hand, the invisible light image output from the invisible light image sensor 105 includes only brightness information. Note that the visible light image sensor 104 only needs to have a sensitivity distribution mainly using visible light, and may have a sensitivity distribution other than visible light. Furthermore, the non-visible light imaging device 105 only needs to have a sensitivity distribution mainly for infrared light, and may also have a sensitivity distribution for light other than infrared light. Driving of the visible light image sensor 104 and the non-visible light image sensor 105 and reading of image signals are controlled by a control unit 106.

なお、本実施の形態では、光分離部１０３を用いて可視光撮像素子１０４と非可視光撮像素子１０５に異なる分光特性の光を導く構成について説明するが、本発明はこの構成に限られない。例えば、可視光撮像素子１０４と非可視光撮像素子１０５が別々の独立した光学系を有する、所謂２眼タイプであっても良い。このような構成においても、互いに同期した撮像を行う。 Note that although this embodiment describes a configuration in which light with different spectral characteristics is guided to the visible light image sensor 104 and the non-visible light image sensor 105 using the light separation unit 103, the present invention is not limited to this structure. . For example, the visible light image sensor 104 and the non-visible light image sensor 105 may have separate and independent optical systems, which is a so-called two-lens type. Even in such a configuration, mutually synchronized imaging is performed.

制御部１０６は、例えば、ＣＰＵ、ＭＰＵ、その他の専用演算回路等で構成され、撮像装置１０１全体の制御を司る。画像処理部１０７は、可視光撮像素子１０４と非可視光撮像素子１０５とからそれぞれ得られた画像信号に対して画像処理を行って、撮像素子毎の撮像画像データを生成する。この画像処理は、例えば、画素補間処理、色変換処理、画素欠陥補正やレンズ補正等の各種補正処理、黒レベルやフォーカスや露出等の調整を行うための検波処理、ホワイトバランス処理、ガンマ補正処理、エッジ強調処理、ノイズ抑制処理等を含む。また、この画像処理は、デモザイク処理を含む。例えば、ＲＧＢ形式で読み出された可視光画像にデモザイク処理を施すと、可視光画像はＹＵＶ形式の画像に変換される。また、非可視光画像にデモザイク処理を施すと、非可視光画像はＹＵＶ形式の画像に変換される。なお、非可視光画像から変換されたＹＵＶ形式の画像は、色情報を有さず、ＵとＶの値はゼロである。 The control unit 106 includes, for example, a CPU, an MPU, and other dedicated arithmetic circuits, and controls the entire imaging device 101 . The image processing unit 107 performs image processing on image signals obtained from the visible light image sensor 104 and the non-visible light image sensor 105, respectively, and generates captured image data for each image sensor. This image processing includes, for example, pixel interpolation processing, color conversion processing, various correction processing such as pixel defect correction and lens correction, detection processing for adjusting black level, focus, exposure, etc., white balance processing, and gamma correction processing. , edge enhancement processing, noise suppression processing, etc. Further, this image processing includes demosaic processing. For example, when a visible light image read out in RGB format is subjected to demosaic processing, the visible light image is converted to an image in YUV format. Furthermore, when the invisible light image is subjected to demosaic processing, the invisible light image is converted into a YUV format image. Note that the YUV format image converted from the invisible light image does not have color information, and the values of U and V are zero.

機械学習処理部１０８は、可視光画像を入力画像として、後述する図２のニューラルネットワーク２０１を用いて、「空」、「草芝」、「肌」等の属性毎のスコアマップを出力する。スコアマップ算出部１０９は、非可視光画像を用いて、機械学習処理部１０８によって得られた属性毎のスコアマップの補正処理を行う。 The machine learning processing unit 108 uses the visible light image as an input image and outputs a score map for each attribute such as "sky", "grass", "skin", etc. using the neural network 201 of FIG. 2, which will be described later. The score map calculation unit 109 uses the invisible light image to perform correction processing on the score map for each attribute obtained by the machine learning processing unit 108.

撮像装置１０１は、スコアマップ算出部１０９による補正処理済みのスコアマップを用いて、画像処理部１０７にてホワイトバランス制御を実行したり、可視光撮像素子１０４に対して制御部１０６を介して露出制御を行ったりする。 The imaging device 101 uses the score map that has been corrected by the score map calculation unit 109 to perform white balance control in the image processing unit 107 and to perform exposure control on the visible light image sensor 104 via the control unit 106. control.

具体的に、ホワイトバランス制御では、「空」の属性を表すスコアマップを用いて、地面の赤みが強くなるのを防ぐ。また、空ではない雪景色では青みが残りやすいので、日陰と空を区別するために、「空」の属性を表すスコアマップが用いられる。また、「草芝」の属性を表すスコアマップを用いて、水銀灯光源下のシーンと草芝のシーンとを見分けた適切なホワイトバランス制御が行われる。 Specifically, white balance control uses a score map representing the attribute of "sky" to prevent the ground from becoming too reddish. Furthermore, since a snowy scene that is not a sky tends to have a bluish tint, a score map representing the attribute of "sky" is used to distinguish between shade and sky. In addition, appropriate white balance control is performed that distinguishes between a scene under a mercury lamp light source and a scene with grass and grass using a score map representing the attribute of "grass and grass."

露出制御では、顔の領域を用いて顔が適正露出になるように制御するが、顔の領域に髪の毛やマスク等が含まれている場合、適切に露出できないという課題がある。そこで、「肌」の属性を表すスコアマップを用いて、顔領域における肌領域のみを識別して肌領域が適正露出になるような制御が行われる。 In exposure control, the face area is used to control the face so that it is properly exposed. However, if the face area includes hair, a mask, etc., there is a problem in that appropriate exposure cannot be achieved. Therefore, a score map representing the attribute of "skin" is used to identify only the skin area in the face area, and control is performed so that the skin area is properly exposed.

メモリ１１０は、不揮発性メモリとＲＡＭ等で構成される。不揮発性メモリは、制御部１０６の処理手順（制御プログラム）や、各種パラメータを記憶する。ＲＡＭは、制御部１０６のワークエリアとして使用され、画像処理を行うための記憶領域としても使用される。 The memory 110 is composed of nonvolatile memory, RAM, and the like. The nonvolatile memory stores processing procedures (control programs) of the control unit 106 and various parameters. The RAM is used as a work area for the control unit 106, and is also used as a storage area for image processing.

なお、本実施の形態では、制御部１０６は、画像処理部１０７、図示しない圧縮伸長部を含む構成であっても良い。これらのブロックが有する処理機能は、例えば、ＣＰＵがメモリ１１０に記憶されているプログラムを実行することにより実現することができる。或いは、制御部１０６を構成する専用演算回路により実現されても良い。 Note that in this embodiment, the control unit 106 may include an image processing unit 107 and a compression/expansion unit (not shown). The processing functions of these blocks can be realized, for example, by the CPU executing a program stored in the memory 110. Alternatively, it may be realized by a dedicated arithmetic circuit that constitutes the control unit 106.

制御部１０６は更に、図示しない圧縮伸長部にて圧縮画像を生成しても良い。圧縮伸長部は、静止画圧縮及び動画像圧縮を実行する。画像圧縮方式は、例えば、Ｈ．２６４、Ｈ．２６５、ＭＰＥＧ、ＪＰＥＧ等の規格に基づく圧縮方式である。なお、圧縮伸長部は、ｍｐ４やａｖｉ形式等の任意のデータ形式の画像を生成しても良い。圧縮伸長部によって生成された圧縮画像は、メモリ１１０や、撮像装置１０１に装着された不図示の記録媒体等に記録される。また、圧縮画像は、圧縮伸長部によって伸長処理が行われ、伸長処理によって得られた画像が表示部１１１に表示される。表示部１１１は、画像の表示以外に、ユーザ（操作者）に対し、ユーザインタフェース（ＵＩ）表示を行う。 The control unit 106 may further generate a compressed image using a compression/expansion unit (not shown). The compression/expansion section executes still image compression and moving image compression. The image compression method is, for example, H. 264, H. This compression method is based on standards such as H.265, MPEG, and JPEG. Note that the compression/decompression unit may generate an image in any data format such as mp4 or avi format. The compressed image generated by the compression/expansion unit is recorded in the memory 110 or a recording medium (not shown) attached to the imaging device 101. Further, the compressed image is subjected to decompression processing by a compression/expansion section, and the image obtained by the decompression processing is displayed on the display section 111. In addition to displaying images, the display unit 111 displays a user interface (UI) to the user (operator).

図２は、図１の機械学習処理部１０８の構成を概略的に示すブロック図である。図２において、機械学習処理部１０８は、ニューラルネットワーク２０１、連結特徴生成部２０２、及び属性判定部２０３を備える。 FIG. 2 is a block diagram schematically showing the configuration of the machine learning processing unit 108 in FIG. 1. In FIG. 2, the machine learning processing unit 108 includes a neural network 201, a connected feature generation unit 202, and an attribute determination unit 203.

ニューラルネットワーク２０１は、第１層～第ｎ層（ｎは２以上の自然数）を有する階層型のニューラルネットワークであり、入力画像である可視光画像を処理する。連結特徴生成部２０２は、ニューラルネットワーク２０１における規定の層の出力（特徴マップ）を連結することで連結階層特徴を生成する特徴生成を行う。属性判定部２０３は、３つの尤度判定部２０３ａ～２０３ｃを備え、入力画像の領域の属性を表すスコアマップを生成する。尤度判定部２０３ａ～２０３ｃは、それぞれ連結特徴生成部２０２によって生成された連結階層特徴を用いて、対応する属性のスコアマップを生成する。尤度判定部２０３ａは、「空」の属性の尤度スコア（以下、「空尤度スコア」という。）をマップ化した空尤度スコアマップを生成する。尤度判定部２０３ｂは、「草芝」の属性の尤度スコア（以下、「草芝尤度スコア」という。）をマップ化した草芝尤度スコアマップを生成する。尤度判定部２０３ｃは、「肌」の属性の尤度スコア（以下、「肌尤度スコア」という。）をマップ化した肌尤度スコアマップを生成する。 The neural network 201 is a hierarchical neural network having first to nth layers (n is a natural number of 2 or more), and processes a visible light image that is an input image. The connected feature generation unit 202 performs feature generation to generate connected hierarchical features by connecting the outputs (feature maps) of prescribed layers in the neural network 201. The attribute determination unit 203 includes three likelihood determination units 203a to 203c, and generates a score map representing the attributes of the region of the input image. The likelihood determining units 203a to 203c each use the connected hierarchical features generated by the connected feature generating unit 202 to generate a score map of the corresponding attribute. The likelihood determination unit 203a generates an empty likelihood score map that maps the likelihood score of the attribute of “empty” (hereinafter referred to as “empty likelihood score”). The likelihood determination unit 203b generates a grass-grass likelihood score map that is a map of the likelihood score of the attribute "grass-grass" (hereinafter referred to as "grass-grass likelihood score"). The likelihood determination unit 203c generates a skin likelihood score map that maps the likelihood score of the attribute "skin" (hereinafter referred to as "skin likelihood score").

図３は、図１の機械学習処理部１０８によって行われるスコアマップ算出処理の手順を示すフローチャートである。 FIG. 3 is a flowchart showing the procedure of score map calculation processing performed by the machine learning processing unit 108 of FIG. 1.

図３において、まず、ステップＳ３０１では、機械学習処理部１０８は、ニューラルネットワーク２０１に可視光画像を入力する。本実施の形態では、一例として、図４に示す可視光画像を入力した場合について説明する。この可視光画像には、木の領域や、人物の領域が含まれている。また、可視光画像４０１には、背景領域として、草や芝のある山の領域や、空の領域が含まれている。ニューラルネットワーク２０１は、ステップＳ３０１にて入力された可視光画像を処理する。 In FIG. 3, first, in step S301, the machine learning processing unit 108 inputs a visible light image to the neural network 201. In this embodiment, as an example, a case will be described in which a visible light image shown in FIG. 4 is input. This visible light image includes a tree area and a person area. Furthermore, the visible light image 401 includes a mountain area with grass and turf, and a sky area as a background area. The neural network 201 processes the visible light image input in step S301.

次いで、ステップＳ３０２では、連結特徴生成部２０２は、ニューラルネットワーク２０１の処理結果を特徴として抽出する。具体的に、連結特徴生成部２０２は、ニューラルネットワーク２０１における規定の層の出力（特徴マップ）を抽出する。次いで、ステップＳ３０３では、連結特徴生成部２０２は、ステップＳ３０２にて抽出された規定の層の出力を連結することで連結階層特徴を生成する。次いで、ステップＳ３０４では、属性判定部２０３は、連結階層特徴を用いて、属性毎にスコアマップを生成する。具体的に、属性判定部２０３における尤度判定部２０３ａは、図５に示すように、空尤度スコアをマップ化した空尤度スコアマップを生成する。尤度判定部２０３ｂは、図５に示すように、草芝尤度スコアをマップ化した草芝尤度スコアマップを生成する。尤度判定部２０３ｃは、図５に示すように、肌尤度スコアをマップ化した肌尤度スコアマップを生成する。図５では、各属性の尤度スコアが最も大きい領域ブロックを白色とし、尤度スコアが最も小さい領域ブロックを黒色として、尤度スコアに応じて各領域ブロックがグレーの階調で表されている。その後、本処理は終了する。 Next, in step S302, the connected feature generation unit 202 extracts the processing result of the neural network 201 as a feature. Specifically, the connected feature generation unit 202 extracts the output (feature map) of a prescribed layer in the neural network 201. Next, in step S303, the connected feature generation unit 202 generates a connected hierarchical feature by connecting the outputs of the prescribed layers extracted in step S302. Next, in step S304, the attribute determination unit 203 generates a score map for each attribute using the connected hierarchical features. Specifically, the likelihood determination unit 203a in the attribute determination unit 203 generates a null likelihood score map in which the null likelihood scores are mapped, as shown in FIG. The likelihood determination unit 203b generates a grass and grass likelihood score map that maps the grass and grass likelihood scores, as shown in FIG. As shown in FIG. 5, the likelihood determination unit 203c generates a skin likelihood score map in which the skin likelihood scores are mapped. In Figure 5, the area block with the highest likelihood score for each attribute is shown in white, the area block with the smallest likelihood score is shown in black, and each area block is represented in gray gradations according to the likelihood score. . After that, this process ends.

ここで、機械学習処理部１０８では、機械学習が行われていないホワイトバランスや露出状態において、正確なスコアマップを算出できない懸念がある。 Here, there is a concern that the machine learning processing unit 108 may not be able to calculate an accurate score map in white balance and exposure conditions where machine learning is not performed.

これに対し、本実施の形態では、非可視光撮像素子１０５によって得られた非可視光画像に基づいて尤度スコアマップの補正が行われる。 In contrast, in the present embodiment, the likelihood score map is corrected based on the invisible light image obtained by the invisible light image sensor 105.

図６は、図１の撮像装置１０１によって用いられる画像の一例を示す図である。図６（ａ）は可視光画像の一例を示す。図６（ｂ）は非可視光画像の一例を示す。本実施の形態では、非可視光画像は、波長が７５０ｎｍから２５００ｎｍまでの近赤外光の画像である。近赤外光下では、空は赤外線が少なく、また、草や芝等は赤外線を多く反射する。このため、非可視光画像において、空の領域は暗く（黒色）なり、草芝の領域は明るく（白色）なるという特徴がある。また、人間が着ている洋服において、コットン、ナイロン、ウール等の素材は赤外線を多く反射するため、非可視光画像において、これらの素材の洋服の領域はやや明るくなるという特徴がある。一方、人間の肌は、草、芝、コットン、ナイロン、ウール等程多くの赤外線を反射しないが、或る程度の赤外線を反射する。このため、非可視光画像において、肌の領域は、黒色と白色の中間程度（グレー色）になるという特徴がある。 FIG. 6 is a diagram illustrating an example of an image used by the imaging device 101 of FIG. 1. FIG. 6(a) shows an example of a visible light image. FIG. 6(b) shows an example of a non-visible light image. In this embodiment, the non-visible light image is an image of near-infrared light having a wavelength of 750 nm to 2500 nm. Under near-infrared light, the sky emits less infrared rays, and grass and grass reflect more infrared rays. Therefore, in a non-visible light image, the sky area is dark (black) and the grass area is bright (white). Furthermore, in clothing worn by humans, materials such as cotton, nylon, and wool reflect a large amount of infrared rays, so in non-visible light images, regions of clothing made of these materials are slightly brighter. On the other hand, human skin does not reflect as much infrared rays as grass, grass, cotton, nylon, wool, etc., but it does reflect some infrared rays. Therefore, in a non-visible light image, the skin area has a characteristic that the color is intermediate between black and white (gray color).

次に、スコアマップ算出部１０９によって行われる非可視光画像を用いた尤度スコアマップの補正について説明する。 Next, the correction of the likelihood score map using the non-visible light image performed by the score map calculation unit 109 will be described.

図７は、図１のスコアマップ算出部１０９によって行われる空尤度スコアマップの補正を説明するための図である。本実施の形態では、非可視光画像と、尤度判定部２０３ａによって生成された空尤度スコアマップとがスコアマップ算出部１０９に入力される。なお、空尤度スコアマップ及び非可視光画像は、同じ画素数であり、１画素あたり８ｂｉｔ（０～２５５）の階調（レベル）を持っているものとする。尤度判定部２０３ａによって生成された空尤度スコアマップは、正確な尤度スコアマップではなく、図７に示すように、空の領域以外の一部の領域で、空尤度スコア（画素値）が大きくなっている。例えば、洋服の領域において空尤度スコアが非常に大きくなっており（画素値（１９２））、また、草芝の領域において空尤度スコアが僅かに大きくなっている（画素値（６４））。 FIG. 7 is a diagram for explaining the correction of the null likelihood score map performed by the score map calculation unit 109 of FIG. 1. In this embodiment, the invisible light image and the empty likelihood score map generated by the likelihood determination unit 203a are input to the score map calculation unit 109. Note that the sky likelihood score map and the invisible light image have the same number of pixels, and each pixel has a gradation (level) of 8 bits (0 to 255). The sky likelihood score map generated by the likelihood determination unit 203a is not an accurate likelihood score map, but as shown in FIG. ) is getting larger. For example, the empty likelihood score is extremely large in the clothing area (pixel value (192)), and the empty likelihood score is slightly large in the grass area (pixel value (64)). .

まず、スコアマップ算出部１０９は、空尤度スコアマップ及び非可視光画像を取得すると、空尤度スコアマップに対して反転処理を行う。具体的には、スコアマップ算出部１０９は、画素値（０）を画素値（２５５）に変換し、画素値（２５５）を画素値（０）に変換するといった反転処理を行う。 First, upon acquiring the sky likelihood score map and the non-visible light image, the score map calculation unit 109 performs an inversion process on the sky likelihood score map. Specifically, the score map calculation unit 109 performs inversion processing such as converting a pixel value (0) to a pixel value (255) and converting a pixel value (255) to a pixel value (0).

次いで、スコアマップ算出部１０９は、反転処理済みの空尤度スコアマップと非可視光画像とで対応する画素の画素値を比較し、大きい方の画素値を出力する大値選択処理を行う。例えば、尤度判定部２０３ａによって生成された空尤度スコアマップにおいて空尤度スコア（画素値）が非常に大きい洋服の領域について、反転処理済みの空尤度スコアマップでは、上述した反転処理によってこの領域の画素値が非常に小さくなる。このため、洋服の領域では、反転処理済みの空尤度スコアマップより、非可視光画像の方が画素値が大きくなり、非可視光画像の画素値が出力される。草芝の領域でも同様に、反転処理済みの空尤度スコアマップより、非可視光画像の方が画素値が大きくなるため、非可視光画像の画素値が出力される。 Next, the score map calculation unit 109 compares the pixel values of corresponding pixels in the inverted likelihood score map and the non-visible light image, and performs large value selection processing to output the larger pixel value. For example, regarding a region of clothing in which the void likelihood score (pixel value) is extremely large in the void likelihood score map generated by the likelihood determination unit 203a, in the void likelihood score map that has been inverted, the void likelihood score map generated by the likelihood determination unit 203a is The pixel value in this area becomes very small. Therefore, in the area of clothes, the pixel value of the non-visible light image is larger than that of the empty likelihood score map that has undergone inversion processing, and the pixel value of the non-visible light image is output. Similarly, in the grass area, the pixel values of the non-visible light image are larger than those of the inverted sky likelihood score map, so the pixel values of the non-visible light image are output.

次いで、スコアマップ算出部１０９は、大値選択処理によって出力された画素値を反転出力する。このような処理を、スコアマップ算出部１０９は、空尤度スコアマップを構成する全ての画素に対して行う。これにより、尤度判定部２０３ａによって生成された空尤度スコアマップにおいて、空の領域以外で空尤度スコアが大きくなっていた草芝の領域や洋服の領域の空尤度スコアを小さく補正することができ、空尤度スコアマップの精度を向上することができる。 Next, the score map calculation unit 109 inverts and outputs the pixel values output by the large value selection process. The score map calculation unit 109 performs such processing on all pixels forming the empty likelihood score map. As a result, in the empty likelihood score map generated by the likelihood determination unit 203a, the empty likelihood scores of grass areas and clothing areas, which had large empty likelihood scores other than sky areas, are corrected to be smaller. The accuracy of the empty likelihood score map can be improved.

図８は、図１のスコアマップ算出部１０９によって行われる草芝尤度スコアマップの補正を説明するための図である。本実施の形態では、非可視光画像と、尤度判定部２０３ｂによって生成された草芝尤度スコアマップとが、スコアマップ算出部１０９に入力される。なお、草芝尤度スコアマップ及び非可視光画像は、同じ画素数であり、１画素あたり８ｂｉｔ（０～２５５）の階調（レベル）を持っているものとする。尤度判定部２０３ｂによって生成された草芝尤度スコアマップは、正確な尤度スコアマップではなく、図８に示すように、草芝の領域以外の一部の領域で、草芝尤度スコアが大きくなっている。例えば、人物の顔の領域において草芝尤度スコアが非常に大きくなっており（画素値（２５５））、また、空の領域において草芝尤度スコアが僅かに大きくなっている（画素値（３２））。 FIG. 8 is a diagram for explaining the correction of the grass and turf likelihood score map performed by the score map calculation unit 109 of FIG. 1. In the present embodiment, the invisible light image and the grass likelihood score map generated by the likelihood determination unit 203b are input to the score map calculation unit 109. It is assumed that the grass likelihood score map and the invisible light image have the same number of pixels, and each pixel has a gradation (level) of 8 bits (0 to 255). The grass and grass likelihood score map generated by the likelihood determination unit 203b is not an accurate likelihood score map, but as shown in FIG. is getting bigger. For example, the grass-grass likelihood score is extremely large in the human face region (pixel value (255)), and the grass-grass likelihood score is slightly large in the sky region (pixel value (255)). 32)).

まず、スコアマップ算出部１０９は、草芝尤度スコアマップ及び非可視光画像を取得すると、草芝尤度スコアマップ及び非可視光画像に対してそれぞれ上述した反転処理を行う。次いで、スコアマップ算出部１０９は、反転処理済みの草芝尤度スコアマップと反転処理済みの非可視光画像とで対応する画素の画素値を比較し、大きい方の画素値を出力する大値選択処理を行う。 First, upon acquiring the grass and turf likelihood score map and the non-visible light image, the score map calculation unit 109 performs the above-described inversion process on the grass and turf likelihood score map and the non-visible light image, respectively. Next, the score map calculation unit 109 compares the pixel values of corresponding pixels in the inverted grass likelihood score map and the inverted invisible light image, and outputs the larger pixel value. Perform selection processing.

例えば、尤度判定部２０３ｂによって生成された草芝尤度スコアマップにおいて草芝尤度スコア（画素値）が非常に大きい人物の顔の領域について、反転処理済みの草芝尤度スコアマップでは、上述した反転処理によってこの領域の画素値が非常に小さくなる（画素値（０））。このため、人物の顔の領域では、反転処理済みの草芝尤度スコアマップより、反転処理済みの非可視光画像の方が画素値が大きくなり、反転処理済みの非可視光画像の画素値が出力される。空の領域でも同様に、反転処理済みの草芝尤度スコアマップより、反転処理済みの非可視光画像の方が画素値が大きくなるため、非可視光画像の画素値が出力される。次いで、スコアマップ算出部１０９は、大値選択処理によって出力された画素値を反転出力する。このような処理を、スコアマップ算出部１０９は、草芝尤度スコアマップを構成する全ての画素に対して行う。これにより、尤度判定部２０３ｂによって生成された草芝尤度スコアマップにおいて、草芝の領域以外で草芝尤度スコアが大きくなっていた人物の顔の領域や空の領域の草芝尤度スコアを小さく補正することができ、草芝尤度スコアマップの精度を向上することができる。 For example, for a region of a person's face that has a very large Kusashiba likelihood score (pixel value) in the Kusashiba likelihood score map generated by the likelihood determination unit 203b, in the Kusashiba likelihood score map that has undergone inversion processing, The above-mentioned inversion process makes the pixel value of this area extremely small (pixel value (0)). Therefore, in the area of a person's face, the pixel value of the inverted invisible light image is larger than that of the inverted grass likelihood score map, and the pixel value of the inverted invisible light image is is output. Similarly, in the sky region, the pixel values of the inverted invisible light image are larger than those of the inverted grass and turf likelihood score map, so the pixel values of the invisible light image are output. Next, the score map calculation unit 109 inverts and outputs the pixel values output by the large value selection process. The score map calculation unit 109 performs such processing on all pixels forming the grass and turf likelihood score map. As a result, in the grass and grass likelihood score map generated by the likelihood determination unit 203b, the grass and grass likelihood score of the person's face area and the sky area where the grass and grass likelihood score is large in areas other than the grass and grass area is determined. The score can be corrected to a smaller value, and the accuracy of the grass and grass likelihood score map can be improved.

図９は、図１のスコアマップ算出部１０９によって行われる肌尤度スコアマップの補正を説明するための図である。本実施の形態では、非可視光画像と、尤度判定部２０３ｃによって生成された肌尤度スコアマップとが、スコアマップ算出部１０９に入力される。なお、肌尤度スコアマップ及び非可視光画像は、同じ画素数であり、１画素あたり８ｂｉｔ（０～２５５）の階調（レベル）を持っているものとする。尤度判定部２０３ｃによって生成された肌尤度スコアマップは、正確な尤度スコアマップではなく、図９に示すように、肌の領域以外の一部の領域で、肌尤度スコアが大きくなっている。例えば、洋服の領域において肌尤度スコアが非常に大きくなっており（画素値（１２８））、また、空の領域や草芝の領域において肌尤度スコアが僅かに大きくなっている（画素値（３２））。 FIG. 9 is a diagram for explaining the correction of the skin likelihood score map performed by the score map calculation unit 109 of FIG. 1. In the present embodiment, the invisible light image and the skin likelihood score map generated by likelihood determination section 203c are input to score map calculation section 109. It is assumed that the skin likelihood score map and the invisible light image have the same number of pixels, and each pixel has a gradation (level) of 8 bits (0 to 255). The skin likelihood score map generated by the likelihood determination unit 203c is not an accurate likelihood score map, but as shown in FIG. 9, the skin likelihood score is large in some areas other than the skin area. ing. For example, the skin likelihood score is very large in the clothing area (pixel value (128)), and the skin likelihood score is slightly large in the sky area and grass area (pixel value (128)). (32)).

まず、スコアマップ算出部１０９は、肌尤度スコアマップ及び非可視光画像を取得すると、肌尤度スコアマップに対して上述した反転処理を行う。また、スコアマップ算出部１０９は、非可視光画像の各画素において、予め決められた所定値、例えば、１２８から画素値を差し引く減算処理を行う。この所定値は、例えば、非可視光画像において肌の領域は黒色と白色の中間程度（グレー色）になるという特徴から決定された値であり、画素値における略中間の値に相当する値である。なお、減算処理において算出した結果が負の値になった場合、算出した結果を０に置き換えるクリップ処理を行う。つまり、本実施の形態では、非可視光画像において画素値が上記所定値以上となる領域、例えば、草芝の領域や洋服の領域について、これらの処理によって算出した結果として、０が得られる。次いで、スコアマップ算出部１０９は、これらの処理によって算出した結果に対して上述した反転処理を行って減算処理済みの非可視光画像を生成する。次いで、スコアマップ算出部１０９は、反転処理済みの肌尤度スコアマップと減算処理済みの非可視光画像とで対応する画素の画素値を比較し、大きい方の画素値を出力する大値選択処理を行う。 First, upon acquiring the skin likelihood score map and the non-visible light image, the score map calculation unit 109 performs the above-described inversion process on the skin likelihood score map. Further, the score map calculation unit 109 performs a subtraction process of subtracting a pixel value from a predetermined value, for example, 128, for each pixel of the invisible light image. This predetermined value is, for example, a value determined based on the characteristic that the skin area in a non-visible light image has a color somewhere between black and white (gray color), and is a value corresponding to approximately the middle value in pixel values. be. Note that if the result calculated in the subtraction process is a negative value, a clip process is performed to replace the calculated result with 0. That is, in this embodiment, 0 is obtained as a result of calculation by these processes for an area where the pixel value is equal to or greater than the predetermined value in the non-visible light image, for example, an area of grass or an area of clothes. Next, the score map calculation unit 109 performs the above-described inversion process on the results calculated by these processes to generate a subtracted invisible light image. Next, the score map calculation unit 109 compares the pixel values of corresponding pixels in the inversion-processed skin likelihood score map and the subtraction-processed invisible light image, and selects a large value to output the larger pixel value. Perform processing.

例えば、尤度判定部２０３ｃによって生成された肌尤度スコアマップにおいて肌尤度スコア（画素値）が非常に大きい洋服の領域について、反転処理済みの肌尤度スコアマップでは、上述した反転処理によってこの領域の画素値が１２７となる。一方、減算処理済みの非可視光画像では、上述した減算処理、クリップ処理によって得られた０に対して反転処理が行われ、上記領域の画素値が２５５となる。このように、洋服の領域では、反転処理済みの肌尤度スコアマップより、減算処理済みの非可視光画像の方が画素値が大きくなるため、非可視光画像の画素値が出力される。また、草芝の領域でも同様に、反転処理済みの肌尤度スコアマップより、減算処理済みの非可視光画像の方が画素値が大きくなるため、非可視光画像の画素値が出力される。次いで、スコアマップ算出部１０９は、大値選択処理によって出力された画素値を反転出力する。このような処理を、スコアマップ算出部１０９は、肌尤度スコアマップを構成する全ての画素に対して行う。これにより、尤度判定部２０３ｃによって生成された肌尤度スコアマップにおいて、肌の領域以外で肌尤度スコアが大きくなっていた草芝の領域や洋服の領域等の肌尤度スコアを小さく補正することができ、肌尤度スコアマップの精度を向上することができる。 For example, regarding a region of clothing in which the skin likelihood score (pixel value) is very large in the skin likelihood score map generated by the likelihood determination unit 203c, in the skin likelihood score map that has been inverted, the skin likelihood score map generated by the likelihood determination unit 203c is The pixel value of this area is 127. On the other hand, in the subtracted invisible light image, inversion processing is performed on the 0 obtained by the above-described subtraction processing and clipping processing, and the pixel value of the above region becomes 255. In this manner, in the clothing region, the subtraction-processed non-visible light image has a larger pixel value than the inversion-processed skin likelihood score map, so the pixel values of the non-visible light image are output. Similarly, in the grass area, the pixel values of the subtracted invisible light image are larger than the inverted skin likelihood score map, so the pixel values of the invisible light image are output. . Next, the score map calculation unit 109 inverts and outputs the pixel values output by the large value selection process. The score map calculation unit 109 performs such processing on all pixels forming the skin likelihood score map. As a result, in the skin likelihood score map generated by the likelihood determination unit 203c, the skin likelihood score of areas other than skin areas where the skin likelihood score was large, such as areas of grass and clothes, is corrected to be smaller. The accuracy of the skin likelihood score map can be improved.

図１０は、図１の撮像装置１０１によって実行される制御処理の手順を示すフローチャートである。図１０の制御処理は、制御部１０６がメモリ１１０に記憶されているプログラムを実行することにより実現することができる。 FIG. 10 is a flowchart showing the procedure of control processing executed by the imaging apparatus 101 of FIG. 1. The control process in FIG. 10 can be realized by the control unit 106 executing a program stored in the memory 110.

図１０において、まず、ステップＳ１００１では、制御部１０６は、可視光撮像素子１０４を駆動させる。これにより、可視光撮像素子１０４は、可視光を受光して画像信号を生成する。この画像信号に基づいて可視光画像が生成される。次いで、ステップＳ１００２では、制御部１０６は、画像処理部１０７に対し、可視光画像の画像処理の実行を指示する。この指示を受けた画像処理部１０７は、可視光画像に対して各種画像処理を施す。 In FIG. 10, first, in step S1001, the control unit 106 drives the visible light image sensor 104. Thereby, the visible light image sensor 104 receives visible light and generates an image signal. A visible light image is generated based on this image signal. Next, in step S1002, the control unit 106 instructs the image processing unit 107 to execute image processing of the visible light image. The image processing unit 107 that receives this instruction performs various image processing on the visible light image.

次いで、ステップＳ１００３では、制御部１０６は、可視光画像がホワイトバランス用のサンプリングフレームの画像であるか否かを判別する。例えば、ＥＶＦのフレームレートが１２０［ｆｐｓ］で動作する時、ホワイトバランスを毎フレーム切り替えると、画面のちらつきが発生し、視認性が悪くなってしまう。このため、本実施の形態では、３０［ｆｐｓ］等の低フレームレートでホワイトバランス用にサンプリングが行われる。 Next, in step S1003, the control unit 106 determines whether the visible light image is an image of a sampling frame for white balance. For example, when the EVF operates at a frame rate of 120 [fps], if the white balance is changed every frame, the screen will flicker, resulting in poor visibility. Therefore, in this embodiment, sampling for white balance is performed at a low frame rate such as 30 [fps].

ステップＳ１００３において、可視光画像がホワイトバランス用のサンプリングフレームの画像であると判別された場合、処理はステップＳ１００４へ進む。ステップＳ１００４では、制御部１０６は、非可視光撮像素子１０５を駆動させる。これにより、非可視光撮像素子１０５は、非可視光である赤外光を受光して画像信号を生成する。この画像信号に基づいて非可視光画像が生成される。 If it is determined in step S1003 that the visible light image is an image of a sampling frame for white balance, the process advances to step S1004. In step S1004, the control unit 106 drives the invisible light image sensor 105. Thereby, the invisible light image sensor 105 receives infrared light, which is invisible light, and generates an image signal. A non-visible light image is generated based on this image signal.

次いで、ステップＳ１００５において、制御部１０６は、画像処理部１０７に対し、非可視光画像の画像処理の実行を指示する。この指示を受けた画像処理部１０７は、非可視光画像に対して各種画像処理を施す。次いで、ステップＳ１００６において、制御部１０６は、機械学習処理部１０８に対し、上述した尤度スコアマップの生成を指示する。この指示を受けた機械学習処理部１０８は、可視光画像を入力画像として、上述した尤度スコアマップを生成する。次いで、ステップＳ１００７において、制御部１０６は、スコアマップ算出部１０９に対し、機械学習処理部１０８によって生成された尤度スコアマップの補正を指示する。この指示を受けたスコアマップ算出部１０９は、上述したように、非可視光画像を用いて、機械学習処理部１０８によって生成された尤度スコアマップの補正を行う。次いで、ステップＳ１００８において、制御部１０６は、撮影を終了するか否かを判別する。ステップＳ１００８では、例えば、ユーザから撮影の終了指示となる所定の操作を受け付けた場合、制御部１０６は、撮影を終了すると判別する。一方、上述した所定の操作を受け付けない場合、制御部１０６は、撮影を終了しないと判別する。 Next, in step S1005, the control unit 106 instructs the image processing unit 107 to execute image processing of the invisible light image. Upon receiving this instruction, the image processing unit 107 performs various image processing on the non-visible light image. Next, in step S1006, the control unit 106 instructs the machine learning processing unit 108 to generate the above-described likelihood score map. The machine learning processing unit 108 that receives this instruction generates the above-mentioned likelihood score map using the visible light image as an input image. Next, in step S1007, the control unit 106 instructs the score map calculation unit 109 to correct the likelihood score map generated by the machine learning processing unit 108. Upon receiving this instruction, the score map calculation unit 109 uses the non-visible light image to correct the likelihood score map generated by the machine learning processing unit 108, as described above. Next, in step S1008, the control unit 106 determines whether to end imaging. In step S1008, for example, when receiving a predetermined operation from the user that instructs to end imaging, the control unit 106 determines to end imaging. On the other hand, if the above-described predetermined operation is not accepted, the control unit 106 determines that the photographing is not to be completed.

ステップＳ１００８において、撮影を終了しないと判別された場合、処理はステップＳ１００１へ戻る。ステップＳ１００８において、撮影を終了すると判別された場合、本処理は終了する。 If it is determined in step S1008 that photographing is not to be completed, the process returns to step S1001. If it is determined in step S1008 that the photographing is to be completed, this processing ends.

ステップＳ１００３において、可視光画像がホワイトバランス用のサンプリングフレームの画像でないと判別された場合、非可視光撮像素子１０５を駆動させる制御は行われず、処理はステップＳ１００９へ進む。このように本実施の形態では、尤度スコアマップの精度を向上させたいフレームに対してのみ、非可視光撮像素子１０５を駆動させるように制御することで、撮像装置１０１の電力の消費を抑えることができる。 If it is determined in step S1003 that the visible light image is not an image of a sampling frame for white balance, control to drive the invisible light image sensor 105 is not performed, and the process advances to step S1009. In this manner, in this embodiment, the power consumption of the imaging device 101 is suppressed by controlling the invisible light imaging device 105 to be driven only for frames for which the precision of the likelihood score map is desired to be improved. be able to.

ステップＳ１００９では、制御部１０６は、機械学習処理部１０８に対し、顔位置を把握するためのスコアマップ（不図示）の生成を指示する。この指示を受けた機械学習処理部１０８は、取得した可視光画像に基づいて、顔位置を把握するためのスコアマップを生成する。顔位置を把握するためのスコアマップは、撮影時の被写体の追跡に用いられる。次いで、処理は後述するステップＳ１００８へ進む。 In step S1009, the control unit 106 instructs the machine learning processing unit 108 to generate a score map (not shown) for understanding the face position. Upon receiving this instruction, the machine learning processing unit 108 generates a score map for understanding the face position based on the acquired visible light image. A score map for determining the face position is used to track the subject when photographing. Next, the process advances to step S1008, which will be described later.

上述した実施の形態によれば、非可視光撮像素子１０５によって得られた非可視光画像に基づいて尤度スコアマップの補正が行われる。これにより、ニューラルネットワーク２０１を用いて出力される尤度スコアマップの精度を向上させることができる。 According to the embodiment described above, the likelihood score map is corrected based on the invisible light image obtained by the invisible light image sensor 105. Thereby, the accuracy of the likelihood score map output using the neural network 201 can be improved.

なお、上述した実施の形態では、入力画像となる可視光画像がホワイトバランス用のサンプリングフレームの画像である場合に、非可視光撮像素子１０５を駆動させる構成について説明したが、本発明はこの構成に限られない。例えば、入力画像となる可視光画像が露出制御やシーン認識制御（風景、夜景等）といった尤度スコアマップを使用する所定の制御に用いられるフレームの画像である場合に、非可視光撮像素子１０５を駆動させるように制御しても良い。このように制御することで、撮像装置１０１の電力の消費を最小限に抑えつつ、露出制御やシーン認識制御に使用される尤度スコアマップの精度を向上することができる。 Note that in the embodiment described above, a configuration was described in which the invisible light image sensor 105 is driven when the visible light image serving as the input image is an image of a sampling frame for white balance. Not limited to. For example, when the visible light image serving as the input image is an image of a frame used for a predetermined control using a likelihood score map such as exposure control or scene recognition control (landscape, night scene, etc.), the invisible light image sensor 105 It may also be controlled to drive. By controlling in this way, it is possible to minimize the power consumption of the imaging device 101 and improve the accuracy of the likelihood score map used for exposure control and scene recognition control.

また、上述した実施の形態では、スコアマップ算出部１０９は、上述した図７～図９に示す方法と異なる方法で、尤度スコアマップの補正を行っても良い。 Furthermore, in the embodiment described above, the score map calculation unit 109 may correct the likelihood score map using a method different from the method shown in FIGS. 7 to 9 described above.

図１１は、図１のスコアマップ算出部１０９によって行われる空尤度スコアマップの補正を説明するための図である。なお、空尤度スコアマップ及び非可視光画像は、上述した通り、同じ画素数であり、１画素あたり８ｂｉｔ（０～２５５）の階調（レベル）を持っているものとする。また、図１１では、図７と同様に、尤度判定部２０３ａによって生成された空尤度スコアマップは、正確な尤度スコアマップではなく、空の領域以外の一部の領域で、空尤度スコアが大きくなっている。例えば、洋服の領域において空尤度スコアが非常に大きくなっており（画素値（１９２））、また、草芝の領域において空尤度スコアが僅かに大きくなっている（画素値（６４））。 FIG. 11 is a diagram for explaining the correction of the null likelihood score map performed by the score map calculation unit 109 of FIG. 1. Note that, as described above, the sky likelihood score map and the invisible light image have the same number of pixels, and each pixel has a gradation (level) of 8 bits (0 to 255). In addition, in FIG. 11, similarly to FIG. 7, the empty likelihood score map generated by the likelihood determination unit 203a is not an accurate likelihood score map, but is a partial likelihood score map that is a partial likelihood score map other than the sky area. degree score is increasing. For example, the empty likelihood score is extremely large in the clothing area (pixel value (192)), and the empty likelihood score is slightly large in the grass area (pixel value (64)). .

まず、スコアマップ算出部１０９は、非可視光画像と、予め決められた第１の閾値、例えば、６４との比較を行う。第１の閾値は、非可視光画像において空の領域の画素値として想定される値より大きい値であり、且つ非可視光画像において空の領域以外の領域の画素値として想定される値より小さい値である。非可視光画像では、空の領域は暗い（黒色）、つまり、画素値が最小値（０）に近い値となるため、画素値が第１の閾値より小さい領域は空の領域としての信頼度が高いと言える。そのため、非可視光画像において画素値が第１の閾値より小さい領域は、空の領域としての信頼度が高い領域であると判断され、空尤度スコアマップにおいてこの領域に対応する画素の画素値がそのまま使用される。一方、非可視光画像において画素値が第１の閾値以上である領域は、空の領域としての信頼度が低い領域であると判断され、空尤度スコアマップにおいてこの領域に対応する画素の画素値が０に変換される。このような補正を行うことで、尤度判定部２０３ａによって生成された空尤度スコアマップにおいて、空尤度スコアが大きくなっていた草芝の領域や洋服の領域の空尤度スコアを０に補正することができ、空尤度スコアマップの精度を向上することができる。 First, the score map calculation unit 109 compares the invisible light image with a predetermined first threshold, for example, 64. The first threshold is a value larger than a value expected as a pixel value of a sky region in a non-visible light image, and smaller than a value expected as a pixel value of a region other than the sky region in a non-visible light image. It is a value. In a non-visible light image, the sky region is dark (black), that is, the pixel value is close to the minimum value (0), so the reliability of the region whose pixel value is smaller than the first threshold value as a sky region is high. can be said to be high. Therefore, in the invisible light image, a region whose pixel value is smaller than the first threshold is determined to be a region with high reliability as a sky region, and the pixel value of the pixel corresponding to this region is determined in the sky likelihood score map. is used as is. On the other hand, an area in which the pixel value is equal to or greater than the first threshold in the invisible light image is determined to be an area with low reliability as a sky area, and the pixels corresponding to this area in the sky likelihood score map The value is converted to 0. By performing such correction, in the empty likelihood score map generated by the likelihood determination unit 203a, the empty likelihood scores of the grass areas and clothing areas where the empty likelihood scores were large are reduced to 0. can be corrected, and the accuracy of the empty likelihood score map can be improved.

図１２は、図１のスコアマップ算出部１０９によって行われる草芝尤度スコアマップの補正を説明するための図である。なお、草芝尤度スコアマップ及び非可視光画像は、上述した通り、同じ画素数であり、１画素あたり８ｂｉｔ（０～２５５）の階調（レベル）を持っているものとする。図１２でも、図８と同様に、尤度判定部２０３ｂによって生成された草芝尤度スコアマップは、正確な尤度スコアマップではなく、草芝の領域以外の一部の領域で、草芝尤度スコアが大きくなっている。例えば、人物の顔の領域において草芝尤度スコアが非常に大きくなっており（画素値（２５５））、また、空の領域において草芝尤度スコアが僅かに大きくなっている（画素値（３２））。 FIG. 12 is a diagram for explaining the correction of the grass and turf likelihood score map performed by the score map calculation unit 109 of FIG. 1. Note that, as described above, it is assumed that the grass likelihood score map and the invisible light image have the same number of pixels, and each pixel has a gradation (level) of 8 bits (0 to 255). In FIG. 12 as well, similarly to FIG. 8, the grass and grass likelihood score map generated by the likelihood determination unit 203b is not an accurate likelihood score map, but is based on the grass and grass in some areas other than the grass and grass area. The likelihood score is large. For example, the grass-grass likelihood score is extremely large in the human face region (pixel value (255)), and the grass-grass likelihood score is slightly large in the sky region (pixel value (255)). 32)).

まず、スコアマップ算出部１０９は、非可視光画像と、予め決められた第２の閾値、例えば、９６との比較を行う。第２の閾値は、非可視光画像において草芝の領域の画素値として想定される値より小さい値であり、且つ非可視光画像において草芝の領域以外の領域の画素値として想定される値より大きい値である。非可視光画像では、草芝の領域は明るい（白色）、つまり、画素値が最大値（２５５）に近い値になるため、画素値が第２の閾値より大きい領域は草芝の領域としての信頼度が高いと言える。そのため、非可視光画像において画素値が第２の閾値よりも大きい領域は、草芝の領域としての信頼度が高い領域であると判断され、草芝尤度スコアマップにおいてこの領域に対応する画素の画素値がそのまま使用される。一方、非可視光画像において画素値が第２の閾値以下となる領域は、草芝の領域としての信頼度が低い領域であると判断され、草芝尤度スコアマップにおいてこの領域に対応する画素の画素値が０に変換される。このような補正を行うことで、尤度判定部２０３ｂによって生成された草芝尤度スコアマップにおいて、草芝尤度スコアが大きくなっていた人物の顔の領域や空の領域の草芝尤度スコアを０に補正することができ、草芝尤度スコアマップの精度を向上することができる。 First, the score map calculation unit 109 compares the invisible light image with a predetermined second threshold, for example, 96. The second threshold value is a value smaller than a value assumed as a pixel value of a grass area in a non-visible light image, and a value assumed as a pixel value of an area other than the grass area in a non-visible light image. It is a larger value. In the non-visible light image, the grass area is bright (white), that is, the pixel value is close to the maximum value (255), so the area whose pixel value is larger than the second threshold is considered as a grass area. It can be said that the reliability is high. Therefore, in the non-visible light image, an area whose pixel value is larger than the second threshold is determined to be an area with high reliability as a grass area, and the pixels corresponding to this area in the grass/grass likelihood score map The pixel value of is used as is. On the other hand, an area where the pixel value is less than or equal to the second threshold in the invisible light image is determined to be an area with low reliability as a grass area, and the pixels corresponding to this area in the grass/grass likelihood score map The pixel value of is converted to 0. By performing such correction, in the grass and grass likelihood score map generated by the likelihood determination unit 203b, the grass and grass likelihood of the person's face area and the sky area where the grass and grass likelihood score was large is reduced. The score can be corrected to 0, and the accuracy of the grass-grass likelihood score map can be improved.

図１３は、図１のスコアマップ算出部１０９によって行われる肌尤度スコアマップの補正を説明するための図である。なお、肌尤度スコアマップ及び非可視光画像は、上述した通り、同じ画素数であり、１画素あたり８ｂｉｔ（０～２５５）の階調（レベル）を持っているものとする。図１３でも、図９と同様に、尤度判定部２０３ｃによって生成された肌尤度スコアマップは、正確な尤度スコアマップではなく、肌の領域以外の一部の領域で、肌尤度スコアが大きくなっている。例えば、洋服の領域において肌尤度スコアが非常に大きくなっており（画素値（１２８））、また、空の領域や草芝の領域において肌尤度スコアが僅かに大きくなっている（画素値（３２））。 FIG. 13 is a diagram for explaining the correction of the skin likelihood score map performed by the score map calculation unit 109 of FIG. 1. Note that, as described above, the skin likelihood score map and the invisible light image have the same number of pixels, and each pixel has a gradation (level) of 8 bits (0 to 255). In FIG. 13, similarly to FIG. 9, the skin likelihood score map generated by the likelihood determination unit 203c is not an accurate likelihood score map, but is based on the skin likelihood score map in some areas other than the skin area. is getting bigger. For example, the skin likelihood score is very large in the clothing area (pixel value (128)), and the skin likelihood score is slightly large in the sky area and grass area (pixel value (128)). (32)).

まず、スコアマップ算出部１０９は、非可視光画像と、予め決められた第３の閾値、例えば、１２８との比較を行う。第３の閾値は、非可視光画像において肌の領域の画素値として想定される値より大きい値であり、且つ非可視光画像において肌の領域以外の所定の領域、例えば、洋服の領域や草芝の領域の画素値として想定される値より小さい値である。非可視光画像では、肌の領域はやや暗い（グレー）、つまり、画素値が中間程度の値になるため、画素値が第３の値より小さい領域は肌の領域としての信頼度が高いと言える。そのため、非可視光画像において画素値が第３の閾値より小さい領域は、肌の領域としての信頼度が高いと判断され、肌尤度スコアマップにおいてこの領域に対応する画素の画素値がそのまま使用される。一方、非可視光画像において画素値が第３の閾値以上である領域は、肌の領域としての信頼度が低い領域であると判断され、肌尤度スコアマップにおいてこの領域に対応する画素の画素値が０に変換される。このような補正を行うことで、尤度判定部２０３ｃによって生成された肌尤度スコアマップにおいて、肌尤度スコアが大きくなっていた草芝の領域や洋服の領域等の肌尤度スコアを０に補正することができ、肌尤度スコアマップの精度を向上することができる。 First, the score map calculation unit 109 compares the invisible light image with a predetermined third threshold, for example, 128. The third threshold is a value that is larger than the expected pixel value of the skin area in the invisible light image, and is a value that is larger than the expected pixel value of the skin area in the invisible light image. This value is smaller than the expected pixel value for the grass area. In invisible light images, the skin area is somewhat dark (gray), that is, the pixel value is around the middle, so areas with pixel values smaller than the third value are considered to be highly reliable as skin areas. I can say it. Therefore, in the non-visible light image, a region whose pixel value is smaller than the third threshold is determined to be highly reliable as a skin region, and the pixel value of the pixel corresponding to this region is used as is in the skin likelihood score map. be done. On the other hand, an area in which the pixel value is equal to or higher than the third threshold in the invisible light image is determined to be an area with low reliability as a skin area, and the pixels corresponding to this area in the skin likelihood score map The value is converted to 0. By performing such correction, in the skin likelihood score map generated by the likelihood determination unit 203c, the skin likelihood scores of areas such as grass areas and clothing areas where the skin likelihood scores were large are reduced to 0. can be corrected to improve the accuracy of the skin likelihood score map.

このように上述した実施の形態では、非可視光画像を構成する画素の画素値と予め決められた閾値（第１の閾値、第２の閾値、第３の閾値）とを比較した結果に基づいて、尤度スコアマップにおいて当該画素に対応する画素の画素値を補正するか否かが決定される。これにより、尤度スコアマップの精度を向上することができる。 As described above, in the embodiment described above, based on the result of comparing the pixel values of pixels constituting the invisible light image with predetermined thresholds (first threshold, second threshold, third threshold), Then, it is determined whether or not to correct the pixel value of the pixel corresponding to the pixel in the likelihood score map. Thereby, the accuracy of the likelihood score map can be improved.

また、上述した実施の形態では、閾値は、尤度スコアマップが表す属性毎に異なるので、属性に応じて尤度スコアマップに適切な補正を行うことができる。 Furthermore, in the embodiments described above, the threshold value differs for each attribute represented by the likelihood score map, so it is possible to appropriately correct the likelihood score map depending on the attribute.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記録媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読み出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。また、機能ごとに、プロセッサがプログラムを読み出すことによって実行されるものと、回路をによって実行されるものに分け、これらを組み合わせるようにしてもよい。 (Other examples)
The present invention provides a system or device with a program that implements one or more functions of the embodiments described above via a network or a recording medium, and one or more processors in the computer of the system or device reads and executes the program. This can also be achieved by processing. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions. Further, each function may be divided into those executed by a processor reading a program and those executed by a circuit, and these may be combined.

また、本発明の好ましい実施形態について説明したが、本発明は、これらの実施形態に限定されず、その要旨の範囲内で種々の変形および変更が可能である。 Further, although preferred embodiments of the present invention have been described, the present invention is not limited to these embodiments, and various modifications and changes can be made within the scope of the gist.

本実施の形態の開示は、以下の構成及び方法を含む。 The disclosure of this embodiment includes the following configuration and method.

（構成１）可視光を受光する可視光撮像手段と、非可視光を受光する非可視光撮像手段とを備える撮像装置であって、前記可視光撮像手段によって得られた可視光画像を入力画像として、階層型ニューラルネットワークを用いて前記入力画像の領域の属性を表すスコアマップを生成する生成手段と、前記非可視光撮像手段によって得られた非可視光画像に基づいて前記スコアマップを補正する補正手段とを有することを特徴とする撮像装置。
（構成２）前記補正手段は、前記スコアマップを構成する全ての画素の画素値を反転させた反転処理済みのスコアマップを生成し、前記反転処理済みのスコアマップと前記非可視光画像とで対応する画素の画素値を比較して大きい方の画素値を出力する大値選択処理を行い、前記大値選択処理によって出力された画素値を反転させた画素値を用いて、前記スコアマップを補正することを特徴とする構成１に記載の撮像装置。
（構成３）前記スコアマップは、前記入力画像の領域の空の属性を表す尤度スコアをマップ化した空尤度スコアマップであることを特徴とする構成２に記載の撮像装置。
（構成４）前記補正手段は、前記スコアマップを構成する全ての画素の画素値を反転させた反転処理済みのスコアマップを生成し、前記非可視光画像を構成する全ての画素の画素値を反転させた反転処理済みの非可視光画像を生成し、前記反転処理済みのスコアマップと前記反転処理済みの非可視光画像とで対応する画素の画素値を比較して大きい方の画素値を出力する大値選択処理を行い、前記大値選択処理によって出力された画素値を反転させた画素値を用いて、前記スコアマップを補正することを特徴とする構成１に記載の撮像装置。
（構成５）前記スコアマップは、前記入力画像の領域の草芝の属性を表す尤度スコアをマップ化した草芝尤度スコアマップであることを特徴とする構成４に記載の撮像装置。
（構成６）前記補正手段は、前記スコアマップを構成する全ての画素の画素値を反転させた反転処理済みのスコアマップを生成し、前記非可視光画像を構成する全ての画素の画素値に所定の演算処理を行って得られた値を反転させて演算処理済みの非可視光画像を生成し、前記反転処理済みのスコアマップと前記演算処理済みの非可視光画像とで対応する画素の画素値を比較して大きい方の画素値を出力する大値選択処理を行い、前記大値選択処理によって出力された画素値を反転させた画素値を用いて、前記スコアマップを補正することを特徴とする構成１に記載の撮像装置。
（構成７）前記所定の演算処理は、前記非可視光画像を構成する画素において、予め決められた所定値から当該画素の画素値を減算し、減算して得られた値が正の値である場合には当該値を反転させた値を用い、減算して得られた値が負の値である場合には０を反転させた値を用いて、演算処理済みの非可視光画像を生成する処理であることを特徴とする構成６に記載の撮像装置。
（構成８）前記スコアマップは、前記入力画像の領域の肌の属性を表す尤度スコアをマップ化した肌尤度スコアマップであることを特徴とする構成６又は７に記載の撮像装置。
（構成９）前記補正手段は、前記非可視光画像を構成する画素の画素値と予め決められた閾値とを比較した結果に基づいて、前記スコアマップにおいて当該画素に対応する画素の画素値を補正するか否かを決定することを特徴とする構成１に記載の撮像装置。
（構成１０）前記閾値は、前記スコアマップが表す属性毎に異なることを特徴とする構成９に記載の撮像装置。
（構成１１）前記非可視光撮像手段の駆動を制御する制御手段を更に備え、前記制御手段は、前記入力画像が撮影に関する所定の制御に用いられるフレームの画像である場合に前記非可視光撮像手段を駆動させ、前記入力画像が前記所定の制御に用いられるフレームの画像でない場合に前記非可視光撮像手段を駆動させないように制御することを特徴とする構成１乃至１０の何れか１項に記載の撮像装置。
（構成１２）前記所定の制御は、ホワイトバランス制御、露出制御、又はシーン認識制御であることを特徴とする構成１１に記載の撮像装置。
（構成１３）可視光を受光する可視光撮像手段と、非可視光を受光する非可視光撮像手段とを備える撮像装置の制御方法であって、前記可視光撮像手段によって得られた可視光画像を入力画像として、階層型ニューラルネットワークを用いて前記入力画像の領域の属性を表すスコアマップを生成する生成工程と、前記非可視光撮像手段によって得られた非可視光画像に基づいて前記スコアマップを補正する補正工程とを有することを特徴とする撮像装置の制御方法。 (Structure 1) An imaging device comprising a visible light imaging means for receiving visible light and an invisible light imaging means for receiving non-visible light, wherein the visible light image obtained by the visible light imaging means is used as an input image. generating means for generating a score map representing attributes of a region of the input image using a hierarchical neural network; and correcting the score map based on the invisible light image obtained by the invisible light imaging means. An imaging device comprising: a correction means.
(Structure 2) The correction means generates an inverted score map in which the pixel values of all pixels constituting the score map are inverted, and combines the inverted score map and the invisible light image. A large value selection process is performed to compare the pixel values of corresponding pixels and output the larger pixel value, and the score map is created using a pixel value obtained by inverting the pixel value output by the large value selection process. The imaging device according to configuration 1, wherein the imaging device performs correction.
(Structure 3) The imaging device according to Structure 2, wherein the score map is a sky likelihood score map that maps likelihood scores representing sky attributes of the region of the input image.
(Structure 4) The correction means generates an inverted score map in which the pixel values of all pixels constituting the score map are inverted, and inverts the pixel values of all pixels constituting the invisible light image. Generate an inverted invisible light image that has been inverted, compare the pixel values of corresponding pixels in the inverted score map and the inverted invisible light image, and select the larger pixel value. The imaging device according to configuration 1, wherein a large value selection process to be output is performed, and the score map is corrected using a pixel value obtained by inverting the pixel value outputted by the large value selection process.
(Structure 5) The imaging device according to Structure 4, wherein the score map is a grass likelihood score map that is a map of likelihood scores representing attributes of grass in the area of the input image.
(Structure 6) The correction means generates an inverted score map in which the pixel values of all pixels constituting the score map are inverted, and the correction means generates an inverted score map in which the pixel values of all pixels constituting the invisible light image are inverted. The values obtained by performing predetermined arithmetic processing are inverted to generate an arithmetic-processed invisible light image, and the corresponding pixels are Performing a large value selection process that compares pixel values and outputs the larger pixel value, and correcting the score map using a pixel value obtained by inverting the pixel value output by the large value selection process. The imaging device according to feature 1.
(Structure 7) The predetermined calculation process subtracts the pixel value of the pixel from a predetermined value in the pixels constituting the invisible light image, and the value obtained by subtraction is a positive value. In some cases, the inverted value is used, and if the value obtained by subtraction is a negative value, the inverted value of 0 is used to generate a computationally processed invisible light image. The imaging device according to configuration 6, characterized in that the processing is performed.
(Structure 8) The imaging device according to Structure 6 or 7, wherein the score map is a skin likelihood score map that is a map of likelihood scores representing skin attributes of the region of the input image.
(Structure 9) The correction means calculates the pixel value of the pixel corresponding to the pixel in the score map based on the result of comparing the pixel value of the pixel constituting the invisible light image with a predetermined threshold value. The imaging device according to configuration 1, wherein the imaging device determines whether or not to perform correction.
(Configuration 10) The imaging device according to Configuration 9, wherein the threshold value differs for each attribute represented by the score map.
(Structure 11) Further comprising a control means for controlling driving of the invisible light imaging means, wherein the control means controls the invisible light imaging when the input image is an image of a frame used for predetermined control regarding photography. According to any one of configurations 1 to 10, the control is performed so that the invisible light imaging means is not driven when the input image is not an image of a frame used for the predetermined control. The imaging device described.
(Configuration 12) The imaging device according to Configuration 11, wherein the predetermined control is white balance control, exposure control, or scene recognition control.
(Structure 13) A method for controlling an imaging device comprising a visible light imaging means for receiving visible light and an invisible light imaging means for receiving non-visible light, the visible light image being obtained by the visible light imaging means. is an input image, a generation step of generating a score map representing attributes of a region of the input image using a hierarchical neural network, and a generation step of generating the score map based on the invisible light image obtained by the invisible light imaging means. A method for controlling an imaging device, comprising: a correction step for correcting.

１０１撮像装置
１０４可視光撮像素子
１０５非可視光撮像素子
１０６制御部
１０８機械学習処理部
１０９スコアマップ算出部
２０１ニューラルネットワーク 101 Imaging device 104 Visible light image sensor 105 Invisible light image sensor 106 Control unit 108 Machine learning processing unit 109 Score map calculation unit 201 Neural network

Claims

An imaging device comprising visible light imaging means for receiving visible light and invisible light imaging means for receiving non-visible light,
A generation unit that uses a visible light image obtained by the visible light imaging unit as an input image and generates a score map representing attributes of a region of the input image using a hierarchical neural network;
An imaging device comprising: a correction means for correcting the score map based on the invisible light image obtained by the invisible light imaging means.

The correction means generates an inverted score map in which the pixel values of all pixels constituting the score map are inverted, and calculates the value of corresponding pixels between the inverted score map and the invisible light image. Performing a large value selection process that compares pixel values and outputs the larger pixel value, and correcting the score map using a pixel value obtained by inverting the pixel value output by the large value selection process. The imaging device according to claim 1, characterized in that:

The imaging device according to claim 2, wherein the score map is a sky likelihood score map that is a map of likelihood scores representing sky attributes of the region of the input image.

The correction means generates an inverted score map in which pixel values of all pixels constituting the score map are inverted, and inverts the pixel values of all pixels constituting the invisible light image. A large value that generates a processed invisible light image, compares the pixel values of corresponding pixels in the inverted score map and the inverted invisible light image, and outputs the larger pixel value. The imaging device according to claim 1, wherein the score map is corrected by performing selection processing and using pixel values obtained by inverting the pixel values output by the large value selection processing.

5. The imaging device according to claim 4, wherein the score map is a grass likelihood score map that is a map of likelihood scores representing attributes of grass in the area of the input image.

The correction means generates an inverted score map in which the pixel values of all pixels constituting the score map are inverted, and performs predetermined arithmetic processing on the pixel values of all pixels constituting the invisible light image. The values obtained are inverted to generate an arithmetic-processed invisible light image, and the pixel values of corresponding pixels are compared between the inverted score map and the arithmetic-processed invisible light image. and performing a large value selection process to output a larger pixel value, and correcting the score map using a pixel value obtained by inverting the pixel value output by the large value selection process. The imaging device according to item 1.

The predetermined arithmetic processing includes subtracting the pixel value of the pixel from a predetermined value in the pixels constituting the invisible light image, and if the value obtained by subtraction is a positive value, This is a process that uses the inverted value of the value, and if the value obtained by subtraction is a negative value, uses the inverted value of 0 to generate an arithmetic-processed invisible light image. The imaging device according to claim 6, characterized in that:

The imaging device according to claim 6 or 7, wherein the score map is a skin likelihood score map that is a map of likelihood scores representing skin attributes of the region of the input image.

The correction means corrects the pixel value of the pixel corresponding to the pixel in the score map based on the result of comparing the pixel value of the pixel constituting the invisible light image with a predetermined threshold value. The imaging device according to claim 1, wherein the imaging device determines whether or not an image is captured.

The imaging device according to claim 9, wherein the threshold value differs for each attribute represented by the score map.

further comprising a control means for controlling driving of the invisible light imaging means,
The control means drives the invisible light imaging means when the input image is a frame image used for predetermined control regarding photography, and when the input image is not a frame image used for the predetermined control. 2. The imaging apparatus according to claim 1, wherein control is performed so that the non-visible light imaging means is not driven.

The imaging device according to claim 11, wherein the predetermined control is white balance control, exposure control, or scene recognition control.

A method for controlling an imaging device comprising a visible light imaging means for receiving visible light and an invisible light imaging means for receiving non-visible light, the method comprising:
a generation step of generating a score map representing attributes of a region of the input image using a hierarchical neural network, using the visible light image obtained by the visible light imaging means as an input image;
A method for controlling an imaging device, comprising: a correction step of correcting the score map based on the invisible light image obtained by the invisible light imaging means.

A program that causes a computer to execute a method for controlling an imaging device including a visible light imaging device that receives visible light and an invisible light imaging device that receives non-visible light, the program comprising:
The method for controlling the imaging device includes:
a generation step of generating a score map representing attributes of a region of the input image using a hierarchical neural network, using the visible light image obtained by the visible light imaging means as an input image;
A program comprising: a correction step of correcting the score map based on the invisible light image obtained by the invisible light imaging means.