JP2019186911A

JP2019186911A - Image processing apparatus, image processing method, and imaging apparatus

Info

Publication number: JP2019186911A
Application number: JP2019023778A
Authority: JP
Inventors: 保彦岩本; Yasuhiko Iwamoto; 良介辻; Ryosuke Tsuji
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2018-04-10
Filing date: 2019-02-13
Publication date: 2019-10-24
Anticipated expiration: 2039-02-13
Also published as: JP6931369B2

Abstract

To provide an image processing apparatus capable of detecting a subject suitable for an amount of blur and clarity of a main subject in subject detection using machine learning, an image processing method, and an imaging apparatus.SOLUTION: Using parameters generated on the basis of machine learning, subject detection processing is applied to image data. In the subject detection processing, a parameter selected from a plurality of parameters stored in advance is used. According to an amount of blur and clarity of a subject, the subject detection processing is applied using different parameters.SELECTED DRAWING: Figure 2

Description

本発明は、画像処理装置および画像処理方法、ならびに撮像装置に関し、特に被写体検出技術に関する。 The present invention relates to an image processing apparatus, an image processing method, and an imaging apparatus, and more particularly to a subject detection technique.

画像から特定の被写体パターンを自動的に検出する被写体検出技術は非常に有用である。特許文献１には、撮影した画像から人物の顔のような特定の被写体パターンに該当する領域を検出し、検出した領域に焦点や露出を最適化させる撮像装置が開示されている。 A subject detection technique for automatically detecting a specific subject pattern from an image is very useful. Patent Document 1 discloses an imaging apparatus that detects a region corresponding to a specific subject pattern such as a human face from a photographed image and optimizes the focus and exposure of the detected region.

また、深層学習と呼ばれる手法を用いて、画像中の被写体を学習、認識することが知られている（非特許文献１）。コンボリューショナル・ニューラル・ネットワーク（ＣＮＮ）は、深層学習の代表的な手法である。一般的にＣＮＮは、画像の局所の特徴を空間的に統合する畳み込み層、特徴量を空間方向へ圧縮するプーリング層またはサブサンプリング層、さらに、全結合層、出力層などが組み合わされた多層構造を有する。ＣＮＮは多層構造による段階的な特徴変換を通じて、複雑な特徴表現を獲得することができるため、特徴表現に基づいて画像中の被写体のカテゴリ認識や被写体検出を高精度に行うことができる。 It is also known to learn and recognize a subject in an image using a technique called deep learning (Non-Patent Document 1). The convolutional neural network (CNN) is a typical technique for deep learning. In general, CNN is a multi-layer structure in which a convolution layer that spatially integrates local features of an image, a pooling layer or sub-sampling layer that compresses features in the spatial direction, and a fully connected layer, an output layer, etc. Have Since the CNN can acquire a complicated feature expression through stepwise feature conversion with a multi-layer structure, the category recognition of the subject in the image and the subject detection can be performed with high accuracy based on the feature representation.

特開２００５−３１８５５４号公報JP 2005-318554 A

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ”ImageNet classification with deep convolutional neural networks”, NIPS'12 Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, PP.1097-1105Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, “ImageNet classification with deep convolutional neural networks”, NIPS'12 Proceedings of the 25th International Conference on Neural Information Processing Systems-Volume 1, PP.1097-1105

特許文献１のように、撮影画像から検出された被写体領域に焦点を合わせる撮像装置においては、被写体検出に対する要求が、主被写体（焦点を合わせる被写体）の決定前後で異なる。主被写体の決定前では、ボケの大きさにかかわらず、検出対象の被写体を漏れなく検出できた方がよい。一方、主被写体の決定後は、ボケの大きな被写体は検出しなくてよい。 In an imaging apparatus that focuses on a subject area detected from a captured image as in Patent Document 1, the request for subject detection differs before and after determination of a main subject (a subject to be focused). Before determining the main subject, it is better to be able to detect the subject to be detected without omission regardless of the size of the blur. On the other hand, after determining the main subject, it is not necessary to detect a subject with large blur.

しかしながら、従来、機械学習を利用した被写体検出において、主被写体の決定前に適した被写体検出と、主被写体の決定後に適した被写体検出とを実現させるための手法は提案されていなかった。換言すれば、主被写体のボケ量（もしくは鮮明度）に適した被写体検出を行うための手法は提案されていなかった。主被写体のボケ量や鮮明度は、合焦度合いに限らず、被写体ブレや手ぶれの程度や、画像ノイズの大きさなどによっても変化する。 However, conventionally, in subject detection using machine learning, a method for realizing subject detection suitable before determination of the main subject and subject detection suitable after determination of the main subject has not been proposed. In other words, a method for subject detection suitable for the amount of blur (or definition) of the main subject has not been proposed. The blur amount and sharpness of the main subject are not limited to the degree of focus, but also vary depending on the degree of subject blurring and camera shake, the magnitude of image noise, and the like.

本発明はこのような従来技術の課題に鑑みてなされたものである。本発明の目的は、機械学習を利用した被写体検出において、主被写体のボケ量や鮮明度に適した被写体検出を実現可能な画像処理装置および画像処理方法、ならびに撮像装置を提供することにある。 The present invention has been made in view of the problems of the prior art. An object of the present invention is to provide an image processing device, an image processing method, and an imaging device capable of realizing subject detection suitable for a blur amount and a sharpness of a main subject in subject detection using machine learning.

上述の目的は、機械学習に基づいて生成されたパラメータを用いて、画像に対して被写体検出処理を適用する被写体検出手段と、被写体検出処理に用いるパラメータを複数記憶する記憶手段と、記憶手段が記憶するパラメータから、被写体検出手段で用いるパラメータを選択する選択手段と、を有し、選択手段は、主被写体が決定されている場合と、決定されていない場合とで、異なるパラメータを選択することを特徴とする画像処理装置によって達成される。また、同様の構成において、選択手段は、ブレが大きいと類推されるか否か、位置姿勢変化が大きいか否か、画像に施されたゲインが大きいか否かで異なるパラメータを選択することを特徴とする画像処理装置によって達成される。 The above-described object is provided by subject detection means for applying subject detection processing to an image using parameters generated based on machine learning, storage means for storing a plurality of parameters used for subject detection processing, and storage means. Selection means for selecting a parameter to be used by the subject detection means from the stored parameters, and the selection means selects different parameters depending on whether the main subject is determined or not. This is achieved by an image processing apparatus characterized by the following. Further, in the same configuration, the selecting means selects different parameters depending on whether or not it is estimated that the shake is large, whether or not the position and orientation change is large, and whether or not the gain applied to the image is large. This is achieved by the featured image processing apparatus.

本発明によれば、機械学習を利用した被写体検出において、主被写体の決定前に適した被写体検出と、主被写体の決定後に適した被写体検出とを実現可能な画像処理装置および画像処理方法、ならびに撮像装置を提供することができる。また、主被写体の動き状況、撮像装置の動き状況、撮影条件に応じて適した被写体検出を実現可能な画像処理装置および画像処理方法ならびに撮像装置を提供することができる。 According to the present invention, in subject detection using machine learning, an image processing apparatus and an image processing method capable of realizing subject detection suitable before the main subject is determined and subject detection suitable after the main subject is determined, and An imaging device can be provided. In addition, it is possible to provide an image processing device, an image processing method, and an imaging device capable of realizing subject detection suitable for the movement status of the main subject, the movement status of the imaging device, and the imaging conditions.

第１実施形態に係る画像処理装置の一例としてのデジタル一眼レフカメラの模式的な垂直断面図。1 is a schematic vertical sectional view of a digital single-lens reflex camera as an example of an image processing apparatus according to a first embodiment. 第１実施形態に係るデジタル一眼レフカメラの機能構成例を示すブロック図。1 is a block diagram showing an example of a functional configuration of a digital single lens reflex camera according to a first embodiment. 第１実施形態に係る撮影動作の概要に関するフローチャート。The flowchart regarding the outline | summary of imaging | photography operation | movement which concerns on 1st Embodiment. 第１実施形態に係る被写体検出処理に関するフローチャート。5 is a flowchart regarding subject detection processing according to the first embodiment. 第１実施形態に係る被写体検出部が用いるＣＮＮの構成例を示す模式図。FIG. 3 is a schematic diagram illustrating a configuration example of a CNN used by a subject detection unit according to the first embodiment. 図５のＣＮＮの一部の構成の示す模式図。FIG. 6 is a schematic diagram illustrating a configuration of part of the CNN in FIG. 5. 第２〜第４実施形態に係るデジタル一眼レフカメラの機能構成例を示すブロック図。The block diagram which shows the function structural example of the digital single-lens reflex camera which concerns on 2nd-4th embodiment. 第２実施形態に係る動きベクトル算出処理の一例を示す模式図。The schematic diagram which shows an example of the motion vector calculation process which concerns on 2nd Embodiment. 第２実施形態に係る被写体検出処理に関するフローチャート。10 is a flowchart regarding subject detection processing according to the second embodiment. 第３実施形態に係る被写体検出処理に関するフローチャート。10 is a flowchart regarding subject detection processing according to the third embodiment. 第４実施形態に係る被写体検出処理に関するフローチャート。10 is a flowchart regarding subject detection processing according to the fourth embodiment.

以下、添付図面を参照して、本発明の例示的な実施形態について詳細に説明する。なお、以下の実施形態は特許請求の範囲に係る発明を限定しない。また、実施形態には複数の特徴が記載されているが、その全てが発明に必須のものとは限らず、また、複数の特徴は任意に組み合わせられてもよい。さらに、添付図面においては、同一若しくは同様の構成に同一の参照番号を付し、重複した説明は省略する。 Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. The following embodiments do not limit the invention according to the claims. Further, although a plurality of features are described in the embodiment, all of them are not necessarily essential to the invention, and a plurality of features may be arbitrarily combined. Furthermore, in the accompanying drawings, the same reference numerals are assigned to the same or similar components, and duplicate descriptions are omitted.

以下の実施形態では、本発明をデジタル一眼レフカメラ（ＤＳＬＲ）で実施する場合に関して説明する。しかし、本発明は画像データを取り扱うことの可能な任意の電子機器で実施可能であり、デジタル一眼レフカメラは本発明に係る画像処理装置の一例に過ぎない。本発明を実施可能な電子機器には例えばパーソナルコンピュータ、スマートフォン、タブレット端末、ゲーム機、ロボットなどが含まれるが、これらに限定されない。 In the following embodiments, a case where the present invention is implemented by a digital single lens reflex camera (DSLR) will be described. However, the present invention can be implemented with any electronic device capable of handling image data, and a digital single-lens reflex camera is only an example of an image processing apparatus according to the present invention. Electronic devices that can implement the present invention include, but are not limited to, personal computers, smartphones, tablet terminals, game machines, robots, and the like.

《第１実施形態》
●（撮像装置の構成）
図１は本実施形態に係るデジタル一眼レフカメラ（ＤＳＬＲ）１００の垂直断面図である。また、図２はＤＳＬＲ１００の機能構成例を示すブロック図である。全図を通じて同じ参照番号は同じ構成要素を指す。 << First Embodiment >>
● (Configuration of imaging device)
FIG. 1 is a vertical sectional view of a digital single-lens reflex camera (DSLR) 100 according to this embodiment. FIG. 2 is a block diagram showing a functional configuration example of the DSLR 100. Like reference numerals refer to like elements throughout the drawings.

ＤＳＬＲ１００は、本体１０１と、本体１０１に着脱可能な撮影レンズ１０２（交換レンズ）とを有する。本体１０１と撮影レンズ１０２の着脱部（マウント）にはそれぞれマウント接点群１１５が設けられている。撮影レンズ１０２を本体１０１に装着すると、マウント接点群１１５が接触し、撮影レンズ１０２と本体１０１との電気的な接続が確立する。 The DSLR 100 includes a main body 101 and a photographing lens 102 (interchangeable lens) that can be attached to and detached from the main body 101. Mount contact groups 115 are provided on the attachment and detachment portions (mounts) of the main body 101 and the photographing lens 102, respectively. When the photographic lens 102 is attached to the main body 101, the mount contact group 115 comes into contact, and electrical connection between the photographic lens 102 and the main body 101 is established.

システム制御部２０１は、１つ以上のプログラマブルプロセッサと、ＲＯＭ２０１１、ＲＡＭ２０１２を有し、ＲＯＭ２０１１に記憶されているプログラムをＲＡＭ２０１２に読み込んで実行することにより、本体１０１および撮影レンズ１０２の動作を制御する。ＲＯＭ２０１１には、システム制御部２０１が実行するプログラムのほか、各種の設定値、ＧＵＩデータなどが記憶されている。 The system control unit 201 includes one or more programmable processors, a ROM 2011, and a RAM 2012, and controls operations of the main body 101 and the photographing lens 102 by reading a program stored in the ROM 2011 into the RAM 2012 and executing it. The ROM 2011 stores various setting values, GUI data, and the like in addition to programs executed by the system control unit 201.

撮影レンズ１０２には合焦距離を調節するフォーカスレンズ１１３と、本体１０１に入射する光量を調整する絞り１１４（およびこれらを駆動するモータやアクチュエータなど）が設けられる。フォーカスレンズ１１３や絞り１１４の駆動は、マウント接点群１１５を通じてカメラ本体１０１が制御する。 The photographing lens 102 is provided with a focus lens 113 that adjusts the focusing distance, and a diaphragm 114 (and a motor or actuator that drives them) that adjusts the amount of light incident on the main body 101. The driving of the focus lens 113 and the diaphragm 114 is controlled by the camera body 101 through the mount contact group 115.

メインミラー１０３およびサブミラー１０４は、クイックリターンミラーを構成する。メインミラー１０３の一部は、撮影レンズ１０２から入射する光束をファインダー光学系（図の上方）に向かう光束と、サブミラー１０４に向かう光束に分離するために反射率（透過率）が制御されている。 The main mirror 103 and the sub mirror 104 constitute a quick return mirror. A part of the main mirror 103 has a reflectance (transmittance) controlled to separate a light beam incident from the photographing lens 102 into a light beam traveling toward the finder optical system (upper side in the drawing) and a light beam traveling toward the sub mirror 104. .

図１は光学ファインダー使用時（非撮影時）の状態を示しており、メインミラー１０３が撮影レンズ１０２から入射する光束の光路中に位置している。この状態では、メインミラー１０３の反射光がファインダー光学系に入射し、ペンタプリズム１０７によって屈曲された光束はアイピース１０９から出射する。したがって、ユーザはアイピース１０９を覗くことにより、光学被写体像を見ることができる。 FIG. 1 shows a state when the optical viewfinder is used (when not photographing), and the main mirror 103 is positioned in the optical path of the light beam incident from the photographing lens 102. In this state, the reflected light of the main mirror 103 enters the finder optical system, and the light beam bent by the pentaprism 107 is emitted from the eyepiece 109. Therefore, the user can see the optical subject image by looking into the eyepiece 109.

また、メインミラー１０３の透過光はサブミラー１０４で反射されてＡＦセンサ１０５に入射する。ＡＦセンサ１０５は、撮影レンズ１０２の二次結像面をラインセンサー上に形成し、位相差検出方式による焦点検出に利用可能な１対の像信号（焦点検出用信号）を生成する。生成された焦点検出用信号はシステム制御部２０１へ送信される。システム制御部２０１は、焦点検出用信号を用いてフォーカスレンズ１１３のデフォーカス量を求め、デフォーカス量に基づいてフォーカスレンズ１１３の駆動方向および駆動量を制御する。 Further, the transmitted light of the main mirror 103 is reflected by the sub mirror 104 and enters the AF sensor 105. The AF sensor 105 forms a secondary imaging surface of the photographing lens 102 on the line sensor, and generates a pair of image signals (focus detection signals) that can be used for focus detection by the phase difference detection method. The generated focus detection signal is transmitted to the system control unit 201. The system control unit 201 obtains the defocus amount of the focus lens 113 using the focus detection signal, and controls the drive direction and drive amount of the focus lens 113 based on the defocus amount.

ピント板１０６は、ファインダー光学系内の撮影レンズ１０２の予定結像面に配置される。アイピース１０９を覗いたユーザは、ピント板１０６に形成された光学像を観察する。なお、光学像のほか、シャッタースピード、絞り値などの撮影情報も併せて提供することができる。 The focus plate 106 is disposed on the planned imaging plane of the photographing lens 102 in the finder optical system. A user looking into the eyepiece 109 observes the optical image formed on the focusing plate 106. In addition to the optical image, shooting information such as shutter speed and aperture value can also be provided.

測光センサー１０８は、入射する光束から像信号（露出制御用信号）を生成し、システム制御部２０１へ送信する。システム制御部２０１は、受信した露出制御用信号を用いて自動露出制御を行ったり、後述する被写体検出部２０４による被写体検出を制御したりする。測光センサー１０８は、光電変換部を備える画素が２次元状に配置された撮像素子である。被写体検出部２０４は、測光センサー１０８が出力する像信号に対して被写体検出処理を適用するものとするが、撮像素子１１１が出力する像信号に対して被写体検出処理を適用してもよい。 The photometric sensor 108 generates an image signal (exposure control signal) from the incident light beam and transmits it to the system control unit 201. The system control unit 201 performs automatic exposure control using the received exposure control signal, and controls subject detection by a subject detection unit 204 described later. The photometric sensor 108 is an image sensor in which pixels including a photoelectric conversion unit are two-dimensionally arranged. The subject detection unit 204 applies subject detection processing to the image signal output from the photometric sensor 108, but may apply subject detection processing to the image signal output from the image sensor 111.

撮像素子１１１の露光時、メインミラー１０３およびサブミラー１０４は、撮影レンズ１０２から入射する光束の光路の外に移動する。また、フォーカルプレーンシャッター１１０（以下、単にシャッターという）が開く。 When the image sensor 111 is exposed, the main mirror 103 and the sub mirror 104 move out of the optical path of the light beam incident from the photographing lens 102. Further, a focal plane shutter 110 (hereinafter simply referred to as a shutter) is opened.

撮像素子１１１には、光電変換部を備える画素が２次元状に配置されており、撮影レンズ１０２が形成する被写体光学像を各画素で光電変換し、画像信号をシステム制御部２０１に送信する。システム制御部２０１は、受信した画像信号から画像データを生成して画像記憶部２０２へ保存するとともに、ＬＣＤ等の表示部１１２に表示する。また、撮像素子１１１で生成された画像データは、被写体検出のために被写体検出部２０４にも供給されてもよい。なお、システム制御部２０１は、画像データを用い、コントラスト方式による焦点検出を行ってもよい。なお、システム制御部２０１は、画像データを用い、コントラスト方式による焦点検出を行ってもよい。 Pixels including a photoelectric conversion unit are two-dimensionally arranged in the image sensor 111, subject optical images formed by the photographing lens 102 are photoelectrically converted by each pixel, and an image signal is transmitted to the system control unit 201. The system control unit 201 generates image data from the received image signal, saves it in the image storage unit 202, and displays it on the display unit 112 such as an LCD. The image data generated by the image sensor 111 may also be supplied to the subject detection unit 204 for subject detection. Note that the system control unit 201 may perform focus detection by a contrast method using image data. Note that the system control unit 201 may perform focus detection by a contrast method using image data.

操作部２０３は、本体１０１および撮影レンズ１０２が備え、ユーザが操作可能な入力デバイス群の総称である。レリーズボタン、電源スイッチ、方向キー、決定ボタン、メニューボタン、動作モードの選択ダイヤルなどが操作部２０３に含まれる入力デバイスの具体例であるが、これらに限定されない。操作部２０３の操作は、システム制御部２０１が検知する。レリーズボタンは半押し操作でオンするスイッチＳＷ１と、全押し操作でオンするスイッチＳＷ２とを有する。 The operation unit 203 is a generic name for a group of input devices that are provided in the main body 101 and the photographing lens 102 and can be operated by the user. A release button, a power switch, a direction key, an enter button, a menu button, an operation mode selection dial, and the like are specific examples of the input device included in the operation unit 203, but are not limited thereto. The operation of the operation unit 203 is detected by the system control unit 201. The release button includes a switch SW1 that is turned on by a half-press operation and a switch SW2 that is turned on by a full-press operation.

例えば、レリーズボタンの半押し操作（ＳＷ１のオン）が検出されると、システム制御部２０１は、静止画撮影準備動作を開始する。撮影準備動作は例えば自動焦点検出（ＡＦ）や自動露出制御（ＡＥ）に関する動作である。また、レリーズボタンの全押し操作（ＳＷ２のオン）を検出すると、システム制御部２０１は、静止画の撮影および記録動作を実行する。システム制御部２０１は、撮影によって得られた画像を、表示部１１２に一定時間表示する。 For example, when a half-press operation of the release button (SW1 is turned on) is detected, the system control unit 201 starts a still image shooting preparation operation. The shooting preparation operation is, for example, an operation related to automatic focus detection (AF) and automatic exposure control (AE). In addition, when a full pressing operation of the release button (ON of SW2) is detected, the system control unit 201 performs still image shooting and recording operations. The system control unit 201 displays an image obtained by shooting on the display unit 112 for a certain period of time.

また、動画撮影時（撮影スタンバイ状態や動画記録中）、システム制御部２０１は、撮影によって得られた動画を、表示部１１２にリアルタイムに表示することにより、表示部１１２を電子ビューファインダー（ＥＶＦ）として機能させる。表示部１１２をＥＶＦとして機能させる際に表示する動画像およびそのフレーム画像を、ライブビュー画像もしくはスルー画像と呼ぶ。静止画と動画の何れを撮影するかは操作部２０３を通じて選択可能であり、システム制御部２０１は、静止画撮影時と動画撮影時とで、カメラ本体１０１および撮影レンズ１０２の制御方法を切り替える。 In addition, during moving image shooting (in shooting standby state or during moving image recording), the system control unit 201 displays the moving image obtained by shooting on the display unit 112 in real time, thereby displaying the display unit 112 in an electronic viewfinder (EVF). To function as. A moving image and its frame image that are displayed when the display unit 112 functions as an EVF are referred to as a live view image or a through image. Whether to shoot still images or moving images can be selected through the operation unit 203, and the system control unit 201 switches the control method of the camera body 101 and the photographing lens 102 between still image shooting and moving image shooting.

被写体検出部２０４は、ＧＰＵ(Graphic Processing Unit)で構成される。ＧＰＵは、元々は画像処理用のプロセッサであるが、複数の積和演算器を有し、行列計算を得意としているため、学習用の処理を行うプロセッサとしても用いられることが多い。そして、深層学習を行う処理においても、ＧＰＵが用いられることが一般的である。例えば、被写体検出部２０４として、ＮＶＩＤＩＡ社のＪｅｔｓｏｎＴＸ２ｍｏｄｕｌｅを用いることができる。なお、被写体検出部２０４として、ＦＰＧＡ(Field-Programmable Gate Array)やＡＳＩＣ(Application Specific Integrated Circuit）などを用いてもよい。 The subject detection unit 204 is configured by a GPU (Graphic Processing Unit). The GPU is originally a processor for image processing, but has a plurality of product-sum operation units and is good at matrix calculation, so it is often used as a processor that performs processing for learning. In general, GPU is also used in the process of performing deep learning. For example, as the subject detection unit 204, a Jetson TX2 module manufactured by NVIDIA can be used. As the subject detection unit 204, a field-programmable gate array (FPGA), an application specific integrated circuit (ASIC), or the like may be used.

被写体検出部２０４は、学習モデル記憶部２０５が複数記憶する学習モデルのうち、システム制御部２０１が選択した１つの学習モデルを用いて、供給される画像データに対して被写体検出処理を適用する。被写体検出処理の詳細については後述する。学習モデル記憶部２０５は例えば書き換え可能な不揮発性メモリであってよく、ＲＯＭ２０１１の一部であってもよい。本実施形態において学習モデル記憶部は、被写体検出に求められる特性に応じた検出学習モデル２０６，２０７を記憶する。システム制御部２０１は、被写体検出部２０４が検出した被写体の領域から主被写体の領域を決定し、主被写体の領域に焦点や露出を最適化させる。具体的には、主被写体の領域が適正露出となるように露出条件を決定したり、主被写体の領域が合焦するように焦点調節したりすることができるが、これらに限定されない。 The subject detection unit 204 applies subject detection processing to the supplied image data using one learning model selected by the system control unit 201 among a plurality of learning models stored in the learning model storage unit 205. Details of the subject detection process will be described later. The learning model storage unit 205 may be a rewritable nonvolatile memory, for example, or may be a part of the ROM 2011. In the present embodiment, the learning model storage unit stores detection learning models 206 and 207 corresponding to characteristics required for subject detection. The system control unit 201 determines the main subject region from the subject region detected by the subject detection unit 204, and optimizes the focus and exposure for the main subject region. Specifically, the exposure conditions can be determined so that the main subject area is properly exposed, or the focus can be adjusted so that the main subject area is in focus, but the present invention is not limited thereto.

（被写体検出における辞書切り替え）
本実施形態のＤＳＬＲ１００（システム制御部２０１）は、被写体検出部２０４が検出した被写体の領域を露出条件の決定や焦点検出に用いる。より具体的には、システム制御部２０１は、被写体検出部２０４が検出した被写体の中から主被写体を決定し、主被写体の領域を露出条件の決定や焦点検出に用いる。したがって、主被写体の決定の前後で、被写体検出に要求される特性は異なる。なお、主被写体は、システム制御部２０１が予め定められた条件（例えば被写体領域の位置や大きさ）に従って決定してもよいし、ユーザの選択に従って決定してもよい。 (Dictionary switching for subject detection)
The DSLR 100 (system control unit 201) of the present embodiment uses the subject area detected by the subject detection unit 204 for determining exposure conditions and focus detection. More specifically, the system control unit 201 determines a main subject from the subjects detected by the subject detection unit 204, and uses the main subject region for determination of exposure conditions and focus detection. Therefore, the characteristics required for subject detection differ before and after the main subject is determined. Note that the main subject may be determined by the system control unit 201 according to predetermined conditions (for example, the position and size of the subject region), or may be determined according to the user's selection.

主被写体の決定前に撮影される画像は、例えば画面中央が合焦するように焦点調節された画像であり、ユーザが主被写体としたい被写体には合焦していない場合がある。そのため、主被写体の決定前に実施する被写体検出では、ボケ量が大きくても検出対象の被写体が検出できることが要求される。換言すれば、誤検出の抑制よりも検出漏れの抑制を優先した被写体検出が要求される。 The image shot before the main subject is determined is an image that is focused so that the center of the screen is in focus, for example, and the user may not be focused on the subject that the user wants to be the main subject. Therefore, the subject detection performed before the main subject is determined requires that the subject to be detected can be detected even if the amount of blur is large. In other words, subject detection that gives priority to suppression of detection omission over suppression of erroneous detection is required.

一方、主被写体が決定されると、その後は主被写体の領域が合焦するように焦点検出が行われるため、その後得られる画像における主被写体のボケ量は小さくなる。そのため、主被写体の決定後は、検出対象の被写体であっても、ボケ量の大きい被写体を検出する価値が低下する。換言すれば、検出漏れの抑制よりも誤検出の抑制を優先した被写体検出が要求される。 On the other hand, when the main subject is determined, focus detection is performed so that the area of the main subject is in focus thereafter, so that the amount of blur of the main subject in the image obtained thereafter becomes small. For this reason, after the main subject is determined, even if the subject is a detection target, the value of detecting a subject with a large blur amount decreases. In other words, subject detection that gives priority to suppression of false detection over suppression of detection omission is required.

ボケ量の大きい被写体を検出しようとするほど、対象の被写体が写っていない領域を誤検出する確率が増加する傾向にある。これは、被写体のボケ量が大きくなるほど、被写体に類似した画像のバリエーションが増加するとともに、被写体の特徴が失われるからである。このように、主被写体の決定前は、ボケ量の小さい被写体から大きい被写体までを検出することが重要であって、主被写体の決定後は、誤検出を抑制し、ボケ量の小さい被写体を正確に検出することが重要である。 There is a tendency that the probability of erroneous detection of an area in which the subject subject is not captured increases as the subject with a larger amount of blur is detected. This is because, as the blur amount of the subject increases, image variations similar to the subject increase and the characteristics of the subject are lost. As described above, it is important to detect from a subject with a small amount of blur to a subject with a large amount of blur before the main subject is determined.After the main subject is determined, erroneous detection is suppressed and a subject with a small amount of blur is accurately detected. It is important to detect.

本実施形態の被写体検出部２０４は、予め機械学習を通じて生成された学習モデルに基づく処理パラメータを用いた被写体検出を画像に適用する。本実施形態では学習モデル記憶部２０５が、主被写体決定前の被写体検出用の検出学習モデル２０６と、主被写体決定後の被写体検出用の検出学習モデル２０７とを記憶している。主被写体決定前の被写体検出用の検出学習モデル２０６は、ボケ量の小さい被写体から大きい被写体の画像を用いた機械学習によって生成されている。また、主被写体決定後の被写体検出用の検出学習モデル２０７は、ボケ量の小さい被写体画像を重視した機械学習によって生成されている。 The subject detection unit 204 of the present embodiment applies subject detection using processing parameters based on a learning model generated in advance through machine learning to an image. In this embodiment, the learning model storage unit 205 stores a detection learning model 206 for subject detection before main subject determination and a detection learning model 207 for subject detection after main subject determination. A detection learning model 206 for detecting a subject before determining a main subject is generated by machine learning using an image of a large subject from a subject with a small amount of blur. In addition, the detection learning model 207 for subject detection after the main subject is determined is generated by machine learning that places importance on subject images with a small amount of blur.

そして、システム制御部２０１は、主被写体の決定前後で、被写体検出部２０４が用いる学習モデルを切り替えることにより、被写体検出部２０４における被写体検出の特性を切り替える。システム制御部２０１は、主被写体の決定前には主被写体決定前の被写体検出用の検出学習モデル２０６を、主被写体の決定後には主被写体決定後の被写体検出用の検出学習モデル２０７を用いるように切り替える。 The system control unit 201 switches the subject detection characteristics of the subject detection unit 204 by switching the learning model used by the subject detection unit 204 before and after the main subject is determined. The system control unit 201 uses the detection learning model 206 for detecting a subject before determining the main subject before the determination of the main subject, and uses the detection learning model 207 for detecting the subject after determining the main subject after determining the main subject. Switch to.

（撮像動作）
次に、図３および図４を参照して、本実施形態のＤＳＬＲ１００の撮影動作について説明する。
図３は撮影動作の概要に関するフローチャートであり、各ステップの処理はシステム制御部２０１のプログラマブルプロセッサがＲＯＭ２０１１からＲＡＭ２０１２に読み込まれたプログラムを実行することによって実現される。ここでは、本体１０１の電源がＯＮであり、撮影スタンバイ状態であるものとする。撮影スタンバイ状態でシステム制御部２０１は、例えば動画撮影を行い、得られた動画を表示部１１２に表示させることでＥＶＦとして機能させるなど、予め定められた動作を実行している。 (Imaging operation)
Next, with reference to FIG. 3 and FIG. 4, the photographing operation of the DSLR 100 of the present embodiment will be described.
FIG. 3 is a flowchart relating to the outline of the photographing operation, and the processing of each step is realized by the execution of a program read from the ROM 2011 into the RAM 2012 by the programmable processor of the system control unit 201. Here, it is assumed that the power source of the main body 101 is ON and is in a photographing standby state. In the shooting standby state, the system control unit 201 performs a predetermined operation such as, for example, shooting a moving image and displaying the obtained moving image on the display unit 112 to function as an EVF.

Ｓ３０１でシステム制御部２０１は、レリーズボタンが有するスイッチＳＷ１およびＳＷ２の状態を検出し、スイッチＳＷ１とＳＷ２のいずれかがオンであれば処理をＳ３０２に進める。一方、スイッチＳＷ１とＳＷ２のいずれもオフであれば、システム制御部２０１はＳ３０１を繰り返し実行する。 In step S301, the system control unit 201 detects the state of the switches SW1 and SW2 included in the release button. If any of the switches SW1 and SW2 is on, the process proceeds to step S302. On the other hand, if both the switches SW1 and SW2 are off, the system control unit 201 repeatedly executes S301.

Ｓ３０２でシステム制御部２０１は、測光センサー１０８の露光処理（電荷蓄積）を行う。測光センサー１０８の露光処理は所謂電子シャッターによって所定時間電荷蓄積を行うことによって実現される。システム制御部２０１は、測光センサー１０８の動作を制御して、所定時間電荷蓄積を行い、測光センサー１０８から画像信号（露出制御用信号）を読み出す。また、システム制御部２０１は、ＡＦセンサー１０５についても露光処理（電荷蓄積）を行い、画像信号（焦点検出用信号）を読み出す。 In step S 302, the system control unit 201 performs exposure processing (charge accumulation) of the photometric sensor 108. The exposure process of the photometric sensor 108 is realized by performing charge accumulation for a predetermined time using a so-called electronic shutter. The system control unit 201 controls the operation of the photometric sensor 108, performs charge accumulation for a predetermined time, and reads an image signal (exposure control signal) from the photometric sensor 108. The system control unit 201 also performs exposure processing (charge accumulation) for the AF sensor 105 and reads an image signal (focus detection signal).

Ｓ３０３でシステム制御部２０１は、露出制御用信号を被写体検出部２０４に供給し、被写体検出結果を被写体検出部２０４から取得する。また、システム制御部２０１は、被写体検出結果に基づいて主被写体を決定する。主被写体は任意の方法で決定することができ、システム制御部２０１が自動的に決定してもよいし、検出された被写体の領域をユーザに示し、ユーザが選択した領域の被写体を主被写体と決定してもよい。システム制御部２０１が自動的に主被写体を決定する場合、例えば被写体の領域の位置や大きさに基づいて決定することができる。例えば、画像の中心に一番近い被写体や、領域が一番大きい被写体を主被写体に決定することができる。なお、被写体検出の信頼度や過去の検出結果など、他の条件を考慮したり、複数の条件を組み合わせて考慮したりして主被写体を決定することもできる。被写体検出処理の詳細については図４を用いて後述する。 In step S 303, the system control unit 201 supplies an exposure control signal to the subject detection unit 204 and acquires a subject detection result from the subject detection unit 204. Further, the system control unit 201 determines the main subject based on the subject detection result. The main subject can be determined by an arbitrary method. The system control unit 201 may automatically determine the main subject. The detected subject area is shown to the user, and the subject in the user-selected area is set as the main subject. You may decide. When the system control unit 201 automatically determines the main subject, for example, it can be determined based on the position and size of the region of the subject. For example, the subject closest to the center of the image or the subject with the largest area can be determined as the main subject. It should be noted that the main subject can be determined by considering other conditions such as the reliability of subject detection and past detection results, or by combining a plurality of conditions. Details of the subject detection process will be described later with reference to FIG.

Ｓ３０４でシステム制御部２０１は、選択可能な焦点検出領域のうち、Ｓ３０３で決定した主被写体の位置に最も近い焦点検出領域を選択する。そして、Ｓ３０２で読み出した焦点検出用信号のうち、選択した焦点検出領域に関する焦点検出用信号から、選択した焦点検出領域の焦点状態（デフォーカス量および方向）を検出する。 In step S304, the system control unit 201 selects a focus detection region that is closest to the position of the main subject determined in step S303 from the selectable focus detection regions. Then, the focus state (defocus amount and direction) of the selected focus detection region is detected from the focus detection signal related to the selected focus detection region among the focus detection signals read out in S302.

なお、Ｓ３０３で被写体が検出されなかった場合、システム制御部２０１は、選択可能な全ての焦点検出領域についての焦点状態（デフォーカス量および方向）を、焦点検出用信号に基づいて求める。そして、最も近い距離に被写体が存在する焦点検出領域を選択する。 If no subject is detected in S303, the system control unit 201 obtains the focus state (defocus amount and direction) for all selectable focus detection areas based on the focus detection signal. Then, the focus detection area where the subject is present at the closest distance is selected.

Ｓ３０５でシステム制御部２０１は、Ｓ３０４で選択した焦点検出領域の焦点状態に基づいてフォーカスレンズ１１３の位置を制御することにより、撮影レンズ１０２の合焦距離を調節する。 In step S305, the system control unit 201 adjusts the focus distance of the photographing lens 102 by controlling the position of the focus lens 113 based on the focus state of the focus detection area selected in step S304.

Ｓ３０６でシステム制御部２０１は、Ｓ３０２で読み出した露出制御用信号を用いて撮影条件（絞り値（ＡＶ値）、シャッタスピード（ＴＶ値）、ＩＳＯ感度（ＩＳＯ値））を決定する。撮影条件の決定方法に特に制限は無いが、ここでは、露出制御用信号に基づいて得られる輝度（Ｂｖ値）に対応する撮影条件を、予め記憶されたプログラム線図を参照して決定するものとする。なお、被写体検出処理によって検出された被写体領域の輝度を用いて撮影条件を決定するようにしてもよい。 In step S306, the system control unit 201 determines shooting conditions (aperture value (AV value), shutter speed (TV value), ISO sensitivity (ISO value)) using the exposure control signal read in step S302. Although there is no particular limitation on the method for determining the shooting condition, here, the shooting condition corresponding to the luminance (Bv value) obtained based on the exposure control signal is determined with reference to a pre-stored program diagram. And Note that the shooting condition may be determined using the luminance of the subject area detected by the subject detection process.

Ｓ３０７でシステム制御部２０１は、スイッチＳＷ２の状態を検出し、スイッチＳＷ２がオンであれば処理をＳ３０８へ進め、スイッチＳＷ２がオフであれば処理をＳ３０１に戻す。 In step S307, the system control unit 201 detects the state of the switch SW2. If the switch SW2 is on, the system control unit 201 advances the process to step S308. If the switch SW2 is off, the process returns to step S301.

Ｓ３０８でシステム制御部２０１は、静止画の撮影処理を実行する。システム制御部２０１は、撮影レンズ１０２からの光束と交差しない位置にメインミラー１０３およびサブミラー１０４を移動させるとともに、Ｓ３０６で決定したシャッタースピードに従ってシャッター１１０を駆動する。これにより、撮影レンズ１０２が形成する光学像によって撮像素子１１１が露光される。撮像素子１１１は各画素が露光期間に蓄積した電荷を電圧に変換した画像信号を生成する。システム制御部２０１は撮像素子から画像信号を読み出し、Ａ／Ｄ変換、ノイズ低減、ホワイトバランス調整、色補間など、予め定められた画像処理を適用することにより画像データを生成する。システム制御部２０１は、生成した画像データを画像データファイルとして画像記憶部２０２に保存したり、画像データに基づく表示用画像信号を生成して表示部１１２に表示したりする。その後、システム制御部２０１は処理をＳ３０１に戻す。 In step S308, the system control unit 201 performs still image shooting processing. The system control unit 201 moves the main mirror 103 and the sub mirror 104 to positions that do not intersect with the light flux from the photographic lens 102, and drives the shutter 110 according to the shutter speed determined in S306. Thereby, the image sensor 111 is exposed by the optical image formed by the photographing lens 102. The image sensor 111 generates an image signal obtained by converting the charge accumulated in each pixel during the exposure period into a voltage. The system control unit 201 reads an image signal from the image sensor and generates image data by applying predetermined image processing such as A / D conversion, noise reduction, white balance adjustment, and color interpolation. The system control unit 201 saves the generated image data as an image data file in the image storage unit 202, or generates a display image signal based on the image data and displays it on the display unit 112. Thereafter, the system control unit 201 returns the process to S301.

（被写体検出の処理の流れ）
次に、図４に示すフローチャートを用いて、図３のＳ３０３の被写体検出処理の詳細について説明する。
Ｓ４０１でシステム制御部２０１は、主被写体が決定済みであるか否か判定し、決定済みと判定されなければＳ４０２へ、決定済みと判定されればＳ４０３へ、処理を進める。 (Subject detection process flow)
Next, details of the subject detection processing in S303 of FIG. 3 will be described using the flowchart shown in FIG.
In step S401, the system control unit 201 determines whether the main subject has been determined. If it is not determined that the main subject has been determined, the process proceeds to step S402. If it is determined that the main subject has been determined, the process proceeds to step S403.

Ｓ４０２でシステム制御部２０１（選択手段）は、学習モデル記憶部２０５に複数記憶されている学習モデルのうち、主被写体決定前の被写体検出用の検出学習モデル２０６を選択し、被写体検出処理用のパラメータとして被写体検出部２０４に設定する。主被写体決定前の被写体検出用の検出学習モデル２０６は、幅広いボケ量に対応して被写体を検出するための学習モデルであり、検出対象の被写体について小さいボケ量から大きいボケ量まで幅広いボケ量の画像を用いた機械学習によって得られた学習モデルである。検出学習モデル２０６を用いることにより、ボケ量が大きくても検出対象の被写体を検出可能な被写体検出が実現できる。 In step S 402, the system control unit 201 (selection unit) selects a detection learning model 206 for subject detection before the main subject determination from among a plurality of learning models stored in the learning model storage unit 205, and performs subject detection processing. The parameter is set in the subject detection unit 204 as a parameter. The detection learning model 206 for detecting a subject before determining a main subject is a learning model for detecting a subject corresponding to a wide range of blur amounts, and has a wide range of blur amounts from a small blur amount to a large blur amount for a subject to be detected. It is a learning model obtained by machine learning using images. By using the detection learning model 206, it is possible to realize subject detection that can detect the subject to be detected even if the amount of blur is large.

Ｓ４０３でシステム制御部２０１（選択手段）は、学習モデル記憶部２０５に複数記憶されている学習モデルのうち、主被写体決定後の被写体検出用の検出学習モデル２０７を選択し、被写体検出処理用のパラメータとして被写体検出部２０４に設定する。主被写体決定後の被写体検出用の検出学習モデル２０７は、ボケ量の小さい被写体を検出するための学習モデルであり、検出対象の被写体について、小さいボケ量の画像を重視して用いた機械学習によって得られた学習モデルである。検出学習モデル２０７を用いることにより、ボケ量が小さな検出対象の被写体を精度良く検出可能な被写体検出が実現できる。 In step S 403, the system control unit 201 (selection unit) selects a detection learning model 207 for subject detection after main subject determination from among a plurality of learning models stored in the learning model storage unit 205, and performs subject detection processing. The parameter is set in the subject detection unit 204 as a parameter. A detection learning model 207 for detecting a subject after the main subject is determined is a learning model for detecting a subject with a small amount of blur, and for a subject to be detected, by machine learning using an image with a small amount of blur as an emphasis. It is the obtained learning model. By using the detection learning model 207, subject detection capable of accurately detecting a subject to be detected with a small amount of blur can be realized.

また、Ｓ４０２およびＳ４０３でシステム制御部２０１は、Ｓ３０２で読み出した露出制御用信号に対してＡ／Ｄ変換やノイズ低減処理などを行って生成した画像データを被写体検出部２０４に供給する。そして、システム制御部２０１は、処理をＳ４０４に進める。 In steps S 402 and S 403, the system control unit 201 supplies image data generated by performing A / D conversion or noise reduction processing on the exposure control signal read in step S 302 to the subject detection unit 204. Then, the system control unit 201 advances the process to S404.

Ｓ４０４で被写体検出部２０４は、露出制御用信号に基づく画像データに対して、システム制御部２０１からＳ４０２またはＳ４０３において設定された検出学習モデルを用いて被写体検出処理を適用する。被写体検出部２０４は、検出結果を表す情報をシステム制御部２０１に供給する。検出結果を表す情報には、被写体が検出されたか否か（検出数）や、検出した被写体領域に関する情報（例えば位置や大きさ）が含まれてよい。被写体検出部２０４は、被写体検出にＣＮＮを用いる。ＣＮＮの詳細については後述する。 In step S404, the subject detection unit 204 applies subject detection processing to the image data based on the exposure control signal using the detection learning model set in the step S402 or S403 from the system control unit 201. The subject detection unit 204 supplies information representing the detection result to the system control unit 201. The information indicating the detection result may include whether or not a subject has been detected (number of detections) and information (for example, position and size) regarding the detected subject region. The subject detection unit 204 uses CNN for subject detection. Details of CNN will be described later.

Ｓ４０５でシステム制御部２０１は、被写体検出の結果、被写体が１つ以上検出されていれば、検出された被写体から主被写体を決定する。主被写体の決定方法については先に述べたとおりである。 In step S405, if one or more subjects are detected as a result of subject detection, the system control unit 201 determines a main subject from the detected subjects. The method for determining the main subject is as described above.

ここでは、主被写体の決定後は、ボケ量の小さい被写体の画像を重視した機械学習で得られた検出学習モデル２０７を用いて被写体検出を行うものとして説明した。しかし、主被写体の決定後であっても、主被写体が所定期間継続して検出されない場合には、幅広いボケ量に対応して被写体を検出するための学習モデルである検出学習モデル２０６を用いるように構成してもよい。主被写体が所定期間継続して検出されない（見失った）場合、主被写体のボケ量が大きくなっている可能性があるためである。 Here, it has been described that, after the main subject is determined, subject detection is performed using the detection learning model 207 obtained by machine learning that places importance on an image of a subject with a small amount of blur. However, if the main subject is not detected for a predetermined period even after the main subject is determined, the detection learning model 206, which is a learning model for detecting the subject corresponding to a wide range of blurring, is used. You may comprise. This is because when the main subject is not detected (lost) for a predetermined period of time, the amount of blur of the main subject may be large.

また、主被写体の決定後であっても、主被写体に合焦していないと判断される期間についても、検出学習モデル２０６を用いる構成としてもよい。主被写体に合焦する前は、主被写体のボケ量が大きい状態である可能性があるためである。あるいは、主被写体のボケ量が閾値以上である場合には検出学習モデル２０６を用いる構成としてもよい。あるいは、スイッチＳＷ１がオンになる前においても、検出学習モデル２０６を用いて被写体検出を行う構成としてもよい。 Further, even after the main subject is determined, the detection learning model 206 may be used for a period in which it is determined that the main subject is not in focus. This is because there is a possibility that the amount of blur of the main subject is large before focusing on the main subject. Alternatively, the detection learning model 206 may be used when the amount of blur of the main subject is greater than or equal to the threshold value. Alternatively, the subject detection may be performed using the detection learning model 206 even before the switch SW1 is turned on.

なお、検出学習モデル２０６を用いた場合、誤検出よりも検出漏れを抑制することを優先した被写体検出が行われる。そのため、被写体検出に検出学習モデル２０６を用いている場合は、被写体検出に検出学習モデル２０７を用いている場合よりも、Ｓ４０５において、主被写体を決定する際に必要な検出信頼度の閾値を厳しく設定することができる。つまり、主被写体の領域に必要な被写体検出の確からしさ（検出信頼度）を、検出学習モデル２０６を用いて検出された被写体領域については、検出学習モデル２０７を用いて検出された被写体領域よりも高く設定することができる。これにより、誤検出された被写体を主被写体として決定し、適切でない領域に焦点調節や露出制御を最適化してしまうことを抑制できる。 When the detection learning model 206 is used, subject detection is performed with priority given to suppressing detection omission over erroneous detection. Therefore, in the case where the detection learning model 206 is used for subject detection, the detection reliability threshold necessary for determining the main subject is made stricter in S405 than in the case where the detection learning model 207 is used for subject detection. Can be set. In other words, the subject detection accuracy (detection reliability) required for the main subject region is greater than the subject region detected using the detection learning model 207 for the subject region detected using the detection learning model 207. Can be set high. Accordingly, it is possible to prevent the erroneously detected subject from being determined as the main subject and optimizing the focus adjustment and the exposure control in an inappropriate area.

（被写体検出部の詳細）
次に、被写体検出部２０４について説明する。本実施形態では、被写体検出部２０４をＣＮＮ（コンボリューショナル・ニューラル・ネットワーク）の１種であるネオコグニトロンで構成する。被写体検出部２０４の基本的な構成について、図５および図６を用いて説明する。図５に入力された２次元画像データから被写体を検出するＣＮＮの基本的な構成を示す。処理の流れは、左端を入力とし、右方向に処理が進んでいく。ＣＮＮは、特徴検出層（Ｓ層）と特徴統合層（Ｃ層）と呼ばれる２つの層をひとつのセットとし、それが階層的に構成されている。なお、Ｓ層は従来技術で説明した畳み込み層に、Ｃ層は同プーリング層またはサブサンプリング層に対応する。 (Details of subject detection unit)
Next, the subject detection unit 204 will be described. In the present embodiment, the subject detection unit 204 is composed of Neocognitron, which is a type of CNN (convolutional neural network). A basic configuration of the subject detection unit 204 will be described with reference to FIGS. 5 and 6. FIG. 5 shows a basic configuration of the CNN that detects a subject from the input two-dimensional image data. The processing flow takes the left end as input and proceeds in the right direction. The CNN includes two layers called a feature detection layer (S layer) and a feature integration layer (C layer) as one set, and is configured hierarchically. The S layer corresponds to the convolution layer described in the prior art, and the C layer corresponds to the pooling layer or the subsampling layer.

ＣＮＮでは、まずＳ層において１つ前の階層で検出された特徴をもとに次の特徴を検出する。またＳ層において検出した特徴をＣ層で統合し、その階層における検出結果として次の階層に伝える構成を有する。
Ｓ層は特徴検出細胞面からなり、特徴検出細胞面ごとに異なる特徴を検出する。また、Ｃ層は特徴統合細胞面からなり、１つ前の階層の特徴検出細胞面での検出結果をプーリングもしくはサブサンプリングする。以下では、特に区別する必要がない場合、特徴検出細胞面および特徴統合細胞面を総称して特徴面と呼ぶ。本実施形態では、最終階層である出力層（ｎ階層目）ではＣ層は用いずＳ層のみで構成している。 In the CNN, first, the next feature is detected based on the feature detected in the previous layer in the S layer. Further, it has a configuration in which the features detected in the S layer are integrated in the C layer and transmitted to the next layer as a detection result in that layer.
The S layer is composed of a feature detection cell surface, and detects different features for each feature detection cell surface. Further, the C layer is composed of a feature integrated cell surface, and the detection result on the feature detection cell surface of the previous layer is pooled or subsampled. Hereinafter, the feature detection cell surface and the feature integrated cell surface are collectively referred to as a feature surface unless it is particularly necessary to distinguish between them. In the present embodiment, the output layer (the nth layer), which is the last layer, is configured by only the S layer without using the C layer.

特徴検出細胞面での特徴検出処理、および特徴統合細胞面での特徴統合処理の詳細について、図６を用いて説明する。１つの特徴検出細胞面は複数の特徴検出ニューロンにより構成され、個々の特徴検出ニューロンは１つ前の階層のＣ層に所定の構造で結合している。また１つの特徴統合細胞面は、複数の特徴統合ニューロンにより構成され、個々の特徴統合ニューロンは同じ階層のＳ層に所定の構造で結合している。 Details of the feature detection process on the feature detection cell plane and the feature integration process on the feature integration cell plane will be described with reference to FIG. One feature detection cell surface is composed of a plurality of feature detection neurons, and each feature detection neuron is connected to the C layer of the previous layer with a predetermined structure. One feature-integrated cell surface is composed of a plurality of feature-integrated neurons, and each feature-integrated neuron is connected to the S layer of the same hierarchy with a predetermined structure.

図６に示した、Ｌ階層目のＳ層のＭ番目の細胞面内において、位置（ξ, ζ）の特徴検出ニューロンの出力値を

と表記する。また、Ｌ階層目のＣ層のＭ番目の細胞面内において、位置(ξ, ζ)の特徴統合ニューロンの出力値を

と表記する。その時、それぞれのニューロンの結合係数を

とすると、各出力値は以下のように表すことができる。 The output value of the feature detection neuron at position (ξ, ζ) in the Mth cell plane of the Sth layer of the Lth layer shown in FIG.

Is written. Also, the output value of the feature integration neuron at the position (ξ, ζ) in the Mth cell plane of the Cth layer of the Lth layer

Is written. At that time, the coupling coefficient of each neuron

Then, each output value can be expressed as follows.

[数式１]

[数式２]

ここで、数式１におけるｆは活性化関数であり、例えばロジスティック関数や双曲正接関数などのシグモイド関数である。また、

は、Ｌ階層目のＳ層のＭ番目の細胞面における、位置(ξ, ζ)の特徴検出ニューロンの内部状態を表す。数式２は活性化関数を用いておらず、単純な線形和で表されている。 [Formula 1]

[Formula 2]

Here, f in Formula 1 is an activation function, for example, a sigmoid function such as a logistic function or a hyperbolic tangent function. Also,

Represents the internal state of the feature detection neuron at position (ξ, ζ) on the Mth cell surface of the Sth layer of the Lth layer. Equation 2 does not use an activation function and is represented by a simple linear sum.

数式２のように活性化関数を用いない場合、ニューロンの内部状態

と出力値

とは等しい。また、数式１の

を特徴検出ニューロンの結合先出力値と呼び、数式２の

を特徴統合ニューロンの結合先出力値と呼ぶ。 When the activation function is not used as in Equation 2, the internal state of the neuron

And output value

Is equal to In addition, Formula 1

Is called the connection destination output value of the feature detection neuron.

Is called the connection destination output value of the feature integration neuron.

ここで、数式１及び数式２におけるξ，ζ，ｕ，v，nについて説明する。位置(ξ, ζ)は入力画像における位置座標に対応しており、例えば出力値

が大きい場合、入力画像の画素位置(ξ, ζ)に、Ｌ階層目のＳ層のＭ番目の細胞面が検出する特徴が存在する可能性が高いことを意味する。またｎは数式１において、Ｌ−１階層目のＣ層のｎ番目の細胞面を意味しており、統合先特徴番号と呼ぶ。基本的にＬ−１階層目のＣ層に存在する全ての細胞面について積和演算を行う。（ｕ, ｖ）は、結合係数の相対位置座標であり、検出する特徴のサイズに応じて有限の範囲（ｕ, ｖ）において積和演算を行う。このような有限な（ｕ, ｖ）の範囲を受容野と呼ぶ。また受容野の大きさを、以下では受容野サイズと呼び、結合している範囲の横画素数×縦画素数で表す。 Here, ξ, ζ, u, v, and n in Equations 1 and 2 will be described. The position (ξ, ζ) corresponds to the position coordinate in the input image.

Is large, it means that there is a high possibility that the feature detected by the Mth cell surface of the Sth layer of the Lth layer exists at the pixel position (ξ, ζ) of the input image. Moreover, n means the nth cell surface of the C layer of the (L-1) th layer in Formula 1, and is called the integration destination feature number. Basically, the product-sum operation is performed on all the cell planes existing in the C layer of the (L-1) th layer. (U, v) is a relative position coordinate of the coupling coefficient, and a product-sum operation is performed in a finite range (u, v) according to the size of the feature to be detected. Such a finite range (u, v) is called a receptive field. The size of the receptive field is hereinafter referred to as a receptive field size, and is represented by the number of horizontal pixels × the number of vertical pixels in the combined range.

また数式１において、Ｌ＝１つまり最初の階層のＳ層では、数式１中の

は、入力画像

である。ちなみにニューロンや画素の分布は離散的であり、結合先特徴番号も離散的なので、ξ，ζ，ｕ，ｖ，ｎは離散的な値をとる。ここでは、ξ，ζは非負整数、ｎは自然数、ｕ，ｖは整数とし、何れも有限な範囲を有する。 In Equation 1, L = 1, that is, in the S layer of the first hierarchy,

The input image

It is. Incidentally, since the distribution of neurons and pixels is discrete and the connection feature number is also discrete, ξ, ζ, u, v, and n take discrete values. Here, ξ and ζ are non-negative integers, n is a natural number, u and v are integers, and both have a finite range.

数式１中の

は、所定の特徴を検出するための結合係数であり、結合係数を適切な値に調整することによって、所定の特徴を検出可能になる。この結合係数の調整が学習であり、ＣＮＮの構築においては、さまざまなテストパターンを用いて、

が適切な出力値になるように、結合係数を繰り返し徐々に修正していくことで結合係数を調整する。 In Equation 1

Is a coupling coefficient for detecting a predetermined feature, and the predetermined feature can be detected by adjusting the coupling coefficient to an appropriate value. Adjustment of this coupling coefficient is learning, and in the construction of CNN, using various test patterns,

The coupling coefficient is adjusted by repeatedly and gradually correcting the coupling coefficient so that becomes an appropriate output value.

次に、数式２中の

は、２次元のガウシアン関数を用いており、以下の数式３のように表すことができる。
[数式３]

ここでも、（ｕ，ｖ）は有限の範囲を有し、特徴検出ニューロンの場合と同様、範囲を受容野、範囲の大きさを受容野サイズと呼ぶ。ここではＬ階層目のＳ層のＭ番目の特徴のサイズに従って、受容野サイズの値を適宜設定することができる。数式３中のσは特徴サイズ因子であり、受容野サイズに応じて適宜定めることができる定数であってよい。例えば、受容野の一番外側の値がほぼ０とみなせるような値になるように特徴サイズ因子σを設定することができる。このように、本実施形態の被写体検出部２０４は、上述した演算を各階層で行い、最終階層（ｎ階層目）のＳ層において被写体検出を行うＣＮＮによって構成される。 Next, in Equation 2

Uses a two-dimensional Gaussian function and can be expressed as Equation 3 below.
[Formula 3]

Again, (u, v) has a finite range, and the range is called the receptive field and the size of the range is called the receptive field size, as in the case of the feature detection neuron. Here, the value of the receptive field size can be appropriately set according to the size of the Mth feature of the Sth layer of the Lth layer. Σ in Equation 3 is a feature size factor, and may be a constant that can be appropriately determined according to the receptive field size. For example, the feature size factor σ can be set so that the outermost value of the receptive field becomes a value that can be regarded as almost zero. As described above, the subject detection unit 204 of the present embodiment is configured by a CNN that performs the above-described calculation in each layer and performs subject detection in the S layer of the final layer (nth layer).

（被写体検出の学習方法）
結合係数

の具体的な調整（学習）方法について説明する。被写体検出部２０４に直接的に学習を行わせても良いし、被写体検出部２０４と等価なＣＮＮの機能を有するクラウドや外部のエッジコンピュータ上において学習を行わせるようにしてもよい。クラウドやエッジコンピュータ上で学習を行った場合には、ＤＳＬＲ１００が生成された学習モデルを無線通信、有線通信、あるいは、着脱可能な記憶メディアを介して取得し、学習モデル記憶部２０５に記憶させるようにすればよい。学習は、ＣＮＮに特定の入力画像（テストパターン）を与えて得られるニューロンの出力値と、教師信号（そのニューロンが出力すべき出力値）との関係に基づいて、結合係数

を修正することである。本実施形態の学習では、最終階層（ｎ階層目）の特徴検出層Ｓについては最小二乗法を用いて結合係数を修正する。また、他の階層（１〜ｎ−１階層目）の特徴検出層Ｓについては、誤差逆伝搬法を用いて結合係数を修正する。最小二乗法や誤差逆伝搬法を用いた結合係数の修正手法は例えば非特許文献１に記載されるような公知技術を用いることができるため、詳細についての説明は省略する。 (Subject detection learning method)
Coupling factor

A specific adjustment (learning) method will be described. The subject detection unit 204 may perform learning directly, or may perform learning on a cloud or an external edge computer having a CNN function equivalent to the subject detection unit 204. When learning is performed on a cloud or an edge computer, the learning model generated by the DSLR 100 is acquired via wireless communication, wired communication, or a removable storage medium, and stored in the learning model storage unit 205. You can do it. Learning is based on the relationship between the output value of a neuron obtained by giving a specific input image (test pattern) to the CNN and the teacher signal (the output value that the neuron should output).

Is to fix. In the learning of the present embodiment, for the feature detection layer S of the last layer (nth layer), the coupling coefficient is corrected using the least square method. For the feature detection layer S of the other layers (1st to (n-1) th layers), the coupling coefficient is corrected using the error back propagation method. Since a known technique as described in Non-Patent Document 1, for example, can be used as the coupling coefficient correction method using the least square method or the error back propagation method, a detailed description thereof will be omitted.

検出すべきパターンと、検出すべきでないパターンとを、学習用のテストパターンとして多数用意する。各テストパターンは、画像データと、対応する教師信号とを有する。検出すべきパターンに該当する画像データについては、最終階層の特徴検出細胞面において、検出対象のパターンが存在する領域に対応するニューロンの出力が１となるような教師信号とする。一方、検出すべきでないパターンに該当する画像データについては、検出すべきでないパターンが存在する領域に対応するニューロンの出力が−１となるような教師信号を与える。 A large number of patterns to be detected and patterns that should not be detected are prepared as test patterns for learning. Each test pattern has image data and a corresponding teacher signal. The image data corresponding to the pattern to be detected is a teacher signal such that the output of the neuron corresponding to the area where the pattern to be detected exists is 1 on the feature detection cell surface in the final hierarchy. On the other hand, for image data corresponding to a pattern that should not be detected, a teacher signal is given so that the output of the neuron corresponding to the region where the pattern that should not be detected exists is -1.

本実施形態では、被写体検出を実施する時期により、望ましい被写体検出の特性が異なることに着目し、被写体検出に用いるパラメータ（機械学習によって生成した学習モデル）を、被写体検出を実施する時期に応じて切り替える構成とした。そのため、１つの学習モデルを用いる構成と比較して、より適切な被写体検出結果を得ることができる。具体的には、検出対象の被写体（例えば人物の顔）について、さまざまなボケ量（小さいボケ量から大きいボケ量まで）の画像データを用いたテストパターンによる機械学習により、主被写体決定前の被写体検出用の検出学習モデル２０６を用意する。また、検出対象の被写体について、小さいボケ量の画像データを重点的に用いたテストパターンによる機械学習により、主被写体決定後の被写体検出用の検出学習モデル２０７を用意する。 In this embodiment, focusing on the fact that the characteristics of desirable subject detection differ depending on the timing of subject detection, the parameters used for subject detection (learning models generated by machine learning) are set according to the timing of subject detection. It was set as the structure switched. Therefore, a more appropriate subject detection result can be obtained as compared with a configuration using one learning model. Specifically, for a subject to be detected (for example, a human face), a subject before the main subject is determined by machine learning using a test pattern using image data of various blur amounts (from a small blur amount to a large blur amount). A detection learning model 206 for detection is prepared. In addition, a detection learning model 207 for subject detection after main subject determination is prepared for a subject to be detected by machine learning using a test pattern that preferentially uses image data with a small amount of blur.

そして、主被写体の決定前には検出学習モデル２０６を、決定後には検出学習モデル２０７を用いることにより、主被写体の決定前に適した被写体検出と、主被写体の決定後に適した被写体検出とを実現できる。そのため、検出された被写体のうち、特定の被写体の領域を用いる処理を適切に実行することができる。 Then, the detection learning model 206 is used before the main subject is determined, and the detection learning model 207 is used after the determination, so that a suitable subject detection before the main subject is determined and a suitable subject detection after the main subject is determined. realizable. Therefore, it is possible to appropriately execute processing using a specific subject area among the detected subjects.

《第２実施形態》
次に、本発明の第２実施形態について説明する。図７は、本実施形態に係るＤＳＬＲ１００の機能構成例を示すブロック図であり、第１実施形態と同様の構成については図２と同じ参照数字を付してある。ただし、測光センサー１０８は、像信号を生成する際に、像信号の輝度値が所定の範囲に含まれるように像信号にゲインを適用する点で第１実施形態と異なる。測光センサー１０８が像信号に適用したゲインの大きさは、測光センサー１０８からシステム制御部２０１に通知される。また、学習モデル記憶部２０５は、記憶している検出学習モデルの種類が第１実施形態と異なる。 << Second Embodiment >>
Next, a second embodiment of the present invention will be described. FIG. 7 is a block diagram illustrating a functional configuration example of the DSLR 100 according to the present embodiment. The same reference numerals as those in FIG. 2 are assigned to the same configurations as those in the first embodiment. However, the photometric sensor 108 differs from the first embodiment in that when generating an image signal, a gain is applied to the image signal so that the luminance value of the image signal is included in a predetermined range. The magnitude of the gain applied to the image signal by the photometric sensor 108 is notified from the photometric sensor 108 to the system control unit 201. The learning model storage unit 205 is different from the first embodiment in the type of the detected learning model stored.

位置姿勢変化取得部７０１は、例えばジャイロ、加速度センサ、電子コンパス等の位置姿勢センサを有し、ＤＳＬＲ１００の位置姿勢変化を表す信号を出力する。システム制御部２０１は、位置姿勢変化取得部７０１が出力した信号をＲＡＭ２０１２に格納する。 The position / orientation change acquisition unit 701 includes a position / orientation sensor such as a gyro, an acceleration sensor, or an electronic compass, for example, and outputs a signal representing the position / orientation change of the DSLR 100. The system control unit 201 stores the signal output from the position / orientation change acquisition unit 701 in the RAM 2012.

動きベクトル検出部７０２は、２フレーム分の画像データを用いて動きベクトルを検出する。検出した動きベクトルの情報はシステム制御部２０１に出力する。システム制御部２０１は、動きベクトルの情報をＲＡＭ２０１２に格納する。ベクトル検出処理の詳細については後述する。その他の各部については第１実施形態と同様のため、説明を省略する。 The motion vector detection unit 702 detects a motion vector using image data for two frames. Information on the detected motion vector is output to the system control unit 201. The system control unit 201 stores motion vector information in the RAM 2012. Details of the vector detection process will be described later. Since other parts are the same as those in the first embodiment, description thereof is omitted.

（動きベクトル算出処理の詳細）
動きベクトル検出部７０２における、動きベクトル算出処理の詳細について説明する。図８は、動きベクトル検出部７０２に供給される２フレーム分の画像を模式的に示した図である。ここで、２フレーム分の画像は、同じ視点から異なるタイミングで撮影された画像であり、現フレームをフレーム_t、１つ前のフレームをフレーム_t-1とするフレーム_t-1内の複数の位置に対して、フレーム_tとのマッチング処理を行う。そして、フレーム_t-1上の点

を始点、それらに対応するフレーム_t上の点

を終点とする、ｉ個のベクトルｖ_iを以下のように算出する。

ただし、ベクトルの第一成分を画像の横方向、第二成分を画像の縦方向とする。 (Details of motion vector calculation processing)
Details of the motion vector calculation processing in the motion vector detection unit 702 will be described. FIG. 8 is a diagram schematically showing an image for two frames supplied to the motion vector detection unit 702. Here, two frames of the image is an image captured at different timings from the same viewpoint, a plurality of positions in the frame _t-1 to frame _t, 1 previous frame and the frame _t-1 to the current frame Is matched with the frame _t . And a point on frame _t-1

Starting point, corresponding point on frame _t

I vectors v _i are calculated as follows.

However, the first component of the vector is the horizontal direction of the image, and the second component is the vertical direction of the image.

（被写体検出における辞書切り替え）
フレーム間の動きベクトルに基づいて現フレーム内の被写体領域を検出しようとした場合、動きベクトルが大きいほど誤検出の確率が増加する。これは、動きベクトルが大きい被写体は画像のブレが大きく、ボケ量の大きい場合と同様、被写体に類似した画像のバリエーションが増加するとともに、被写体の特徴が失われるからである。 (Dictionary switching for subject detection)
When attempting to detect a subject area in the current frame based on a motion vector between frames, the probability of false detection increases as the motion vector increases. This is because a subject with a large motion vector has a large image blur, and as in the case where the amount of blur is large, variations in the image similar to the subject increase and characteristics of the subject are lost.

被写体検出部２０４は、予め機械学習を通じて生成された学習モデルに基づく処理パラメータを用いた被写体検出処理を画像に適用する。具体的には、被写体検出部２０４は、学習モデル記憶部２０５に記憶された学習モデルのうち、主被写体動きベクトル小用の検出学習モデル７０３と主被写体動きベクトル大用の検出学習モデル７０４とを参照する。主被写体動きベクトル小用の検出学習モデル７０３は、被写体ブレ量の小さい被写体画像を重視した機械学習によって生成されている。また、主被写体動きベクトル大用の検出学習モデル７０４は、被写体ブレ量の大きい被写体画像を重視した機械学習によって生成されている。 The subject detection unit 204 applies subject detection processing using processing parameters based on a learning model generated through machine learning in advance to an image. Specifically, the subject detection unit 204 includes a detection learning model 703 for small main subject motion vector and a detection learning model 704 for large main subject motion vector among the learning models stored in the learning model storage unit 205. refer. The detection learning model 703 for small main subject motion vectors is generated by machine learning that places importance on subject images with small subject blur amounts. The detection learning model 704 for large main subject motion vectors is generated by machine learning that places importance on subject images with a large subject blur amount.

そして、システム制御部２０１は、主被写体の動きベクトルの大きさに応じて、主被写体動きベクトル小用の検出学習モデル７０３と主被写体動きベクトル大用の検出学習モデル７０４のどちらを被写体検出部２０４で用いるかを切り替える。主被写体の動きベクトルの大きさに応じて被写体検出処理に用いる学習モデルを切り替えることにより、システム制御部２０１は主被写体検出部２０４における被写体検出処理の特性を切り替える。より具体的には、システム制御部２０１は、主被写体の動きベクトルが大きい判定されなければ検出学習モデル７０３を、主被写体の動きベクトルが大きいと判定されれば検出学習モデル７０４を用いるように切り替える。動きベクトルの大きさを判定するための閾値は、例えば予め実験的に定めることができる。 Then, the system control unit 201 selects either the detection learning model 703 for main subject motion vector small or the detection learning model 704 for large main subject motion vector depending on the magnitude of the motion vector of the main subject. Switch between using with. The system control unit 201 switches the characteristics of the subject detection process in the main subject detection unit 204 by switching the learning model used for the subject detection process according to the magnitude of the motion vector of the main subject. More specifically, the system control unit 201 switches the detection learning model 703 to use the detection learning model 704 if the main subject's motion vector is not determined to be large, and uses the detection learning model 704 if the main subject's motion vector is determined to be large. . The threshold for determining the magnitude of the motion vector can be experimentally determined in advance, for example.

（被写体検出の処理の流れ）
次に、図９に示すフローチャートを用いて、本実施形態のＤＳＬＲ１００が撮影動作で行う被写体検出処理の詳細について説明する。本実施形態のＤＳＬＲ１００の撮影動作は第１実施形態と同様であるため、以下で説明する被写体検出処理は、図３のＳ３０３で実施される。なお、ここでは既に主被写体が決定されており、動きベクトル検出部７０２は例えば撮影スタンバイ状態で撮影される動画のフレーム画像を用いて、前フレームにおける主被写体領域内の点を始点とする動きベクトルを１つ以上検出しているものとする。 (Subject detection process flow)
Next, details of subject detection processing performed by the DSLR 100 according to the present embodiment in a shooting operation will be described with reference to the flowchart shown in FIG. Since the shooting operation of the DSLR 100 of this embodiment is the same as that of the first embodiment, the subject detection process described below is performed in S303 of FIG. Here, the main subject has already been determined, and the motion vector detection unit 702 uses, for example, a frame image of a moving image shot in the shooting standby state, and a motion vector starting from a point in the main subject region in the previous frame. It is assumed that one or more are detected.

Ｓ９０１でシステム制御部２０１は、ＲＡＭ２０１２を参照して主被写体の動きベクトルの大きさを算出する。そして、システム制御部２０１は、算出した大きさを予め定めた閾値と比較し、閾値より大きければ動きベクトルが大きいと判定する。システム制御部２０１は、主被写体の動きベクトルが大きいと判定されればＳ９０３へ、判定されなければＳ９０２へ、それぞれ処理を進める。 In step S 901, the system control unit 201 refers to the RAM 2012 and calculates the size of the motion vector of the main subject. Then, the system control unit 201 compares the calculated magnitude with a predetermined threshold, and determines that the motion vector is large if it is greater than the threshold. If it is determined that the motion vector of the main subject is large, the system control unit 201 proceeds to step S903, and if not, proceeds to step S902.

Ｓ９０２でシステム制御部２０１は、学習モデル記憶部２０５に複数記憶されている学習モデルのうち、動きベクトル小用の検出学習モデル７０３を選択し、被写体検出処理用のパラメータとして被写体検出部２０４に設定する。検出学習モデル７０３を用いることにより、被写体ブレが小さな検出対象の被写体を精度よく検出可能な被写体検出が実現できる。 In step S 902, the system control unit 201 selects a detection learning model 703 for small motion vectors from among a plurality of learning models stored in the learning model storage unit 205, and sets the selected learning model 703 as a parameter for subject detection processing in the subject detection unit 204. To do. By using the detection learning model 703, it is possible to realize subject detection that can accurately detect a subject to be detected with small subject blur.

Ｓ９０３でシステム制御部２０１は、学習モデル記憶部２０５に複数記憶されている学習モデルのうち、動きベクトル大用の検出学習モデル７０４を選択し、被写体検出処理用パラメータとして被写体検出部２０４に設定する。検出学習モデル７０４を用いることにより、被写体ブレが大きくても検出対象の被写体を検出可能な被写体検出が実現できる。 In step S 903, the system control unit 201 selects a detection learning model 704 for large motion vectors from among a plurality of learning models stored in the learning model storage unit 205, and sets the selected detection learning model 704 as a subject detection processing parameter in the subject detection unit 204. . By using the detection learning model 704, it is possible to realize subject detection that can detect a subject to be detected even if subject blurring is large.

また、Ｓ９０２およびＳ９０３でシステム制御部２０１は、Ｓ３０２で読みだした露出制御用信号に対してＡ／Ｄ変換やノイズ低減処理などを行って生成した画像データを被写体検出部２０４に供給する。そして、システム制御部２０１は、処理をＳ９０４に進める。 In steps S902 and S903, the system control unit 201 supplies the subject detection unit 204 with image data generated by performing A / D conversion, noise reduction processing, and the like on the exposure control signal read in step S302. Then, the system control unit 201 advances the process to S904.

Ｓ９０４で被写体検出部２０４は、露出制御用信号に基づく画像データに対して、システム制御部２０１からＳ９０２またはＳ９０３で設定された検出学習モデルを用いて被写体検出処理を適用する。被写体検出部２０４は、検出結果を表す情報をシステム制御部２０１に供給する。検出結果を表す情報や、ＣＮＮを用いた被写体検出の具体的な処理は第１実施形態と同様であってよい。 In step S904, the subject detection unit 204 applies subject detection processing to the image data based on the exposure control signal using the detection learning model set in step S902 or S903 from the system control unit 201. The subject detection unit 204 supplies information representing the detection result to the system control unit 201. Information representing the detection result and specific processing of subject detection using CNN may be the same as in the first embodiment.

Ｓ９０５でシステム制御部２０１は、被写体検出の結果、被写体が１つ以上検出されていれば、検出された被写体から主被写体を決定する。主被写体の決定方法についても第１実施形態と同様であってよい。 In step S905, if one or more subjects are detected as a result of subject detection, the system control unit 201 determines a main subject from the detected subjects. The method for determining the main subject may be the same as in the first embodiment.

Ｓ９０６で動きベクトル検出部７０２は、前述の方法によりＳ９０５で決定した主被写体の動きベクトルを算出する。 In S906, the motion vector detection unit 702 calculates the motion vector of the main subject determined in S905 by the method described above.

本実施形態では、主被写体の動きベクトルの大きさ応じて望ましい被写体検出の特性が異なることに着目し、被写体検出に用いるパラメータ（機械学習によって生成した学習モデル）を、主被写体の動きベクトルの大きさに応じて切り替える構成とした。そのため、１つの学習モデルを用いる構成と比較して、より適切な被写体検出結果を得ることができる。具体的には、検出対象の被写体（例えば人物の顔）について、小さい被写体ブレ量の画像データを重点的に用いたテストパターンによる機械学習により、動きベクトル小用の検出学習モデル７０３を用意する。また、検出対象の被写体について、大きい被写体ブレ量の画像データを重点的に用いたテストパターンによる機械学習により、動きベクトル大用の検出学習モデル７０４を用意する。 In the present embodiment, focusing on the fact that the desired subject detection characteristics differ depending on the magnitude of the motion vector of the main subject, the parameters used for subject detection (the learning model generated by machine learning) are set to the magnitude of the motion vector of the main subject. It was set as the structure switched according to it. Therefore, a more appropriate subject detection result can be obtained as compared with a configuration using one learning model. Specifically, a detection learning model 703 for small motion vectors is prepared by subjecting a subject to be detected (for example, a human face) to machine learning using a test pattern that preferentially uses image data with a small subject blur amount. In addition, a detection learning model 704 for a large motion vector is prepared for a subject to be detected by machine learning using a test pattern that preferentially uses image data with a large subject blur amount.

そして、主被写体の動きベクトルが大きいと判定される場合には検出学習モデル７０４を、大きいと判定されない場合には検出学習モデル７０３を用いて被写体検出処理を実行するよう、検出学習モデルを切り替える。これにより、主被写体の動きベクトルの大きさ（すなわち、被写体像のブレ量の大きさ）に適した被写体検出を実現できる。そのため、検出された被写体のうち、特定の被写体の領域を用いる処理を適切に実行することができる。 Then, when it is determined that the motion vector of the main subject is large, the detection learning model 704 is switched, and when it is not determined that the main subject is large, the detection learning model is switched so as to execute the subject detection processing. Thereby, subject detection suitable for the size of the motion vector of the main subject (that is, the amount of blur of the subject image) can be realized. Therefore, it is possible to appropriately execute processing using a specific subject area among the detected subjects.

《第３実施形態》
次に、本発明の第３実施形態について説明する。本実施形態は、被写体検出に用いる学習モデルの切り替え方法以外は第２実施形態と同様に実施することができる。そのため、以下では学習モデルの切り替え方法に関して重点的に説明する。 << Third Embodiment >>
Next, a third embodiment of the present invention will be described. This embodiment can be carried out in the same manner as the second embodiment except for a learning model switching method used for subject detection. For this reason, the learning model switching method will be mainly described below.

（被写体検出における辞書切り替え）
本実施形態では、被写体検出処理に用いる学習モデルを、撮像装置（ＤＳＬＲ１００）の位置姿勢変化の大きさに応じて切り替える。撮像装置の位置姿勢変化が大きい場合には、手ぶれが大きくなるため、被写体領域の誤検出が起こりやすくなる。これは、撮像装置の位置姿勢変化が大きい場合はボケ量の大きい場合と同様に、被写体に類似した画像のバリエーションが増加するとともに、被写体の特徴が失われるからである。 (Dictionary switching for subject detection)
In the present embodiment, the learning model used for the subject detection process is switched according to the magnitude of the position / orientation change of the imaging device (DSLR 100). When the position and orientation change of the image pickup apparatus is large, camera shake becomes large, and erroneous detection of the subject area is likely to occur. This is because, when the change in the position and orientation of the imaging apparatus is large, variations in images similar to the subject increase and the characteristics of the subject are lost, as in the case where the amount of blur is large.

本実施形態の被写体検出部２０４は、予め機械学習を通じて生成された学習モデルに基づく処理パラメータを用いた被写体検出を画像に適用する。本実施形態では学習モデル記憶部２０５から、位置姿勢変化小用の検出学習モデル７０５と、位置姿勢変化大用の検出学習モデル７０６を参照する。位置姿勢変化小用の検出学習モデル７０５は、手ブレ量が小さい場合の被写体画像を重視した機械学習によって生成されている。また、位置姿勢変化大用の検出学習モデル７０６は、手ブレ量が大きい場合の被写体画像を重視した機械学習によって生成されている。 The subject detection unit 204 of the present embodiment applies subject detection using processing parameters based on a learning model generated in advance through machine learning to an image. In this embodiment, the detection learning model 705 for small change in position and orientation and the detection learning model 706 for large change in position and orientation are referred to from the learning model storage unit 205. The detection learning model 705 for small position and orientation change is generated by machine learning that places importance on the subject image when the amount of camera shake is small. Further, the detection learning model 706 for large change in position and orientation is generated by machine learning that places importance on the subject image when the amount of camera shake is large.

そして、システム制御部２０１は、位置姿勢変化の大きさに応じて、被写体検出部２０４が用いる学習モデルを切り替えることにより、被写体検出部２０４における被写体検出の特性を切り替える。より具体的には、位置姿勢変化が大きいと判定されない場合には、位置姿勢変化小用の検出学習モデル７０５を、位置姿勢変化が大きいと判定される場合には、位置姿勢変化大用の検出学習モデル７０６を用いるように切り替える。位置姿勢変化の大きさを判定するための閾値は、例えば予め実験的に定めることができる。 The system control unit 201 switches the subject detection characteristics in the subject detection unit 204 by switching the learning model used by the subject detection unit 204 according to the magnitude of the position and orientation change. More specifically, when it is not determined that the position and orientation change is large, the detection learning model 705 for small position and orientation change is detected. Switch to use the learning model 706. The threshold for determining the magnitude of the position and orientation change can be experimentally determined in advance, for example.

（被写体検出の処理の流れ）
次に、図１０に示すフローチャートを用いて、本実施形態のＤＳＬＲ１００が撮影動作で行う被写体検出処理の詳細について説明する。本実施形態のＤＳＬＲ１００の撮影動作は第１実施形態と同様であるため、以下で説明する被写体検出処理は、図３のＳ３０３で実施される。 (Subject detection process flow)
Next, the details of subject detection processing performed by the DSLR 100 of the present embodiment in the shooting operation will be described using the flowchart shown in FIG. Since the shooting operation of the DSLR 100 of this embodiment is the same as that of the first embodiment, the subject detection process described below is performed in S303 of FIG.

Ｓ１００１でシステム制御部２０１は、ＲＡＭ２０１２を参照して、位置姿勢変化取得部７０１から得られた位置姿勢変化を予め定めた閾値と比較し、閾値より大きければ位置姿勢変化が大きいと判定する。システム制御部２０１は、ＤＳＬＲ１００の位置姿勢変化が大きいと判定されればＳ１００３へ、判定されなければＳ１００２へ、それぞれ処理を進める。 In step S 1001, the system control unit 201 refers to the RAM 2012 and compares the position / orientation change obtained from the position / orientation change acquisition unit 701 with a predetermined threshold. If it is determined that the position and orientation change of the DSLR 100 is large, the system control unit 201 proceeds to S1003. If not determined, the system control unit 201 proceeds to S1002.

Ｓ１００２でシステム制御部２０１は、学習モデル記憶部２０５に複数記憶されている学習モデルのうち、位置姿勢変化小用の検出学習モデル７０５を選択し、被写体検出処理用のパラメータとして被写体検出部２０４に設定する。検出学習モデル７０５を用いることにより、手ブレが小さな検出対象の被写体を精度よく検出可能な被写体検出が実現できる。
Ｓ１００３でシステム制御部２０１は、学習モデル記憶部２０５に複数記憶されている学習モデルのうち、位置姿勢変化大用の検出学習モデル７０６を選択し、被写体検出処理用パラメータとして被写体検出部２０４に設定する。検出学習モデル７０６を用いることにより、手ブレが大きくても検出対象の被写体を検出可能な被写体検出が実現できる。 In step S 1002, the system control unit 201 selects a detection learning model 705 for small change in position and orientation from among a plurality of learning models stored in the learning model storage unit 205, and supplies the subject detection unit 204 with a parameter for subject detection processing. Set. By using the detection learning model 705, it is possible to realize subject detection that can accurately detect a subject to be detected with small camera shake.
In step S 1003, the system control unit 201 selects a detection learning model 706 for large change in position and orientation from among a plurality of learning models stored in the learning model storage unit 205, and sets the selected detection learning model 706 as a subject detection processing parameter in the subject detection unit 204. To do. By using the detection learning model 706, it is possible to realize subject detection that can detect a subject to be detected even if camera shake is large.

また、Ｓ１００２およびＳ１００３でシステム制御部２０１は、Ｓ３０２で読みだした露出制御用信号に対してＡ／Ｄ変換やノイズ低減処理などを行って生成した画像データを被写体検出部２０４に供給する。そして、システム制御部２０１は、処理をＳ１００４に進める。 In steps S1002 and S1003, the system control unit 201 supplies image data generated by performing A / D conversion, noise reduction processing, and the like on the exposure control signal read in step S302 to the subject detection unit 204. Then, the system control unit 201 advances the process to S1004.

Ｓ１００４で被写体検出部２０４は、露出制御用信号に基づく画像データに対して、システム制御部２０１からＳ１００２またはＳ１００３で設定された検出学習モデルを用いて被写体検出処理を適用する。被写体検出部２０４は、検出結果を表す情報をシステム制御部２０１に供給する。検出結果を表す情報や、ＣＮＮを用いた被写体検出の具体的な処理は第１実施形態と同様であってよい。 In step S1004, the subject detection unit 204 applies subject detection processing to the image data based on the exposure control signal using the detection learning model set in the step S1002 or S1003 from the system control unit 201. The subject detection unit 204 supplies information representing the detection result to the system control unit 201. Information representing the detection result and specific processing of subject detection using CNN may be the same as in the first embodiment.

本実施形態では、撮像装置の位置姿勢変化の大きさに応じて望ましい被写体検出の特性が異なることに着目し、被写体検出に用いるパラメータ（機械学習によって生成した学習モデル）を、位置姿勢変化の大きさに応じて切り替える構成とした。そのため、１つの学習モデルを用いる構成と比較して、より適切な被写体検出結果を得ることができる。具体的には、検出対象の被写体（例えば人物の顔）について、手ブレ量が小さい場合の画像データを重点的に用いたテストパターンによる機械学習により、位置姿勢変化小用の検出学習モデル７０５を用意する。また、検出対象の被写体について、手ブレ量が大きい場合の画像データを重点的に用いたテストパターンによる機械学習により、位置姿勢変化大用の検出学習モデル７０６を用意する。 In the present embodiment, focusing on the fact that desirable object detection characteristics differ according to the magnitude of the position and orientation change of the imaging apparatus, the parameters (learning model generated by machine learning) used for subject detection are set to the magnitude of the position and orientation change. It was set as the structure switched according to it. Therefore, a more appropriate subject detection result can be obtained as compared with a configuration using one learning model. Specifically, for a subject to be detected (for example, a human face), a detection learning model 705 for small change in position and orientation is obtained by machine learning using a test pattern that preferentially uses image data when the amount of camera shake is small. prepare. Also, a detection learning model 706 for large change in position and orientation is prepared by machine learning based on a test pattern that preferentially uses image data when the amount of camera shake is large for a subject to be detected.

そして、位置姿勢変化が大きいと判定される場合には検出学習モデル７０６を、大きいと判定されない場合には検出学習モデル７０５を用いて被写体検出処理を実行するよう、検出学習モデルを切り替える。これにより、位置姿勢変化の大きさ（すなわち、被写体像のブレ量の大きさ）に適した被写体検出を実現できる。そのため、検出された被写体のうち、特定の被写体の領域を用いる処理を適切に実行することができる。 Then, the detection learning model 706 is switched using the detection learning model 706 when it is determined that the position and orientation change is large, and the detection learning model 705 is switched when it is not determined that the position and orientation change is large. Thereby, subject detection suitable for the magnitude of the position and orientation change (that is, the magnitude of the blur amount of the subject image) can be realized. Therefore, it is possible to appropriately execute processing using a specific subject area among the detected subjects.

《第４実施形態》
次に、本発明の第４実施形態について説明する。本実施形態は、被写体検出に用いる学習モデルの切り替え方法以外は第２実施形態と同様に実施することができる。そのため、以下では学習モデルの切り替え方法に関して重点的に説明する。 << 4th Embodiment >>
Next, a fourth embodiment of the present invention will be described. This embodiment can be carried out in the same manner as the second embodiment except for a learning model switching method used for subject detection. For this reason, the learning model switching method will be mainly described below.

（被写体検出における辞書切り替え）
本実施形態では、被写体検出処理に用いる学習モデルを、被写体検出に用いる画像データのノイズ量の大きさに応じて切り替える。より具体的には、被写体検出処理に用いる学習モデルを、被写体検出部２０４に供給される画像データ（測光センサー１０８が生成する像信号）に適用されるゲイン（信号増幅率）の大きさに応じて切り替える。 (Dictionary switching for subject detection)
In this embodiment, the learning model used for the subject detection process is switched according to the amount of noise in the image data used for subject detection. More specifically, the learning model used for the subject detection process is set according to the magnitude of the gain (signal amplification factor) applied to the image data (image signal generated by the photometric sensor 108) supplied to the subject detection unit 204. To switch.

大きなゲインが適用された画像データに対して被写体検出処理を適用した場合、被写体領域の誤検出が起こりやすくなる。これは、画像データにゲインを適用することで、ノイズも増幅され、画像のＳ／Ｎ比が低下し、被写体に類似した画像のバリエーションが増加するとともに、被写体の特徴が失われるからである。 When subject detection processing is applied to image data to which a large gain is applied, erroneous detection of the subject region is likely to occur. This is because by applying gain to the image data, noise is also amplified, the S / N ratio of the image is lowered, the variation of the image similar to the subject is increased, and the characteristics of the subject are lost.

本実施形態の被写体検出部２０４は、予め機械学習を通じて生成された学習モデルに基づく処理パラメータを用いた被写体検出を画像に適用する。本実施形態では学習モデル記憶部２０５から、ゲイン小用の検出学習モデル７０７と、ゲイン大用の検出学習モデル７０８を参照する。ゲイン小用の検出学習モデル７０７は、ノイズが小さい場合の被写体画像を重視した機械学習によって生成されている。また、ゲイン大用の検出学習モデル７０８は、ノイズが大きい場合の被写体画像を重視した機械学習によって生成されている。 The subject detection unit 204 of the present embodiment applies subject detection using processing parameters based on a learning model generated in advance through machine learning to an image. In the present embodiment, the detection learning model 707 for small gain and the detection learning model 708 for large gain are referred to from the learning model storage unit 205. The small gain detection learning model 707 is generated by machine learning that places importance on subject images when noise is small. Further, the detection learning model 708 for increasing gain is generated by machine learning that places importance on a subject image when noise is large.

そして、システム制御部２０１は、被写体検出部２０４に供給される画像データに適用されたゲインの大きさに応じて、被写体検出部２０４が用いる学習モデルを切り替えることにより、主被写体検出部２０４における被写体検出の特性を切り替える。より具体的には、画像データに施されたゲインが大きいと判定される場合には、ゲイン大用の検出学習モデル７０８を、画像に施されたゲインが大きいと判定されない場合には、ゲイン小用の検出学習モデル７０７を用いるように切り替える。ゲインの大きさを判定するための閾値は、例えば予め実験的に定めることができる。なお、被写体検出を撮像素子１１１で得られた画像データに対して実施する場合、システム制御部２０１は例えば撮影感度に対応するゲインの大きさを判定することができる。 The system control unit 201 switches the learning model used by the subject detection unit 204 according to the magnitude of the gain applied to the image data supplied to the subject detection unit 204, so that the subject in the main subject detection unit 204 Switch detection characteristics. More specifically, when it is determined that the gain applied to the image data is large, the detection learning model 708 for increasing the gain is set to a low gain when the gain applied to the image is not determined to be large. The detection learning model 707 is switched to use. The threshold for determining the magnitude of the gain can be experimentally determined in advance, for example. Note that when subject detection is performed on image data obtained by the image sensor 111, the system control unit 201 can determine the magnitude of the gain corresponding to the imaging sensitivity, for example.

（被写体検出の処理の流れ）
次に、図１１に示すフローチャートを用いて、本実施形態のＤＳＬＲ１００が撮影動作で行う被写体検出処理の詳細について説明する。本実施形態のＤＳＬＲ１００の撮影動作は第１実施形態と同様であるため、以下で説明する被写体検出処理は、図３のＳ３０３で実施される。 (Subject detection process flow)
Next, details of subject detection processing performed by the DSLR 100 according to the present embodiment in a shooting operation will be described with reference to the flowchart shown in FIG. Since the shooting operation of the DSLR 100 of this embodiment is the same as that of the first embodiment, the subject detection process described below is performed in S303 of FIG.

Ｓ１１０１でシステム制御部２０１は、測光センサー１０８が像信号に適用したゲインの大きさを予め定めた閾値と比較し、閾値より大きければゲインが大きいと判定する。システム制御部２０１は、ゲインが大きいと判定されればＳ１１０３へ、判定されなければＳ１１０２へ、それぞれ処理を進める。 In step S1101, the system control unit 201 compares the magnitude of the gain applied to the image signal by the photometric sensor 108 with a predetermined threshold, and determines that the gain is large if it is greater than the threshold. If it is determined that the gain is large, the system control unit 201 proceeds to S1103. If not determined, the system control unit 201 proceeds to S1102.

Ｓ１１０２でシステム制御部２０１は、学習モデル記憶部２０５に複数記憶されている学習モデルのうち、ゲイン小用の検出学習モデル７０７を選択し、被写体検出処理用のパラメータとして被写体検出部２０４に設定する。検出学習モデル７０７を用いることにより、ノイズが小さい場合に検出対象の被写体を精度よく検出可能な被写体検出が実現できる。 In step S1102, the system control unit 201 selects a low-gain detection learning model 707 from among a plurality of learning models stored in the learning model storage unit 205, and sets the selected learning model 707 as a parameter for subject detection processing in the subject detection unit 204. . By using the detection learning model 707, it is possible to realize subject detection that can accurately detect a subject to be detected when noise is small.

Ｓ１１０３でシステム制御部２０１は、学習モデル記憶部２０５に複数記憶されている学習モデルのうち、ゲイン大用の検出学習モデル７０８を選択し、被写体検出処理用パラメータとして被写体検出部２０４に設定する。検出学習モデル７０８を用いることにより、ノイズが大きい場合でも検出対象の被写体を検出可能な被写体検出が実現できる。 In step S 1103, the system control unit 201 selects a large-gain detection learning model 708 from among a plurality of learning models stored in the learning model storage unit 205, and sets the selected learning model 708 as a subject detection processing parameter in the subject detection unit 204. By using the detection learning model 708, it is possible to realize subject detection that can detect the subject to be detected even when the noise is large.

また、Ｓ１１０２およびＳ１１０３でシステム制御部２０１は、Ｓ３０２で読みだした露出制御用信号に対してＡ／Ｄ変換やノイズ低減処理などを行って生成した画像データを被写体検出部２０４に供給する。そして、システム制御部２０１は、処理をＳ１１０４に進める。 In steps S1102 and S1103, the system control unit 201 supplies image data generated by performing A / D conversion, noise reduction processing, and the like on the exposure control signal read in step S302 to the subject detection unit 204. Then, the system control unit 201 advances the process to S1104.

Ｓ１１０４で被写体検出部２０４は、露出制御用信号に基づく画像データに対して、システム制御部２０１からＳ１１０２またはＳ１１０３で設定された検出学習モデルを用いて被写体検出処理を適用する。被写体検出部２０４は、検出結果を表す情報をシステム制御部２０１に供給する。検出結果を表す情報や、ＣＮＮを用いた被写体検出の具体的な処理は第１実施形態と同様であってよい。 In step S1104, the subject detection unit 204 applies subject detection processing to the image data based on the exposure control signal using the detection learning model set in step S1102 or S1103 from the system control unit 201. The subject detection unit 204 supplies information representing the detection result to the system control unit 201. Information representing the detection result and specific processing of subject detection using CNN may be the same as in the first embodiment.

本実施形態では、画像データのノイズの量に応じて望ましい被写体検出の特性が異なることに着目し、被写体検出に用いるパラメータ（機械学習によって生成した学習モデル）を、画像データに適用されたゲインの大きさに応じて切り替える構成とした。そのため、１つの学習モデルを用いる構成と比較して、より適切な被写体検出結果を得ることができる。具体的には、検出対象の被写体（例えば人物の顔）について、ノイズが少ない画像データを重点的に用いたテストパターンによる機械学習により、ゲイン小用の検出学習モデル７０７を用意する。また、検出対象の被写体について、ノイズが多い画像データを重点的に用いたテストパターンによる機械学習により、ゲイン大用の検出学習モデル７０８を用意する。 In this embodiment, focusing on the fact that desirable object detection characteristics differ depending on the amount of noise in the image data, the parameters used for object detection (the learning model generated by machine learning) are set to the gain applied to the image data. It was set as the structure switched according to a magnitude | size. Therefore, a more appropriate subject detection result can be obtained as compared with a configuration using one learning model. Specifically, for a subject to be detected (for example, a human face), a small gain detection learning model 707 is prepared by machine learning using a test pattern that preferentially uses image data with little noise. Also, a detection learning model 708 for increasing gain is prepared for a subject to be detected by machine learning based on a test pattern using image data with much noise.

そして、被写体検出を行う画像データに適用されたゲインが大きいと判定される場合には検出学習モデル７０８を、大きいと判定されない場合には検出学習モデル７０７を用いて被写体検出処理を実行するよう、検出学習モデルを切り替える。これにより、ノイズが少ない場合に適した被写体検出と、ノイズが多い場合に適した被写体検出とを実現できる。そのため、検出された被写体のうち、特定の被写体の領域を用いる処理を適切に実行することができる。 Then, when it is determined that the gain applied to the image data for subject detection is large, the detection learning model 708 is used, and when it is not determined that the gain is large, the subject detection processing is executed using the detection learning model 707. Switch detection learning model. Thereby, it is possible to realize subject detection suitable for a case where noise is low and subject detection suitable for a case where noise is high. Therefore, it is possible to appropriately execute processing using a specific subject area among the detected subjects.

（その他の実施形態）
上述の実施形態では、主被写体の領域に露出や焦点調節を最適化する撮像装置に本発明を適用した例について説明した。しかし、主被写体領域の利用目的は撮影に限定されない。例えば、主被写体領域（もしくは主被写体領域以外の領域）に画像処理を適用する場合など、他の任意の目的に用いることができる。したがって、本発明において撮影に関する構成は必須ではない。また、上述の実施形態では、２種類の学習モデルを切り替えて用いる構成について説明したが、３種類以上の学習モデルを切り替えて用いてもよい。 (Other embodiments)
In the above-described embodiment, an example in which the present invention is applied to an imaging apparatus that optimizes exposure and focus adjustment in the region of the main subject has been described. However, the purpose of use of the main subject area is not limited to shooting. For example, it can be used for any other purpose such as when image processing is applied to the main subject area (or an area other than the main subject area). Therefore, in the present invention, a configuration relating to photographing is not essential. Moreover, although the above-mentioned embodiment demonstrated the structure which switches and uses two types of learning models, you may switch and use three or more types of learning models.

以上、被写体検出処理の対象となる被写体のボケ量や鮮明度が異なり得る状況の例として、合焦度合い、被写体の動きの大きさ、撮像装置の位置姿勢変化の大きさ、画像データのノイズの多さに着目した実施形態を説明した。しかし、他の条件に応じて検出学習モデルを切り替えるように構成してもよい。 As described above, examples of situations in which the amount of blur and sharpness of the subject to be subject to the subject detection process may vary include the degree of focus, the magnitude of the subject's movement, the magnitude of the position / orientation change of the imaging device, The embodiment focusing on the number has been described. However, the detection learning model may be switched according to other conditions.

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又はコンピュータ読み取り可能な記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータの１以上のプロセッサがプログラムを実行することでも実現できる。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a computer-readable storage medium, and one or more processors of the computer of the system or apparatus execute the program. It can also be realized by executing. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

上述の実施形態は本発明の理解を助けることを目的とした具体例に過ぎず、いかなる意味においても本発明を上述の実施形態に限定する意図はない。特許請求の範囲に規定される範囲に含まれる全ての実施形態は本発明に包含される。 The above-described embodiments are merely specific examples for the purpose of assisting understanding of the present invention, and are not intended to limit the present invention to the above-described embodiments in any way. All embodiments that fall within the scope of the claims are encompassed by the present invention.

１００…デジタル一眼レフカメラ、１０１…本体、１０２…レンズ、１０８…測光センサー、１１１…撮像素子、２０４…被写体検出部、２０６…主被写体決定前の被写体検出処理用の検出学習モデル、２０７…主被写体決定後の被写体検出処理用の検出学習モデル DESCRIPTION OF SYMBOLS 100 ... Digital single-lens reflex camera, 101 ... Main body, 102 ... Lens, 108 ... Photometry sensor, 111 ... Image sensor, 204 ... Subject detection part, 206 ... Detection learning model for subject detection processing before main subject determination, 207 ... Main Detection learning model for subject detection processing after subject determination

Claims

Subject detection means for applying subject detection processing to image data using parameters generated based on machine learning;
Storage means for storing a plurality of parameters used for the subject detection processing;
A selection means for selecting a parameter to be used in the subject detection means from the parameters stored in the storage means;
Have
The image processing apparatus according to claim 1, wherein the selection unit selects different parameters according to a blur amount or a sharpness of a subject.

The image processing apparatus according to claim 1, wherein the selection unit selects different parameters depending on whether the main subject is determined or not.

The selection means selects a first parameter when the main subject is determined, selects a second parameter when the main subject is not determined, and the second parameter is: The image processing apparatus according to claim 2, wherein the image processing apparatus is a parameter suitable for detecting a subject having a larger amount of blur than the first parameter.

The selection means selects a first parameter when the main subject is determined, and selects a second parameter when the main subject is not determined,
The first parameter is a parameter for performing subject detection processing that prioritizes suppression of subject false detection over suppression of subject detection omission,
The image processing apparatus according to claim 2, wherein the second parameter is a parameter for performing subject detection processing that prioritizes suppression of subject detection omission over suppression of subject erroneous detection.

The selection unit is configured to select the second parameter when the main subject is not detected continuously for a predetermined period of time even when the main subject is determined. 5. The image processing apparatus according to 4.

The selection means selects the second parameter when the blur amount of the main subject is equal to or greater than a threshold value even when the main subject is determined. The image processing apparatus according to claim 5.

The accuracy of subject detection required for the main subject region is set higher for the subject region detected using the second parameter than the subject region detected using the first parameter. The image processing apparatus according to claim 3, wherein the image processing apparatus is characterized.

The image processing apparatus according to claim 1, wherein the selection unit selects different parameters depending on whether the movement of the subject is determined to be large or not.

The selection means selects the first parameter when it is determined that the movement of the subject is large, selects the second parameter when it is not determined that the movement of the subject is large, and the first parameter The image processing apparatus according to claim 1, wherein the image processing apparatus is a parameter suitable for detecting a subject having a larger movement than the second parameter.

The image processing apparatus according to claim 1, wherein the selection unit selects different parameters depending on whether the position and orientation change of the image processing apparatus is determined to be large or not.

The selection unit selects the first parameter when it is determined that the position and orientation change of the image processing apparatus is large, and the second parameter when it is not determined that the position and orientation change of the image processing apparatus is large. The first parameter is a parameter suitable for detection of a subject when a change in position and orientation of the image processing apparatus is larger than the second parameter. An image processing apparatus according to 1.

The image processing apparatus according to claim 1, wherein the selection unit selects different parameters depending on whether the noise amount of the image data is determined to be large or not.

The selection means selects the first parameter when it is determined that the amount of noise of the image data is large, and selects the second parameter when it is not determined that the amount of noise of the image data is large. The image processing apparatus according to claim 1, wherein the first parameter is a parameter suitable for detecting a subject with respect to image data having a larger amount of noise than the second parameter.

The image processing apparatus according to claim 1, wherein the parameter is a coupling coefficient used in a convolutional neural network (CNN).

The image processing apparatus according to claim 1, further comprising an acquisition unit configured to acquire the parameter from outside.

The acquisition unit acquires the parameter from the outside through wireless communication, wired communication, or a removable storage medium, and the storage unit stores the parameter acquired by the acquisition unit. Item 15. The image processing apparatus according to Item 15.

An image sensor;
An image processing apparatus comprising: the image processing apparatus according to claim 1,
Subject detection means of the image processing device applies subject detection processing to an image obtained by the image sensor;
An imaging apparatus characterized by that.

Focus adjusting means for adjusting the focus so that the determined area of the main subject is in focus;
Determining means for determining an exposure condition such that the area of the main subject has an appropriate exposure;
The image pickup apparatus according to claim 17, further comprising at least one of the following.

The imaging apparatus according to claim 17 or 18, wherein the imaging element is an imaging element for acquiring an image for exposure control.

An image processing method executed by an image processing apparatus,
A subject detection step of applying subject detection processing to an image using parameters generated based on machine learning;
A selection step of selecting parameters used in the subject detection step from a storage unit that stores a plurality of parameters used in the subject detection processing;
Have
In the selection step, an image processing method is characterized in that different parameters are selected according to the amount of blur or definition of a subject.

The program for functioning a computer as each means of the image processing apparatus of any one of Claim 1 to 16.

The program for functioning the computer which an imaging device has as an image processing apparatus which the imaging device of any one of Claims 17-19 has.