JP7286330B2

JP7286330B2 - Image processing device and its control method, program, storage medium

Info

Publication number: JP7286330B2
Application number: JP2019018256A
Authority: JP
Inventors: 保彦岩本
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-02-04
Filing date: 2019-02-04
Publication date: 2023-06-05
Anticipated expiration: 2039-02-04
Also published as: JP2020126434A

Description

本発明は、画像処理装置における主被写体領域を決定する技術に関するものである。 The present invention relates to a technique for determining a main subject area in an image processing device.

従来より、撮像装置や画像処理装置において、ユーザが何らかの操作をすることなく、装置が画像を識別し、自動で主被写体を選定する機能が知られている。この機能では、開発者が予め想定する画像群を用意し、各々の画像において主被写体とすべき正解を設定しており、それらに基づいて決定器のパラメータが調整されている。例えば、特許文献１には、予め用意された学習データ群を用いてニューラルネットワークを用いた物体の決定を行う方法が開示されている。 2. Description of the Related Art Conventionally, in an imaging device or an image processing device, a function is known in which the device identifies an image and automatically selects a main subject without any operation by the user. In this function, the developer prepares a group of assumed images in advance, sets the correct answer to be the main subject in each image, and adjusts the parameters of the determiner based on them. For example, Patent Literature 1 discloses a method of determining an object using a neural network using a training data group prepared in advance.

また、予め学習した決定器に加え、決定時に決定領域から得られた情報を用いて、追加の学習を行う方法も提案されている。例えば、特許文献２には、予め学習した固定識別機と固定識別機の決定領域から得られた情報とを辞書データに追加した学習識別機を用いて物体を決定する方法が開示されている。 In addition to the pre-learned decider, a method of performing additional learning using information obtained from the decision area at the time of decision has also been proposed. For example, Patent Literature 2 discloses a method of determining an object using a learned classifier obtained by adding a pre-learned fixed classifier and information obtained from the decision area of the fixed classifier to dictionary data.

特開２０１６－１５７２１９号公報JP 2016-157219 A 特開２０１０－１７０２０１号公報Japanese Unexamined Patent Application Publication No. 2010-170201

Ｓ．Ｈａｙｋｉｎ，“ＮｅｕｒａｌＮｅｔｗｏｒｋｓＡＣｏｍｐｒｅｈｅｎｓｉｖｅＦｏｕｎｄａｔｉｏｎ２ｎｄＥｄｉｔｉｏｎ”，ＰｒｅｎｔｉｃｅＨａｌｌ，ｐｐ．１５６－２５５，Ｊｕｌｙ１９９８S. Haykin, "Neural Networks A Comprehensive Foundation 2nd Edition", Prentice Hall, pp. 156-255, July 1998

ここで、ユーザが頻繁に撮影する画像は、それぞれのユーザによって異なるのが一般的である。また同じ画像であっても主被写体とすべき正解はユーザによって異なる。従って、主被写体の自動選択を行う機能においても、各ユーザの好みに応じた調整がなされるべきである。 Here, images that are frequently shot by users are generally different for each user. Further, even for the same image, the correct answer to be the main subject differs depending on the user. Therefore, the function of automatically selecting the main subject should be adjusted according to each user's preference.

しかしながら、上述の特許文献１では予め用意された学習データ群から学習しているだけであり、上述の特許文献２では、固定識別機の決定領域から得られた情報を辞書データに追加しているだけである。そのため、いずれの技術においても、ユーザの好みに応じた調整がなされているとは言えない。 However, in the above-mentioned Patent Document 1, only learning is performed from a group of learning data prepared in advance, and in the above-mentioned Patent Document 2, information obtained from the decision region of the fixed classifier is added to the dictionary data. Only. Therefore, it cannot be said that any technique is adjusted according to the user's preference.

本発明は上述した課題に鑑みてなされたものであり、画像から主被写体を選択する場合に、ユーザの所望する被写体を適切に選択することができる画像処理装置を提供することである。 SUMMARY OF THE INVENTION It is an object of the present invention to provide an image processing apparatus capable of appropriately selecting a subject desired by a user when selecting a main subject from an image.

本発明に係わる画像処理装置は、取得した画像から主被写体を決定する決定手段を用いて主被写体が決定された画像の中から、前記決定手段のための学習において負の報酬を与えるデータとして用いる画像を、ユーザの入力に基づいて選別する選別手段を有し、前記選別手段は、撮影指示を示すユーザの入力があった際に前記決定手段を用いて決定されていた第１の主被写体と、前記撮影指示を示すユーザの入力の前に前記決定手段を用いて決定されていた第２の主被写体とが異なる被写体である場合に、前記第２の主被写体が決定された画像を、前記負の報酬を与えるデータとして選別することを特徴とする。 The image processing apparatus according to the present invention uses, as data to give a negative reward in learning for the determining means, an image in which the main subject has been determined using the determining means for determining the main subject from the acquired image. A selection means is provided for selecting an image based on user input, and the selection means selects the first main subject determined by the determination means when there is a user input indicating a photographing instruction. and, if the subject is different from the second main subject determined using the determining means before the user's input indicating the photographing instruction, the image in which the second main subject is determined is It is characterized in that it is sorted out as data that gives a negative reward .

本発明によれば、画像から主被写体を選択する場合に、ユーザの所望する被写体を適切に選択することができる画像処理装置を提供することが可能となる。 According to the present invention, it is possible to provide an image processing apparatus capable of appropriately selecting a subject desired by a user when selecting a main subject from an image.

本発明の一実施形態に係わる撮像装置の構成を示すブロック図。1 is a block diagram showing the configuration of an imaging device according to one embodiment of the present invention; FIG. 一実施形態の撮像装置の全体動作の流れを示すフローチャート。4 is a flow chart showing the flow of the overall operation of the imaging device of one embodiment. 主被写体決定結果の例を示す模式図。4A and 4B are schematic diagrams showing examples of main subject determination results; FIG. ＣＮＮの全体構成の例を示す模式図。The schematic diagram which shows the example of the whole structure of CNN. ＣＮＮの部分構成の例を示す模式図。Schematic diagram showing an example of a partial configuration of a CNN. 一実施形態におけるデータセットの選別手順を示すフローチャート。4 is a flow chart showing a data set sorting procedure in one embodiment. 一実施形態における別のデータセットの選別手順を示すフローチャート。A flowchart illustrating another data set filtering procedure in one embodiment. スマートフォンとサーバからなるシステムを示す図。The figure which shows the system which consists of a smart phone and a server.

以下、添付図面を参照して実施形態を詳しく説明する。尚、以下の実施形態は特許請求の範囲に係る発明を限定するものではない。実施形態には複数の特徴が記載されているが、これらの複数の特徴の全てが発明に必須のものとは限らず、また、複数の特徴は任意に組み合わせられてもよい。さらに、添付図面においては、同一若しくは同様の構成に同一の参照番号を付し、重複した説明は省略する。 Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. In addition, the following embodiments do not limit the invention according to the scope of claims. Although multiple features are described in the embodiments, not all of these multiple features are essential to the invention, and multiple features may be combined arbitrarily. Furthermore, in the accompanying drawings, the same or similar configurations are denoted by the same reference numerals, and redundant description is omitted.

（撮像装置の構成）
図１は、本発明の一実施形態に係わる画像処理装置としての撮像装置１００の構成を示すブロック図である。 (Configuration of imaging device)
FIG. 1 is a block diagram showing the configuration of an imaging device 100 as an image processing device according to one embodiment of the present invention.

図１において、撮像装置１００は、被写体を撮影して、動画や静止画のデータを、テープ、固体メモリ、光ディスク、磁気ディスクなどの各種メディアに記録可能なデジタルスチルカメラやビデオカメラなどである。しかし、本発明はこれらに限定されるものではなく、カメラ付き携帯電話やタブレット端末等の撮影機能を有する他の装置にも適用可能である。撮像装置１００内の各ユニットは、バス１６０を介して接続されている。また各ユニットは、ＣＰＵ１５１（中央演算処理装置）により制御される。 In FIG. 1, an imaging device 100 is a digital still camera, a video camera, or the like that can photograph a subject and record moving and still image data on various media such as tapes, solid-state memories, optical disks, and magnetic disks. However, the present invention is not limited to these, and can be applied to other devices having a photographing function, such as camera-equipped mobile phones and tablet terminals. Each unit in the imaging device 100 is connected via a bus 160 . Each unit is controlled by a CPU 151 (central processing unit).

レンズユニット１０１は、固定１群レンズ１０２、ズームレンズ１１１、絞り１０３、固定３群レンズ１２１、および、フォーカスレンズ１３１を備えて構成される。絞り制御回路１０５は、ＣＰＵ１５１の指令に従い、絞りモータ１０４（ＡＭ）を介して絞り１０３を駆動することにより、絞り１０３の開口径を調整して撮影時の光量調節を行う。ズーム制御回路１１３は、ズームモータ１１２（ＺＭ）を介してズームレンズ１１１を駆動することにより、焦点距離を変更する。フォーカス制御回路１３３は、レンズユニット１０１のピントのずれ量に基づいてフォーカスモータ１３２（ＦＭ）を駆動する駆動量を決定する。加えてフォーカス制御回路１３３は、フォーカスモータ１３２（ＦＭ）を介してフォーカスレンズ１３１を駆動することにより、焦点調節状態を制御する。フォーカス制御回路１３３およびフォーカスモータ１３２によるフォーカスレンズ１３１の移動制御により、ＡＦ制御が実現される。フォーカスレンズ１３１は、焦点調節用レンズであり、図１には単レンズで簡略的に示されているが、通常複数枚のレンズで構成される。 The lens unit 101 comprises a fixed 1st group lens 102 , a zoom lens 111 , an aperture 103 , a fixed 3rd group lens 121 and a focus lens 131 . The diaphragm control circuit 105 drives the diaphragm 103 through the diaphragm motor 104 (AM) in accordance with a command from the CPU 151, thereby adjusting the aperture diameter of the diaphragm 103 and adjusting the amount of light during photographing. A zoom control circuit 113 changes the focal length by driving the zoom lens 111 via a zoom motor 112 (ZM). The focus control circuit 133 determines the drive amount for driving the focus motor 132 (FM) based on the amount of focus shift of the lens unit 101 . In addition, the focus control circuit 133 controls the focus adjustment state by driving the focus lens 131 via the focus motor 132 (FM). AF control is realized by movement control of the focus lens 131 by the focus control circuit 133 and the focus motor 132 . The focus lens 131 is a lens for focus adjustment, and although it is simply shown as a single lens in FIG. 1, it is usually composed of a plurality of lenses.

レンズユニット１０１を介して撮像素子１４１上に結像された被写体像は、撮像素子１４１により電気信号に変換される。撮像素子１４１は、被写体像（光学像）を電気信号に変換する光電変換素子である。撮像素子１４１は、横方向にｍ画素、縦方向にｎ画素の受光素子が配置されている。撮像素子１４１上に結像されて光電変換された画像は、撮像信号処理回路１４２により画像信号（画像データ）として整えられる。 A subject image formed on the image sensor 141 through the lens unit 101 is converted into an electric signal by the image sensor 141 . The imaging element 141 is a photoelectric conversion element that converts a subject image (optical image) into an electrical signal. The image sensor 141 has m pixels in the horizontal direction and n pixels in the vertical direction. An image formed on the imaging element 141 and photoelectrically converted is arranged as an image signal (image data) by an imaging signal processing circuit 142 .

撮像信号処理回路１４２から出力される画像データは、撮像制御回路１４３に送られ、一時的にＲＡＭ（ランダム・アクセス・メモリ）１５４に蓄積される。ＲＡＭ１５４に蓄積された画像データは、画像圧縮解凍回路１５３において圧縮された後、画像記録媒体１５７に記録される。これと並行して、ＲＡＭ１５４に蓄積された画像データは、画像処理回路１５２に送られる。画像処理回路１５２は、画像信号を処理し、画像データに対して最適なサイズへの縮小・拡大処理や画像データ同士の類似度算出処理等を行う。最適なサイズに処理された画像データを、適宜モニタディスプレイ１５０に送って表示することによりプレビュー画像表示やスルー画像表示を行うことができる。また、主被写体決定回路１６２の主被写体決定結果を画像データに重畳表示することもできる。また、ＲＡＭ１５４をリングバッファとして用いることにより、所定期間内に撮像された複数の画像データと、画像データ毎に対応した主被写体決定回路１６２の決定結果をバッファリングすることができる。また同様に、主被写体決定回路１６２の学習に用いた画像データと、画像データに対応した主被写体決定結果とをバッファリングすることができる。 Image data output from the imaging signal processing circuit 142 is sent to an imaging control circuit 143 and temporarily stored in a RAM (random access memory) 154 . The image data accumulated in the RAM 154 is recorded in the image recording medium 157 after being compressed in the image compression/decompression circuit 153 . In parallel with this, the image data accumulated in the RAM 154 is sent to the image processing circuit 152 . The image processing circuit 152 processes an image signal, performs reduction/enlargement processing to an optimum size for image data, similarity calculation processing between image data, and the like. By appropriately sending the image data processed to the optimum size to the monitor display 150 and displaying it, preview image display and through image display can be performed. Also, the main subject determination result of the main subject determination circuit 162 can be superimposed on the image data. By using the RAM 154 as a ring buffer, it is possible to buffer a plurality of image data captured within a predetermined period and the determination result of the main subject determination circuit 162 corresponding to each image data. Similarly, image data used for learning of the main subject determination circuit 162 and main subject determination results corresponding to the image data can be buffered.

操作スイッチ１５６は、タッチパネルやボタンなどを含む入力インターフェイスであり、モニタディスプレイ１５０に表示される種々の機能アイコンを選択操作することなどにより、様々な操作を行うことができる。例えば、ユーザは、モニタディスプレイ１５０に表示されたスルー画像を見ながら、主被写体位置をマニュアル指定したり、既に指定されている主被写体をキャンセルしたりすることができる。 The operation switch 156 is an input interface including a touch panel, buttons, and the like, and various operations can be performed by selecting and operating various function icons displayed on the monitor display 150 . For example, while viewing the through image displayed on the monitor display 150, the user can manually specify the position of the main subject or cancel the already specified main subject.

ＣＰＵ１５１は、操作スイッチ１５６から入力されたユーザからの指示、あるいは、一時的にＲＡＭ１５４に蓄積された画像データの画素信号の大きさに基づき、撮像素子１４１の蓄積時間、撮像素子１４１から撮像信号処理回路１４２へ信号を出力する際のゲインの設定値等を決定する。撮像制御回路１４３は、ＣＰＵ１５１から蓄積時間、ゲインの設定値の指示を受け取り、撮像素子１４１を制御する。 The CPU 151 determines the accumulation time of the image sensor 141 and image signal processing from the image sensor 141 based on the user's instruction input from the operation switch 156 or the magnitude of the pixel signal of the image data temporarily stored in the RAM 154 . A gain setting value and the like for outputting a signal to the circuit 142 are determined. The imaging control circuit 143 receives instructions of the accumulation time and gain setting values from the CPU 151 and controls the imaging device 141 .

主被写体決定回路１６２は、画像信号を用いて主被写体が存在する領域を決定する。主被写体決定回路１６２における主被写体決定処理は、ＣＮＮ（Convolutinal Neural Networks）による特徴抽出処理により実現される。主被写体決定回路１６２は、ＧＰＵ（ＧｒａｐｈｉｃＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）で構成される。ＧＰＵは、元々は画像処理用のプロセッサであるが、複数の積和演算器を有し、行列計算を得意としているため、学習用の処理を行うプロセッサとしても用いられることが多い。そして、深層学習を行う処理においても、ＧＰＵが用いられることが一般的であるが、ＦＰＧＡ（ｆｉｅｌｄ－ｐｒｏｇｒａｍｍａｂｌｅｇａｔｅａｒｒａｙ）やＡＳＩＣ（ａｐｐｌｉｃａｔｉｏｎｓｐｅｃｉｆｉｃｉｎｔｅｇｒａｔｅｄｃｉｒｃｕｉｔ）などを用いてもよい。 A main subject determination circuit 162 uses the image signal to determine the area where the main subject exists. Main subject determination processing in the main subject determination circuit 162 is realized by feature extraction processing by CNN (Convolutinal Neural Networks). The main subject determination circuit 162 is composed of a GPU (Graphic Processing Unit). A GPU is originally a processor for image processing, but since it has a plurality of sum-of-products arithmetic units and excels at matrix calculation, it is often used as a processor for performing processing for learning. Also in the deep learning process, a GPU is generally used, but a field-programmable gate array (FPGA), an application specific integrated circuit (ASIC), or the like may also be used.

フォーカス制御回路１３３は、特定の被写体領域に対するＡＦ制御を行う。絞り制御回路１０５は、特定の被写体領域の輝度値を用いた露出制御を行う。画像処理回路１５２は、被写体領域に基づいたガンマ補正、ホワイトバランス処理などを行う。モニタディスプレイ１５０は、画像や主被写体決定結果を矩形などで表示する。バッテリ１５９は、電源管理回路１５８により適切に管理され、撮像装置１００の全体に安定した電源供給を行う。 A focus control circuit 133 performs AF control for a specific subject area. A diaphragm control circuit 105 performs exposure control using the luminance value of a specific subject area. The image processing circuit 152 performs gamma correction, white balance processing, etc. based on the subject area. A monitor display 150 displays an image and a main subject determination result in a rectangle or the like. The battery 159 is appropriately managed by the power management circuit 158 to stably supply power to the entire imaging apparatus 100 .

フラッシュメモリ１５５には、撮像装置１００の動作に必要な制御プログラムや、各部の動作に用いるパラメータ等が記録されている。ユーザの操作により撮像装置１００が起動されると（電源ＯＦＦ状態から電源ＯＮ状態へ移行すると）、フラッシュメモリ１５５に格納された制御プログラム及びパラメータがＲＡＭ１５４の一部に読み込まれる。ＣＰＵ１５１は、ＲＡＭ１５４にロードされた制御プログラム及び定数に従って撮像装置１００の動作を制御する。 The flash memory 155 stores control programs necessary for the operation of the imaging apparatus 100, parameters used for the operation of each unit, and the like. When the imaging apparatus 100 is activated by the user's operation (shifted from the power OFF state to the power ON state), the control program and parameters stored in the flash memory 155 are read into part of the RAM 154 . The CPU 151 controls the operation of the imaging device 100 according to the control program and constants loaded in the RAM 154 .

（全体処理フロー）
図２は、本実施形態の撮像装置１００における全体動作の流れを示すフローチャートである。図２に示すフローチャートは、撮像装置がライブビュー表示をしている状態で、例えば、ライブビューの１フレーム期間ごとに繰り返される。 (Overall processing flow)
FIG. 2 is a flow chart showing the overall operation flow of the imaging apparatus 100 of this embodiment. The flowchart shown in FIG. 2 is repeated, for example, every one frame period of the live view while the imaging device is displaying the live view.

撮像装置１００がライブビュー表示を開始すると、まず、Ｓ２０１において、撮像制御回路１４３は、レンズユニット１０１、撮像素子１４１を用いて取得された入力画像を撮像装置１００の各部へ供給する。 When the imaging apparatus 100 starts live view display, first, in S201 , the imaging control circuit 143 supplies an input image obtained using the lens unit 101 and the imaging element 141 to each section of the imaging apparatus 100 .

Ｓ２０２においては、主被写体決定回路１６２は、入力画像に対して主被写体決定を行う。主被写体決定処理の詳細は後述する。また主被写体決定回路１６２には予め必要な学習がなされているものとし、学習処理の詳細は後述する。 In S202, the main subject determination circuit 162 determines the main subject for the input image. Details of the main subject determination process will be described later. Further, it is assumed that the main subject determination circuit 162 has previously undergone necessary learning, and the details of the learning process will be described later.

Ｓ２０３においては、ＣＰＵ１５１は、Ｓ２０２において主被写体決定回路１６２から主被写体決定結果が出力されたか否かを判定する。出力されていればＳ２０４へ進み、出力されていなければＳ２０５へ進む。 In S203, the CPU 151 determines whether or not the main subject determination circuit 162 has output the main subject determination result in S202. If it is output, the process proceeds to S204, and if it is not output, the process proceeds to S205.

Ｓ２０４においては、モニタディスプレイ１５０は、入力画像を表示すると共に主被写体決定結果を重畳表示する。Ｓ２０５においては、モニタディスプレイ１５０は、入力画像のみを表示する。 In S204, the monitor display 150 displays the input image and superimposes the main subject determination result. In S205, monitor display 150 displays only the input image.

Ｓ２０６においては、ＣＰＵ１５１は、Ｓ２０４でライブビュー表示した入力画像、主被写体決定結果および表示時刻を、１組のデータセットとしてＲＡＭ１５４にバッファリングする。Ｓ２０７においては、ＣＰＵ１５１は、ＲＡＭ１５４にバッファリングされたデータセットから、ポジティブ学習、あるいは、ネガティブ学習に用いるための入力画像を選択する。この処理については後述する。 In S206, the CPU 151 buffers the input image live-view-displayed in S204, the main subject determination result, and the display time in the RAM 154 as a set of data sets. In S207 , the CPU 151 selects an input image for use in positive learning or negative learning from the data set buffered in the RAM 154 . This processing will be described later.

Ｓ２０８においては、ＣＰＵ１５１は、ユーザから撮影指示があったか否かを判定し、撮影指示がある場合にはＳ２０９に進み、撮影指示がない場合にはＳ２１１に進む。なお、撮影指示とは、記録のための静止画の撮影を開始するための指示、あるいは、記録のための動画の撮影を開始するための指示である。ユーザがレリーズボタンを全押ししたり、タッチパネルを操作することによって、ユーザは撮影指示を与えることができる。 In S208, the CPU 151 determines whether or not the user has given a shooting instruction. If there is a shooting instruction, the process proceeds to S209, and if there is no shooting instruction, the process proceeds to S211. Note that the shooting instruction is an instruction to start shooting a still image for recording or an instruction to start shooting a moving image for recording. The user can give a photographing instruction by fully pressing the release button or by operating the touch panel.

Ｓ２０９において、ＣＰＵ１５１は静止画あるいは動画の撮影処理を行い、撮影が終了するとＳ２１０に進む。 In S209, the CPU 151 performs still image or moving image shooting processing, and when the shooting ends, the process proceeds to S210.

Ｓ２１０において、ＣＰＵ１５１は、ＲＡＭ１５４にバッファリングされたデータセットから、ポジティブ学習、あるいは、ネガティブ学習に用いるためのデータセットを選択する。この処理については後述する。 In S210 , the CPU 151 selects a data set to be used for positive learning or negative learning from the data sets buffered in the RAM 154 . This processing will be described later.

Ｓ２１１において、ＣＰＵ１５１は、Ｓ２０７およびＳ２１０にて選択された入力画像の数が閾値以上であるか否かを判定し、閾値以上であればＳ２１２に進み、閾値未満であればＳ２１３に進む。 In S211, the CPU 151 determines whether or not the number of input images selected in S207 and S210 is equal to or greater than a threshold.

Ｓ２１２において、主被写体決定回路１６２は、追加学習処理を行う。追加学習処理の詳細は後述する。 In S212, the main subject determination circuit 162 performs additional learning processing. Details of the additional learning process will be described later.

Ｓ２１３においては、ＣＰＵ１５１は、操作スイッチ１５６からの終了指示があるか否かを判定する。終了指示があれば処理を終了し、終了指示がなければＳ２０１に戻り、一連の処理を繰り返す。 In S213 , the CPU 151 determines whether or not there is an end instruction from the operation switch 156 . If there is an end instruction, the process is terminated, and if there is no end instruction, the process returns to S201 and repeats the series of processes.

（主被写体決定結果の例）
図３は、図２のＳ２０４において表示される主被写体決定結果の例を示す図である。図３（ａ）はＳ２０７における追加学習を行っていない場合の主被写体決定結果の例を示している。図３（ｂ）は、花を好み、頻繁に撮影するユーザの操作により、Ｓ２０７において追加学習を行った後の主被写体決定結果の例を示している。図３（ｃ）は、鳥を好み、頻繁に撮影するユーザの操作により、Ｓ２０７における追加学習を行った後の主被写体決定結果の例を示している。 (Example of main subject determination result)
FIG. 3 is a diagram showing an example of the main subject determination result displayed in S204 of FIG. FIG. 3A shows an example of the main subject determination result when additional learning is not performed in S207. FIG. 3B shows an example of the main subject determination result after additional learning is performed in S207 by the operation of a user who likes flowers and frequently shoots them. FIG. 3(c) shows an example of the main subject determination result after additional learning in S207 by the operation of a user who likes birds and frequently shoots them.

（主被写体決定回路の説明）
本実施形態では、主被写体決定回路１６２をＣＮＮ（Convolutinal Neural Networks）で構成する。ＣＮＮの基本的な構成について、図４および図５を用いて説明する。 (Description of main subject determination circuit)
In this embodiment, the main subject determination circuit 162 is configured by CNN (Convolutinal Neural Networks). A basic configuration of the CNN will be described with reference to FIGS. 4 and 5. FIG.

図４は、入力された２次元画像データおよび位置マップから主被写体を決定するＣＮＮの基本的な構成を示す図である。処理の流れは、左端を入力とし、右方向に処理が進んでいく。ＣＮＮは、特徴検出層（Ｓ層）と特徴統合層（Ｃ層）と呼ばれる２つの層をひとつのセットとし、それが階層的に構成されている。 FIG. 4 is a diagram showing the basic configuration of a CNN that determines a main subject from input two-dimensional image data and a position map. In the flow of processing, the left end is the input and the processing proceeds to the right. The CNN has a set of two layers called a feature detection layer (S layer) and a feature integration layer (C layer), which are hierarchically configured.

ＣＮＮでは、まずＳ層において前段階層で検出された特徴に基づいて次の特徴を検出する。またＳ層において検出した特徴をＣ層で統合し、その階層における検出結果として次の階層に送るように構成されている。このＣＮＮに入力される情報としてはさまざまなものが考えられる。例えば、ＲＧＢ画像、あるいは現像処理前の画素単位の撮像画像信号（ＲＡＷ画像）や画像のデプス情報、物体検出器による物体検出スコアのマップ、画像の局所領域における分散値から得られるコントラストマップ、などが挙げられる。 In CNN, first, the following features are detected in the S layer based on the features detected in the previous layer. In addition, the features detected in the S layer are integrated in the C layer, and the results of detection in that layer are sent to the next layer. Various information can be considered as information input to this CNN. For example, an RGB image, a pixel-by-pixel captured image signal (RAW image) before development processing, image depth information, an object detection score map by an object detector, a contrast map obtained from variance values in a local area of an image, etc. is mentioned.

Ｓ層は特徴検出細胞面からなり、特徴検出細胞面ごとに異なる特徴を検出する。また、Ｃ層は、特徴統合細胞面からなり、前段の特徴検出細胞面での検出結果をプーリングする。以下では、特に区別する必要がない場合、特徴検出細胞面および特徴統合細胞面を総称して特徴面と呼ぶ。本実施形態では、最終段階層である出力層は、Ｃ層は用いずＳ層のみで構成されている。 The S layer consists of feature detection cell planes, and detects different features for each feature detection cell plane. Also, the C layer consists of a feature integration cell plane, and pools the detection results of the feature detection cell plane in the preceding stage. Hereinafter, the feature detection cell plane and the feature integration cell plane will be collectively referred to as feature planes when there is no particular need to distinguish them. In this embodiment, the output layer, which is the final stage layer, is composed only of the S layer without using the C layer.

図５は、特徴検出細胞面での特徴検出処理、および特徴統合細胞面での特徴統合処理について説明する図である。 FIG. 5 is a diagram for explaining feature detection processing in the feature detection cell plane and feature integration processing in the feature integration cell plane.

特徴検出細胞面は、複数の特徴検出ニューロンにより構成され、特徴検出ニューロンは前段階層のＣ層に所定の構造で結合されている。また特徴統合細胞面は、複数の特徴統合ニューロンにより構成され、特徴統合ニューロンは同階層のＳ層に所定の構造で結合されている。図５に示した、Ｌ階層目Ｓ層のＭ番目細胞面内において、位置（ξ，ζ）の特徴検出ニューロンの出力値をｙ_M ^LS(ξ,ζ)、Ｌ階層目Ｃ層のＭ番目細胞面内において、位置（ξ，ζ）の特徴統合ニューロンの出力値をｙ_M ^LC(ξ,ζ)と表記する。その場合、それぞれのニューロンの結合係数をｗ_M ^LS(n,u,v)、ｗ_M ^LC(u,v)とすると、各出力値は以下のように表すことができる。 The feature detection cell surface is composed of a plurality of feature detection neurons, and the feature detection neurons are connected to the C layer of the pre-stage layer in a predetermined structure. The feature-integrating cell surface is composed of a plurality of feature-integrating neurons, and the feature-integrating neurons are connected to the S layer of the same hierarchy in a predetermined structure. In the M-th cell plane of the L-th layer S layer shown ⁱⁿ FIG _. In the cell plane, the output value of the feature integration neuron at position (ξ, ζ) is expressed as y _M ^LC (ξ, ζ). In that case, if the coupling coefficients of the neurons are w _M ^LS (n, u, v) and w _M ^LC (u, v), each output value can be expressed as follows.

…（１）

…（２）
式（１）のｆは活性化関数であり、ロジスティック関数や双曲正接関数などのシグモイド関数であれば何でもよく、例えばtanh関数で実現してよい。ｕ_M ^LS(ξ,ζ)は、Ｌ階層目Ｓ層のＭ番目細胞面における、位置(ξ,ζ)の特徴検出ニューロンの内部状態である。式（２）は活性化関数を用いず単純な線形和をとっている。式（２）のように活性化関数を用いない場合は、ニューロンの内部状態ｕ_M ^LC(ξ,ζ)と出力値ｙ_M ^LC(ξ,ζ)は等しい。また、式（１）のｙ_n ^(L-1C)(ξ+u,ζ+v)、式（３）のｙ_M ^LS(ξ+u,ζ+v)をそれぞれ特徴検出ニューロン、特徴統合ニューロンの結合先出力値と呼ぶ。

…(1)

…(2)
f in Equation (1) is an activation function, which may be any sigmoid function such as a logistic function or a hyperbolic tangent function, and may be realized by, for example, a tanh function. u _M ^LS (ξ, ζ) is the internal state of the feature detection neuron at position (ξ, ζ) on the M-th cell plane of the L-th layer S layer. Equation (2) takes a simple linear sum without using an activation function. When no activation function is used as in Equation (2), the neuron's internal state u _M ^LC (ξ, ζ) and output value y _M ^LC (ξ, ζ) are equal. Also, y _n ^(L−1C) (ξ+u, ζ+v) in equation (1) and y _M ^LS (ξ+u, ζ+v) in equation (3) are represented by feature detection neurons and feature integration neurons, respectively. is called the destination output value of

ここで、式（１）及び式（２）中のξ，ζ，ｕ，ｖ，ｎについて説明する。位置（ξ，ζ）は、入力画像における位置座標に対応しており、例えばｙ_M ^LS(ξ,ζ)が高い出力値である場合は、入力画像の画素位置（ξ，ζ）に、Ｌ階層目Ｓ層Ｍ番目細胞面において検出する特徴が存在する可能性が高いことを意味する。またｎは式（２）において、Ｌ－１階層目Ｃ層ｎ番目細胞面を意味しており、統合先特徴番号と呼ぶ。基本的にＬ－１階層目Ｃ層に存在する全ての細胞面についての積和演算を行う。（ｕ，ｖ）は、結合係数の相対位置座標であり、検出する特徴のサイズに応じて有限の範囲（ｕ，ｖ）において積和演算を行う。このような有限な（ｕ，ｖ）の範囲を受容野と呼ぶ。また受容野の大きさを、以下では受容野サイズと呼び、結合している範囲の横画素数×縦画素数で表す。 Here, ξ, ζ, u, v, and n in equations (1) and (2) will be explained. The position (ξ, ζ) corresponds to the position coordinates in the input image. For example, if y _M ^LS (ξ, ζ) is a high output value, then L This means that there is a high probability that the features detected in the M-th cell surface of the S-th layer are present. Also, n in the formula (2) means the n-th cell surface of the C layer of the L-1 layer, and is called the integrated feature number. Basically, sum-of-products operations are performed for all cell planes present in the L-1 layer C layer. (u, v) are the relative position coordinates of the coupling coefficient, and the sum-of-products operation is performed in a finite range (u, v) according to the size of the feature to be detected. Such a finite range of (u, v) is called a receptive field. The size of the receptive field is hereinafter referred to as the size of the receptive field, and is represented by the number of horizontal pixels×the number of vertical pixels in the combined range.

また式（１）において、Ｌ＝１つまり一番初めのＳ層では、ｙ_n ^(L-1C)(ξ+u,ζ+v)は、入力画像ｙ^in_image (ξ+u,ζ+v)または、入力位置マップｙ^in_posi_map (ξ+u,ζ+v)となる。ちなみに、ニューロンや画素の分布は離散的であり、結合先特徴番号も離散的なので、ξ，ζ，ｕ，ｖ，ｎは連続な変数ではなく、離散的な値をとる。ここでは、ξ，ζは非負整数、ｎは自然数、ｕ，ｖは整数とし、何れも有限な範囲となる。 Also, in equation (1), y _n ^(L-1C) (ξ+u, ζ+v) is the input image y ^in_image (ξ+u, ζ+v) Alternatively, it becomes the input position map y ^in_posi_map (ξ+u, ζ+v). Incidentally, the distribution of neurons and pixels is discrete, and the connection destination feature number is also discrete, so ξ, ζ, u, v, and n are not continuous variables but discrete values. Here, ξ and ζ are non-negative integers, n is a natural number, and u and v are integers, all of which have a finite range.

式（１）中のｗ_M ^LS(n,u,v)は、所定の特徴を検出するための結合係数分布であり、これを適切な値に調整することによって、所定の特徴を検出することが可能になる。この結合係数分布の調整が学習であり、ＣＮＮの構築においては、さまざまなテストパターンを提示して、ｙ_M ^LS(ξ,ζ)が適切な出力値になるように、結合係数を繰り返し徐々に修正していくことにより結合係数の調整を行う。 w _M ^LS (n, u, v) in equation (1) is a coupling coefficient distribution for detecting a predetermined feature, and by adjusting this to an appropriate value, the predetermined feature can be detected. becomes possible. This adjustment of the coupling coefficient distribution is learning, and in constructing the CNN, various test patterns are presented, and the coupling coefficients are repeatedly adjusted so that y _M ^LS (ξ, ζ) becomes an appropriate output value. Coupling coefficients are adjusted by correcting them.

次に、式（２）中のｗ_M ^LC(u,v)は、２次元のガウシアン関数を用いており、以下の式（３）のように表すことができる。 Next, w _M ^LC (u,v) in Equation (2) uses a two-dimensional Gaussian function and can be expressed as in Equation (3) below.

…（３）
ここでも、（ｕ，ｖ）は有限の範囲としているので、特徴検出ニューロンの説明と同様に、有限の範囲を受容野といい、範囲の大きさを受容野サイズと呼ぶ。この受容野サイズは、ここではＬ階層目Ｓ層のＭ番目特徴のサイズに応じて適当な値に設定すればよい。式（３）中の、σは特徴サイズ因子であり、受容野サイズに応じて適当な定数に設定しておけばよい。具体的には、受容野の一番外側の値がほぼ０とみなせるような値になるように設定するのがよい。

…(3)
Since (u, v) has a finite range here as well, the finite range is called the receptive field, and the size of the range is called the receptive field size, as in the description of the feature detection neuron. This receptive field size may be set to an appropriate value according to the size of the M-th feature of the L-th layer S layer. In Equation (3), σ is a feature size factor, which may be set to an appropriate constant according to the size of the receptive field. Specifically, it is preferable to set the value so that the outermost value of the receptive field can be regarded as approximately zero.

上述のような演算を各階層で行うことにより、最終階層のＳ層において、主被写体決定を行うのが、本実施形態におけるＣＮＮの構成である。 The configuration of the CNN in this embodiment is to determine the main subject in the S layer, which is the final layer, by performing the above-described calculations in each layer.

（追加学習処理の流れ）
図６は、図２のＳ２０７における画像の選択処理の流れを示すフローチャートである。なお、以下では、入力画像の所定の領域を主被写体を決定すべきテストパターンとして学習することをポジティブ（肯定的）学習と記載する。また所定の領域を主被写体を決定すべきではないテストパターンとして学習することをネガティブ（否定的）学習と記載する。また各学習における詳細は後述する。 (Flow of additional learning process)
FIG. 6 is a flowchart showing the flow of image selection processing in S207 of FIG. Note that hereinafter, learning a predetermined area of an input image as a test pattern for determining a main subject is referred to as positive learning. Also, learning a predetermined area as a test pattern for which the main subject should not be determined is referred to as negative learning. Details of each learning will be described later.

まず、Ｓ６０１において、ＣＰＵ１５１は、操作スイッチ１５６からマニュアル指示があったか否かを判定する。マニュアル指示があった場合にはＳ６０２に進み、マニュアル指示がない場合はＳ６０５に進む。ここでマニュアル指示とは、ユーザが操作スイッチ１５６をマニュアル操作し、画像に対する主被写体領域を指定する操作（選択指示）を指す。Ｓ６０１でマニュアル指示があったということは、図２のＳ２０２において決定された主被写体がユーザが所望する主被写体と異なることを意味するため、Ｓ６０２以降の処理に進む。 First, in S601 , the CPU 151 determines whether or not there is a manual instruction from the operation switch 156 . If there is a manual instruction, the process proceeds to S602, and if there is no manual instruction, the process proceeds to S605. Here, the manual instruction refers to an operation (selection instruction) in which the user manually operates the operation switch 156 to specify the main subject area for the image. The manual instruction in S601 means that the main subject determined in S202 of FIG. 2 is different from the main subject desired by the user, so the process proceeds to S602 and subsequent steps.

Ｓ６０２においては、Ｓ６０１でユーザがマニュアルで指示した被写体が正しい主被写体と考えられる。そのため、ＣＰＵ１５１は、図２のＳ２０１で取得した入力画像と、Ｓ６０１でマニュアル指示された主被写体領域をポジティブ学習のためのデータセットとして選別する。 In S602, the subject manually designated by the user in S601 is considered to be the correct main subject. Therefore, the CPU 151 selects the input image acquired in S201 of FIG. 2 and the main subject area manually instructed in S601 as data sets for positive learning.

Ｓ６０３においては、ＣＰＵ１５１は、図２のＳ２０２で決定した主被写体領域と、Ｓ６０１でマニュアル指定された主被写体領域が異なる被写体であるか否かを判定する。例えば、図２のＳ２０２で決定した主被写体領域と、Ｓ６０１でマニュアル指定された主被写体領域が所定間隔以上離れている場合、２つの領域が異なる被写体であると判定する。所定間隔以上離れていればＳ６０４に進み、離れていなければこのフローチャートを終了する。なおここでの所定間隔は十分大きければ何でもよいが、例えば入力画像の画角に対する割合で決定してもよい。より具体的には、入力画像が６４０×４８０画素の画像である場合、例えば、水平サイズの６４０に対する１０％である６４画素を所定間隔としてもよい。なお、顔検出、人体検出、あるいは、物体検出などの結果により、Ｓ２０２で決定した主被写体領域とＳ６０１でマニュアル指定された主被写体領域が異なる被写体であることが明らかであれば、２つの領域の間の距離を判定する必要はない。 In S603, the CPU 151 determines whether the main subject area determined in S202 of FIG. 2 and the main subject area manually specified in S601 are different subjects. For example, if the main subject area determined in S202 in FIG. 2 and the main subject area manually designated in S601 are separated by a predetermined distance or more, it is determined that the two areas are different subjects. If the distance is equal to or longer than the predetermined distance, the process proceeds to S604, and if not, the flow chart ends. The predetermined interval here may be anything as long as it is sufficiently large, but may be determined, for example, as a ratio to the angle of view of the input image. More specifically, when the input image is an image of 640×480 pixels, for example, the predetermined interval may be 64 pixels, which is 10% of the horizontal size of 640 pixels. If it is clear from the result of face detection, human body detection, or object detection that the main subject area determined in S202 and the main subject area manually designated in S601 are different subjects, It is not necessary to determine the distance between

Ｓ６０４においては、ＣＰＵ１５１は、図２のＳ２０１で取得した入力画像と、Ｓ２０２で決定した主被写体領域をネガティブ学習のためのデータセットとして選別する。そして、このフローチャートを終了する。 In S604, the CPU 151 selects the input image acquired in S201 of FIG. 2 and the main subject area determined in S202 as data sets for negative learning. Then, this flowchart ends.

Ｓ６０５においては、ＣＰＵ１５１は、操作スイッチ１５６からキャンセル指示があったか否かを判定する。キャンセル指示があった場合にはＳ６０６に進み、キャンセル指示がない場合にはこのフローチャートを終了する。ここでキャンセル指示とは、図２のＳ２０４で出力した主被写体決定結果を削除し、撮像装置１００を主被写体の決定がされていない状態にする操作を指す。例えば、主被写体領域の決定をやり直すことをユーザが指示できる機能が搭載されている場合には、その指示が、キャンセル指示に相当する。あるいは、焦点調節の対象とする被写体を確定するための操作である、レリーズボタンの半押しがユーザによってやり直された場合には、この半押しのやり直しが、キャンセル指示に相当する。Ｓ６０５でキャンセル指示があったということは、図２のＳ２０２において決定された主被写体がユーザが所望する主被写体と異なることを意味するため、Ｓ６０６の処理に進む。 In S605 , the CPU 151 determines whether or not there is a cancel instruction from the operation switch 156 . If there is a cancel instruction, the process advances to S606, and if there is no cancel instruction, this flowchart ends. Here, the cancel instruction refers to an operation to delete the main subject determination result output in S204 of FIG. 2 and place the imaging apparatus 100 in a state in which the main subject has not been determined. For example, if a function is provided that allows the user to instruct to redo determination of the main subject area, that instruction corresponds to a cancel instruction. Alternatively, when the user redoes the half-pressing of the release button, which is an operation for determining the subject to be focused, this redoing of the half-pressing corresponds to a cancel instruction. The fact that there is a cancel instruction in S605 means that the main subject determined in S202 of FIG. 2 is different from the main subject desired by the user, so the process proceeds to S606.

Ｓ６０６においては、ＣＰＵ１５１は、図２のＳ２０１で取得した入力画像と、Ｓ２０２で決定した主被写体領域をネガティブ学習のためのデータセットとして選別する。そして、このフローチャートを終了する。 In S606, the CPU 151 selects the input image acquired in S201 of FIG. 2 and the main subject area determined in S202 as data sets for negative learning. Then, this flowchart ends.

図７は、図２のＳ２１０における画像の選択処理の流れを示すフローチャートである。 FIG. 7 is a flow chart showing the flow of image selection processing in S210 of FIG.

Ｓ７０１においては、ＣＰＵ１５１は、ＲＡＭ１５４にバッファリングされたデータセット群のそれぞれについて、撮影指示があったときと同じ被写体が、主被写体として決定されたデータセットであるか否かを判定する。撮影指示があったときと同じ被写体が主被写体として決定されているデータセットであればＳ７０２に進み、そうでなければＳ７０３に進む。撮影指示よりも前に、撮影指示があったときに主被写体として決定されていた被写体とは別の被写体が、主被写体として決定されていたのであれば、先の主被写体の決定は誤りである可能性が高い。そのため、別の被写体が主被写体として決定されていたときのデータセットは、ネガティブ学習のためのデータセットとして有効である。 In S701 , the CPU 151 determines whether or not the same subject as when the shooting instruction was given is the data set determined as the main subject for each data set group buffered in the RAM 154 . If the data set has the same subject determined as the main subject when the shooting instruction was given, the process proceeds to S702; otherwise, the process proceeds to S703. If a subject other than the subject that was determined as the main subject at the time of the shooting instruction was determined as the main subject prior to the shooting instruction, the previous determination of the main subject is an error. Probability is high. Therefore, a data set in which another subject has been determined as the main subject is effective as a data set for negative learning.

Ｓ７０２においては、ＣＰＵ１５１は、撮影指示があったときと同じ被写体が主被写体として決定されているデータセット群から、ポジティブ学習条件を満たすデータセットを選別する。ここでのポジティブ学習条件とは、データセットのうち表示時刻と本ステップの実行時刻の差が所定時間未満であることである。なお、ここでの所定時間は、ユーザが図２のＳ２０４でモニタディスプレイ１５０に出力された主被写体決定結果を確認してから、Ｓ６０７で操作スイッチ１５６を用いて撮影指示を行うまでのタイムラグに基づいて決定してもよい。より具体的には、例えば所定時間を２秒とする。また１度に複数のデータセットが条件を満たす場合には、各々のデータセットに含まれる入力画像同士の類似度を算出し、類似度が所定値以上の場合には、類似する何れかのデータセットを選別対象から除外する。除外処理においては、データセットに含まれる表示時刻と本ステップの実行時刻の差が大きいデータを優先的に除外してもよい。ここでの類似度には、ＳＡＤ値やヒストグラム差分等を用いることができる。 In S702, the CPU 151 selects a data set that satisfies a positive learning condition from the data set group in which the same subject as when the shooting instruction was given is determined as the main subject. The positive learning condition here is that the difference between the display time in the data set and the execution time of this step is less than a predetermined time. Note that the predetermined time here is based on the time lag from when the user confirms the determination result of the main subject output to the monitor display 150 in S204 of FIG. may be determined by More specifically, for example, the predetermined time is set to 2 seconds. Also, when a plurality of data sets satisfy the conditions at once, the degree of similarity between the input images included in each data set is calculated. Exclude sets from screening. In the exclusion process, data with a large difference between the display time included in the data set and the execution time of this step may be preferentially excluded. A SAD value, a histogram difference, or the like can be used as the degree of similarity here.

Ｓ７０３においては、ＣＰＵ１５１は、撮影指示があったときと別の被写体が主被写体として決定されているデータセット群から、ネガティブ学習条件を満たすデータセットを選別する。また１度に複数のデータセットが条件を満たす場合には、Ｓ６０８と同様に除外処理を行う。ＣＰＵ１５１は、Ｓ７０２およびＳ７０３の少なくともいずれかの処理が行われると、このフローチャートを終了する。 In S703, the CPU 151 selects a data set that satisfies a negative learning condition from a data set group in which a subject different from that at the time of the photographing instruction is determined as the main subject. Also, if a plurality of data sets satisfy the condition at once, exclusion processing is performed in the same manner as in S608. When at least one of S702 and S703 is performed, the CPU 151 ends this flowchart.

そして、図２のＳ２１２において、主被写体決定回路１６２は、Ｓ２１０およびＳ２１２で選別したデータセットを用いて、強化学習を行う。 Then, in S212 of FIG. 2, the main subject determination circuit 162 performs reinforcement learning using the data sets selected in S210 and S212.

なお、Ｓ２１２においては、ＣＰＵ１５１は、図２のＳ２０６でＲＡＭ１５４にバッファリングしたデータセットの全削除を行う。 In S212, the CPU 151 deletes all data sets buffered in the RAM 154 in S206 of FIG.

以上説明したように、ユーザ操作に基づいて追加学習を行うことにより、ユーザの嗜好に応じて主被写体決定回路１６２のパラメータを調整することができる。またデータセット選別時に類似度の高いデータセットを除外しているため、特定のシーンに対する過学習を抑制できる。 As described above, by performing additional learning based on the user's operation, it is possible to adjust the parameters of the main subject determination circuit 162 according to the user's preference. In addition, since data sets with high similarity are excluded during data set selection, over-learning for specific scenes can be suppressed.

（学習方法）
次に、具体的な学習方法について説明する。本実施形態では教師ありの学習により、結合係数の調整を行う。教師ありの学習では、テストパターンを与えて実際にニューロンの出力値を求め、その出力値と教師信号（そのニューロンが出力すべき望ましい出力値）の関係から結合係数ｗ_M ^LS(n,u,v)の修正を行えばよい。本実施形態の学習においては、最終層の特徴検出層は最小二乗法を用い、中間層の特徴検出層は誤差逆伝搬法を用いて結合係数の修正を行う（最小二乗法や、誤差逆伝搬法等の、結合係数の修正手法の詳細は、非特許文献１を参照）。 (learning method)
Next, a specific learning method will be described. In this embodiment, the coupling coefficient is adjusted by supervised learning. In supervised learning, a test pattern is given to actually obtain the output value of a neuron, and the coupling coefficient w _M ^LS (n, u, v) should be corrected. In the learning of this embodiment, the last feature detection layer uses the least squares method, and the intermediate feature detection layer uses the error backpropagation method to correct the coupling coefficients (the least squares method, error backpropagation See Non-Patent Document 1 for details of the method of modifying the coupling coefficient, such as the modulus.

本実施形態では、予め学習する場合には、学習用のテストパターンとして、検出すべき特定パターンと、検出すべきでないパターンを多数用意し、追加学習する場合には、前述の方法で学習すべきテストパターンをバッファから選定する。各テストパターンは、画像および教師信号を１セットとする。ポジティブ学習のために選別されたデータセットが、検出すべき特定パターンとして用いられ、ネガティブ学習のための選別されたデータセットが、検出すべきでないパターンとして用いられる。 In this embodiment, when learning in advance, a large number of specific patterns to be detected and patterns not to be detected are prepared as test patterns for learning. Select a test pattern from the buffer. Each test pattern is a set of images and teacher signals. The data set selected for positive learning is used as the specific pattern to be detected, and the data set selected for negative learning is used as the pattern not to be detected.

活性化関数にtanh関数を用いる場合は、検出すべき特定パターンを提示した場合は、最終層の特徴検出細胞面の、特定パターンが存在する領域のニューロンに対し、出力が１となるように教師信号を与える。すなわち、正の報酬を与える。逆に、検出すべきでないパターンを提示した場合は、そのパターンの領域のニューロンに対し、出力が－１となるように教師信号を与えることになる。すなわち、負の報酬を与える。 When the tanh function is used as the activation function, when a specific pattern to be detected is presented, the neuron in the area where the specific pattern exists in the feature detection cell surface of the final layer is supervised so that the output becomes 1. give a signal. That is, give a positive reward. Conversely, when a pattern that should not be detected is presented, a teacher signal is given to the neuron in the region of that pattern so that the output becomes -1. i.e. give a negative reward.

以上の方法により、２次元画像から主被写体を決定するためのＣＮＮが構築される。実際の決定においては、学習により構築した結合係数ｗ_M ^LS(n,u,v)を用いて演算を行い、最終層の特徴検出細胞面上のニューロン出力が、所定値以上であれば、そこに主被写体が存在すると判定する。 A CNN for determining a main subject from a two-dimensional image is constructed by the above method. In actual determination, computation is performed using the coupling coefficients w _M ^LS (n, u, v) constructed by learning. It is determined that the main subject exists in

なお、本実施形態では、撮像装置１００が主被写体決定回路１６２を有し、学習のためのデータセットの選別および強化学習も行う構成を例に挙げて説明を行ったが、これに限られるものではない。 In the present embodiment, the imaging apparatus 100 has the main subject determination circuit 162, and the configuration for selecting data sets for learning and performing reinforcement learning has been described as an example, but the present invention is limited to this. isn't it.

例えば、カメラ機能を有するスマートフォンやタブレット端末を、サーバあるいはエッジコンピュータなどの外部装置と無線通信により接続するシステムにおいても、本発明を適用することが可能である。 For example, the present invention can be applied to a system in which a smartphone or tablet terminal having a camera function is connected to an external device such as a server or an edge computer by wireless communication.

図８は、スマートフォンとサーバからなるシステムを示す。スマートフォン８０１が内部のＲＡＭにバッファリングされたデータセットをサーバ８０２に送信し、サーバ８０２が強化学習を行って、スマートフォン８０１の内部の主被写体決定回路の結合係数を求める。そして、サーバ８０２が求めた結合係数をスマートフォン８０１に送信し、スマートフォン８０１は受信した結合係数を主被写体決定回路に設定する。このとき、スマートフォン８０１のＲＡＭにバッファリングされたデータセットからポジティブ学習、および、ネガティブ学習のためのデータセットを選別する処理は、スマートフォン８０１とサーバ８０２のいずれが行っても構わない。スマートフォン８０１がデータセットの選別を行うのであれば、選別後のデータセットを、スマートフォン８０１からサーバ８０２に送信すればよい。サーバ８０２がデータセットの選別を行うのであれば、スマートフォン８０１はＲＡＭにバッファリングされた全てのデータセットと、そのデータセットに対して関連付けて記憶されたユーザ操作の情報を、サーバ８０２に送信する。データセットに含まれる画像と、ユーザ操作の情報は時間軸上の対応付けがされているものとする。このような構成であれば、スマートフォン８０１は、データセットとユーザ操作情報をサーバ８０２に送信することで、サーバ８０２がデータセットの選別を行うことができる。 FIG. 8 shows a system consisting of a smart phone and a server. The smartphone 801 transmits the data set buffered in the internal RAM to the server 802 , and the server 802 performs reinforcement learning to obtain the coupling coefficient of the main subject determination circuit inside the smartphone 801 . Then, the server 802 transmits the obtained coupling coefficient to the smartphone 801, and the smartphone 801 sets the received coupling coefficient in the main subject determination circuit. At this time, either the smart phone 801 or the server 802 may perform the process of selecting data sets for positive learning and negative learning from the data sets buffered in the RAM of the smart phone 801 . If the smartphone 801 sorts out the datasets, the sorted datasets may be transmitted from the smartphone 801 to the server 802 . If the server 802 sorts out the datasets, the smartphone 801 transmits to the server 802 all the datasets buffered in the RAM and the user operation information stored in association with the datasets. . It is assumed that the images included in the data set and the information of the user's operation are associated with each other on the time axis. With such a configuration, the smartphone 801 transmits data sets and user operation information to the server 802, so that the server 802 can select data sets.

また、スマートフォン８０１が主被写体決定回路を備えておらず、サーバ８０２が主被写体決定回路を備えるように構成してもよい。スマートフォン８０１はリアルタイムで入力画像をサーバに送信し、サーバ８０２が主被写体の決定を行い、その結果をスマートフォン８０１に送信する。この場合、サーバ８０２が強化学習を行い、サーバの主被写体決定回路の結合係数を更新することになる。この場合も、スマートフォン８０１のＲＡＭにバッファリングされたデータセットからポジティブ学習、および、ネガティブ学習のためのデータセットを選別する処理は、スマートフォン８０１とサーバ８０２のいずれで行っても構わない。 Alternatively, the smartphone 801 may not include the main subject determination circuit, and the server 802 may include the main subject determination circuit. The smartphone 801 transmits the input image to the server in real time, and the server 802 determines the main subject and transmits the result to the smartphone 801 . In this case, the server 802 performs reinforcement learning to update the coupling coefficients of the main subject determination circuit of the server. In this case as well, the process of selecting data sets for positive learning and negative learning from the data sets buffered in the RAM of the smart phone 801 may be performed by either the smart phone 801 or the server 802 .

さらに、ポジティブ学習のためのデータセットについては、撮影指示の前にＲＡＭ１５４にバッファリングされたデータセットに代えて、撮影によって得られた入力画像を用いても構わない。これは、撮影時には主被写体が正しく選定されている可能性が高いためである。この場合、ポジティブ学習のためのデータセットは、撮影によって記録された画像を含むデータセットであるのに対して、ネガティブ学習のためのデータセットは、撮影指示の前にバッファリングされたデータセットとなる。 Furthermore, as for the data set for positive learning, an input image obtained by shooting may be used instead of the data set buffered in the RAM 154 before the shooting instruction. This is because there is a high possibility that the main subject has been correctly selected at the time of photographing. In this case, the data set for positive learning is the data set containing the images recorded by shooting, whereas the data set for negative learning is the data set buffered before the shooting instruction. Become.

以上説明したように、本実施形態によれば、通常のカメラ操作を繰り返すことにより、ユーザの嗜好に応じた主被写体決定を行うことが可能となる。さらに、撮影指示を行う前の主被写体決定の結果と、そのときのユーザの操作の関係を判定することで、ユーザが意図しない主被写体決定の結果であることが識別できるようになり、効率的にネガティブ学習のためのデータを選別することが可能となる。 As described above, according to the present embodiment, by repeating normal camera operations, it is possible to determine the main subject according to the user's preference. Furthermore, by determining the relationship between the main subject determination result before the shooting instruction and the user's operation at that time, it becomes possible to identify that the main subject determination result is unintended by the user. It is possible to select data for negative learning.

（他の実施形態）
また本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現できる。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現できる。 (Other embodiments)
In addition, the present invention supplies a program that implements one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in the computer of the system or apparatus reads the program. It can also be realized by executing processing. It can also be implemented by a circuit (eg, ASIC) that implements one or more functions.

発明は上記実施形態に制限されるものではなく、発明の精神及び範囲から離脱することなく、様々な変更及び変形が可能である。従って、発明の範囲を公にするために請求項を添付する。 The invention is not limited to the embodiments described above, and various modifications and variations are possible without departing from the spirit and scope of the invention. Accordingly, the claims are appended to make public the scope of the invention.

１００：撮像装置、１０１：レンズユニット、１４１：撮像素子、１４２：撮像信号処理回路、１４３：撮像制御回路、１５１：ＣＰＵ、１５２：画像処理回路、１６２：主被写体決定回路 100: imaging device, 101: lens unit, 141: imaging element, 142: imaging signal processing circuit, 143: imaging control circuit, 151: CPU, 152: image processing circuit, 162: main subject determination circuit

Claims

An image to be used as data for giving a negative reward in learning for the determining means is selected from among the images for which the main subject has been determined using the determining means for determining the main subject from the acquired images, based on the user's input. Having sorting means for sorting,
The selecting means uses the first main subject determined by the determining means at the time of the user's input indicating the photographing instruction, and the determining means before the user's input indicating the photographing instruction. and selecting an image in which the second main subject is determined as the data for giving the negative reward when the subject is different from the second main subject that has been determined by the method. .

The selection means selects an image to be used as data that gives a positive reward in learning for the determination means from among the images for which the main subject has been determined using the determination means for determining the main subject from the acquired images, 2. The image processing apparatus according to claim 1, wherein the selection is made based on the user's input.

An image to be used as data for giving a negative reward in learning for the determining means is selected from among the images for which the main subject has been determined using the determining means for determining the main subject from the acquired images, based on the user's input. Having sorting means for sorting,
The selection means selects an image to be used as data that gives a positive reward in learning for the determination means from among the images for which the main subject has been determined using the determination means for determining the main subject from the acquired images, filtering based on said user input;
The selecting means uses the first main subject determined by the determining means at the time of the user's input indicating the photographing instruction, and the determining means before the user's input indicating the photographing instruction. selecting the image in which the second main subject is determined as the data for giving the negative reward, and selecting the image in which the second main subject is determined as the data that gives the negative reward, and 2. An image processing apparatus, wherein, when the second main subject is the same subject, an image in which the second main subject is determined is selected as data for giving the positive reward.

When the first main subject specified based on the user's input and the second main subject determined using the determining means are different subjects, the selecting means selects the second main subject. 4. The image processing apparatus according to any one of claims 1 to 3, wherein an image for which is determined is selected as the data for giving the negative reward.

When the first main subject is designated based on the user's input, the selection means selects an image in which the first main subject is determined as the data for giving the positive reward. 3. The image processing apparatus according to claim 2.

When the user's input for determining the subject to be focused is redone, the selection means converts the image in which the main subject has been determined using the determination means before the redo to the negative image. 6. The image processing apparatus according to any one of claims 1 to 5, wherein the data is selected as reward data.

Further comprising calculating means for calculating the similarity of the plurality of images,
The selection means selects only an image selected from a plurality of images having a degree of similarity calculated by the calculation means equal to or higher than a predetermined value as data for giving the negative reward. 7. The image processing apparatus according to any one of claims 1 to 6 .

8. An image processing apparatus according to claim 1, further comprising said determining means.

Having learning means for learning for the determining means,
9. The image processing apparatus according to any one of claims 1 to 8 , wherein the learning means performs learning using the data selected by the selecting means and giving the negative reward.

The determining means uses a learning result obtained by wireless communication from a learning means of an external device that performs learning for the determining means,
9. The image processing apparatus according to claim 8 , wherein the learning means performs learning using the data selected by the selecting means and giving the negative reward.

An external device capable of wireless communication with the image processing device has the determining means and a learning means for performing learning for the determining means,
8. The image processing apparatus according to any one of claims 1 to 7 , wherein the learning means performs learning using the data selected by the selecting means and giving the negative reward.

12. The image processing apparatus according to any one of claims 1 to 11 , further comprising imaging means for imaging an image in which the main subject has been determined.

An image to be used as data for giving a negative reward in learning for the determining means is selected from among the images for which the main subject has been determined using the determining means for determining the main subject from the acquired images, based on the user's input. Having a sorting process for sorting,
In the selecting step, the first main subject determined by the determining means when the user input indicating the photographing instruction is received, and the determining means is used before the user input indicating the photographing instruction. and selecting an image in which the second main subject is determined as the data for giving the negative reward when the subject is different from the second main subject that has been determined by the method. control method.

An image to be used as data for giving a negative reward in learning for the determining means is selected from among the images for which the main subject has been determined using the determining means for determining the main subject from the acquired images, based on the user's input. Having a sorting process for sorting,
In the selecting step, an image to be used as data for giving a positive reward in learning for the determining means, from among the images for which the main subject has been determined using the determining means for determining the main subject from the acquired images, filtering based on said user input;
In the selecting step, the first main subject determined by the determining means when the user input indicating the photographing instruction is received, and the determining means is used before the user input indicating the photographing instruction. selecting the image in which the second main subject is determined as the data for giving the negative reward, and selecting the image in which the second main subject is determined as the data that gives the negative reward, and and selecting an image in which the second main subject is determined as the data giving the positive reward when the second main subject is the same subject. .

A program for causing a computer to execute the steps of the control method according to claim 13 or 14.

A computer-readable storage medium storing a program for causing a computer to execute the steps of the control method according to claim 13 or 14.