JP2019118098A

JP2019118098A - Imaging device, control method therefor, program, and storage medium

Info

Publication number: JP2019118098A
Application number: JP2018203255A
Authority: JP
Inventors: 英貴門井; Hideki Kadoi
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-12-26
Filing date: 2018-10-29
Publication date: 2019-07-18
Anticipated expiration: 2038-10-29
Also published as: JP6641447B2; GB201919386D0; GB2582197A; GB2582197B; JP6799660B2; JP2020061761A

Abstract

To provide an imaging device capable of acquiring a video suitable for a user without performing a special operation by the user.SOLUTION: An imaging device includes an acquisition unit that acquires data related to a photographed image photographed by a photographing unit, a learning unit that learns conditions of an image preferred by the user on the basis of teacher data acquired by the acquiring unit, a control circuit that determines the automatic photographing by the photographing unit on the basis of the condition learned by the learning unit, and a registration unit that registers, as the teacher data, the data acquired by the acquisition unit for the photographed image for learning captured continuously to the photographing performed by the user's instruction.SELECTED DRAWING: Figure 9

Description

本発明は、撮像装置における自動撮影技術に関するものである。 The present invention relates to an automatic imaging technique in an imaging device.

カメラ等の撮像装置による静止画・動画撮影においては、ユーザがファインダー等を通して撮影対象を決定し、撮影状況を自ら確認して撮影画像のフレーミングを調整することによって、画像を撮影するのが通常である。このような撮像装置では、ユーザの操作ミスを検知してユーザに通知したり、外部環境の検知を行い、撮影に適していない場合にユーザに通知したりする機能が備えられている。また、撮影に適した状態になるようにカメラを制御する仕組みが従来から存在している。 In still image / moving image shooting with an imaging device such as a camera, it is common for a user to decide an object to be shot through a finder or the like, check the shooting situation by himself and adjust the framing of the shot image to shoot an image. is there. Such an imaging apparatus is provided with a function of detecting a user's operation error and notifying the user, or detecting an external environment and notifying the user when it is not suitable for photographing. In addition, there has conventionally been a mechanism for controlling the camera to be in a state suitable for photographing.

このようなユーザの操作により撮影を実行する撮像装置に対し、ユーザが撮影指示を与えることなく定期的および継続的に撮影を行うライフログカメラが存在する（特許文献１）。ライフログカメラは、ストラップ等でユーザの身体に装着された状態で用いられ、ユーザが日常生活で目にする光景を一定時間間隔で映像として記録するものである。ライフログカメラによる撮影は、ユーザがシャッターを切るなどの意図したタイミングで撮影するのではなく、一定の時間間隔で撮影を行うため、普段撮影しないような不意な瞬間を映像として残すことができる。 There is a life log camera that shoots regularly and continuously without the user giving a shooting instruction to an imaging apparatus that performs shooting by such a user operation (Japanese Patent Application Laid-Open No. 2008-112118). The life log camera is used in a state of being attached to the user's body by a strap or the like, and records a scene that the user sees in daily life as an image at regular time intervals. The life log camera does not shoot at a timing intended by the user, for example, but shoots at fixed time intervals, so that unexpected moments that are not normally shot can be left as images.

特表２０１６−５３６８６８号公報Japanese Patent Publication No. 2016-536868 Publication 特開２００４−３５４２５１号公報JP, 2004-354251, A

しかしながら、ライフログカメラをユーザが身に着けた状態において、定期的に自動撮影を行った場合、ユーザの好みでない映像が取得され、本当に得たい瞬間の映像を取得できない場合があった。 However, when the user wears the life log camera, if automatic shooting is performed periodically, there is a case where a video which is not preferred by the user is acquired, and it is not possible to acquire a video at the moment when it is desired to obtain.

また、ライフログカメラが学習機能を有し、本当に撮影したい瞬間を学習して自動撮影することができたとしても、その学習のためには大量の教師データが必要となる。特許文献２には、ニューラルネットワークを用いて被検物の欠陥の有無を検査する欠陥検査装置において、被検物の人工的な欠陥画像を画像処理により作成して、学習用パターンの不足を補う技術が開示されている。しかし、被検物の種類が限られる欠陥検査とは異なり、ライフログカメラにおいては被写体の種類は無限にあり、画像処理によって学習用パターンの不足を補うことは難しい。 In addition, even if the life log camera has a learning function and can learn and automatically shoot the moment that the user really wants to shoot, a large amount of teacher data is required for the learning. In Patent Document 2, in a defect inspection apparatus for inspecting the presence or absence of a defect in a test object using a neural network, an artificial defect image of the test object is created by image processing to compensate for the lack of a learning pattern. Technology is disclosed. However, unlike the defect inspection in which the type of the object is limited, in the life log camera, the type of the object is infinite, and it is difficult to compensate for the lack of the learning pattern by image processing.

本発明は上述した課題に鑑みてなされたものであり、その目的は、ユーザが特別な操作を行うことなく、ユーザに好適な映像を取得することが可能な撮像装置を提供することである。 The present invention has been made in view of the above-described problems, and an object thereof is to provide an imaging device capable of acquiring a video suitable for the user without the user performing a special operation.

本発明に係わる撮像装置は、撮影手段により撮影された撮影画像に関するデータを取得する取得手段と、前記取得手段により取得された教師データに基づいて、ユーザが好む画像の条件を学習する学習手段と、前記学習手段により学習された条件に基づいて、前記撮影手段による自動撮影の判定を行う制御手段と、ユーザの指示により行われた撮影に連続して撮影された学習用の撮影画像について前記取得手段により取得されたデータを前記教師データとして登録する登録手段と、を備えることを特徴とする。 An imaging apparatus according to the present invention includes an acquisition unit that acquires data related to a photographed image captured by an imaging unit, and a learning unit that learns conditions of an image preferred by a user based on teacher data acquired by the acquisition unit. The control means for determining the automatic photographing by the photographing means based on the condition learned by the learning means, and the acquisition of the learning photographed image continuously photographed in the photographing performed by the user's instruction And registration means for registering data acquired by the means as the teacher data.

本発明によれば、ユーザが特別な操作を行うことなく、ユーザに好適な映像を取得することが可能な撮像装置を提供することが可能となる。 According to the present invention, it is possible to provide an imaging device capable of acquiring a video suitable for the user without the user performing a special operation.

撮像装置を模式的に示す図である。It is a figure which shows an imaging device typically. 撮像装置の構成を示す図である。It is a figure showing composition of an imaging device. 撮像装置と外部機器との構成を示す図である。It is a figure which shows the structure of an imaging device and an external apparatus. 外部機器の構成を示す図である。It is a figure which shows the structure of an external apparatus. 撮像装置と外部機器との構成を示す図である。It is a figure which shows the structure of an imaging device and an external apparatus. 外部機器の構成を示す図である。It is a figure which shows the structure of an external apparatus. 第１制御回路を説明するフローチャートである。It is a flowchart explaining a 1st control circuit. 第２制御回路を説明するフローチャートである。It is a flowchart explaining a 2nd control circuit. 撮影モード処理を説明するフローチャートである。5 is a flowchart illustrating shooting mode processing. ニューラルネットワークを説明する図である。It is a figure explaining a neural network. 撮影画像内のエリア分割を説明するための図である。It is a figure for demonstrating area division in a photography picture. 学習モード判定を説明するフローチャートである。It is a flowchart explaining a learning mode determination. 学習処理を説明するフローチャートである。It is a flowchart explaining a learning process. 本実施形態に係る表示処理を説明する図である。It is a figure explaining the display process which concerns on this embodiment.

〔第１の実施形態〕
＜撮像装置の構成＞
図１は、第１の実施形態の撮像装置を模式的に示す図である。 First Embodiment
<Configuration of Imaging Device>
FIG. 1 is a view schematically showing an imaging device of the first embodiment.

図１（ａ）に示す撮像装置１０１は、電源スイッチの操作を行うことができる操作部材（以後、電源ボタンというが、タッチパネルへのタップやフリック、スワイプなどの操作でもよい）などが設けられている。撮像を行う撮影レンズ群や撮像素子を含む筐体である鏡筒１０２は、撮像装置１０１に取り付けられ、鏡筒１０２を固定部１０３に対して回転駆動できる回転機構を設けている。チルト回転ユニット１０４は、鏡筒１０２を図１（ｂ）に示すピッチ方向に回転できるモーター駆動機構であり、パン回転ユニット１０５は、鏡筒１０２をヨー方向に回転できるモーター駆動機構である。よって、鏡筒１０２は、１軸以上の方向に回転可能である。なお、図１（ｂ）は、固定部１０３位置での軸定義である。角速度計１０６と加速度計１０７はともに、撮像装置１０１の固定部１０３に実装されている。そして、角速度計１０６や加速度計１０７に基づいて、撮像装置１０１の振動を検出し、チルト回転ユニットとパン回転ユニットを検出した揺れ角度に基づいて回転駆動する。これにより、可動部である鏡筒１０２の振れを補正したり、傾きを補正したりする構成となっている。 The imaging device 101 illustrated in FIG. 1A is provided with an operation member (hereinafter referred to as a power button, which may be a tap, flick, or swipe operation to a touch panel) that can operate the power switch. There is. A lens barrel 102, which is a housing including an imaging lens group for capturing an image and an imaging element, is attached to the imaging apparatus 101, and is provided with a rotation mechanism capable of rotationally driving the lens barrel 102 with respect to the fixed portion 103. The tilt rotation unit 104 is a motor drive mechanism capable of rotating the lens barrel 102 in the pitch direction shown in FIG. 1B, and the pan rotation unit 105 is a motor drive mechanism capable of rotating the lens barrel 102 in the yaw direction. Thus, the lens barrel 102 can rotate in one or more axes. FIG. 1B is an axis definition at the fixed portion 103 position. Both the angular velocity meter 106 and the accelerometer 107 are mounted on the fixing unit 103 of the imaging device 101. The vibration of the imaging apparatus 101 is detected based on the angular velocity meter 106 and the accelerometer 107, and the tilt rotation unit and the pan rotation unit are rotationally driven based on the detected swing angle. Thereby, the shake of the lens barrel 102 which is the movable portion is corrected, and the tilt is corrected.

図２は、本実施形態の撮像装置の構成を示すブロック図である。 FIG. 2 is a block diagram showing the configuration of the imaging device of the present embodiment.

図２において、第１制御回路２２３は、プロセッサ（例えば、ＣＰＵ、ＧＰＵ、マイクロプロセッサ、ＭＰＵなど）、メモリ（例えば、ＤＲＡＭ、ＳＲＡＭなど）からなる。これらは、各種処理を実行して撮像装置１０１の各ブロックを制御したり、各ブロック間でのデータ転送を制御したりする。不揮発性メモリ（ＥＥＰＲＯＭ）２１６は、電気的に消去・記録可能なメモリであり、第１制御回路２２３の動作用の定数、プログラム等が記憶される。 In FIG. 2, the first control circuit 223 includes a processor (for example, a CPU, a GPU, a microprocessor, an MPU, and the like) and a memory (for example, a DRAM, an SRAM, and the like). These units execute various processes to control each block of the imaging apparatus 101 or to control data transfer between the blocks. A nonvolatile memory (EEPROM) 216 is an electrically erasable and recordable memory, and stores constants, programs and the like for the operation of the first control circuit 223.

図２において、ズームユニット２０１は、変倍を行うズームレンズを含む。ズーム駆動制御回路２０２は、ズームユニット２０１を駆動制御する。フォーカスユニット２０３は、ピント調整を行うレンズを含む。フォーカス駆動制御回路２０４は、フォーカスユニット２０３を駆動制御する。 In FIG. 2, the zoom unit 201 includes a zoom lens that performs magnification change. The zoom drive control circuit 202 drives and controls the zoom unit 201. The focus unit 203 includes a lens that performs focus adjustment. The focus drive control circuit 204 drives and controls the focus unit 203.

撮像部２０６は、撮像素子とＡ／Ｄ変換器を備え、撮像素子が各レンズ群を通して入射する光を受け、その光量に応じた電荷の情報をアナログ画像データとして画像処理回路２０７に出力する。画像処理回路２０７は複数のＡＬＵ（Arithmetic and Logic Unit）を搭載した演算回路であり、Ａ／Ｄ変換により出力されたデジタル画像データに対して、歪曲補正やホワイトバランス調整や色補間処理等の画像処理を適用し、適用後のデジタル画像データを出力する。画像処理回路２０７から出力されたデジタル画像データは、画像記録回路２０８でＪＰＥＧ形式等の記録用フォーマットに変換し、メモリ２１５や後述する映像出力回路２１７に送信される。 The imaging unit 206 includes an imaging element and an A / D converter, and the imaging element receives light incident through each lens group, and outputs information of charge corresponding to the light amount to the image processing circuit 207 as analog image data. An image processing circuit 207 is an arithmetic circuit equipped with a plurality of ALUs (Arithmetic and Logic Units), and an image such as distortion correction, white balance adjustment, or color interpolation processing for digital image data output by A / D conversion. Apply processing and output digital image data after application. The digital image data output from the image processing circuit 207 is converted into a recording format such as JPEG format by the image recording circuit 208, and is transmitted to the memory 215 or a video output circuit 217 described later.

鏡筒回転駆動回路２０５は、チルト回転ユニット１０４、パン回転ユニット１０５を駆動して鏡筒１０２をチルト方向とパン方向に駆動させる。 The lens barrel rotation drive circuit 205 drives the tilt rotation unit 104 and the pan rotation unit 105 to drive the lens barrel 102 in the tilt direction and the pan direction.

装置揺れ検出回路２０９は、例えば撮像装置１０１の３軸方向の角速度を検出する角速度計（ジャイロセンサ）１０６や、装置の３軸方向の加速度を検出する加速度計（加速度センサ）１０７が搭載される。装置揺れ検出回路２０９は、検出された信号に基づいて、装置の回転角度や装置のシフト量などが演算される。 The device shake detection circuit 209 is mounted with, for example, an angular velocity meter (gyro sensor) 106 that detects angular velocity in the direction of three axes of the imaging device 101 and an accelerometer (acceleration sensor) 107 that detects an acceleration in the direction of three axes of the device. . The device shake detection circuit 209 calculates the rotation angle of the device, the shift amount of the device, and the like based on the detected signal.

音声入力回路２１３は、撮像装置１０１に設けられたマイクから撮像装置１０１周辺の音声信号を取得し、アナログデジタル変換をして音声処理回路２１４に送信する。音声処理回路２１４は、入力されたデジタル音声信号の適正化処理等の音声に関する処理を行う。そして、音声処理回路２１４で処理された音声信号は、第１制御回路２２３によりメモリ２１５に送信される。メモリ２１５は、画像処理回路２０７、音声処理回路２１４により得られた画像信号及び音声信号を一時的に記憶する。 The audio input circuit 213 acquires an audio signal around the imaging device 101 from a microphone provided in the imaging device 101, performs analog-to-digital conversion, and transmits the signal to the audio processing circuit 214. The audio processing circuit 214 performs processing relating to audio such as optimization processing of the input digital audio signal. Then, the audio signal processed by the audio processing circuit 214 is transmitted to the memory 215 by the first control circuit 223. The memory 215 temporarily stores the image signal and the audio signal obtained by the image processing circuit 207 and the audio processing circuit 214.

画像処理回路２０７及び音声処理回路２１４は、メモリ２１５に一時的に記憶された画像信号や音声信号を読み出して画像信号の符号化、音声信号の符号化などを行い、圧縮画像信号、圧縮音声信号を生成する。第１制御回路２２３は、これらの圧縮画像信号、圧縮音声信号を、記録再生回路２２０に送信する。 An image processing circuit 207 and an audio processing circuit 214 read out an image signal and an audio signal temporarily stored in the memory 215, encode the image signal, encode an audio signal, and the like, and generate a compressed image signal and a compressed audio signal. Generate The first control circuit 223 transmits the compressed image signal and the compressed audio signal to the recording and reproducing circuit 220.

記録再生回路２２０は、記録媒体２２１に対して画像処理回路２０７及び音声処理回路２１４で生成された圧縮画像信号、圧縮音声信号、その他撮影に関する制御データ等を記録する。また、音声信号を圧縮符号化しない場合には、第１制御回路２２３は、音声処理回路２１４により生成された音声信号と画像処理回路２０７により生成された圧縮画像信号とを、記録再生回路２２０に送信し記録媒体２２１に記録させる。 The recording / reproducing circuit 220 records, on the recording medium 221, the compressed image signal and the compressed audio signal generated by the image processing circuit 207 and the audio processing circuit 214, and other control data related to photographing. When the audio signal is not compressed and encoded, the first control circuit 223 sends the audio signal generated by the audio processing circuit 214 and the compressed image signal generated by the image processing circuit 207 to the recording and reproducing circuit 220. It transmits and makes the recording medium 221 record.

記録媒体２２１は、撮像装置１０１に内蔵された記録媒体でも、取外し可能な記録媒体でもよい。記録媒体２２１は、撮像装置１０１で生成した圧縮画像信号、圧縮音声信号、音声信号などの各種データを記録することができ、不揮発性メモリ２１６よりも大容量な媒体が一般的に使用される。例えば、記録媒体２２１は、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−Ｒ、ＤＶＤ−Ｒ、磁気テープ、不揮発性の半導体メモリ、フラッシュメモリ、などのあらゆる方式の記録媒体を含む。 The recording medium 221 may be a recording medium built in the imaging device 101 or a removable recording medium. The recording medium 221 can record various data such as a compressed image signal, a compressed audio signal, and an audio signal generated by the imaging device 101, and a medium having a larger capacity than the non-volatile memory 216 is generally used. For example, the recording medium 221 includes a recording medium of any type such as a hard disk, an optical disk, a magneto-optical disk, a CD-R, a DVD-R, a magnetic tape, a non-volatile semiconductor memory, and a flash memory.

記録再生回路２２０は、記録媒体２２１に記録された圧縮画像信号、圧縮音声信号、音声信号、各種データ、プログラムを読み出す（再生する）。そして読み出した圧縮画像信号、圧縮音声信号を、第１制御回路２２３は画像処理回路２０７及び音声処理回路２１４に送信する。画像処理回路２０７及び音声処理回路２１４は、圧縮画像信号、圧縮音声信号を一時的にメモリ２１５に記憶させ、所定の手順で復号し、復号した信号を映像出力回路２１７、音声出力回路２１８に送信する。 The recording and reproducing circuit 220 reads (reproduces) the compressed image signal, the compressed audio signal, the audio signal, various data, and the program recorded on the recording medium 221. Then, the first control circuit 223 transmits the read compressed image signal and compressed audio signal to the image processing circuit 207 and the audio processing circuit 214. The image processing circuit 207 and the audio processing circuit 214 temporarily store the compressed image signal and the compressed audio signal in the memory 215, decode the signal according to a predetermined procedure, and transmit the decoded signal to the video output circuit 217 and the audio output circuit 218. Do.

音声入力回路２１３は複数のマイクが撮像装置１０１に搭載されており、音声処理回路２１４は複数のマイクが設置された平面上の音の方向を検出することができ、後述する探索や自動撮影に用いられる。さらに、音声処理回路２１４では、特定の音声コマンドを検出する。音声コマンドは事前に登録されたいくつかのコマンドの他、ユーザが特定音声を撮像装置に登録できる構成にしてもよい。また、音シーン認識も行う。音シーン認識では、予め大量の音声データを基に機械学習により学習させたネットワークにより音シーン判定を行う。例えば、「歓声が上がっている」、「拍手している」、「声を発している」などの特定シーンを検出するためのネットワークが音声処理回路２１４に設定されている。そして、特定音シーンや特定音声コマンドを検出すると、第１制御回路２２３や第２制御回路２１１に、検出トリガー信号を出力する構成になっている。 The voice input circuit 213 has a plurality of microphones mounted on the imaging apparatus 101, and the voice processing circuit 214 can detect the direction of sound on a plane on which the plurality of microphones are installed. Used. Further, the audio processing circuit 214 detects a specific audio command. The voice command may be configured so that the user can register a specific voice in the imaging device in addition to some commands registered in advance. It also performs sound scene recognition. In sound scene recognition, sound scene determination is performed by a network learned in advance by machine learning based on a large amount of audio data. For example, a network for detecting a specific scene such as “Chaiing”, “Claping”, “Speaking” is set in the audio processing circuit 214. Then, when a specific sound scene or a specific voice command is detected, a detection trigger signal is output to the first control circuit 223 and the second control circuit 211.

撮像装置１０１のメインシステム全体を制御する第１制御回路２２３とは別に設けられた、第２制御回路２１１が第１制御回路２２３の供給電力を制御する。 A second control circuit 211, which is provided separately from the first control circuit 223 that controls the entire main system of the imaging device 101, controls the power supplied from the first control circuit 223.

第１電源回路２１０と第２電源回路２１２は、第１制御回路２２３と第２制御回路２１１を動作させるための、電力をそれぞれ供給する。撮像装置１０１に設けられた電源ボタンの押下により、まず第１制御回路２２３と第２制御回路２１１の両方に電力が供給されるが、後述するように、第１制御回路２２３は、第１電源回路２１０へ自らの電力供給をＯＦＦするように制御する。第１制御回路２２３が動作していない間も、第２制御回路２１１は動作しており、装置揺れ検出回路２０９や音声処理回路２１４からの情報が入力される。第２制御回路は各種入力情報を基にして、第１制御回路２２３を起動するか否かの判定処理を行い、起動判定されると第１電源回路に電力供給指示をする構成になっている。 The first power supply circuit 210 and the second power supply circuit 212 supply power for operating the first control circuit 223 and the second control circuit 211, respectively. Power is supplied to both the first control circuit 223 and the second control circuit 211 by pressing the power button provided on the imaging device 101, but as described later, the first control circuit 223 performs the first power supply. It is controlled to turn off its own power supply to the circuit 210. Even while the first control circuit 223 is not in operation, the second control circuit 211 is in operation, and information from the device fluctuation detection circuit 209 and the audio processing circuit 214 is input. The second control circuit is configured to determine whether to activate the first control circuit 223 based on various input information, and when the activation is determined, instructs the first power supply circuit to supply power. .

音声出力回路２１８は、例えば撮影時などに撮像装置１０１に内蔵されたスピーカーから予め設定された音声パターンを出力する。 The audio output circuit 218 outputs an audio pattern set in advance from a speaker incorporated in the imaging apparatus 101, for example, at the time of shooting.

ＬＥＤ制御回路２２４は、例えば撮影時などに撮像装置１０１に設けられたＬＥＤを予め設定された点灯点滅パターンで制御する。 The LED control circuit 224 controls, for example, an LED provided in the imaging device 101 according to a preset lighting blink pattern at the time of shooting or the like.

映像出力回路２１７は、例えば映像出力端子からなり、接続された外部ディスプレイ等に映像を表示させるために画像信号を送信する。また、音声出力回路２１８、映像出力回路２１７は、結合された１つの端子、例えばＨＤＭＩ（登録商標）（Ｈｉｇｈ−ＤｅｆｉｎｉｔｉｏｎＭｕｌｔｉｍｅｄｉａＩｎｔｅｒｆａｃｅ）端子のような端子であってもよい。 The video output circuit 217 includes, for example, a video output terminal, and transmits an image signal to display a video on a connected external display or the like. In addition, the audio output circuit 218 and the video output circuit 217 may be one coupled terminal, for example, a terminal such as a high-definition multimedia interface (HDMI (registered trademark)) terminal.

通信回路２２２は、撮像装置１０１と外部装置との間で通信を行うもので、例えば、音声信号、画像信号、圧縮音声信号、圧縮画像信号などのデータを送信したり受信したりする。また、撮影開始や終了コマンド、パン・チルトやズーム駆動等の、撮影にかかわる制御信号を受信して、撮像装置１０１と相互通信可能な外部機器の指示から撮像装置１０１を駆動する。また、撮像装置１０１と外部装置との間で、後述する学習処理回路２１９で処理される学習にかかわる各種パラメータなどの情報を送信したり受信したりする。通信回路２２２は、例えば、赤外線通信モジュール、Ｂｌｕｅｔｏｏｔｈ（登録商標）通信モジュール、無線ＬＡＮ通信モジュール、ＷｉｒｅｌｅｓｓＵＳＢ、ＧＰＳ受信機等の無線通信モジュールである。 The communication circuit 222 performs communication between the imaging apparatus 101 and an external device, and transmits and receives data such as an audio signal, an image signal, a compressed audio signal, and a compressed image signal, for example. Also, the imaging device 101 is driven based on an instruction from an external device that can mutually communicate with the imaging device 101 by receiving imaging start and end commands and control signals related to imaging such as pan / tilt and zoom driving. In addition, information such as various parameters related to learning processed by a learning processing circuit 219 described later is transmitted and received between the imaging apparatus 101 and an external apparatus. The communication circuit 222 is, for example, a wireless communication module such as an infrared communication module, a Bluetooth (registered trademark) communication module, a wireless LAN communication module, a wireless USB, or a GPS receiver.

＜外部通信機器との構成＞
図３は、撮像装置１０１と外部装置３０１との無線通信システムの構成例を示す図である。撮像装置１０１は撮影機能を有するデジタルカメラであり、外部装置３０１はＢｌｕｅｔｏｏｔｈ通信モジュール、無線ＬＡＮ通信モジュールを含むスマートデバイスである。 <Configuration with external communication device>
FIG. 3 is a diagram showing a configuration example of a wireless communication system of the imaging apparatus 101 and the external apparatus 301. The imaging apparatus 101 is a digital camera having a photographing function, and the external apparatus 301 is a smart device including a Bluetooth communication module and a wireless LAN communication module.

撮像装置１０１とスマートデバイス３０１は、例えばＩＥＥＥ８０２．１１規格シリーズに準拠した無線ＬＡＮによる通信３０２と、例えばＢｌｕｅｔｏｏｔｈＬｏｗＥｎｅｒｇｙ（以下、「ＢＬＥ」と呼ぶ。）などの、制御局と従属局などの主従関係を有する通信３０３とによって通信可能である。なお、無線ＬＡＮ及びＢＬＥは通信手法の一例であり、各通信装置は、２つ以上の通信機能を有し、例えば制御局と従属局との関係の中で通信を行う一方の通信機能によって、他方の通信機能の制御を行うことが可能であれば、他の通信手法が用いられてもよい。ただし、一般性を失うことなく、無線ＬＡＮなどの第１の通信は、ＢＬＥなどの第２の通信より高速な通信が可能であり、また、第２の通信は、第１の通信よりも消費電力が少ないか通信可能距離が短いかの少なくともいずれかであるものとする。 The imaging apparatus 101 and the smart device 301 communicate, for example, with a wireless LAN conforming to the IEEE 802.11 standard series, and master-slave such as a control station and a dependent station such as Bluetooth Low Energy (hereinafter, referred to as "BLE"). It is possible to communicate by means of the communication 303 which has a relationship. The wireless LAN and BLE are an example of a communication method, and each communication apparatus has two or more communication functions, for example, one communication function that performs communication in the relationship between the control station and the dependent station. Other communication methods may be used as long as control of the other communication function can be performed. However, without loss of generality, the first communication such as a wireless LAN can perform faster communication than the second communication such as BLE, and the second communication consumes more than the first communication. It is assumed that the power is low and / or the communicable distance is short.

スマートデバイス３０１の構成を、図４を用いて説明する。 The configuration of the smart device 301 will be described with reference to FIG.

スマートデバイス３０１は、例えば、無線ＬＡＮ用の無線ＬＡＮ制御回路４０１、及び、ＢＬＥ用のＢＬＥ制御回路４０２に加え、公衆無線通信用の公衆回線制御回路４０６を有する。また、スマートデバイス３０１は、パケット送受信回路４０３をさらに有する。無線ＬＡＮ制御回路４０１は、無線ＬＡＮのＲＦ制御、通信処理、ＩＥＥＥ８０２．１１規格シリーズに準拠した無線ＬＡＮによる通信の各種制御を行うドライバや無線ＬＡＮによる通信に関するプロトコル処理を行う。ＢＬＥ制御回路４０２は、ＢＬＥのＲＦ制御、通信処理、ＢＬＥによる通信の各種制御を行うドライバやＢＬＥによる通信に関するプロトコル処理を行う。公衆回線制御回路４０６は、公衆無線通信のＲＦ制御、通信処理、公衆無線通信の各種制御を行うドライバや公衆無線通信関連のプロトコル処理を行う。公衆無線通信は例えばＩＭＴ（ＩｎｔｅｒｎａｔｉｏｎａｌＭｕｌｔｉｍｅｄｉａＴｅｌｅｃｏｍｍｕｎｉｃａｔｉｏｎｓ）規格やＬＴＥ（ＬｏｎｇＴｅｒｍＥｖｏｌｕｔｉｏｎ）規格などに準拠したものである。パケット送受信回路４０３は、無線ＬＡＮ並びにＢＬＥによる通信及び公衆無線通信に関するパケットの送信と受信との少なくともいずれかを実行するための処理を行う。なお、本例では、スマートデバイス３０１は、通信においてパケットの送信と受信との少なくともいずれかを行うものとして説明するが、パケット交換以外に、例えば回線交換など、他の通信形式が用いられてもよい。 The smart device 301 has, for example, a public line control circuit 406 for public wireless communication in addition to a wireless LAN control circuit 401 for wireless LAN and a BLE control circuit 402 for BLE. The smart device 301 further includes a packet transmission / reception circuit 403. The wireless LAN control circuit 401 performs RF control of a wireless LAN, communication processing, a driver that performs various controls of communication by a wireless LAN conforming to the IEEE 802.11 standard series, and protocol processing regarding communication by a wireless LAN. The BLE control circuit 402 performs RF control of BLE, communication processing, and protocol processing regarding communication by a driver that performs various control of communication by BLE and BLE. The public line control circuit 406 performs RF control of public wireless communication, communication processing, a driver that performs various controls of public wireless communication, and protocol processing related to public wireless communication. The public wireless communication conforms to, for example, the International Multimedia Telecommunications (IMT) standard or the Long Term Evolution (LTE) standard. The packet transmission / reception circuit 403 performs processing for performing transmission and / or reception of packets related to communication by wireless LAN and BLE and public wireless communication. In this example, the smart device 301 is described as performing at least one of transmission and reception of a packet in communication, but other communication formats such as circuit switching may be used other than packet switching. Good.

スマートデバイス３０１は、例えば、制御回路４１１、記憶回路４０４、ＧＰＳ受信部４０５、表示装置４０７、操作部材４０８、音声入力音声処理回路４０９、電源回路４１０をさらに有する。制御回路４１１は、例えば、記憶回路４０４に記憶される制御プログラムを実行することにより、スマートデバイス３０１全体を制御する。記憶回路４０４は、例えば制御回路４１１が実行する制御プログラムと、通信に必要なパラメータ等の各種情報とを記憶する。後述する各種動作は、記憶回路４０４に記憶された制御プログラムを制御回路４１１が実行することにより、実現される。 The smart device 301 further includes, for example, a control circuit 411, a storage circuit 404, a GPS reception unit 405, a display device 407, an operation member 408, an audio input audio processing circuit 409, and a power supply circuit 410. The control circuit 411 controls the entire smart device 301 by executing a control program stored in the storage circuit 404, for example. The storage circuit 404 stores, for example, a control program executed by the control circuit 411 and various information such as parameters required for communication. Various operations to be described later are realized by the control circuit 411 executing a control program stored in the memory circuit 404.

電源回路４１０はスマートデバイス３０１に電力を供給する。表示装置４０７は、例えば、ＬＣＤやＬＥＤのように視覚で認知可能な情報の出力、又はスピーカー等の音出力が可能な機能を有し、各種情報の表示を行う。操作部材４０８は、例えばユーザによるスマートデバイス３０１の操作を受け付けるボタン等である。なお、表示装置４０７及び操作部材４０８は、例えばタッチパネルなどの共通する部材によって構成されてもよい。 The power supply circuit 410 supplies power to the smart device 301. The display device 407 has a function capable of outputting visually recognizable information such as an LCD or an LED, or a sound output such as a speaker, and displays various information. The operation member 408 is, for example, a button that receives an operation of the smart device 301 by the user. The display device 407 and the operation member 408 may be configured by a common member such as a touch panel, for example.

音声入力音声処理回路４０９は、例えばスマートデバイス３０１に内蔵された汎用的なマイクから、ユーザが発した音声を取得し、音声認識処理により、ユーザの操作命令を取得する構成にしてもよい。 The voice input voice processing circuit 409 may be configured to obtain a voice uttered by the user from, for example, a general-purpose microphone built in the smart device 301, and to obtain a user's operation command by voice recognition processing.

また、スマートデバイス内の専用のアプリケーションを介して、ユーザの発音により音声コマンドを取得する。そして、無線ＬＡＮによる通信３０２を介して、撮像装置１０１の音声処理回路２１４に特定音声コマンド認識させるための特定音声コマンドとして登録することもできる。 Also, a voice command is acquired by the user's pronunciation via a dedicated application in the smart device. Then, it can be registered as a specific voice command for causing the voice processing circuit 214 of the imaging apparatus 101 to recognize a specific voice command via the communication 302 by wireless LAN.

ＧＰＳ（Ｇｌｏｂａｌｐｏｓｉｔｉｏｎｉｎｇｓｙｓｔｅｍ）４０５は、衛星から通知されるＧＰＳ信号を受信し、ＧＰＳ信号を解析し、スマートデバイス３０１の現在位置（経度・緯度情報）を推定する。もしくは、位置推定は、ＷＰＳ（Ｗｉ−ＦｉＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ）等を利用して、周囲に存在する無線ネットワークの情報に基づいて、スマートデバイス３０１の現在位置を推定するようにしてもよい。取得した現在のＧＰＳ位置情報が予め事前に設定されている位置範囲（所定半径の範囲以内）に位置している場合に、ＢＬＥ制御回路４０２を介して撮像装置１０１へ移動情報を通知し、後述する自動撮影や自動編集のためのパラメータとして使用する。また、ＧＰＳ位置情報に所定以上の位置変化があった場合に、ＢＬＥ制御回路４０２を介して撮像装置１０１へ移動情報を通知し、後述する自動撮影や自動編集のためのパラメータとして使用する。 A GPS (Global Positioning System) 405 receives a GPS signal notified from a satellite, analyzes the GPS signal, and estimates the current position (longitude / latitude information) of the smart device 301. Alternatively, position estimation may estimate the current position of the smart device 301 based on information of a wireless network present in the vicinity using a Wi-Fi Positioning System (WPS) or the like. When the acquired current GPS position information is located in a position range (within a predetermined radius range) set in advance, movement information is notified to the imaging apparatus 101 via the BLE control circuit 402, and will be described later. Use as a parameter for automatic shooting and editing. In addition, when there is a position change of predetermined position or more in the GPS position information, the movement information is notified to the imaging apparatus 101 via the BLE control circuit 402, and used as a parameter for automatic photographing and automatic editing described later.

上記のように撮像装置１０１とスマートデバイス３０１は、無線ＬＡＮ制御回路４０１、及び、ＢＬＥ制御回路４０２を用いた通信により、撮像装置１０１とデータのやりとりを行う。例えば、音声信号、画像信号、圧縮音声信号、圧縮画像信号などのデータを送信したり受信したりする。また、スマートデバイスから撮像装置１０１の撮影などの操作指示であったり、音声コマンド登録データ送信や、ＧＰＳ位置情報に基づいた所定位置検出通知や場所移動通知を行う。また、スマートデバイス内の専用のアプリケーションを介しての学習用データの送受信も行う。 As described above, the imaging apparatus 101 and the smart device 301 exchange data with the imaging apparatus 101 by communication using the wireless LAN control circuit 401 and the BLE control circuit 402. For example, it transmits and receives data such as an audio signal, an image signal, a compressed audio signal, and a compressed image signal. Further, the smart device issues an operation instruction such as photographing of the imaging apparatus 101, transmission of voice command registration data, notification of predetermined position detection based on GPS position information, and notification of location movement. It also sends and receives learning data via a dedicated application in the smart device.

＜アクセサリ類の構成＞
図５は、撮像装置１０１と通信可能である外部装置５０１との構成例を示す図である。撮像装置１０１は撮影機能を有するデジタルカメラであり、外部装置５０１は、例えばＢｌｕｅｔｏｏｔｈ通信モジュールなどにより撮像装置１０１と通信可能である各種センシング部を含むウエアラブルデバイスである。 <Configuration of accessories>
FIG. 5 is a diagram showing a configuration example of an external apparatus 501 that can communicate with the imaging apparatus 101. The imaging apparatus 101 is a digital camera having a photographing function, and the external apparatus 501 is a wearable device including various sensing units capable of communicating with the imaging apparatus 101 by, for example, a Bluetooth communication module.

ウエアラブルデバイス５０１において、例えばユーザの腕などに装着できるような構成なっており、所定の周期でユーザの脈拍、心拍、血流等の生体情報を検出するセンサやユーザの運動状態を検出できる加速度センサ等が搭載されている。 The wearable device 501 is configured to be attached to, for example, a user's arm, and is a sensor that detects biological information such as the user's pulse, heart rate, blood flow, etc. at a predetermined cycle, and an acceleration sensor that can detect the user's exercise state. Etc. are mounted.

生体情報検出回路５０２は、例えば、脈拍を検出する脈拍センサ、心拍を検出する心拍センサ、血流を検出する血流センサ、導電性高分子による皮膚の接触によって電位の変化を検出したことを検知するセンサを含む。本実施形態では、生体情報検出回路５０２として心拍センサを用いて説明する。心拍センサは、例えばＬＥＤ等を用いて皮膚に赤外光を照射し、体組織を透過した赤外光を受光センサで検出して信号処理することによりユーザの心拍を検出する。生体情報検出回路５０２は、検出した生体情報を信号として、後述する制御回路６０７へ出力する。 The biological information detection circuit 502 detects, for example, a change in electric potential detected by contact of the skin with a conductive polymer, a pulse sensor that detects a pulse, a heartbeat sensor that detects a heartbeat, a blood flow sensor that detects blood flow, Include sensors that The present embodiment will be described using a heart rate sensor as the living body information detection circuit 502. The heart rate sensor irradiates the skin with infrared light using, for example, an LED or the like, and detects the heart rate of the user by detecting the infrared light transmitted through the body tissue with a light receiving sensor and processing the signal. The biological information detection circuit 502 outputs the detected biological information as a signal to a control circuit 607 described later.

ユーザの運動状態を検出する揺れ検出回路５０３は、例えば、加速度センサやジャイロセンサを搭載しており、加速度の情報に基づきユーザが移動しているかどうかや、腕を振り回してアクションをしているかどうかなどのモーションを検出することができる。 The shake detection circuit 503 for detecting the motion state of the user includes, for example, an acceleration sensor or a gyro sensor, and whether or not the user is moving based on acceleration information, or whether the user is swinging an arm And other motions can be detected.

また、ユーザによるウエアラブルデバイス５０１の操作を受け付ける操作部材５０５や、ＬＣＤやＬＥＤのように視覚で認知可能な情報を出力する表示装置５０４が搭載される。 In addition, an operation member 505 that receives an operation of the wearable device 501 by the user, and a display device 504 that outputs visually recognizable information such as an LCD or an LED are mounted.

ウエアラブルデバイス５０１の構成を、図６を用いて説明する。 The configuration of the wearable device 501 will be described with reference to FIG.

ウエアラブルデバイス５０１は、例えば、制御回路６０７、通信回路６０１、生体情報検出回路５０２、揺れ検出回路５０３、表示装置５０４、操作部材５０５、電源回路６０６、記憶回路６０８を有する。 The wearable device 501 includes, for example, a control circuit 607, a communication circuit 601, a biological information detection circuit 502, a shake detection circuit 503, a display device 504, an operation member 505, a power supply circuit 606, and a storage circuit 608.

制御回路６０７は、例えば、記憶回路６０８に記憶される制御プログラムを実行することにより、ウエアラブルデバイス５０１全体を制御する。記憶回路６０８は、例えば制御回路６０７が実行する制御プログラムと、通信に必要なパラメータ等の各種情報とを記憶する。後述する各種動作は、例えば記憶回路６０８に記憶された制御プログラムを制御回路６０７が実行することにより、実現される。 The control circuit 607 controls the entire wearable device 501, for example, by executing a control program stored in the storage circuit 608. The storage circuit 608 stores, for example, a control program executed by the control circuit 607 and various information such as parameters required for communication. Various operations to be described later are realized, for example, by the control circuit 607 executing a control program stored in the memory circuit 608.

電源回路６０６はウエアラブルデバイス５０１に電力を供給する。表示装置５０４は、例えば、ＬＣＤやＬＥＤのように視覚で認知可能な情報の出力、又はスピーカー等の音出力が可能な機能を有し、各種情報の表示を行う。操作部材５０５は、例えばユーザによるウエアラブルデバイス５０１の操作を受け付けるボタン等である。なお、表示装置５０４及び操作部材５０５は、例えばタッチパネルなどの共通する部材によって構成されてもよい。 The power supply circuit 606 supplies power to the wearable device 501. The display device 504 has a function capable of outputting visually recognizable information, such as an LCD or an LED, or a sound output such as a speaker, and displays various types of information. The operation member 505 is, for example, a button that receives an operation of the wearable device 501 by the user. The display device 504 and the operation member 505 may be configured by a common member such as a touch panel, for example.

また、操作部材は、例えばウエアラブルデバイス５０１に内蔵された汎用的なマイクから、ユーザが発した音声を取得し、音声処理によりユーザが発した音声を取得し、音声認識処理により、ユーザの操作命令を取得する構成にしてもよい。 In addition, the operation member acquires voice uttered by the user from, for example, a general-purpose microphone built in the wearable device 501, acquires voice uttered by the user by voice processing, and receives user's operation instruction by voice recognition processing. May be acquired.

生体情報検出回路５０２や揺れ検出回路５０３から制御回路６０７で処理された各種検出情報は、通信回路６０１により、撮像装置１０１へ送信される。 The various detection information processed by the control circuit 607 from the biological information detection circuit 502 and the shake detection circuit 503 is transmitted to the imaging device 101 by the communication circuit 601.

例えば、ユーザの心拍の変化を検出したタイミングで検出情報を撮像装置１０１に送信したり、歩行移動／走行移動／立ち止まりなどの移動状態の変化のタイミングで検出情報を送信したりする。また、例えば、予め設定された腕ふりのモーションを検出したタイミングで検出情報を送信したり、予め設定された距離の移動を検出したタイミングで検出情報を送信したりする。 For example, the detection information is transmitted to the imaging apparatus 101 at the timing at which a change in the user's heartbeat is detected, or the detection information is transmitted at the change timing of the movement state such as walking movement / running movement / stop. Also, for example, detection information is transmitted at a timing at which a pre-set arm-like motion is detected, or detection information is transmitted at a timing at which movement of a preset distance is detected.

＜撮像動作のシーケンス＞
図７は、本実施形態における撮像装置１０１の第１制御回路２２３が受け持つ動作の例を説明するフローチャートである。 <Sequence of imaging operation>
FIG. 7 is a flowchart for explaining an example of the operation of the first control circuit 223 of the imaging apparatus 101 according to the present embodiment.

ユーザが撮像装置１０１に設けられた電源ボタンを操作すると、第１電源回路２１０により電力供給部から、第１制御回路２２３及び撮像装置１０１の各ブロックに電力を供給させる。 When the user operates a power button provided on the imaging apparatus 101, the first power supply circuit 210 causes the power supply unit to supply power to each block of the first control circuit 223 and the imaging apparatus 101.

また、同様に第２制御回路２１１においても第２電源回路２１２により電力供給部から、第２制御回路に電力を供給させるが、第２制御回路の動作の詳細については後述する図８のフローチャートを用いて説明する。 Similarly, in the second control circuit 211, power is supplied from the power supply unit to the second control circuit by the second power supply circuit 212. For details of the operation of the second control circuit, see the flowchart of FIG. It demonstrates using.

電力が供給されると、図７の処理がスタートする。ステップＳ７０１（以下では、「ステップ」を単に「Ｓ」と省略する）では、起動条件の読み込みが行われる。本実施形態においては、起動条件は以下である。
（１）電源ボタンが手動で押下されて電源起動
（２）外部機器（例えば３０１）からの外部通信（例えばＢＬＥ通信）による指示で電源起動
（３）Ｓｕｂプロセッサ（第２制御回路２１１）から、電源起動
ここで、（３）のＳｕｂプロセッサから電源起動の場合は、Ｓｕｂプロセッサ内で演算された起動条件が読み込まれることになるが、詳細は後述する図８で説明する。 When power is supplied, the process of FIG. 7 starts. In step S701 (hereinafter, "step" is simply referred to as "S"), the activation condition is read. In the present embodiment, the activation condition is as follows.
(1) The power supply button is manually pressed and power supply activation (2) From the power supply activation (3) Sub processor (second control circuit 211) according to the instruction by the external communication (for example, BLE communication) from the external device (for example 301) Power Startup Here, in the case of power startup from the Sub processor of (3), the startup condition calculated in the Sub processor is read, but the details will be described later with reference to FIG.

また、ここで読み込まれた起動条件は、被写体探索や自動撮影時の１つのパラメータ要素として用いられるが、後述して説明する。起動条件読み込みが終了するとＳ７０２に進む。 Also, the start condition read in here is used as one parameter element at the time of subject search and automatic photographing, which will be described later. When the start condition reading is completed, the process proceeds to S702.

Ｓ７０２では、各種センサの読み込みが行われる。ここで読み込まれるセンサは、装置揺れ検出回路２０９からのジャイロセンサや加速度センサなどの振動検出するセンサであったりする。また、チルト回転ユニット１０４やパン回転ユニット１０５の回転位置であったりする。また、音声処理回路２１４にて検出される音声レベルや特定音声認識の検出トリガーや音方向検出だったりする。 In S702, reading of various sensors is performed. The sensor read in here is a sensor for detecting vibration such as a gyro sensor from the apparatus shake detection circuit 209 or an acceleration sensor. In addition, the rotational position of the tilt rotation unit 104 or the pan rotation unit 105 may be set. In addition, it may be a detection trigger of a voice level detected by the voice processing circuit 214, a specific voice recognition, or a sound direction detection.

また、図１乃至図６には図示していないが、環境情報を検出するセンサでも情報を取得する。 Further, although not shown in FIGS. 1 to 6, a sensor that detects environmental information also acquires information.

例えば、所定の周期で撮像装置１０１の周辺の温度を検出する温度センサや、撮像装置１０１の周辺の気圧の変化を検出する気圧センサがある。また、撮像装置１０１の周辺の明るさを検出する照度センサや、撮像装置１０１の周辺の湿度を検出する湿度センサや、撮像装置１０１の周辺の紫外線量を検出するＵＶセンサ等を備えてもよい。検出した温度情報や気圧情報や明るさ情報や湿度情報やＵＶ情報に加え、検出した各種情報から所定時間間隔での変化率を算出した温度変化量や気圧変化量や明るさ変化量や湿度変化量や紫外線変化量などを後述する自動撮影などの判定に使用する。 For example, there are a temperature sensor that detects the temperature around the imaging device 101 at a predetermined cycle, and an air pressure sensor that detects a change in air pressure around the imaging device 101. In addition, an illuminance sensor that detects the brightness around the imaging device 101, a humidity sensor that detects the humidity around the imaging device 101, a UV sensor that detects the amount of ultraviolet light around the imaging device 101, or the like may be provided. . In addition to the detected temperature information, barometric pressure information, brightness information, humidity information, and UV information, the temperature change amount, pressure change amount, brightness change amount, humidity change, etc. are calculated by calculating the change rate at predetermined time intervals from the detected various information. It is used for determination of automatic photographing etc. which mention the amount and the amount of change of ultraviolet rays later.

Ｓ７０２で各種センサ読み込みが行われるとＳ７０３に進む。 When various sensors are read in S702, the process proceeds to S703.

Ｓ７０３では、外部機器からの通信が指示されているかを検出し、通信指示があった場合、外部機器との通信を行う。 In S703, it is detected whether communication from the external device is instructed, and when communication is instructed, communication with the external device is performed.

例えば、スマートデバイス３０１から、無線ＬＡＮやＢＬＥを介した、リモート操作であったり、音声信号、画像信号、圧縮音声信号、圧縮画像信号などのデータを送信したり受信したりする。また、スマートデバイス３０１からの撮像装置１０１の撮影などの操作指示や、音声コマンド登録データ送信や、ＧＰＳ位置情報に基づいた所定位置検出通知や場所移動通知や学習用データの送受信の指示があるかどうかの読み込みを行う。 For example, data such as an audio signal, an image signal, a compressed audio signal, or a compressed image signal is transmitted or received from the smart device 301 via wireless LAN or BLE. In addition, whether there is an operation instruction such as shooting of the imaging device 101 from the smart device 301, transmission of voice command registration data, an instruction of transmission / reception of predetermined position detection notification based on GPS position information, location movement notification and learning data Do you read?

また、例えば、ウエアラブルデバイス５０１から、ユーザの運動情報、腕のアクション情報、心拍などの生体情報の更新がある場合、ＢＬＥを介した情報の読み込みを行う。また、上述した環境情報を検出する各種センサは、撮像装置１０１に搭載してもよいが、スマートデバイス３０１或いはウエアラブルデバイス５０１に搭載していてもよく、その場合、ＢＬＥを介した環境情報の読み込みも行う。Ｓ７０３で外部機器からの通信読み込みが行われると、Ｓ７０４に進む。 Also, for example, when there is an update of biological information such as user's exercise information, arm action information, and heartbeat from the wearable device 501, information is read via BLE. The various sensors for detecting the environmental information described above may be mounted on the imaging apparatus 101, or may be mounted on the smart device 301 or the wearable device 501. In that case, the environmental information is read via BLE Do also. When communication reading from the external device is performed in S703, the process proceeds to S704.

Ｓ７０４では、モード設定判定が行われる。Ｓ７０４で設定されるモードは、以下の内から判定され選ばれる。 In S704, mode setting determination is performed. The mode set in step S704 is determined and selected from the following.

（１）自動撮影モード
［モード判定条件］
後述する学習により設定された各検出情報（画像、音、時間、振動、場所、身体の変化、環境変化）や、自動撮影モードに移行してからの経過時間や、過去の撮影情報などから、自動撮影を行うべきと判定されると、自動撮影モードに設定される。 (1) Automatic shooting mode [mode judgment condition]
From each detection information (image, sound, time, vibration, location, change in the body, environmental change) set by learning described later, elapsed time after transition to the automatic shooting mode, and past shooting information, etc. When it is determined that the automatic shooting should be performed, the automatic shooting mode is set.

［モード内処理］
自動撮影モード処理（Ｓ７１０）では、各検出情報（画像、音、時間、振動、場所、体の変化、環境変化）に基づいて、パン・チルトやズームを駆動して被写体を自動探索する。そして、ユーザの好みの撮影が行えるタイミングであると判定されると、静止画一枚撮影、静止画連続撮影、動画撮影、パノラマ撮影、タイムラプス撮影など様々な撮影方法の中から、撮影方法の判定処理が行われ、自動で撮影が行われる。 [In-mode processing]
In the automatic photographing mode processing (S710), pan / tilt and zoom are driven to automatically search for a subject based on each detection information (image, sound, time, vibration, place, body change, environment change). Then, when it is determined that it is the timing at which the user's favorite shooting can be performed, determination of the shooting method among various shooting methods such as single still image shooting, still image continuous shooting, moving image shooting, panoramic shooting, time-lapse shooting Processing is performed and photographing is performed automatically.

（２）学習モード
［モード判定条件］
前回学習処理を行ってからの経過時間と、学習に使用することのできる画像に対応付けられた情報や学習データの数などから、自動学習を行うべきと判定されると、自動学習モードに設定される。または、スマートデバイス３０１からの通信を介して学習データが設定されるように指示があった場合も本モードに設定される。 (2) Learning mode [mode judgment condition]
If it is determined that automatic learning should be performed from the elapsed time since the previous learning process and information associated with images that can be used for learning, the number of learning data, etc., the automatic learning mode is set. Be done. Alternatively, this mode is also set when there is an instruction to set learning data via communication from the smart device 301.

［モード内処理］
自動学習モード処理（Ｓ７１２）では、ユーザの好みに合わせた学習を行う。スマートデバイス３０１での各操作、スマートデバイス３０１からの学習情報通知などの情報を基にニューラルネットワークを用いて、ユーザの好みに合わせた学習が行われる。スマートデバイス３０１での各操作の情報としては、例えば、撮像装置からの画像取得情報、専用アプリケーションを介して手動による編集指示がされた情報、撮像装置内の画像に対してユーザが入力した判定値情報がある。 [In-mode processing]
In the automatic learning mode process (S712), learning is performed according to the user's preference. Based on information such as each operation in the smart device 301 and notification of learning information from the smart device 301, learning is performed according to the user's preference using a neural network. As information on each operation in the smart device 301, for example, image acquisition information from the imaging apparatus, information on which a manual editing instruction is given via a dedicated application, and a determination value input by the user with respect to the image in the imaging apparatus There is information.

なお、自動撮影モード処理、学習モード処理についての詳細は、後述する。 The details of the automatic shooting mode process and the learning mode process will be described later.

Ｓ７０５ではＳ７０４でモード設定判定が低消費電力モードに設定されているかどうかを判定する。低消費電力モード判定は、後述する「自動撮影モード」と「学習モード」の何れのモードの判定条件でもない場合は、低消費電力モードになるように判定される。判定処理が行われるとＳ７０５に進む。 In step S705, it is determined in step S704 whether the mode setting determination is set to the low power consumption mode. The low power consumption mode determination is made to be in the low power consumption mode when it is not the determination condition of either the “automatic photographing mode” or the “learning mode” described later. When the determination process is performed, the process proceeds to step S705.

Ｓ７０５では、低消費電力モード条件であると判定されれば、Ｓ７０６に進む。 In S705, if it is determined that the low power consumption mode condition is satisfied, the process proceeds to S706.

Ｓ７０６では、Ｓｕｂプロセッサ（第２制御回路２１１）へ、Ｓｕｂプロセッサ内で判定する起動要因に係る各種パラメータを（揺れ検出判定用パラメータ、音検出用パラメータ、時間経過検出パラメータ）を通知する。各種パラメータは後述する学習処理にて、学習されることによって値が変化する。Ｓ７０６の処理を終了すると、Ｓ７０７に進み、Ｍａｉｎプロセッサ（第１制御回路２２３）の電源をＯＦＦして、処理を終了する。 In S706, the Sub processor (second control circuit 211) is notified of various parameters related to the activation factor to be determined in the Sub processor (the shake detection determination parameter, the sound detection parameter, and the time lapse detection parameter). Various parameters change in value by being learned in a learning process described later. When the process of S706 is completed, the process proceeds to S707, the power of the Main processor (first control circuit 223) is turned off, and the process is completed.

一方、Ｓ７０５で低消費電力モードでないと判定されると、Ｓ７０９に進み、モード設定が自動撮影モードであるか否かを判定し、自動撮影モードであればＳ７１０に進み、自動撮影モード処理が行われる。処理が終了すると、Ｓ７０２に戻り、処理を繰り返す。Ｓ７０９で、自動撮影モードでないと判定されると、Ｓ７１１に進む。 On the other hand, if it is determined in S705 that the mode is not the low power consumption mode, the process proceeds to S709 to determine whether the mode setting is the automatic shooting mode. If the automatic shooting mode is selected, the process proceeds to S710 and the automatic shooting mode processing is performed. It will be. When the process ends, the process returns to S702 and repeats the process. If it is determined in S709 that the automatic shooting mode is not set, the process proceeds to S711.

Ｓ７１１では、モード設定が学習モードであるか否かを判定し、学習モードであればＳ７１２に進み、学習モード処理が行われる。処理が終了すると、Ｓ７０２に戻り、処理を繰り返す。Ｓ７１１で、学習モードでないと判定されると、Ｓ７０２に戻り、処理を繰り返す。 In S711, it is determined whether or not the mode setting is the learning mode, and if it is the learning mode, the process proceeds to S712, and learning mode processing is performed. When the process ends, the process returns to S702 and repeats the process. If it is determined in S711 that the learning mode is not set, the process returns to S702 to repeat the process.

図８は、本実施形態における撮像装置１０１の第２制御回路２１１が受け持つ動作の例を説明するフローチャートである。 FIG. 8 is a flowchart for explaining an example of the operation of the second control circuit 211 of the imaging apparatus 101 according to this embodiment.

ユーザが撮像装置１０１に設けられた電源ボタンを操作すると、第１電源回路２１０により電力供給部から第１制御回路２２３へ電力が供給されるのと同様に第２制御回路２１１においても第２電源回路２１２により電力供給部から第２制御回路２１１に電力が供給される。電力が供給されると、Ｓｕｂプロセッサ（第２制御回路２１１）が起動され、図８の処理がスタートする。 When the user operates the power button provided on the imaging apparatus 101, the second power supply is also supplied to the second control circuit 211 in the same manner as the first power supply circuit 210 supplies power from the power supply unit to the first control circuit 223. Power is supplied from the power supply unit to the second control circuit 211 by the circuit 212. When power is supplied, the Sub processor (second control circuit 211) is activated, and the process of FIG. 8 starts.

Ｓ８０１では、サンプリングの周期となる所定期間が経過したか否かを判定する。例えば１０ｍｓｅｃに設定された場合、１０ｍｓｅｃ周期で、Ｓ８０２に進む。所定期間が経過していないと判定されると、Ｓｕｂプロセッサは何も処理をせずにＳ８０１に戻って所定期間が経過するのを待つ。 In S801, it is determined whether or not a predetermined period which is a sampling cycle has elapsed. For example, if it is set to 10 msec, the process proceeds to S802 in a 10 msec cycle. If it is determined that the predetermined period has not elapsed, the Sub processor returns to S801 without performing any processing and waits for the predetermined period to elapse.

Ｓ８０２では、揺れ検出値が取得される。揺れ検出値は、装置揺れ検出回路２０９からのジャイロセンサや加速度センサなどの振動検出するセンサからの出力値である。 In S802, a shake detection value is obtained. The shake detection value is an output value from a sensor that detects vibration such as a gyro sensor or an acceleration sensor from the device shake detection circuit 209.

Ｓ８０２で揺れ検出値が取得されると、Ｓ８０３に進み、予め設定された揺れ状態検出の処理を行う。いくつかの例を説明する。 When the shake detection value is acquired in S802, the process proceeds to S803, and a process of detecting a shake state set in advance is performed. Some examples are described.

（１）タップ検出
ユーザが撮像装置１０１を例えば指先などで叩いた状態（タップ状態）を、撮像装置１０１に取り付けられた加速度センサの出力値より検出することが可能である。３軸の加速度センサの出力を所定サンプリング周期で特定の周波数領域に設定したバンドパスフィルタ（ＢＰＦ）に通すことで、タップによる加速度変化の信号領域を抽出することができる。ＢＰＦ後の加速度信号を所定時間ＴｉｍｅＡ間に、所定閾値ＴｈｒｅｓｈＡを超えた回数が、所定回数ＣｏｕｎｔＡであるか否かにより、タップ検出を行う。ダブルタップの場合は、ＣｏｕｎｔＡは２に設定され、トリプルタップの場合は、ＣｏｕｎｔＡは３に設定される。 (1) Tap Detection A state (tap state) in which the user taps the imaging device 101 with, for example, a fingertip can be detected from an output value of an acceleration sensor attached to the imaging device 101. The signal area of the acceleration change due to the tap can be extracted by passing the output of the acceleration sensor of three axes through a band pass filter (BPF) set in a specific frequency area at a predetermined sampling cycle. Tap detection is performed based on whether the number of times the acceleration signal after BPF exceeds the predetermined threshold value ThreshA is the predetermined number of times CountA during the predetermined time TimeA. For double taps, CountA is set to 2, and for triple taps, CountA is set to 3.

（２）揺れ状態の検出
撮像装置１０１の揺れ状態を、撮像装置１０１に取り付けられたジャイロセンサや加速度センサの出力値より検出することが可能である。ジャイロセンサや加速度センサの出力をＨＰＦで高周波成分をカットし、ＬＰＦで低周波成分をカットした後、絶対値変換を行う。算出した絶対値が所定時間ＴｉｍｅＢ間に、所定閾値ＴｈｒｅｓｈＢを超えた回数が、所定回数ＣｏｕｎｔＢ以上であるか否かにより、振動検出を行う。例えば撮像装置１０１を机などに置いたような揺れが小さい状態か、ウエアラブルで撮像装置１０１を装着し歩いているような揺れが大きい状態かを判定することが可能である。また、判定閾値や判定のカウント数の条件を複数もつことで、揺れレベルに応じた細かい揺れ状態を検出することも可能である。 (2) Detection of Swing State The swing state of the imaging device 101 can be detected from the output value of a gyro sensor or an acceleration sensor attached to the imaging device 101. The output of the gyro sensor or the acceleration sensor is cut by HPF and the low frequency component is cut by LPF, and then the absolute value conversion is performed. The vibration detection is performed based on whether or not the number of times the calculated absolute value exceeds the predetermined threshold value ThreshB is equal to or more than the predetermined number of times CountB during the predetermined time TimeB. For example, it is possible to determine whether the shake is small when the imaging apparatus 101 is placed on a desk or the like, or is large when the imaging apparatus 101 is worn and walked with a wearable. Moreover, it is also possible to detect a fine shaking state according to the shaking level by having a plurality of conditions of the judgment threshold value and the judgment count number.

Ｓ８０３で特定揺れ状態検出処理が行われると、Ｓ８０４に進み、予め設定された特定音検出処理を行う。いくつかの例を説明する。 When the specific vibration state detection process is performed in S803, the process proceeds to S804, and a predetermined specific sound detection process is performed. Some examples are described.

（１）特定音声コマンド検出
特定の音声コマンドを検出する。音声コマンドは事前に登録されたいくつかのコマンドの他、ユーザが特定音声を撮像装置に登録できる。 (1) Specific voice command detection A specific voice command is detected. The voice command allows the user to register a specific voice in the imaging device, in addition to several commands registered in advance.

（２）特定音シーン認識
予め大量の音声データを基に機械学習により学習させたネットワークにより音シーン判定を行う。例えば、「歓声が上がっている」、「拍手している」、「声を発している」などの特定シーンを検出する。 (2) Specific Sound Scene Recognition Sound scene determination is performed by a network learned in advance by machine learning based on a large amount of audio data. For example, it detects a specific scene such as "Cheers up", "Applause", "Speaking" or the like.

（３）音レベル判定
所定時間の間で、音レベルの大きさがレベル所定値を超えている時間を加算するなどの方法によって、音レベル判定による検出を行う。 (3) Sound Level Determination Detection based on sound level determination is performed by adding time during which the magnitude of the sound level exceeds the level predetermined value during a predetermined time.

（４）音方向判定
複数のマイクが設置された平面上の音の方向を検出することができ、所定の大きさの音レベルに対して、音の方向を検出する。 (4) Sound Direction Determination It is possible to detect the direction of the sound on the plane where a plurality of microphones are installed, and to detect the direction of the sound with respect to the sound level of a predetermined magnitude.

音声処理回路２１４内で上記の判定処理が行われており、特定音検出がされたかをＳ８０４で判定する。 The above determination process is performed in the audio processing circuit 214, and it is determined in S804 whether the specific sound is detected.

Ｓ８０４で特定音検出処理が行われると、Ｓ８０５に進む。Ｓ８０５では、Ｍａｉｎプロセッサ（第１制御回路２２３）はＯＦＦ状態であるか否かを判定し、ＭａｉｎプロセッサがＯＦＦ状態であれば、Ｓ８０６に進み、予め設定された時間経過検出処理を行う。ＭａｉｎプロセッサがＯＮからＯＦＦへ遷移したときからの経過時間が計測されており、経過時間がパラメータＴｉｍｅＣ以上であれば、時間経過と判定され、ＴｉｍｅＣより小さければ、時間経過とは判定されない。 When the specific sound detection process is performed in S804, the process proceeds to S805. In S805, it is determined whether or not the Main processor (first control circuit 223) is in the OFF state, and if the Main processor is in the OFF state, the process proceeds to S806, and the processing for detecting a lapse of time set in advance is performed. The elapsed time from when the main processor transits from ON to OFF is measured, and if the elapsed time is greater than or equal to the parameter TimeC, it is determined that the time has elapsed, and if smaller than TimeC, it is not determined that the time has elapsed.

Ｓ８０６で時間経過検出処理が行われると、Ｓ８０７に進み、低消費電力モード解除判定がされたかを判定する。低消費電力モード解除条件は以下によって判定される。
（１）特定揺れ検出の判定条件
（２）特定音検出の判定条件
（３）時間経過判定の判定条件
それぞれ、Ｓ８０３での特定揺れ状態検出処理により、特定揺れ検出の判定条件に入ったか否かを判定できる。また、Ｓ８０４での特定音検出処理により、特定音検出の判定条件に入ったか否かを判定できる。また、Ｓ８０６での時間経過検出処理により、時間経過検出の判定条件に入ったか否かを判定できる。したがって、何れか一つ以上の条件に入っていれば、低消費電力モード解除を行うような判定が行われる。 When the time lapse detection process is performed in S806, the process proceeds to S807, and it is determined whether the low power consumption mode cancellation determination has been made. The low power consumption mode release condition is determined by the following.
(1) Judgment condition of specific vibration detection (2) Judgment condition of specific sound detection (3) Judgment condition of time lapse judgment Whether or not the specific vibration condition detection process in S803 entered the judgment condition of specific vibration detection Can be determined. In addition, it is possible to determine whether or not the determination condition of the specific sound detection has been entered by the specific sound detection process in S804. In addition, it is possible to determine whether or not the determination condition of time lapse detection has been entered by the time lapse detection processing in S806. Therefore, if any one or more conditions are entered, it is determined that the low power consumption mode is canceled.

Ｓ８０７で解除条件判定されると、Ｓ８０８に進みＭａｉｎプロセッサの電源をＯＮし、Ｓ８０９で、低消費電力モード解除と判定された条件（揺れ、音、時間）をＭａｉｎプロセッサに通知し、Ｓ８０１に戻り処理をループする。 If the cancellation condition is determined in S807, the process proceeds to S808, the power of the Main processor is turned on, and the main processor is notified of the condition (shake, sound, time) determined to cancel the low power consumption mode in S809, and the process returns to S801. Loop the process.

Ｓ８０７で何れの解除条件にも当てはまらず、低消費電力モード解除判定でないと判定されると、Ｓ８０１に戻り処理をループする。 If it is determined in S807 that any release condition is not satisfied and it is determined that the low power consumption mode release determination is not made, the process returns to S801 and loops the processing.

Ｓ８０５で、ＭａｉｎプロセッサがＯＮ状態であると判定されている場合、Ｓ８０２乃至８０５までで取得した情報をＭａｉｎプロセッサに通知し、Ｓ８０１に戻り処理をループする。 If it is determined in S805 that the Main processor is in the ON state, the information acquired in S802 to S805 is notified to the Main processor, and the processing returns to S801 and loops.

本実施形態においては、ＭａｉｎプロセッサがＯＮ状態においても揺れ検出や特定音検出をＳｕｂプロセッサで行い、検出結果をＭａｉｎプロセッサに通知する構成にしている。しかしながら、ＭａｉｎプロセッサがＯＮの場合は、Ｓ８０２乃至８０５の処理を行わず、Ｍａｉｎプロセッサ内の処理（図７のＳ７０２）で揺れ検出や特定音検出を検出する構成にしてもよい。 In the present embodiment, even when the Main processor is in the ON state, vibration detection and specific sound detection are performed by the Sub processor, and the detection result is notified to the Main processor. However, when the Main processor is ON, the processing in S802 to 805 may not be performed, and the fluctuation detection or the specific sound detection may be detected in the processing in the Main processor (S702 in FIG. 7).

上記、揺れ検出や音検出や時間経過による低消費電力モード解除方法について詳しく説明したが、環境情報により低消費電力モード解除を行ってもよい。環境情報は温度や気圧や明るさや湿度や紫外線量の絶対量や変化量が所定閾値を超えたか否かで判定することができる。 Although the low power consumption mode cancellation method based on vibration detection, sound detection, and passage of time has been described in detail, the low power consumption mode cancellation may be performed by environmental information. The environmental information can be determined based on whether the absolute amount or the change amount of the temperature, the atmospheric pressure, the brightness, the humidity, and the ultraviolet light amount exceeds a predetermined threshold.

＜自動撮影モード処理＞
図９を用いて、自動撮影モード処理の詳細を説明する。前述したように、以下の処理は、本実施形態における撮像装置１０１の第１制御回路２２３が制御を受け持つ。 <Automatic shooting mode processing>
The details of the automatic shooting mode process will be described with reference to FIG. As described above, the following process is controlled by the first control circuit 223 of the imaging apparatus 101 in the present embodiment.

Ｓ９０１では、画像処理回路２０７に撮像部２０６で取り込まれた信号を画像処理させ、被写体認識用の画像を生成させる。 In step S901, the image processing circuit 207 performs image processing on the signal acquired by the imaging unit 206 to generate an image for object recognition.

生成された画像からは、人物や物体認識などの被写体認識が行われる。 From the generated image, subject recognition such as human and object recognition is performed.

人物を認識する場合、被写体の顔や人体を検出する。顔検出処理では、人物の顔を判断するためのパターンが予め定められており、撮像された画像内に含まれる該パターンに一致する箇所を人物の顔画像として検出することができる。 When recognizing a person, the face or human body of the subject is detected. In the face detection process, a pattern for determining the face of a person is determined in advance, and a portion matching the pattern included in the captured image can be detected as a face image of the person.

また、被写体の顔としての確からしさを示す信頼度も同時に算出し、信頼度は、例えば画像内における顔領域の大きさや、顔パターンとの一致度等から算出される。 Further, the reliability indicating the certainty as the face of the subject is also calculated at the same time, and the reliability is calculated from, for example, the size of the face area in the image, the degree of coincidence with the face pattern, and the like.

物体認識についても同様に、予め登録されたパターンに一致する物体を認識することができる。 Similarly for object recognition, it is possible to recognize an object that matches the pattern registered in advance.

また、撮像された画像内の色相や彩度等のヒストグラムを使用する方法で特徴被写体を抽出する方法などもある。この場合、撮影画角内に捉えられている被写体の画像に関し、その色相や彩度等のヒストグラムから導出される分布を複数の区間に分け、区間ごとに撮像された画像を分類する処理が実行される。 Further, there is also a method of extracting a characteristic subject by a method of using a histogram such as hue or saturation in a captured image. In this case, with respect to the image of the subject captured within the shooting angle of view, the process of dividing the distribution derived from the histogram of the hue, saturation, etc. into a plurality of sections and classifying the captured images into sections is performed. Be done.

例えば、撮像された画像について複数の色成分のヒストグラムが作成され、その山型の分布範囲で区分けし、同一の区間の組み合わせに属する領域にて撮像された画像が分類され、被写体の画像領域が認識される。 For example, a histogram of a plurality of color components is created for the captured image, divided by the mountain-shaped distribution range, the captured image is classified in the region belonging to the combination of the same sections, and the image region of the subject is Be recognized.

認識された被写体の画像領域ごとに評価値を算出することで、当該評価値が最も高い被写体の画像領域を主被写体領域として判定することができる。 By calculating the evaluation value for each image area of the recognized subject, it is possible to determine the image area of the subject having the highest evaluation value as the main subject area.

以上の方法で、撮像情報から各被写体情報を得ることができる。 Each subject information can be obtained from the imaging information by the above method.

Ｓ９０２では、像揺れ補正量の算出を行う。具体的には、まず、装置揺れ検出回路２０９において取得した角速度および加速度情報に基づいて撮像装置の絶対角度の算出を行う。そして、絶対角度を打ち消す角度方向にチルト回転ユニット１０４およびパン回転ユニット１０５を動かす防振角度を求め、像揺れ補正量とする。なお、ここでの像揺れ補正量算出処理は、後述する学習処理によって、演算方法を変更することができる。 In S902, the image shake correction amount is calculated. Specifically, first, the absolute angle of the imaging device is calculated based on the angular velocity and acceleration information acquired by the device shake detection circuit 209. Then, an image stabilization angle for moving the tilt rotation unit 104 and the pan rotation unit 105 in the angular direction that cancels out the absolute angle is determined, and is used as the image shake correction amount. In the image shake correction amount calculation processing here, the calculation method can be changed by learning processing described later.

Ｓ９０３では、撮像装置の状態判定を行う。角速度情報や加速度情報やＧＰＳ位置情報などで検出した角度や移動量などにより、現在、撮像装置がどのような振動／動き状態なのかを判定する。 In step S903, the state determination of the imaging apparatus is performed. Based on angular velocity information, acceleration information, angle detected by GPS position information, etc., it is determined what kind of vibration / motion state of the imaging device is at present.

例えば、車に撮像装置１０１を装着して撮影する場合、移動された距離によって大きく周りの風景などの被写体情報が変化する。 For example, in the case where the imaging apparatus 101 is attached to a car and shooting is performed, subject information such as a surrounding scenery changes largely depending on the moved distance.

そのため、車などに装着して速い速度で移動している「乗り物移動状態」か否かを判定し、後に説明する自動被写体探索に使用することができる。 Therefore, it is possible to determine whether it is a "vehicle moving state" which is attached to a car or the like and moving at a high speed, and can be used for automatic subject search described later.

また、角度の変化が大きいか否かを判定し、撮像装置１０１が揺れ角度がほとんどない「置き撮り状態」であるのかを判定する。 Further, it is determined whether or not the change in angle is large, and it is determined whether or not the imaging device 101 is in the “posting state” with almost no swing angle.

「置き撮り状態」である場合は、撮像装置１０１自体の角度変化はないと考えてよいので、置き撮り用の被写体探索を行うことができる。 In the case of the “post-taking state”, it may be considered that there is no change in the angle of the imaging device 101 itself, so that it is possible to perform a subject search for the post-taking.

また、比較的、角度変化が大きい場合は、「手持ち状態」と判定され、手持ち用の被写体探索を行うことができる。 Also, if the change in angle is relatively large, it is determined that the "hand-held state", and it is possible to perform a hand-held subject search.

Ｓ９０４では、被写体探索処理を行う。被写体探索は、以下の処理によって構成される。 In S904, a subject search process is performed. The subject search is configured by the following processing.

（１）エリア分割
図１１を用いて、エリア分割を説明する。図１１（ａ）のように撮像装置（原点Ｏが撮像装置位置とする。）位置を中心として、全周囲でエリア分割を行う。図１１（ａ）の例においては、チルト方向、パン方向それぞれ２２．５度で分割している。図１１（ａ）のように分割すると、チルト方向の角度が０度から離れるにつれて、水平方向の円周が小さくなり、エリア領域が小さくなる。よって、図１１（ｂ）のように、チルト角度が４５度以上の場合、水平方向のエリア範囲は２２．５度よりも大きく設定している。図１１（ｃ）、（ｄ）に撮影画角内でのエリア分割された例を示す。軸１１０１は初期化時の撮像装置１０１の方向であり、この方向角度を基準位置としてエリア分割が行われる。１１０２は、撮像されている画像の画角エリアを示しており、そのときの画像例を図１１（ｄ）に示す。画角に写し出されている画像内ではエリア分割に基づいて、図１１（ｄ）の１１０３〜１１１８のように画像分割される。 (1) Area Division The area division will be described with reference to FIG. As shown in FIG. 11A, area division is performed all around the position of the imaging device (the origin O is the imaging device position). In the example of FIG. 11A, division is performed at 22.5 degrees in each of the tilt direction and the pan direction. When division is performed as shown in FIG. 11A, as the angle in the tilt direction deviates from 0 degree, the circumference in the horizontal direction becomes smaller and the area area becomes smaller. Accordingly, as shown in FIG. 11B, when the tilt angle is 45 degrees or more, the area range in the horizontal direction is set larger than 22.5 degrees. FIGS. 11 (c) and 11 (d) show examples of area division within the shooting angle of view. An axis 1101 is the direction of the imaging device 101 at the time of initialization, and area division is performed with this direction angle as a reference position. Reference numeral 1102 denotes the angle of view area of the image being captured, and an example of the image at that time is shown in FIG. In the image projected at the angle of view, the image is divided as shown by 1103 to 1118 in FIG. 11D based on the area division.

（２）エリア毎の重要度レベルの算出
前記のように分割した各エリアについて、エリア内に存在する被写体やエリアのシーン状況に応じて、探索を行う優先順位を示す重要度レベルを算出する。被写体の状況に基づいた重要度レベルは、例えば、エリア内に存在する人物の数、人物の顔の大きさ、顔向き、顔検出の確からしさ、人物の表情、人物の個人認証結果に基づいて算出する。また、シーンの状況に応じた重要度レベルは、例えば、一般物体認識結果、シーン判別結果（青空、逆光、夕景など）、エリアの方向からする音のレベルや音声認識結果、エリア内の動き検知情報等である。また、撮像装置の状態判定（Ｓ９０３）で、撮像装置の振動状態が検出されており、振動状態に応じても重要度レベルが変化するようにもすることができる。例えば、「置き撮り状態」と判定された場合、顔認証で登録されている中で優先度の高い被写体（例えば撮像装置のユーザである）を中心に被写体探索が行われるように、特定人物の顔認証を検出すると重要度レベルが高くなるように判定される。また、後述する自動撮影も上記顔を優先して行われることになり、撮像装置のユーザが撮像装置を身に着けて持ち歩き撮影を行っている時間が多くても、撮像装置を取り外して机の上などに置くことで、ユーザが写った画像も多く残すことができる。このときパン・チルトにより探索可能であることから、撮像装置の置き角度などを考えなくても、適当に設置するだけでユーザが写った画像やたくさんの顔が写った集合写真などを残すことができる。なお、上記条件だけでは、各エリアに変化がない限りは、最も重要度レベルが高いエリアが同じとなり、その結果探索されるエリアがずっと変わらないことになってしまう。そこで、過去の撮影情報に応じて重要度レベルを変化させる。具体的には、所定時間継続して探索エリアに指定され続けたエリアは重要度レベルを下げたり、後述するＳ９１０にて撮影を行ったエリアでは、所定時間の間重要度レベルを下げたりしてもよい。 (2) Calculation of Importance Level for Each Area For each area divided as described above, the importance level indicating the priority to be searched is calculated according to the scene status of the subject present in the area and the area. The importance level based on the condition of the subject is, for example, the number of persons present in the area, the size of the face of the person, the face direction, the certainty of face detection, the facial expression of the person, and the personal authentication result of the person. calculate. In addition, the importance level according to the situation of the scene is, for example, general object recognition result, scene discrimination result (blue sky, back light, sunset scene etc.), sound level from the direction of area and voice recognition result, motion detection in area It is information etc. In addition, the vibration state of the imaging device is detected in the state determination (S903) of the imaging device, and the importance level can be changed according to the vibration state. For example, when it is determined that the “deep-shooting state” is selected, a subject search is performed so that a subject search is performed centering on a high-priority subject (for example, the user of the imaging apparatus) registered in face authentication. When face recognition is detected, it is determined that the importance level is high. In addition, the automatic photographing described later is also performed with priority given to the face, and even if the user of the imaging apparatus wears the imaging apparatus and takes a lot of time taking pictures, the imaging apparatus is removed and the desk is By placing it on top, it is possible to leave many images captured by the user. At this time, it is possible to search by pan / tilt, so even if the installation angle of the imaging device is not considered, it is possible to leave an image photographed by the user or a group photograph with many faces photographed only by installing appropriately. it can. In addition, under the above conditions alone, as long as there is no change in each area, the area with the highest importance level is the same, and as a result, the area to be searched remains unchanged. Therefore, the importance level is changed according to the past shooting information. Specifically, the area continuously designated as the search area for a predetermined time is lowered in importance level, or in the area photographed in S910 to be described later, the importance level is lowered for a predetermined time. It is also good.

（３）探索対象エリアの決定
前記のように各エリアの重要度レベルが算出されたら、重要度レベルが高いエリアを探索対象エリアとして決定する。そして、探索対象エリアを画角に捉えるために必要なパン・チルト探索目標角度を算出する。 (3) Determination of Search Target Area When the importance level of each area is calculated as described above, an area having a high importance level is determined as a search target area. Then, the pan / tilt search target angle required to capture the search target area at the angle of view is calculated.

Ｓ９０５では、パン・チルト駆動を行う。具体的には、像振れ補正量とパン・チルト探索目標角度に基づいた制御サンプリングでの駆動角度を加算することで、パン・チルト駆動量を算出し、鏡筒回転駆動回路２０５によって、チルト回転ユニット１０４、パン回転ユニット１０５をそれぞれ駆動制御する。 In S905, pan and tilt driving is performed. Specifically, the pan / tilt drive amount is calculated by adding the drive shake amount at control sampling based on the image blur correction amount and the pan / tilt search target angle, and the lens barrel rotation drive circuit 205 performs tilt rotation. It drives and controls the unit 104 and the pan rotation unit 105 respectively.

Ｓ９０６ではズームユニット２０１を制御しズーム駆動を行う。具体的には、Ｓ９０４で決定した探索対象被写体の状態に応じてズームを駆動させる。例えば、探索対象被写体が人物の顔であるとき、画像上の顔が小さすぎると検出可能な最小サイズを下回ることで検出ができず、見失ってしまう恐れがある。そのような場合は、望遠側にズームすることで画像上の顔のサイズが大きくなるように制御する。一方で、画像上の顔が大きすぎる場合、被写体や撮像装置自体の動きによって被写体が画角から外れやすくなってしまう。そのような場合は、広角側にズームすることで、画面上の顔のサイズが小さくなるように制御する。このようにズーム制御を行うことで、被写体を追跡するのに適した状態を保つことができる。 In step S906, the zoom unit 201 is controlled to perform zoom driving. Specifically, the zoom is driven according to the state of the search target subject determined in S904. For example, when the search target subject is the face of a person, if the face on the image is too small, detection is not possible because the size is smaller than the minimum detectable size, and there is a risk of losing sight. In such a case, control is performed to increase the size of the face on the image by zooming to the telephoto side. On the other hand, when the face on the image is too large, the subject or the imaging apparatus itself easily moves the subject out of the angle of view. In such a case, control is performed to reduce the size of the face on the screen by zooming to the wide angle side. By performing zoom control in this manner, it is possible to maintain a state suitable for tracking an object.

Ｓ９０４乃至Ｓ９０６では、パン・チルトやズーム駆動により被写体探索を行う方法を説明したが、広角なレンズを複数使用して全方位を一度に撮影する撮像システムで被写体探索を行ってもよい。全方位カメラの場合、撮像によって得られる信号すべてを入力画像として、被写体検出などの画像処理を行うと膨大な処理が必要となる。そこで、画像の一部を切り出して、切り出した画像範囲の中で被写体の探索処理を行う構成にする。上述した方法と同様にエリア毎の重要度レベルを算出し、重要度レベルに基づいて切り出し位置を変更し、後述する自動撮影の判定を行う。これにより画像処理による消費電力の低減や高速な被写体探索が可能となる。 In S904 to S906, although the method for performing subject search by pan / tilt or zoom driving has been described, the subject search may be performed with an imaging system that captures an image in all directions at once using a plurality of wide-angle lenses. In the case of an omnidirectional camera, if all signals obtained by imaging are used as an input image and image processing such as object detection is performed, a vast amount of processing is required. Therefore, a part of the image is cut out, and the search processing of the subject is performed in the cut out image range. The importance level for each area is calculated in the same manner as the method described above, the cutout position is changed based on the importance level, and the automatic photographing determination described later is performed. This makes it possible to reduce power consumption and perform high-speed object search by image processing.

Ｓ９０７では、ユーザ（手動）による撮影指示があったがどうかを判定し、撮影指示があった場合、Ｓ９１０に進む。この時、ユーザ（手動）による撮影指示は、シャッターボタン押下によるものや、撮像装置の筺体を指等で軽く叩く（タップ）、音声コマンド入力、外部機器からの指示などによってもよい。タップ操作による撮影指示は、ユーザが撮像装置の筺体をタップした際、装置揺れ検出回路２０９によって短期間に連続した高周波の加速度を検知し、撮影のトリガーとする撮影指示方法である。音声コマンド入力は、ユーザが所定の撮影を指示する合言葉（例えば「写真とって」等）を発声した場合、音声処理回路２１４で音声を認識し、撮影のトリガーとする撮影指示方法である。外部機器からの指示は、例えば撮像装置とＢｌｕｅＴｏｏｔｈ接続したスマートフォン等から、専用のアプリケーションを介して送信されたシャッター指示信号をトリガーとする撮影指示方法である。 In step S 907, it is determined whether or not there is a photographing instruction from the user (manually). If there is a photographing instruction, the process advances to step S 910. At this time, the photographing instruction by the user (manually) may be performed by pressing the shutter button, tapping the housing of the imaging device with a finger or the like (tap), voice command input, an instruction from an external device, or the like. The shooting instruction by the tap operation is a shooting instruction method in which when the user taps the housing of the imaging device, the device shake detection circuit 209 detects the acceleration of high frequency continuous in a short period, and uses it as a trigger for shooting. The voice command input is a shooting instruction method of recognizing a voice by the voice processing circuit 214 and using it as a trigger of shooting when the user utters a slogan (for example, "photograph taking" etc.) instructing a predetermined shooting. The instruction from the external device is, for example, a photographing instruction method using as a trigger a shutter instruction signal transmitted through a dedicated application from a smartphone or the like connected to the imaging device and BlueTooth.

また、Ｓ９０７でユーザによる撮影指示があった場合、Ｓ９１４にも進む。このＳ９１４、および、その後のＳ９１５の処理については、後で詳細な説明を行う。 If the user issues a shooting instruction in step S 907, the process advances to step S 914. The process of S 914 and the subsequent S 915 will be described in detail later.

Ｓ９０７で撮影指示がなかった場合、Ｓ９０８に進み、自動撮影判定を行う。自動撮影判定では、自動撮影を行うかどうかの判定と、撮影方法の判定（静止画一枚撮影、静止画連続撮影（連写）、動画撮影、パノラマ撮影、タイムラプス撮影などの内どれを実行するかの判定）を行う。 If there is no shooting instruction in S907, the process advances to S908 to make automatic shooting determination. In the automatic shooting determination, it is determined whether automatic shooting is to be performed and the shooting method (any one of still image shooting, still image continuous shooting (continuous shooting), moving image shooting, panorama shooting, time-lapse shooting, etc. Make a decision.

（１）自動撮影を行うかどうかの判定
自動撮影を行うかどうかの判定は以下の２つの判定に基づいて行う。１つは、Ｓ９０４にて得られたエリア別の重要度レベルに基づき、重要度レベルが所定値を超えている場合、自動撮影を実施する判定を下す。２つめは、機械学習の１つであるニューラルネットワークに基づく判定である。ニューラルネットワークの一例として、多層パーセプトロンによるネットワークの例を図１０に示す。ニューラルネットワークは、入力値から出力値を予測することに使用されるものであり、予め入力値と、その入力に対して模範となる出力値とを学習しておくことで、新たな入力値に対して、学習した模範に倣った出力値を推定することができる。なお、学習の方法は後述する。図１０の１００１およびその縦に並ぶ丸は入力層のニューロンであり、１００３およびその縦に並ぶ丸は中間層のニューロンであり、１００４は出力層のニューロンである。１００２のような矢印は各ニューロンを繋ぐ結合を示している。ニューラルネットワークに基づく判定では、入力層のニューロンに対して、現在の画角中に写る被写体や、シーンや撮像装置の状態に基づいた特徴量を入力として与え、多層パーセプトロンの順伝播則に基づく演算を経て出力層から出力された値を得る。そして、出力の値が閾値以上であれば、自動撮影を実施する判定を下す。なお、被写体の特徴は、現在のズーム倍率、現在の画角における一般物体認識結果、顔検出結果、現在画角に写る顔の数、顔の笑顔度・目瞑り度、顔角度、顔認証ＩＤ番号、被写体人物の視線角度、シーン判別結果、特定の構図の検出結果等を使用する。また、前回撮影時からの経過時間、現在時刻、ＧＰＳ位置情報および前回撮影位置からの変化量、現在の音声レベル、声を発している人物、拍手、歓声が上がっているか否か等を使用してもよい。また、振動情報（加速度情報、撮像装置の状態）、環境情報（温度、気圧、照度、湿度、紫外線量）等を使用してもよい。更に、ウエアラブルデバイス５０１からの情報通知がある場合、通知情報（ユーザの運動情報、腕のアクション情報、心拍などの生体情報など）も特徴として使用してもよい。この特徴を所定の範囲の数値に変換し、特徴量として入力層の各ニューロンに与える。そのため、入力層の各ニューロンは上記使用する特徴量の数だけ必要となる。 (1) Determination of Whether to Perform Automatic Shooting Determination of whether to perform automatic shooting is performed based on the following two determinations. First, based on the importance level for each area obtained in S904, when the importance level exceeds a predetermined value, it is determined to perform automatic shooting. The second is determination based on a neural network which is one of machine learning. An example of a multi-layer perceptron network is shown in FIG. 10 as an example of a neural network. A neural network is used to predict an output value from an input value, and learns in advance an input value and an output value that is an exemplar for the input to obtain a new input value. On the other hand, it is possible to estimate an output value that follows the learned model. The method of learning will be described later. In FIG. 10, reference numeral 1001 in FIG. 10 and the vertically arranged circles are neurons in the input layer, 1003 and the vertically arranged circles are neurons in the intermediate layer, and 1004 are neurons in the output layer. Arrows such as 1002 indicate connections connecting each neuron. In the judgment based on the neural network, to the neurons of the input layer, an object based on the current view angle, a feature value based on the scene or the state of the imaging device is given as an input, and an operation based on the forward propagation law of the multilayer perceptron To obtain the value output from the output layer. Then, if the value of the output is equal to or greater than the threshold value, it is determined to perform automatic photographing. The characteristics of the subject include the current zoom magnification, the result of general object recognition at the current angle of view, the result of face detection, the number of faces in the current angle of view, the degree of smile and closing of the face, face angle, and face authentication ID The number, the gaze angle of the subject person, the scene discrimination result, the detection result of the specific composition, and the like are used. Also, use the elapsed time since the last shooting, the current time, GPS position information, the amount of change from the last shooting position, the current voice level, the person making a voice, whether applause or cheering has risen, etc. May be Also, vibration information (acceleration information, state of imaging device), environment information (temperature, atmospheric pressure, illuminance, humidity, amount of ultraviolet light) or the like may be used. Furthermore, when there is information notification from the wearable device 501, notification information (user's exercise information, arm's action information, biological information such as heart beat, etc.) may also be used as a feature. This feature is converted into a numerical value of a predetermined range, and given to each neuron of the input layer as a feature amount. Therefore, each neuron of the input layer is required as many as the number of feature quantities to be used.

なお、このニューラルネットワークに基づく判断は、後述する学習処理によって、各ニューロン間の結合重みを変化させることによって、出力値が変化し、判断の結果を学習結果に適応させることができる。 The judgment based on this neural network can change the output value by changing the connection weight between the neurons by learning processing described later, and the result of the judgment can be adapted to the learning result.

また、図７のＳ７０２で読み込んだＭａｉｎプロセッサの起動条件によって、自動撮影される判定も変化する。例えば、タップ検出による起動や特定音声コマンドによる起動の場合は、ユーザが現在撮影してほしいための操作である可能性が非常に高い。そこで、撮影頻度が多くなるように設定されるようになる。 In addition, the determination as to the automatic photographing changes according to the start condition of the Main processor read in S702 of FIG. For example, in the case of activation by tap detection or activation by a specific voice command, there is a very high possibility that the operation is for the user to want to take a picture at present. Therefore, the frequency of photographing is set to be increased.

（２）撮影方法の判定
撮影方法の判定では、Ｓ９０１乃至Ｓ９０４において検出した、撮像装置の状態や周辺の被写体の状態に基づいて、静止画撮影、動画撮影、連写、パノラマ撮影などの内どれを実行するかの判定を行う。例えば、被写体（人物）が静止している場合は静止画撮影を実行し、被写体が動いている場合は動画撮影または連写を実行する。また、被写体が撮像装置を取り囲むように複数存在している場合や、前述したＧＰＳ情報に基づいて景勝地であることが判断できた場合には、パン・チルトを操作させながら順次撮影した画像を合成してパノラマ画像を生成するパノラマ撮影処理を実行してもよい。 (2) Determination of shooting method In the determination of the shooting method, any one of still image shooting, moving image shooting, continuous shooting, panorama shooting and the like is detected based on the state of the imaging apparatus and the surrounding objects detected in S901 to S904. To determine whether to execute. For example, when the subject (person) is stationary, still image shooting is performed, and when the subject is moving, moving image shooting or continuous shooting is performed. In the case where a plurality of subjects exist so as to surround the imaging apparatus, or when it is determined that the scenery is scenic based on the above-described GPS information, images taken sequentially while operating the pan and tilt are A panoramic imaging process may be performed to generate a panoramic image by combining.

Ｓ９０９では、Ｓ９０８の自動撮影判定により撮影する判定が下された場合、Ｓ９１０に進み、下されなかった場合、撮影モード処理終了へと進む。 In step S909, if it is determined in step S908 that shooting is determined by the automatic shooting determination in step S908, the process advances to step S910; otherwise, the process advances to end of shooting mode processing.

Ｓ９１０では、撮影を開始する。この時、手動撮影であれば静止画の撮影、あるいは、ユーザが手動で設定した撮影方法で撮影を行い、自動撮影であればＳ９０８にて判定された撮影方法による撮影を開始する。その際、フォーカス駆動制御回路２０４によるオートフォーカス制御を行う。また、不図示の絞り制御回路およびセンサゲイン制御回路、シャッター制御回路を用いて、被写体が適切な明るさになるような露出制御を行う。さらに、撮影後には画像処理回路２０７において、オートホワイトバランス処理、ノイズリダクション処理、ガンマ補正処理等、種々の画像処理を行い、画像を生成する。 In S910, shooting is started. At this time, in the case of manual shooting, shooting of a still image or shooting according to a shooting method manually set by the user is performed, and in the case of automatic shooting, shooting according to the shooting method determined in S908 is started. At this time, auto focus control is performed by the focus drive control circuit 204. In addition, exposure control is performed so that the subject has appropriate brightness by using an aperture control circuit, a sensor gain control circuit, and a shutter control circuit (not shown). Furthermore, after image capturing, the image processing circuit 207 performs various image processing such as auto white balance processing, noise reduction processing, gamma correction processing, and the like to generate an image.

なお、自動撮影の際に、所定の条件を満たした時、撮像装置が撮影対象となる人物に対し撮影を行う旨を報知処理した上で撮影する手段を取ってもよい。報知の方法は、例えば、音声出力回路２１８からの音声やＬＥＤ制御回路２２４によるＬＥＤ点灯光を使用してもよいし、パン・チルトを駆動することにより視覚的に被写体の視線を誘導するモーション動作をしてもよい。所定の条件は、例えば、画角内における顔の数、顔の笑顔度・目瞑り度、被写体人物の視線角度や顔角度、顔認証ＩＤ番号、個人認証登録されている人物の数等である。また、撮影時の一般物体認識結果、シーン判別結果、前回撮影時からの経過時間、撮影時刻、ＧＰＳ情報に基づく現在位置が景勝地であるか否か、撮影時の音声レベル、声を発している人物の有無、拍手、歓声が上がっているか否か等である。また、振動情報（加速度情報、撮像装置の状態）、環境情報（温度、気圧、照度、湿度、紫外線量）等である。これらの条件に基づき報知撮影を行うことによって、重要性が高いシーンにおいてカメラ目線の好ましい画像を残すことができる。 In the automatic photographing, when the predetermined condition is satisfied, the image pickup apparatus may perform the photographing process after notifying that the person to be photographed is to be photographed, and then the photographing may be performed. The method of notification may use, for example, voice from the audio output circuit 218 or LED lighting light by the LED control circuit 224, or a motion operation that visually guides the subject's line of sight by driving pan and tilt. You may The predetermined conditions are, for example, the number of faces within the angle of view, the degree of smile / eye closure of the face, the gaze angle or face angle of the subject person, the face authentication ID number, the number of persons registered for personal authentication, etc. . In addition, whether the current position based on general object recognition result at the time of shooting, scene discrimination result, shooting time at the previous shooting time, shooting time, GPS information is scenic position, voice level at shooting, voice out Whether the person is present, applause, cheers are rising, etc. Also, the information includes vibration information (acceleration information, state of imaging device), environment information (temperature, atmospheric pressure, illuminance, humidity, amount of ultraviolet light), and the like. By performing informing shooting based on these conditions, it is possible to leave a preferable image of the camera gaze in a scene of high importance.

また、所定の条件を複数もち、各条件に応じて音声を変更したり、ＬＥＤの点灯方法（色や点滅時間など）を変更したり、パン・チルトのモーション方法（動き方や駆動速度）を変更してもよい。 In addition, there are multiple predetermined conditions, change the sound according to each condition, change the lighting method of LED (color, blink time, etc.), or pan / tilt motion method (motion or driving speed) You may change it.

Ｓ９１１では、Ｓ９１０にて生成した画像を加工したり、動画に追加したりといった編集処理を行う。画像加工については、具体的には、人物の顔や合焦位置に基づいたトリミング処理、画像の回転処理、ＨＤＲ（ハイダイナミックレンジ）効果、ボケ効果、色変換フィルタ効果などである。画像加工は、Ｓ９１０にて生成した画像を元に、上記の処理の組み合わせによって複数生成し、前記Ｓ９１０にて生成した画像とは別に保存するとしてもよい。また、動画処理については、撮影した動画または静止画を、生成済みの編集動画にスライド、ズーム、フェードの特殊効果処理をつけながら追加するといった処理をしてもよい。Ｓ９１１での編集についても、撮影画像の情報、或いは撮影前に検出した各種情報をニューラルネットワークに基づく判断によって、画像加工の方法を判定することもできるし、この判定処理は、後述する学習処理によって、判定条件を変更することができる。 In step S911, editing processing such as processing the image generated in step S910 or adding it to a moving image is performed. More specifically, image processing is trimming processing based on the face of a person or the in-focus position, image rotation processing, HDR (high dynamic range) effect, blur effect, color conversion filter effect, and the like. A plurality of image processes may be generated based on the image generated in S910 by a combination of the above-described processes, and may be stored separately from the image generated in S910. In addition, in the moving image processing, processing may be performed such that the captured moving image or still image is added to the generated edited moving image while performing special effect processing of slide, zoom, and fade. With regard to the editing in S911, the method of image processing can also be determined by judgment based on information of the photographed image or various information detected before photographing based on the neural network, and this judgment processing is the learning processing described later. The judgment conditions can be changed.

Ｓ９１２では撮影画像の学習情報生成処理を行う。ここでは、後述する学習処理に使用する学習情報を生成し、記録する。具体的には、今回の撮影画像における、撮影時のズーム倍率、撮影時の一般物体認識結果、顔検出結果、撮影画像に写る顔の数、顔の笑顔度・目瞑り度、顔角度、顔認証ＩＤ番号、被写体人物の視線角度等である。また、シーン判別結果、前回撮影時からの経過時間、撮影時刻、ＧＰＳ位置情報および前回撮影位置からの変化量、撮影時の音声レベル、声を発している人物、拍手、歓声が上がっているか否か等である。また、振動情報（加速度情報、撮像装置の状態）、環境情報（温度、気圧、照度、湿度、紫外線量）、動画撮影時間、手動撮影指示によるものか否か、等である。更にユーザの画像の好みを数値化した学習モデルの出力であるスコアも演算する。 In S912, learning information generation processing of the photographed image is performed. Here, learning information to be used for learning processing described later is generated and recorded. Specifically, the zoom magnification at the time of shooting, the result of general object recognition at the time of shooting, the result of face detection, the number of faces included in the shot image, the degree of smile / eye closing of face, face angle, face The authentication ID number, the gaze angle of the subject person, and the like. In addition, as a result of scene discrimination, elapsed time since the last shooting, shooting time, GPS position information and change amount from the last shooting position, voice level at the time of shooting, voiced person, applause, cheering or not It is In addition, vibration information (acceleration information, state of imaging device), environment information (temperature, atmospheric pressure, illuminance, humidity, ultraviolet light amount), moving image shooting time, whether by manual shooting instruction, etc. Furthermore, a score, which is the output of a learning model that quantifies the preferences of the user's image, is also calculated.

これらの情報を生成し、撮影画像ファイルへタグ情報として記録する。あるいは、不揮発性メモリ２１６へ書き込むか、記録媒体２２１内に、所謂カタログデータとして各々の撮影画像の情報をリスト化した形式で保存する、としてもよい。 These pieces of information are generated and recorded as tag information in the photographed image file. Alternatively, the information may be written to the non-volatile memory 216 or may be stored in the recording medium 221 as so-called catalog data in the form of a list of information of each photographed image.

Ｓ９１３では過去撮影情報の更新を行う。具体的には、Ｓ９０８の説明で述べたエリア毎の撮影枚数や、個人認証登録された人物毎の撮影枚数、一般物体認識で認識された被写体毎の撮影枚数、シーン判別のシーン毎の撮影枚数について、今回撮影された画像が該当する枚数のカウントを１つ増やす。 In step S913, the past shooting information is updated. Specifically, the number of shots for each area described in the description of S908, the number of shots for each person registered for personal identification, the number of shots for each subject recognized by general object recognition, the number of shots for each scene determination scene The count of the number of images corresponding to the image taken this time is incremented by one.

＜学習モード処理＞
次に、本実施形態におけるユーザの好みに合わせた学習について説明する。 <Learning mode processing>
Next, learning in accordance with the preference of the user in the present embodiment will be described.

本実施形態では、図１０に示すようなニューラルネットワークを用い、機械学習アルゴリズムを使用して、学習処理回路２１９にてユーザの好みに合わせた学習を行って学習モデルを生成する。学習処理回路２１９は、例えば、ＮＶＩＤＩＡ社のＪｅｔｓｏｎＴＸ２を用いる。ニューラルネットワークは、入力値から出力値を予測することに使用されるものであり、予め入力値の実績値と出力値の実績値を学習しておくことで、新たな入力値に対して、出力値を推定することができる。ニューラルネットワークを用いることにより、前述の自動撮影や被写体探索に対して、ユーザの好みに合わせた学習を行う。 In this embodiment, using a neural network as shown in FIG. 10, using a machine learning algorithm, the learning processing circuit 219 performs learning in accordance with the user's preference to generate a learning model. The learning processing circuit 219 uses, for example, Jetson TX2 manufactured by NVIDIA. The neural network is used to predict the output value from the input value, and learns in advance the actual value of the input value and the actual value of the output value to output an output for the new input value. The value can be estimated. By using a neural network, learning is performed according to the user's preference for the above-described automatic shooting and subject search.

また、ニューラルネットワークに入力する特徴データともなる被写体登録（顔認証や一般物体認識など）を登録することも行う。 In addition, registration of an object (such as face recognition or general object recognition), which is also feature data to be input to a neural network, is performed.

本実施形態において、学習処理により、学習される要素は以下である。 In the present embodiment, the elements to be learned by the learning process are as follows.

（１）自動撮影
自動撮影に対する学習について説明する。自動撮影では、ユーザの好みに合った画像の撮影を自動で行うための学習を行う。図９のフローを用いた説明で上述したように、撮影後（Ｓ９１２）に学習情報生成処理が行われている。後述する方法により学習させる画像を選択させ、画像に含まれる学習情報を基に、ニューラルネットワークの重みを変化させることで学習する。学習は、自動撮影タイミングの判定を行うニューラルネットワークの変更と、撮影方法（静止画撮影、動画撮影、連写、パノラマ撮影など）の判定をニューラルネットワークの変更で行われる。 (1) Automatic shooting The learning for automatic shooting will be described. In automatic shooting, learning is performed to automatically take an image that suits the user's preference. As described above in the explanation using the flow of FIG. 9, the learning information generation process is performed after the photographing (S912). An image to be learned is selected by a method to be described later, and learning is performed by changing weights of the neural network based on learning information included in the image. The learning is performed by changing a neural network that performs automatic shooting timing determination and determining a shooting method (still image shooting, moving image shooting, continuous shooting, panoramic shooting, etc.).

（２）自動編集
自動編集に対する学習について説明する。自動編集は、図９のＳ９１１での撮影直後の編集に対して学習が行われる。撮影直後の編集について説明する。後述する方法により学習させる画像を選択させ、画像に含まれる学習情報を基に、ニューラルネットワークの重みを変化させることで学習する。撮影或いは撮影直前の情報により得られた各種検出情報をニューラルネットワークに入力し、編集方法（トリミング処理、画像の回転処理、ＨＤＲ（ハイダイナミックレンジ）効果、ボケ効果、色変換フィルタ効果など）の判定を行う。 (2) Automatic Editing Describe the learning for automatic editing. In automatic editing, learning is performed on editing immediately after shooting in S911 of FIG. The editing immediately after shooting will be described. An image to be learned is selected by a method to be described later, and learning is performed by changing weights of the neural network based on learning information included in the image. Various detection information obtained by shooting or information immediately before shooting is input to a neural network, and determination of editing method (trimming process, image rotation process, HDR (high dynamic range) effect, blur effect, color conversion filter effect, etc.) I do.

（３）被写体探索
被写体探索に対する学習について説明する。被写体探索では、ユーザの好みに合った被写体の探索を自動で行うための学習を行う。図９のフローを用いた説明で上述したように、被写体探索処理（Ｓ９０４）において、各エリアの重要度レベルを算出し、パン・チルト、ズームを駆動し、被写体探索を行っている。学習は撮影画像や探索中の検出情報によって学習され、ニューラルネットワークの重みを変化させることで学習する。探索動作中の各種検出情報をニューラルネットワークに入力し、重要度レベルの算出を行い、重要度レベルに基づきパン・チルトの角度を設定することで学習を反映した被写体探索を行う。また、重要度レベルに基づくパン・チルト角度の設定以外にも、例えば、パン・チルト駆動（速度、加速度、動かす頻度）の学習も行う。 (3) Subject Search Learning for subject search will be described. In the subject search, learning is performed to automatically search for a subject matching the user's preference. As described above in the description using the flow of FIG. 9, in the subject search process (S904), the importance level of each area is calculated, pan / tilt and zoom are driven, and the subject search is performed. The learning is performed based on the photographed image and the detection information during the search, and learning is performed by changing the weight of the neural network. Various detection information during the search operation is input to the neural network, the importance level is calculated, and the pan / tilt angle is set based on the importance level to perform the object search reflecting the learning. In addition to the setting of the pan and tilt angles based on the importance level, for example, learning of pan and tilt driving (speed, acceleration, frequency of movement) is also performed.

（４）被写体登録
被写体登録に対する学習について説明する。被写体登録では、ユーザの好みに合った被写体の登録やランク付けを自動で行うための学習を行う。学習として、例えば、顔認証登録や一般物体認識の登録、ジェスチャーや音声認識、音によるシーン認識の登録を行う。認証登録は人と物体に対する認証登録を行い、画像取得される回数や頻度、手動撮影される回数や頻度、探索中の被写体の現れる頻度からランク設定を行う。登録された情報は、各ニューラルネットワークを用いた判定の入力として登録されることになる。 (4) Subject Registration The learning for subject registration will be described. In subject registration, learning is performed to automatically register and rank subjects according to the user's preference. As learning, for example, registration of face recognition registration and registration of general object recognition, registration of gesture and voice recognition, and scene recognition by sound are performed. In authentication registration, authentication registration is performed for a person and an object, and rank setting is performed based on the number and frequency of image acquisition, the number and frequency of manual photographing, and the frequency of appearance of a subject under search. The registered information is registered as an input of determination using each neural network.

次に、学習方法について説明する。 Next, the learning method will be described.

学習方法としては、「撮像装置内の学習」と「通信機器との連携による学習」がある。 As a learning method, there are “learning in the imaging device” and “learning in cooperation with the communication device”.

撮像装置内学習の方法について、以下説明する。本実施形態における撮像装置内学習は、以下の方法がある。 The method of in-imaging device learning will be described below. There are the following methods for in-imaging device learning in the present embodiment.

（１）ユーザによる撮影指示時の検出情報による学習
図９のＳ９０７乃至Ｓ９１３で説明したとおり、本実施形態においては、撮像装置１０１は、手動撮影と自動撮影の２つの撮影を行うことができる。Ｓ９０７で手動操作による撮影指示（上記説明したとおり、３つの判定に基づいて行う。）があった場合は、Ｓ９１２において、撮影画像は手動で撮影された画像であるとの情報が付加される。また、Ｓ９０９にて自動撮影ＯＮと判定されて撮影された場合においては、Ｓ９１２において、撮影画像は自動で撮影された画像であると情報が付加される。 (1) Learning based on detection information at the time of shooting instruction by the user As described in S 907 to S 913 in FIG. 9, in the present embodiment, the imaging apparatus 101 can perform two shootings: manual shooting and automatic shooting. If there is a shooting instruction by manual operation (made based on three determinations as described above) in S 907, in S 912, information is added that the captured image is an image captured manually. In addition, when it is determined that the automatic imaging is ON in S909, the information is added that the captured image is an image captured automatically in S912.

ここで手動撮影される場合、ユーザの好みの被写体、好みのシーン、好みの場所や時間間隔を基に撮影された可能性が非常に高い。よって、手動撮影時に得られた各特徴データや撮影画像の学習情報を基とした学習が行われるようにする。 In the case of manual shooting here, the possibility of shooting based on the user's favorite subject, favorite scene, favorite place and time interval is very high. Therefore, learning based on learning of each feature data and captured image obtained at the time of manual imaging is performed.

また、手動撮影時の検出情報から、撮影画像における特徴量の抽出や個人認証の登録、個人ごとの表情の登録、人の組み合わせの登録に関して学習を行う。また、被写体探索時の検出情報からは、例えば、個人登録された被写体の表情から、近くの人や物体の重要度を変更するような学習を行う。 Further, from detection information at the time of manual imaging, learning is performed regarding extraction of feature amounts in the captured image, registration of personal authentication, registration of individual facial expressions, and registration of human combinations. In addition, from the detection information at the time of subject search, for example, learning is performed to change the importance of a nearby person or object from the expression of the subject registered individually.

（２）被写体探索時の検出情報による学習
被写体探索動作中において、個人認証登録されている被写体が、どんな人物、物体、シーンと同時に写っているかを判定し、同時に画角内に写っている時間比率を演算しておく。 (2) Learning subject based on detection information at the time of subject search During the subject search operation, it is judged what kind of person, object or scene the subject registered with personal identification is taken at the same time, and the time taken simultaneously within the angle of view Calculate the ratio.

例えば、個人認証登録被写体の人物Ａが個人認証登録被写体の人物Ｂと同時に写っている時間比率が所定閾値よりも高い場合重要度が高いと判定できる。このため、人物Ａと人物Ｂが画角内に入る場合は、自動撮影判定の点数が高くなるように各種検出情報を学習データとして保存して学習モード処理７１６で学習する。 For example, it can be determined that the importance is high when the time ratio in which the person A of the personal identification registration subject is shown simultaneously with the person B of the personal identification registration subject is higher than a predetermined threshold. For this reason, when the person A and the person B fall within the angle of view, various types of detection information are stored as learning data so as to increase the score of the automatic photographing determination, and learning is performed in the learning mode processing 716.

他の例では、個人認証登録被写体の人物Ａが一般物体認識により判定された被写体「猫」と同時に写っている時間比率が所定閾値よりも高い場合、重要度が高いと判定できる。このため、人物Ａと「猫」が画角内に入る場合は、自動撮影判定の点数が高くなるように各種検出情報を学習データとして保存する。そして、学習モード処理７１６で学習する。 In another example, it can be determined that the importance is high when the time ratio at which the person A of the personal identification registration subject simultaneously appears with the subject "cat" determined by general object recognition is higher than a predetermined threshold. Therefore, when the person A and the "cat" fall within the angle of view, various detection information is stored as learning data so that the score of the automatic photographing determination becomes high. Then, learning is performed in a learning mode process 716.

このように、探索中の被写体の現れる頻度が高い場合に、自動撮影判定の点数が高くなるようにすると、個人認証登録されている被写体の近くの人や物体の重要度も、高くなるように変更することができる。 As described above, when the number of occurrences of the subject under search is high, if the score of the automatic photographing determination is increased, the importance of the person or object near the subject registered for personal identification is also increased. It can be changed.

また、個人認証登録被写体の人物Ａの笑顔度を検出したり、表情の検出により「喜び」「驚き」などが検出されたとき、同時に写っている被写体は重要であるように学習される処理が行われる。また、表情が「怒り」「真顔」などが検出されたときの、同時に写っている被写体は重要である可能性が低いので学習することはしないなどの処理が行われる。 In addition, when the degree of smile of the person A of the personal identification registration subject is detected or “joy” or “surprise” is detected by the detection of the expression, the subject that is shown at the same time is learned to be important To be done. Also, processing such as not learning is performed because the subject appearing at the same time when an expression such as "anger" or "true face" is detected is unlikely to be important.

次に、本実施形態における外部通信機器との連携による学習を説明する。本実施形態における外部通信機器との連携による学習には、以下の方法がある。 Next, learning in cooperation with an external communication device in the present embodiment will be described. There are the following methods for learning in cooperation with an external communication device in the present embodiment.

（３）外部通信機器で画像を取得したことによる学習
図３で説明したとおり、撮像装置１０１と外部機器３０１は、通信３０２、３０３の通信手段を有している。主に通信３０２によって画像の送受信が行われ、外部機器３０１内の専用のアプリケーションを介して、撮像装置１０１内の画像を外部機器３０１に通信取得することができる。また、撮像装置１０１内の保存されている画像データのサムネイル画像を外部機器３０１内の専用のアプリケーションを介して、閲覧可能な構成である。これにより、ユーザはサムネイル画像の中から、自分が気に入った画像を選択して、画像確認し、画像取得指示を操作することで外部機器３０１に画像取得できる。 (3) Learning Based on Acquisition of Image by External Communication Device As described in FIG. 3, the imaging apparatus 101 and the external device 301 have communication means of the communication 302 and 303. The image transmission / reception is mainly performed by the communication 302, and the image in the imaging apparatus 101 can be acquired by communication with the external device 301 via a dedicated application in the external device 301. Also, the thumbnail image of the image data stored in the imaging apparatus 101 can be browsed through a dedicated application in the external device 301. In this way, the user can select an image that he / she likes from the thumbnail images, check the image, and operate the image acquisition instruction to acquire the image on the external device 301.

このとき、ユーザが画像を選んで送信指示し取得しているので、取得された画像はユーザの好みの画像である可能性が非常に高い。よって取得された画像は、学習すべき画像であると判定し、取得された画像の学習情報を基に学習することでユーザの好みの各種学習を行うことができる。 At this time, since the user has selected an image and instructed to transmit and acquire it, the acquired image is very likely to be an image preferred by the user. Therefore, it is possible to perform various types of learning of the user's preference by determining that the acquired image is an image to be learned and learning based on the learning information of the acquired image.

操作例を説明する。スマートデバイスである外部機器３０１の専用のアプリケーションを介して、撮像装置１０１内の画像を閲覧している例を図１４に示す。表示装置４０７に撮像装置内に保存されている画像データのサムネイル画像（１４０４乃至１４０９）を表示してあり、ユーザは自分が気に入った画像を選択し画像取得を行える。このとき、表示方法を変更する表示方法変更部（１４０１、１４０２、１４０３）が設けられている。１４０１を押下すると表示順序が日時優先表示モードに変更され、撮像装置１０１内画像の撮影日時の順番で表示装置４０７に画像が表示される。（例えば、１４０４は日時が新しく、１４０９が日時は古いように表示される。）１４０２を押下するとおすすめ画像優先表示モードに変更される。図９Ｓ９１２で演算した各画像に対してユーザの好みを判定したスコアに基づいて、撮像装置１０１内画像のスコアの高い順番で表示装置４０７に画像が表示される。（例えば、１４０４はスコアが高く、１４０９がスコアは低いように表示される。）１４０３を押下すると人物や物体被写体を指定でき、続いて特定の人物や物体被写体を指定すると特定の被写体のみを表示することもできる。 An operation example will be described. An example in which an image in the imaging apparatus 101 is browsed via an application dedicated to the external device 301 which is a smart device is shown in FIG. The thumbnail images (1404 to 1409) of the image data stored in the imaging apparatus are displayed on the display device 407, and the user can select an image that he / she likes and can acquire an image. At this time, a display method change unit (1401, 1402, 1403) for changing the display method is provided. When 1401 is pressed, the display order is changed to the date and time priority display mode, and the images are displayed on the display device 407 in the order of the shooting date and time of the image in the imaging device 101. (For example, the date and time is displayed as 1404 is new, and 1409 is displayed as the date and time are old.) When 1402 is pressed, the recommended image priority display mode is changed. The images are displayed on the display device 407 in the descending order of the scores of the images in the imaging apparatus 101 based on the scores obtained by determining the preference of the user for each image calculated in FIG. 9S912. (For example, 1404 indicates that the score is high and 1409 indicates that the score is low.) Pressing 1403 can designate a person or an object subject, and subsequently designating a specific person or an object subject displays only a specific subject. You can also

１４０１乃至１４０３は同時に設定をＯＮすることもでき、例えばすべての設定がＯＮされている場合、指定された被写体のみを表示し、且つ、撮影日時が新しい画像が優先され、且つ、スコアの高い画像が優先され、表示されることになる。 1401 to 1403 can simultaneously turn on the settings. For example, when all the settings are turned on, only a designated subject is displayed, and an image with a new shooting date and time is prioritized, and an image with a high score is displayed. Will take precedence and will be displayed.

このように、撮影画像に対してもユーザの好みを学習しているため、撮影された大量の画像の中から簡単な確認作業でユーザの好みの画像のみを簡単に抽出することが可能である。 As described above, since the user's preference is learned also for the photographed image, it is possible to easily extract only the user's favorite image from a large number of photographed images by a simple confirmation operation. .

（４）外部通信機器を介して、画像に判定値を入力することによる学習
上記で説明したとおり、撮像装置１０１と外部機器３０１は、通信手段を有しており、撮像装置１０１内の保存されている画像を外部機器３０１内の専用のアプリケーションを介して、閲覧可能な構成である。ここで、ユーザは、各画像に対して点数付を行う構成にしてもよい。ユーザが好みと思った画像に対して高い点数（例えば５点）を付けたり、好みでないと思った画像に対して低い点数（例えば１点）を付けることができ、ユーザの操作によって、撮像装置が学習していくような構成にする。各画像の点数は、撮像装置内で学習情報と共に再学習に使用する。指定した画像情報からの特徴データを入力にした、ニューラルネットワークの出力がユーザが指定した点数に近づくように学習される。 (4) Learning by inputting a determination value to an image through an external communication device As described above, the imaging device 101 and the external device 301 have communication means, and are stored in the imaging device 101. These images can be browsed through a dedicated application in the external device 301. Here, the user may be configured to score each image. A high score (for example, 5 points) can be added to an image that the user thinks is preferred, or a low score (for example, 1 point) can be added to an image thought to be unpreferable. Will be structured as you learn. The score of each image is used for relearning along with learning information in the imaging device. Learning is performed so that the output of the neural network, which has feature data from specified image information as an input, approaches a point specified by the user.

本実施形態では、通信機器３０１を介して、撮影済み画像にユーザが判定値を入力する構成にしたが、撮像装置１０１を操作して、直接、画像に判定値を入力する構成にしてもよい。その場合、例えば、撮像装置１０１にタッチパネルディスプレイを設け、タッチパネルディスプレイ画面表示装置に表示されたＧＵＩボタンをユーザが押下して、撮影済み画像を表示するモードに設定する。そして、ユーザは撮影済み画像を確認しながら、各画像に判定値を入力するなどの方法により、同様の学習を行うことができる。 In the present embodiment, the user inputs the determination value to the photographed image via the communication device 301. However, the imaging device 101 may be operated to directly input the determination value to the image. . In that case, for example, the imaging device 101 is provided with a touch panel display, and the user presses a GUI button displayed on the touch panel display screen display device to set a mode for displaying a photographed image. Then, the user can perform similar learning by a method of inputting a determination value to each image while confirming the photographed image.

（５）外部通信機器で、パラメータを変更することによる学習
上記で説明したとおり、撮像装置１０１と外部機器３０１は、通信手段を有しており、撮像装置１０１内に現在設定されている学習パラメータを外部機器３０１に通信し、外部機器３０１の記憶回路４０４に保存することができる。学習パラメータとしては、例えば、ニューラルネットワークの重みや、ニューラルネットワークに入力する被写体の選択などが考えられる。また、外部機器３０１内の専用のアプリケーションを介して、専用のサーバにセットされた学習パラメータを公衆回線制御回路４０６を介して取得して、撮像装置１０１内の学習パラメータに設定することもできる構成とする。これにより、ある時点でのパラメータを外部機器３０１に保存しておいて、撮像装置１０１に設定することで学習パラメータを戻すこともできるし、他のユーザが持つ学習パラメータを専用のサーバを介して取得し自身の撮像装置１０１に設定することもできる。 (5) Learning by changing parameters in an external communication device As described above, the imaging device 101 and the external device 301 have communication means, and learning parameters currently set in the imaging device 101 Can be communicated to the external device 301 and stored in the storage circuit 404 of the external device 301. As learning parameters, for example, weights of a neural network, selection of an object to be input to the neural network, and the like can be considered. In addition, the learning parameter set in the dedicated server can be acquired via the public line control circuit 406 via the dedicated application in the external device 301, and can be set as the learning parameter in the imaging apparatus 101. I assume. In this way, the parameters at a certain point in time can be stored in the external device 301, and learning parameters can be returned by setting them in the imaging apparatus 101, or learning parameters possessed by other users can be obtained via a dedicated server. It is also possible to acquire and set the image capturing apparatus 101 of its own.

次に、学習処理シーケンスについて説明する。 Next, the learning processing sequence will be described.

図７のＳ７０４のモード設定判定にて、学習処理を行うべきか否かを判定し、学習処理を行う場合、学習モードであると判定され、Ｓ７１２の学習モード処理を行う。 In the mode setting determination in S704 of FIG. 7, it is determined whether or not the learning process should be performed. When the learning process is performed, it is determined that the learning mode is set, and the learning mode process in S712 is performed.

学習モードの判定条件を説明する。学習モードに移行するか否かは、前回学習処理を行ってからの経過時間と、学習に使用できる情報の数、通信機器を介して学習処理指示があったかなどから判定される。Ｓ７０４のモード設定判定処理内で判定される、学習モードに移行すべきか否かの判定処理フローを図１２に示す。 The determination conditions of the learning mode will be described. Whether or not to shift to the learning mode is determined based on the elapsed time since the previous learning process, the number of information that can be used for learning, and whether there is a learning process instruction via the communication device. FIG. 12 shows a flow of processing for determining whether or not to shift to the learning mode, which is determined in the mode setting determination processing of S704.

Ｓ７０４のモード設定判定処理内で学習モード判定が開始指示されると、図１２の処理がスタートする。Ｓ１２０１では、外部機器３０１からの登録指示があるかどうかを判定する。ここでの登録は、上記説明した学習するための登録指示があったかどうかの判定である。例えば、＜通信機器で画像取得された画像情報による学習＞や、＜通信機器を介して、画像に判定値を入力することによる学習＞がある。Ｓ１２０１で、外部機器からの登録指示があった場合、Ｓ１２０８に進み、学習モード判定をＴＲＵＥにして、Ｓ７１２の処理を行うように設定する。Ｓ１２０１で外部機器からの登録指示がない場合、Ｓ１２０２に進む。Ｓ１２０２では外部機器からの学習指示があるかどうかを判定する。ここでの学習指示は＜通信機器で、撮像装置パラメータを変更することによる学習＞のように、学習パラメータをセットする指示があったかどうかの判定である。Ｓ１２０２で、外部機器からの学習指示があった場合、Ｓ１２０８に進み、学習モード判定をＴＲＵＥにして、Ｓ７１２の処理を行うように設定し、学習モード判定処理を終了する。Ｓ１２０２で外部機器からの学習指示がない場合、Ｓ１２０３に進む。 When the start of the learning mode determination is instructed in the mode setting determination process of S704, the process of FIG. 12 starts. In S1201, it is determined whether there is a registration instruction from the external device 301. The registration here is a determination as to whether or not there is a registration instruction for learning as described above. For example, there are <learning based on image information acquired by the communication device> and <learning based on inputting a determination value to an image through the communication device>. If there is a registration instruction from the external device in S1201, the process advances to S1208, the learning mode determination is set to TRUE, and the process of S712 is set to be performed. If there is no registration instruction from the external device in S1201, the process proceeds to S1202. In S1202, it is determined whether there is a learning instruction from an external device. The learning instruction here is a determination as to whether or not there is an instruction to set a learning parameter, as in <Learning by changing the imaging device parameter in the communication device>. If there is a learning instruction from the external device in S1202, the process advances to S1208, the learning mode determination is set to TRUE, and the process of S712 is set to be performed, and the learning mode determination process is ended. If there is no learning instruction from the external device in S1202, the process proceeds to S1203.

Ｓ１２０３では、前回学習処理（ニューラルネットワークの重みの再計算）が行われてからの経過時間ＴｉｍｅＮを取得し、Ｓ１２０４に進む。Ｓ１２０４では、学習する新規のデータ数ＤＮ（前回学習処理が行われてからの経過時間ＴｉｍｅＮの間で、学習するように指定された画像の数）を取得し、Ｓ１２０５に進む。Ｓ１２０５では、ＴｉｍｅＮから閾値ＤＴを演算する。例えば、ＴｉｍｅＮが所定値よりも小さい場合の閾値ＤＴａが、所定値よりも大きい場合の閾値ＤＴｂよりも大きく設定されており、時間経過によって、閾値が小さくなるように設定してある。これにより、学習データが少ない場合においても、時間経過が大きいと再度学習するようにすることで、使用時間に応じて撮像装置が学習変化し易いようにしてある。 In S1203, an elapsed time Time N after the previous learning process (recalculation of weights of the neural network) is obtained, and the process proceeds to S1204. In S1204, the number of new data to be learned DN (the number of images designated to be learned during the elapsed time Time N since the previous learning process was performed) is acquired, and the process proceeds to S1205. In S1205, the threshold value DT is calculated from TimeN. For example, the threshold value DTa when TimeN is smaller than a predetermined value is set larger than the threshold value DTb when larger than the predetermined value, and the threshold value is set to be smaller with the passage of time. As a result, even when the amount of learning data is small, the image pickup apparatus is made to easily change learning according to the time of use by learning again if the time elapsed is large.

Ｓ１２０５で閾値ＤＴを演算すると、Ｓ１２０６に進み、学習するデータ数ＤＮが、閾値ＤＴよりも大きいか否かを判定する。ＤＮが、閾値ＤＴよりも大きい場合、Ｓ１２０７に進み、ＤＮを０に設定した後、Ｓ１２０８に進み、学習モード判定をＴＲＵＥにして、Ｓ７１２の処理を行うように設定し、学習モード判定処理を終了する。 When the threshold value DT is calculated in S1205, the process proceeds to S1206, and it is determined whether the number of data to be learned DN is larger than the threshold value DT. If the DN is larger than the threshold value DT, the process proceeds to S1207, sets the DN to 0, and then proceeds to S1208, sets the learning mode determination to TRUE, performs the process of S712, and ends the learning mode determination process. Do.

Ｓ１２０６でＤＮが、閾値ＤＴ以下の場合、Ｓ１２０９に進む。外部機器からの登録指示も、外部機器からの学習指示もなく、且つ学習データ数も所定値以下であるので、学習モード判定をＦＡＬＳＥにし、Ｓ７１２の処理は行わないように設定し、学習モード判定処理を終了する。 If the DN is less than the threshold value DT in S1206, the process proceeds to S1209. Since there is neither a registration instruction from an external device nor a learning instruction from an external device, and the number of learning data is less than a predetermined value, the learning mode determination is set to FALSE and the process of S712 is not performed, and the learning mode determination is made. End the process.

次に、学習モード処理（Ｓ７１２）内の処理について説明する。学習モード処理の詳細なフローを図１３に示す。 Next, the process in the learning mode process (S712) will be described. The detailed flow of the learning mode process is shown in FIG.

図７のＳ７１１で学習モードと判定され、Ｓ７１２に進むと、図１３の処理がスタートする。Ｓ１３０１では、外部機器３０１からの登録指示があるかどうかを判定する。Ｓ１３０１で、外部機器からの登録指示があった場合、Ｓ１３０２に進む。Ｓ１３０２では、各種登録処理を行う。 When it is determined in S711 of FIG. 7 that the learning mode is set, and the process proceeds to S712, the process of FIG. 13 starts. In S1301, it is determined whether there is a registration instruction from the external device 301. If there is a registration instruction from the external device in S1301, the process proceeds to S1302. In S1302, various registration processing is performed.

各種登録は、ニューラルネットワークに入力する特徴の登録であり、例えば顔認証の登録や、一般物体認識の登録や、音情報の登録や、場所情報の登録などである。 The various registrations are registrations of features to be input to the neural network, such as registration of face recognition, registration of general object recognition, registration of sound information, registration of location information, and the like.

登録処理を終了すると、Ｓ１３０３に進み、Ｓ１３０２で登録された情報から、ニューラルネットワークへ入力する要素を変更する。 When the registration process is completed, the process proceeds to S1303, and the element to be input to the neural network is changed from the information registered in S1302.

Ｓ１３０３の処理を終了すると、Ｓ１３０７に進む。 When the process of S1303 ends, the process proceeds to S1307.

Ｓ１３０１で外部機器３０１からの登録指示がない場合、Ｓ１３０４に進み、外部機器３０１からの学習指示があるかどうかを判定する。外部機器からの学習指示があった場合、Ｓ１３０５に進み、外部機器から通信された学習パラメータを各判定器（ニューラルネットワークの重みなど）に設定し、Ｓ１３０７に進む。 If there is no registration instruction from the external device 301 in S1301, the process advances to S1304 to determine whether there is a learning instruction from the external device 301. If there is a learning instruction from the external device, the process advances to step S1305 to set the learning parameters communicated from the external device in each determiner (such as the weight of the neural network), and the process advances to step S1307.

Ｓ１３０４で外部機器からの学習指示がない場合、Ｓ１３０６で学習（ニューラルネットワークの重みの再計算）を行う。Ｓ１３０６の処理に入るのは、図１２を用いて説明したように、学習するデータ数ＤＮが閾値を超えて、各判定器の再学習を行える条件である。誤差逆伝搬法或いは、勾配降下法などの方法を使って再学習させ、ニューラルネットワークの重みを再計算して、各判定器のパラメータを変更する。学習パラメータが設定されると、Ｓ１３０７に進む。 If there is no learning instruction from the external device in S1304, learning (recalculation of weight of neural network) is performed in S1306. As described with reference to FIG. 12, the process of S1306 is a condition under which the number of data to be learned DN exceeds the threshold, and the respective learners can perform relearning. Retraining is performed using a method such as error back propagation method or gradient descent method, and weights of the neural network are recalculated to change parameters of each decision unit. When the learning parameter is set, the process proceeds to S1307.

Ｓ１３０７では、ファイル内の画像を再スコア付する。本実施形態においては、学習結果に基づいてファイル（記録媒体２２１）内に保存されているすべての撮影画像にスコアを付けておき、付けられたスコアに応じて、自動編集や自動ファイル削除を行う構成となっている。よって、再学習や外部機器からの学習パラメータのセットが行われた場合には、撮影済み画像のスコアも更新を行う必要がある。よって、Ｓ１３０７では、ファイル内に保存されている撮影画像に対して新たなスコアを付ける再計算が行われ、処理が終了すると学習モード処理を終了する。 At S1307, the images in the file are re-scored. In the present embodiment, all photographed images stored in the file (recording medium 221) are scored based on learning results, and automatic editing and automatic file deletion are performed according to the scored scores. It is a structure. Therefore, when re-learning or setting of learning parameters from an external device is performed, it is also necessary to update the scores of captured images. Therefore, in S1307, recalculation for adding a new score to the photographed image stored in the file is performed, and when the process ends, the learning mode process ends.

本実施形態においては、撮像装置１０１内で、学習する構成を基に説明したが、外部機器３０１側に学習処理をもち、学習に必要なデータを外部機器３０１に通信し、外部機器側でのみ学習を実行する構成でも同様の学習効果を実現可能である。その場合、上記＜通信機器で、パラメータを変更することによる学習＞で説明したように、外部機器側で学習したニューラルネットワークの重みなどのパラメータを撮像装置１０１に通信により設定することで学習を行う構成にしてもよい。 In the present embodiment, although the explanation has been made based on the configuration for learning in the imaging apparatus 101, the external device 301 has a learning process, data necessary for learning is communicated to the external device 301, and only the external device The same learning effect can be realized even with a configuration that executes learning. In that case, as described above in <Learning by changing parameters in the communication device>, learning is performed by setting parameters such as weights of the neural network learned on the external device side in the imaging apparatus 101 by communication. It may be configured.

また、撮像装置１０１内と、外部機器３０１内の両方に、それぞれ学習処理をもつ構成にしてもよい。例えば撮像装置１０１内で学習モード処理７１６が行われるタイミングで外部機器３０１が持つ学習情報を撮像装置１０１に通信し、学習パラメータをマージすることで学習を行う構成にしてもよい。 Further, both the inside of the imaging apparatus 101 and the inside of the external device 301 may have a learning process. For example, learning information possessed by the external device 301 may be communicated to the imaging apparatus 101 at the timing when the learning mode processing 716 is performed in the imaging apparatus 101, and learning may be performed by merging learning parameters.

次に、ニューラルネットワークの学習において教師データの不足を補う方法について説明する。 Next, a method of compensating for the shortage of teacher data in learning of a neural network will be described.

ニューラルネットワークで入力値から精度良く出力値を推定するためには十分な数の教師データが必要とされる。教師データの数に対してニューラルネットワークのモデルが複雑で自由度が高いと推定精度を上げるのは難しい。また機械学習の分野では、教師データと少し異なるデータであってもロバストに推定ができるようするためにData Augmentationという処理を行う場合がある。これは教師データ（この場合は画像）に対してアスペクト比の変更、回転（ロール、ピッチ、ヨー）、ぼかし、ノイズ付加、ずらし等の画像処理を加えることで行うことが多い。しかしながら、必ずしもカメラで撮影できる画像と一致するとはいえない。たとえば画像処理でぼかしを加えたとしても、実際にカメラで絞りを開放にしたり、ピントをずらしたりしても、同等のぼかしが実現できるとは限らない。 A sufficient number of training data is required to accurately estimate the output value from the input value in the neural network. It is difficult to increase the estimation accuracy if the model of the neural network is complex and has a high degree of freedom with respect to the number of teacher data. In the field of machine learning, in order to be able to estimate robustly even data that is slightly different from teacher data, a process called Data Augmentation may be performed. This is often performed by adding image processing such as change of aspect ratio, rotation (roll, pitch, yaw), blurring, addition of noise, shift and the like to teacher data (in this case, an image). However, it does not necessarily coincide with the image that can be taken by the camera. For example, even if blurring is added in image processing, equivalent blurring can not always be realized even if the camera actually opens the aperture or shifts the focus.

ニューラルネットワークで推定をしたいデータと教師データが似ていないと、この教師データがニューラルネットワークの推定精度を下げる要因となりかねない。また、所定の回転（ロール、ピッチ、ヨー）を加えたとしても、実際に人間がカメラで撮影する角度を再現できるとは限らない。具体的には、画像を単純に画像中心からたとえば４５度や９０度回転させても、ユーザは被写体が直立していない写真を撮る機会は少ないので、ユーザの好みを学習するための教師データとしては貢献度が低い。 If the data desired to be estimated by the neural network and the teacher data are not similar, this teacher data may be a factor to reduce the estimation accuracy of the neural network. Further, even if a predetermined rotation (roll, pitch, yaw) is added, it is not always possible to reproduce the angle at which a person actually shoots with a camera. Specifically, even if the image is simply rotated 45 degrees or 90 degrees from the center of the image, for example, the user does not have a chance to take a picture with the subject not standing upright. Have a low degree of contribution.

このように画像処理によって教師データの不足を補うことは難しく、好ましくは実際の撮影によって教師データを増加させる方が良い。もしくは画像処理でData Augmentationを行う場合も、カメラとして撮影できない画像よりも、カメラで撮影できる画像に近いものの方が良い。そこで本実施形態では、学習用に実際の撮影を自動的に行って教師データを増加させる方法について説明する。 As described above, it is difficult to compensate for the shortage of teacher data by image processing, and it is preferable to increase teacher data by actual photographing. Alternatively, when performing data augmentation by image processing, it is better to use an image closer to an image that can be captured by the camera than an image that can not be captured by the camera. Therefore, in the present embodiment, a method of automatically performing actual shooting for learning to increase teacher data will be described.

上述したように、図９のＳ９０７でユーザによる撮影指示があったと判別された場合、Ｓ９１０およびＳ９１４に進む。 As described above, when it is determined in S907 of FIG. 9 that there is a photographing instruction from the user, the process proceeds to S910 and S914.

Ｓ９１４では現在の教師データ数が所定の数Ｎ（Ｎは自然数）より小さいか否かが判断される。そして、現在の教師データ数が所定の数Ｎより小さいときのみ、教師データが不足しているとしてＳ９１５に進み、教師データの補充のための学習用自動撮影を行う。このＮは、ニューラルネットワークの複雑さや自由度（ノード数や中間層の層数）に応じて変化させるとよい。ニューラルネットワークが複雑であったり、自由度が高かったりすると必要な教師データ数は増えるので、Ｎを増加させる。Ｓ９１４で現在の教師データ数≧Ｎになり、十分な教師データが蓄えられたと判断された場合は、Ｓ９１５をスキップして、Ｓ９１２に進む。 In S914, it is determined whether the current number of teacher data is smaller than a predetermined number N (N is a natural number). Then, only when the current number of teacher data is smaller than the predetermined number N, it is determined that the teacher data is insufficient, the process proceeds to S915, and learning automatic imaging for supplementing the teacher data is performed. This N may be changed according to the complexity and the degree of freedom of the neural network (the number of nodes and the number of layers in the middle layer). If the neural network is complex or has a high degree of freedom, the number of required teacher data increases, so N is increased. If it is determined in S914 that the current number of teacher data is N N and it is determined that sufficient teacher data has been stored, S915 is skipped and the process proceeds to S912.

Ｓ９０７で手動撮影指示があった時にのみ学習用自動撮影をするのは、前述したとおり、手動撮影される場合はユーザの好みの被写体、好みのシーン、好みの場所や時間間隔を基に撮影された可能性が非常に高いためである。よってこのときに学習用自動撮影をすれば、ユーザの好みを反映した教師データが取得できる可能性が高い。 The automatic shooting for learning only when a manual shooting instruction is given in S 907 is, as described above, when shooting manually, shooting is performed based on the user's favorite subject, favorite scene, favorite place and time interval. Because the possibility is very high. Therefore, if automatic imaging for learning is performed at this time, there is a high possibility that teacher data reflecting the preference of the user can be acquired.

またＳ９１５の学習用自動撮影とＳ９１０の手動撮影は同時には行えないので、タイミングをずらして行う。どちらが先でもよいが、学習用自動撮影と手動撮影が連続して行われるようにする。もし学習用自動撮影が遅れてしまうと、ユーザがカメラを動かしてしまって手動撮影時の好ましい構図から離れてしまう恐れがある。一方、手動撮影が遅れてしまうと、シャッタータイミングがずれることになる。以降の説明では特に明記しないかぎり、最初に手動撮影が行われ、直後に学習用自動撮影が行われるものとする。 In addition, since the automatic shooting for learning in step S915 and the manual shooting in step S910 can not be performed simultaneously, the timing is shifted. Either way may be ahead, but automatic shooting for learning and manual shooting should be performed continuously. If the automatic learning for learning is delayed, the user may move the camera and move away from the preferable composition at the time of manual photographing. On the other hand, if the manual shooting is delayed, the shutter timing will be shifted. In the following description, unless otherwise specified, manual imaging is performed first and automatic imaging for learning is performed immediately thereafter.

Ｓ９１５では学習用自動撮影が行われる。学習用自動撮影はいくつかの方法が考えられる。一つ目は連写である。手動撮影後に学習用に自動で連写を行い、連写画像を取得する。手動撮影とタイミングが近ければ、ユーザの好みの画像と近い教師データを複数取得できる。この場合、手動撮影で得られた画像は記録画像として扱われるため、記録媒体２２１に記録されるが、学習用自動撮影で得られた画像は学習のためだけに用いられ、ユーザの目には付かない。 In S915, learning automatic imaging is performed. There are several methods for automatic learning for learning. The first one is continuous shooting. After manual shooting, continuous shooting is performed automatically for learning, and continuous shooting images are acquired. If the timing is close to that of the manual shooting, it is possible to acquire a plurality of teacher data similar to the user's favorite image. In this case, an image obtained by manual shooting is treated as a recorded image, and thus is recorded in the recording medium 221. However, an image obtained by automatic learning for learning is used only for learning, and is used by the user's eyes. It does not attach.

二つ目は動画撮影である。手動撮影の前か後に自動的に動画撮影が行われ、静止画と動画が組み合わされてユーザへ提供される機能が一般のカメラやライフログカメラに搭載されることがある。あるいは、カメラは、リングバッファなどのメモリに、一定期間の動画を常に上書きしておいて、静止画が撮影されたタイミングを基準とした前後の所定期間の動画をユーザへ提供する機能が搭載されることがある。この自動的に取得される動画を静止画に分解して教師データとする。これも一つ目の連写と同じ理由で教師データとしての価値がある。なお、この機能を使ったときに制限されるわけではなく、学習用の目的だけで動画を撮影してもよい。その場合、動画はユーザへ提供されない。 The second is moving image shooting. Before or after manual shooting, moving image shooting is automatically performed, and a function in which a still image and a moving image are combined and provided to the user may be installed in a general camera or a life log camera. Alternatively, the camera has a function of constantly overwriting a moving image for a predetermined period in a memory such as a ring buffer and providing the user with moving images of a predetermined period before and after the timing at which the still image was taken. There is a thing. This automatically acquired moving image is decomposed into still images and used as teacher data. This is also valuable as teacher data for the same reason as the first continuous shooting. In addition, when using this function, it is not necessarily restricted and you may shoot a moving image only for the purpose of learning. In that case, the video is not provided to the user.

三つ目はブラケット撮影である。ブラケット撮影は手動撮影での撮影条件を少しずつ変化させて行う。変化させる撮影条件はカメラで変更可能なパラメータであればよく、フォーカス、露出、ホワイトバランス、ストロボ発光、ズーム、シャープネスなどである。これらの撮影条件を変えることでData Augmentationと同じ効果が期待できる。もしカメラで実現不可能なData Augmentationを行い教師データとしてしまうと、それから学習したニューラルネットワークでは、その教師データに近いものしか良く推定できなくなってしまう。これではカメラ用のニューラルネットワークとしては不適格なものとなってしまう。そのため、カメラで実現可能なブラケット撮影による教師データの増加であれば、Data Augmentationとしての効果が期待できる。 The third is bracket shooting. Bracket shooting is performed by gradually changing shooting conditions in manual shooting. The shooting conditions to be changed may be parameters that can be changed by the camera, such as focus, exposure, white balance, flash, zoom, and sharpness. By changing these shooting conditions, the same effect as Data Augmentation can be expected. If you use Data Augmentation, which can not be realized with a camera, and use it as teacher data, the neural network learned from it can only estimate things close to the teacher data well. This makes it unsuitable as a neural network for cameras. Therefore, if the increase in teacher data by bracket photography that can be realized by a camera, an effect as Data Augmentation can be expected.

なお、ブラケット撮影の中には手動撮影直後に行わないといけないものと、ある程度時間がたってからでも可能なものとがある。前者はフォーカス、ズームなどの機械動作を伴うものである。これらは手動撮影と連続して撮らないと構図が変わってしまい教師データとして成り立たない。一方後者は、ホワイトバランスやシャープネス、ＲＡＷ画像データの現像条件等の画像処理によるものである。これらは例え手動撮影と連続して行えなくても、手動撮影画像を基に生成できる。この場合は手動撮影画像のＲＡＷデータを記録するようにしてもよい。手動撮影画像を基に生成する場合は、撮影時である必要はなく、カメラの待機中などに生成してもよい。 There are two types of bracket shooting, one that must be performed immediately after manual shooting, and one that can be performed even after a certain amount of time. The former involves mechanical operations such as focusing and zooming. If these are not taken continuously with manual shooting, the composition will change and can not be used as teacher data. On the other hand, the latter is based on image processing such as white balance, sharpness, development conditions of RAW image data, and the like. Even if these can not be performed continuously with the manual imaging, they can be generated based on the manually captured image. In this case, RAW data of a manually captured image may be recorded. When generating based on a manually captured image, it does not have to be at the time of shooting, and may be generated while the camera is on standby.

このようにブラケット撮影の中には手動撮影と連続して行う必要があるものと、連続して行う必要がないものがあるため、ブラケット撮影の種別に優先順位を設けて自動撮影を行ってもよい。手動撮影と連続して行う必要があるブラケット撮影を先に行うということである。 As described above, there are some types of bracket imaging that need to be performed continuously with manual imaging and some that do not need to be performed continuously, so even if priority is given to the type of bracket imaging and automatic imaging is performed. Good. This means that bracket shooting, which needs to be performed continuously with manual shooting, is performed first.

また、手動撮影から学習用自動撮影を行うまでに、カメラの角速度計１０６と加速度計１０７の情報等から、ユーザがカメラを動かしたと判断される場合には、学習用自動撮影を中止するようにしてもよい。 In addition, when it is determined from the information of the angular velocity sensor 106 of the camera and the accelerometer 107 etc. that the user moved the camera from manual imaging to automatic imaging for learning, automatic imaging for learning is to be canceled. May be

Ｓ９１２では手動撮影と学習用自動撮影の学習用情報を生成し、教師データを作成する。学習用自動撮影で得られた画像についても手動撮影で得られた画像と同様の方法で学習用情報が生成できる。手動撮影で得られた画像はユーザの好みである可能性が高いので、所定の高いスコアを付ける。そして、そのスコアを学習用自動撮影で得られた画像から生成された教師データにもつける。 In S912, learning information for manual shooting and automatic shooting for learning is generated, and teacher data is created. The learning information can also be generated in the same manner as the image obtained by the manual shooting, for the image obtained by the automatic shooting for learning. The images obtained by the manual imaging are highly likely to be the user's preference, so they are given a predetermined high score. Then, the score is also attached to teacher data generated from an image obtained by automatic learning for learning.

あるいは、学習用自動撮影で得られた画像に対して、手動撮影で得られた画像との関係に応じたスコアをつけるようにしてもよい。たとえば、手動撮影と間を空けずに学習用自動撮影が行われたのであれば、学習用自動撮影で得られた画像に対しても手動撮影で得られた画像と同等の高いスコアを付与する。そして、手動撮影と、学習用自動撮影の間隔が離れるにつれて、学習用自動撮影で得られた画像に対するスコアを下げていくということもできる。これにより、ユーザが指示したベストショットタイミングの手動撮影画像の点数が一番高くなり、それよりずれるにつれて低い点がついていくため、ユーザのシャッタータイミングの好みを学習することができる。あるいは、学習用自動撮影で得られた画像のそれぞれに対して手動撮影画像と類似度を比較し、その類似度に応じてスコアをつけるようにしてもよい。さらに、被写体が動体であったり、被写体を含むシーンが変化していたりする場合には、手動撮影のタイミングで撮影された画像の前後の画像を、敢えて、負の教師データとして学習に用いるようにしてもよい。こうすることで、ユーザのシャッタータイミングの好みをより厳密に学習することができるようになることが期待できる。また、前後の画像の代わりに、連続して撮像された画像のうちの、手動撮影で得られた画像との類似度が閾値より低い画像を負の教師データとしてもよい。 Alternatively, the image obtained by the automatic learning for learning may be scored according to the relationship with the image obtained by the manual shooting. For example, if automatic imaging for learning is performed without leaving a space between manual imaging, an image obtained by automatic imaging for learning is also given a high score equivalent to an image obtained by manual imaging. . Then, as the interval between the manual imaging and the learning automatic imaging increases, it is possible to lower the score of the image obtained by the learning automatic imaging. As a result, the score of the manually shot image at the best shot timing instructed by the user is the highest, and the point becomes lower as it deviates from that point, so that the user's preference for shutter timing can be learned. Alternatively, the degree of similarity may be compared with that of the manually captured image for each of the images obtained by learning for automatic imaging, and a score may be added according to the degree of similarity. Furthermore, when the subject is a moving object or the scene including the subject is changed, the images before and after the image captured at the timing of manual shooting are dared to be used as learning data as negative teacher data. May be By doing this, it can be expected that the user's shutter timing preference can be learned more strictly. Further, instead of the images before and after, an image whose similarity with the image obtained by the manual imaging is lower than the threshold among the continuously captured images may be set as the negative teacher data.

また、ブラケット画像についても同様の考えで、ブラケットによって設定された撮影条件が、手動撮影にて設定された撮影条件から離れるにつれてスコアを下げることもできる。たとえば手動撮影で得られた画像に一番高いスコアをつけ、ブラケット撮影で露出補正＋１とした画像に二番目に高いスコアをつけ、露出補正＋２とした画像に三番目に高いスコアをつけるなどである。これにより、ユーザの好きな撮影条件についても学習することができる。 Also, with regard to the bracket image, in the same way, the score can be lowered as the shooting condition set by the bracket deviates from the shooting condition set by the manual shooting. For example, the highest score is given to the image obtained by manual shooting, the second highest score is given to the image set as exposure correction + 1 by bracket shooting, the third highest score to the image set as exposure correction + 2 etc. is there. This makes it possible to learn about the user's favorite imaging conditions.

学習用自動撮影で得られた画像の学習用情報は、手動撮影で得られた画像の学習用情報から流用してもよい。たとえば、学習用自動撮影と手動撮影で、撮影対象である被写体は同じである可能性が高いので、手動撮影で得られた画像から生成された一般物体認識結果や顔検出結果などは、学習用自動撮影で得られた画像の学習用情報として流用可能である。これにより学習用情報生成の時間を短縮することができる。 The learning information of the image obtained by automatic learning for learning may be diverted from the learning information of the image obtained by manual photographing. For example, there is a high possibility that the subject to be photographed is the same in automatic learning for learning and manual photographing, so general object recognition results and face detection results generated from images obtained by manual photographing are for learning It can be diverted as learning information of an image obtained by automatic photographing. Thus, the time for generating the learning information can be shortened.

また、Ｓ９０７で判定対象とするユーザによる撮影指示には、前述した音声コマンドや装置１０１へのタップ操作、外部装置３０１，５０１からの撮影指示を含むようにしてもよい。 In addition, the shooting instruction from the user to be determined in S 907 may include the above-described voice command, the tap operation to the device 101, and the shooting instruction from the external devices 301 and 501.

また、学習用自動撮影自体はユーザが指示したものではないため、学習用自動撮影では、シャッター音の小さい電子シャッターによる撮影を行うことが望ましい。 In addition, since automatic learning for learning itself is not instructed by the user, it is desirable to perform photographing using an electronic shutter with a small shutter sound in automatic learning for learning.

また、学習用自動撮影は、ユーザの意図したタイミングとは別のタイミングで撮影が行われるため、ユーザの意図しない個人情報を保存してしまう可能性がある。これはプライバシーを考慮する上で問題になる可能性がある。そこで学習用自動撮影で得られた画像は保存せずに、この画像から生成された学習用情報のみを保存する構成にしてもよい。学習用情報はたとえばニューラルネットワークの入力層に当たるパラメータで、画像以外の形式となるため、プライバシー情報が特定されにくい。あるいは、学習用情報に個人認証ID等の人物に関連する情報は記録しないようにし、代わりに所定の規定値で置き換えるようにしてもよい。 Further, in the case of automatic learning for learning, since photographing is performed at a timing different from the timing intended by the user, there is a possibility that personal information not intended by the user may be stored. This can be an issue when considering privacy. Therefore, only the learning information generated from the image may be stored without storing the image obtained by the automatic learning for learning. The learning information is, for example, a parameter corresponding to the input layer of the neural network, and is in a form other than an image, so privacy information is difficult to identify. Alternatively, information related to a person such as a personal identification ID may not be recorded in the learning information, and instead may be replaced with a predetermined specified value.

またＳ９１４の学習用自動撮影を行うための条件は、教師データ数でなくてもよい。例えば、ニューラルネットワークの推定精度が高まったと判断できたらＳ９１４の判定はＮＯとしてもよい。推定精度が高まったかどうかは次の方法で検証する。学習用自動撮影により教師データが取得されたら、それをニューラルネットワークに入力して出力値を求める。その出力値と教師値の差が所定の値より小さければニューラルネットワークの精度が高まったと判断できる。つまり新たなデータを入力しても出力値が模範となる値と近くなったので精度が高くなったと判断できる。 Further, the condition for performing automatic learning for learning in step S 914 may not be the number of teacher data. For example, if it is determined that the estimation accuracy of the neural network has increased, the determination in S914 may be NO. It is verified by the following method whether the estimation accuracy has increased. When teacher data is acquired by automatic imaging for learning, it is input to a neural network to obtain an output value. If the difference between the output value and the training value is smaller than a predetermined value, it can be determined that the accuracy of the neural network is enhanced. That is, even if new data is input, the output value is close to the model value, so it can be determined that the accuracy is high.

また、この「ニューラルネットワーク出力値と教師値の差」を用いて、学習用自動撮影で取得された教師データのうち、教師データとして適さないものを外れ値として除去することも可能である。ニューラルネットワーク出力値と教師値の差が所定の値より大きい場合は、推定ができていないことになり、この教師データは過去に学習してきた教師データと性質が大きく異なる教師データと言える。この場合は、手動撮影直後にカメラが既にユーザによって動かされ、天空や地面など意図しない方向を向いている可能性が高く、外れ値として除去する。つまり、教師データとして登録しない。 Moreover, it is also possible to remove out of the teacher data acquired by automatic learning for learning as what is not suitable as teacher data as an outlier using this “difference between neural network output value and teacher value”. If the difference between the neural network output value and the teacher value is larger than a predetermined value, it means that estimation has not been made, and this teacher data can be said to be teacher data whose nature is significantly different from the teacher data learned in the past. In this case, the camera is already moved by the user immediately after the manual shooting, and it is highly likely that the camera is facing an unintended direction such as the sky or the ground, and is removed as an outlier. That is, it does not register as teacher data.

またニューラルネットワークを通さなくても学習用自動撮影画像の外れ値検証が可能である。ニューラルネットワークの入力層の特徴量を組み合わせた特徴量ベクトルにおいて学習用自動撮影画像と手動撮影画像の差が所定の値より大きければ外れ値として除去してもよい。 In addition, outlier verification of learning automatically captured images is possible without passing through a neural network. If the difference between the automatically captured image for learning and the manually captured image is larger than a predetermined value in a feature amount vector obtained by combining the feature amounts of the input layer of the neural network, it may be removed as an outlier.

これらの学習用自動撮影により教師データの増加が可能となる。これらの教師データは次の学習モードが実行されたときに学習に利用される。教師データが増えた分、ニューラルネットワークの推定精度の向上が望める。 These automatic learning for learning makes it possible to increase teacher data. These teacher data are used for learning when the next learning mode is executed. As the amount of teacher data increases, the estimation accuracy of the neural network can be improved.

（他の実施形態）
また本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現できる。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現できる。 (Other embodiments)
Furthermore, the present invention supplies a program that implements one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read the program. It can also be realized by the process to be executed. It can also be implemented by a circuit (eg, an ASIC) that implements one or more functions.

本発明は、デジタルカメラやデジタルビデオカメラの撮影に限らず、監視カメラ、Ｗｅｂカメラ、携帯電話などの撮影装置にも搭載できる。 The present invention is not limited to shooting with digital cameras and digital video cameras, and can also be installed in shooting devices such as surveillance cameras, web cameras, and mobile phones.

１０１：撮像装置、３０１：スマートデバイス、５０１：ウエアラブルデバイス、１０４：チルト回転ユニット、１０５：パン回転ユニット 101: imaging apparatus, 301: smart device, 501: wearable device, 104: tilt rotation unit, 105: pan rotation unit

Claims

Acquisition means for acquiring data related to a photographed image photographed by the photographing means;
Learning means for learning conditions of an image preferred by the user based on the teacher data acquired by the acquisition means;
A control unit that determines the automatic photographing by the photographing unit based on the conditions learned by the learning unit;
Registration means for registering, as the teacher data, data acquired by the acquisition means with respect to a learning image for photography continuously taken in response to a user's instruction.
An imaging apparatus comprising:

The image processing apparatus further includes a developing unit that develops the photographed image, and the developing unit develops the photographed image for learning under conditions different from those of the photographed image obtained from the photographing performed by the user's instruction. The imaging device according to claim 1.

The imaging apparatus according to claim 1, wherein the user's instruction is performed by inputting a predetermined voice command.

The imaging device according to any one of claims 1 to 3, wherein the instruction of the user is performed through an external device capable of communicating with the imaging device.

5. The teaching value of the photographed image for learning is determined based on the teaching value of a recorded image which is a photographed image obtained from photographing performed by the user's instruction. An imaging device according to claim 1.

The teaching value of the taken image for learning is set so that the difference between the teaching value of the recorded image and the taken image for learning becomes larger as the difference between the photographing conditions of the recorded image and the taken image for learning increases. The imaging device according to claim 5, wherein the imaging device is determined.

The teaching value of the photographed image for learning is set so that the difference between the teaching value of the recorded image and the photographed image for learning increases as the difference between the shutter timings of the recorded image and the photographed image for learning increases. The imaging device according to claim 5, wherein the imaging device is determined.

The imaging device according to any one of claims 1 to 7, wherein the registration unit registers at least a part of the photographed image for learning as negative teacher data.

The registration unit is characterized in that, among the photographed images for learning, an image having a degree of similarity lower than a threshold with a photographed image obtained from the photographing performed by the user's instruction is registered as negative teacher data. The imaging device according to claim 8.

The photographed image for learning is photographed by changing the photographing condition of the photographed image obtained from the photographing performed by the user's instruction and at least any of the focus, exposure, white balance, strobe light emission, and zoom. The imaging device according to any one of claims 1 to 9, which is a captured image.

The imaging device according to any one of claims 1 to 10, wherein the learning photographed image is generated from a moving image photographed immediately before or after photographing performed by the user's instruction. .

The imaging device according to any one of claims 1 to 10, wherein the learning photographed image is a continuously shot image photographed immediately before or after photographing performed according to the user's instruction. .

The imaging apparatus according to any one of claims 1 to 12, wherein the imaging unit captures the learning image by using an electronic shutter.

14. The control method according to any one of claims 1 to 13, wherein when the number of the teacher data is smaller than a predetermined number, the control means causes the imaging means to capture the captured image for learning. Imaging device.

The control means determines the automatic photographing by a neural network based on a condition learned by the learning means, and the teacher outputs a value obtained by inputting the photographed image for learning to the neural network and the teacher The imaging apparatus according to any one of claims 1 to 13, wherein whether or not to cause the imaging unit to capture the learning image is determined based on a difference in teacher values of data.

The control means is configured to input the learning image to the imaging means when a difference between a value output by inputting the photographed image for learning into the neural network and an output value of the teacher data becomes smaller than a predetermined value. The imaging device according to claim 15, wherein the imaging image is not captured.

The control means is configured to teach the photographed image for learning when a difference between a value output by inputting the photographed image for learning into the neural network and a teacher value of the teacher data is equal to or more than a predetermined value. The imaging device according to claim 15, wherein the imaging device is not used as data.

The determination of the automatic photographing is performed by a neural network based on the conditions learned by the learning means,
When the difference between the data of the photographed image obtained from the photographing performed by the user's instruction and the data of the photographed image for learning becomes a predetermined value or more in the vector combining the feature quantities of the input layer of the neural network The image pickup apparatus according to any one of claims 1 to 13, wherein the photographed image for learning is not used as teacher data.

An acquisition step of acquiring data related to a photographed image photographed by the photographing means;
A learning step of learning conditions of an image preferred by the user based on the teacher data acquired in the acquiring step;
A control step of performing automatic photographing determination by the photographing means based on the conditions learned in the learning step;
A registration step of registering, as the teacher data, data acquired in the acquisition step on a learning photographed image continuously photographed in accordance with a user's instruction;
And a control method of an imaging apparatus.

The program for making a computer perform each process of the control method of Claim 19.

A computer readable storage medium storing a program for causing a computer to execute each step of the control method according to claim 19.