JP2021111960A

JP2021111960A - Imaging apparatus, control method of the same, and program

Info

Publication number: JP2021111960A
Application number: JP2020150367A
Authority: JP
Inventors: 陽介高木; Yosuke Takagi
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2020-01-06
Filing date: 2020-09-08
Publication date: 2021-08-02

Abstract

To solve the problem in which there is a possibility of missing a shot since automatic shooting means provided by an imaging apparatus alone cannot take a picture at a timing desired by a user.SOLUTION: The imaging apparatus includes sound-collecting means for collecting sound, analysis means for analyzing the sound collected by the sound collecting means, automatic shooting means for automatically shooting, and setting means for setting a shooting frequency of the automatic shooting means. If a result of the analysis by the analysis means is a specific voice instruction, the imaging apparatus sets the shooting frequency higher by the setting means after the operation according to the instruction.SELECTED DRAWING: Figure 5

Description

本発明は、音声を用いて指示を受け付けることが可能な撮像装置に関する。 The present invention relates to an imaging device capable of receiving instructions using voice.

近年では、定期的に撮影を自動で繰り返すライフログカメラや、撮影状況をカメラ自身が判断して自動的に撮影を行う撮像装置が提案されている。これらの機器では、自動的に撮影することで、ユーザが意識せずとも、ユーザの欲するシーンの画像を撮影することを目的としている。 In recent years, a life log camera that automatically repeats shooting on a regular basis and an imaging device that automatically shoots by judging the shooting status by the camera itself have been proposed. The purpose of these devices is to automatically shoot an image of a scene desired by the user without the user being aware of it.

特開２０１９−１１０５２５号公報JP-A-2019-110525

特許文献１に記載されている機器では、被写体の顔を検出した情報を用いたり、過去の撮影枚数、目標とする撮影枚数などから、撮影のタイミングを判断して、自動的に撮影を行っている。 In the device described in Patent Document 1, the shooting timing is determined from the information obtained by detecting the face of the subject, the number of shots taken in the past, the target number of shots, and the like, and shooting is automatically performed. There is.

しかしながら、あくまでも自動であるため、ユーザの意思が反映されているとは限らない。ゆえに、それだけではユーザが望むタイミングで撮影を行うことができず、撮り逃しが発生する可能性がある。そこで本発明では、撮り逃しの発生を低減することを目的とする。 However, since it is automatic, it does not always reflect the user's intention. Therefore, it is not possible to take a picture at the timing desired by the user by itself, and there is a possibility that a picture may be missed. Therefore, an object of the present invention is to reduce the occurrence of missed shots.

上記目的を達成するために、本発明の撮像装置は、音声を集音する集音手段と、前記集音手段によって集音した音声を解析する解析手段と、自動的に撮影を行う自動撮影手段と、前記自動撮影手段の撮影頻度を設定する設定手段と、を有し、前記解析手段によって解析した結果、特定の音声指示だった場合には、指示に従った動作を行った後、前記設定手段によって撮影頻度をより高く設定することを特徴とする。 In order to achieve the above object, the imaging device of the present invention includes a sound collecting means for collecting sound, an analysis means for analyzing the sound collected by the sound collecting means, and an automatic photographing means for automatically taking a picture. And a setting means for setting the shooting frequency of the automatic shooting means, and if the result of analysis by the analysis means is a specific voice instruction, the setting is performed after performing an operation according to the instruction. It is characterized in that the shooting frequency is set higher by means.

本発明によれば、撮り逃しの発生を低減することができる。 According to the present invention, it is possible to reduce the occurrence of missed shots.

（ａ）撮像装置の外観の例を示すための図である。（ｂ）撮像装置の動作を説明するための図である。(A) It is a figure for demonstrating an example of the appearance of an image pickup apparatus. (B) It is a figure for demonstrating operation of an image pickup apparatus. 撮像装置の構成を示す図である。It is a figure which shows the structure of the image pickup apparatus. 撮像装置と外部機器との構成を示す図である。It is a figure which shows the structure of the image pickup apparatus and an external device. 外部機器の構成を示す図である。It is a figure which shows the structure of an external device. 自動撮影処理を説明するフローチャートである。It is a flowchart explaining the automatic shooting process. 音声認識処理を説明するフローチャートである。It is a flowchart explaining the voice recognition process. 頻度設定処理を説明するフローチャートである。It is a flowchart explaining the frequency setting process. 撮影画像内のエリア分割を説明するための図である。It is a figure for demonstrating the area division in a photographed image. 外部機器に表示される画面の一例を示す図である。It is a figure which shows an example of the screen displayed on an external device.

以下に、本発明を実施するための形態について、添付の図面を用いて詳細に説明する。 Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to the accompanying drawings.

尚、以下に説明する実施の形態は、本発明の実現手段としての一例であり、本発明が適用される装置の構成や各種条件によって適宜修正又は変更されてもよい。また、各実施の形態は適宜組み合わされることも可能である。 The embodiment described below is an example as a means for realizing the present invention, and may be appropriately modified or changed depending on the configuration of the device to which the present invention is applied and various conditions. Moreover, each embodiment can be combined as appropriate.

＜撮像装置の構成＞
図１は、第１の実施形態の撮像装置を模式的に示す図である。 <Configuration of imaging device>
FIG. 1 is a diagram schematically showing an imaging device according to the first embodiment.

図１（ａ）に示す撮像装置１０１は、電源スイッチの操作を行うことができる操作部材（以後、電源ボタンというが、タッチパネルへのタップやフリック、スワイプなどの操作でもよい）などが設けられている。撮像を行う撮影レンズ群や撮像素子を含む筐体である鏡筒１０２は、撮像装置１０１に取り付けられ、鏡筒１０２を固定部１０３に対して回転駆動できる回転機構を設けている。チルト回転ユニット１０４は、鏡筒１０２を図１（ｂ）に示すピッチ方向に回転できるモーター駆動機構であり、パン回転ユニット１０５は、鏡筒１０２をヨー方向に回転できるモーター駆動機構である。よって、鏡筒１０２は、１軸以上の方向に回転可能である。なお、図１（ｂ）は、固定部１０３位置での軸定義である。角速度計１０６と加速度計１０７はともに、撮像装置１０１の固定部１０３に実装されている。そして、角速度計１０６や加速度計１０７に基づいて、撮像装置１０１の振動を検出し、チルト回転ユニットとパン回転ユニットを検出した揺れ角度に基づいて回転駆動する。これにより、可動部である鏡筒１０２の振れを補正したり、傾きを補正したりする構成となっている。 The image pickup device 101 shown in FIG. 1A is provided with an operation member capable of operating the power switch (hereinafter, referred to as a power button, but may be an operation such as tapping, flicking, or swiping on the touch panel). There is. The lens barrel 102, which is a housing including a group of photographing lenses for imaging and an image sensor, is attached to the image pickup device 101 and is provided with a rotation mechanism capable of rotationally driving the lens barrel 102 with respect to the fixed portion 103. The tilt rotation unit 104 is a motor drive mechanism capable of rotating the lens barrel 102 in the pitch direction shown in FIG. 1 (b), and the pan rotation unit 105 is a motor drive mechanism capable of rotating the lens barrel 102 in the yaw direction. Therefore, the lens barrel 102 can rotate in one or more axes. Note that FIG. 1B is an axis definition at the fixed portion 103 position. Both the angular velocity meter 106 and the accelerometer 107 are mounted on the fixed portion 103 of the image pickup apparatus 101. Then, the vibration of the image pickup apparatus 101 is detected based on the angular velocity meter 106 and the accelerometer 107, and the tilt rotation unit and the pan rotation unit are rotationally driven based on the detected shaking angles. As a result, the lens barrel 102, which is a movable portion, is configured to correct the runout and the tilt.

図２は、本実施形態の撮像装置の構成を示すブロック図である。 FIG. 2 is a block diagram showing the configuration of the image pickup apparatus of the present embodiment.

図２において、第１制御部２２３は、プロセッサ（例えば、ＣＰＵ、ＧＰＵ、マイクロプロセッサ、ＭＰＵなど）、メモリ（例えば、ＤＲＡＭ、ＳＲＡＭなど）からなる。これらは、各種処理を実行して撮像装置１０１の各ブロックを制御したり、各ブロック間でのデータ転送を制御したりする。不揮発性メモリ（ＥＥＰＲＯＭ）２１６は、電気的に消去・記録可能なメモリであり、第１制御部２２３の動作用の定数、プログラム等が記憶される。 In FIG. 2, the first control unit 223 includes a processor (for example, CPU, GPU, microprocessor, MPU, etc.) and a memory (for example, DRAM, SRAM, etc.). These perform various processes to control each block of the image pickup apparatus 101, and control data transfer between each block. The non-volatile memory (EEPROM) 216 is a memory that can be electrically erased and recorded, and stores constants, programs, and the like for the operation of the first control unit 223.

図２において、ズームユニット２０１は、変倍を行うズームレンズを含む。ズーム駆動制御部２０２は、ズームユニット２０１を駆動制御する。フォーカスユニット２０３は、ピント調整を行うレンズを含む。フォーカス駆動制御部２０４は、フォーカスユニット２０３を駆動制御する。 In FIG. 2, the zoom unit 201 includes a zoom lens that performs scaling. The zoom drive control unit 202 drives and controls the zoom unit 201. The focus unit 203 includes a lens for adjusting the focus. The focus drive control unit 204 drives and controls the focus unit 203.

撮像部２０６では、撮像素子が各レンズ群を通して入射する光を受け、その光量に応じた電荷の情報をアナログ画像データとして画像処理部２０７に出力する。画像処理部２０７はＡ／Ｄ変換により出力されたデジタル画像データに対して、歪曲補正やホワイトバランス調整や色補間処理等の画像処理を適用し、適用後のデジタル画像データを出力する。画像処理部２０７から出力されたデジタル画像データは、画像記録部２０８でＪＰＥＧ形式等の記録用フォーマットに変換し、メモリ２１５や後述する映像出力部２１７に送信される。 In the image pickup unit 206, the image pickup element receives the light incident through each lens group, and outputs the electric charge information corresponding to the amount of the light to the image processing unit 207 as analog image data. The image processing unit 207 applies image processing such as distortion correction, white balance adjustment, and color interpolation processing to the digital image data output by the A / D conversion, and outputs the applied digital image data. The digital image data output from the image processing unit 207 is converted into a recording format such as a JPEG format by the image recording unit 208, and transmitted to the memory 215 and the video output unit 217 described later.

鏡筒回転駆動部２０５は、チルト回転ユニット１０４、パン回転ユニット１０５を駆動して鏡筒１０２をチルト方向とパン方向に駆動させる。 The lens barrel rotation drive unit 205 drives the tilt rotation unit 104 and the pan rotation unit 105 to drive the lens barrel 102 in the tilt direction and the pan direction.

装置揺れ検出部２０９は、例えば撮像装置１０１の３軸方向の角速度を検出する角速度計（ジャイロセンサ）１０６や、装置の３軸方向の加速度を検出する加速度計（加速度センサ）１０７が搭載される。装置揺れ検出部２０９は、検出された信号に基づいて、装置の回転角度や装置のシフト量などが演算される。 The device shake detection unit 209 is equipped with, for example, an angular velocity meter (gyro sensor) 106 that detects the angular velocity in the three-axis direction of the image pickup device 101, and an accelerometer (accelerometer) 107 that detects the acceleration in the three-axis direction of the device. .. The device shake detection unit 209 calculates the rotation angle of the device, the shift amount of the device, and the like based on the detected signal.

音声入力部２１３は、撮像装置１０１に設けられたマイクを用いて撮像装置１０１周辺から集音された音声信号を取得し、アナログデジタル変換をして音声処理部２１４に送信する。音声処理部２１４は、入力されたデジタル音声信号の適正化処理等の音声に関する処理を行う。そして、音声処理部２１４で処理された音声信号は、第１制御部２２３によりメモリ２１５に送信される。メモリ２１５は、画像処理部２０７、音声処理部２１４により得られた画像信号及び音声信号を一時的に記憶する。 The voice input unit 213 acquires a voice signal collected from the periphery of the image pickup device 101 by using a microphone provided in the image pickup device 101, performs analog-to-digital conversion, and transmits the voice signal to the voice processing unit 214. The voice processing unit 214 performs voice-related processing such as optimization processing of the input digital voice signal. Then, the voice signal processed by the voice processing unit 214 is transmitted to the memory 215 by the first control unit 223. The memory 215 temporarily stores the image signal and the audio signal obtained by the image processing unit 207 and the audio processing unit 214.

画像処理部２０７及び音声処理部２１４は、メモリ２１５に一時的に記憶された画像信号や音声信号を読み出して画像信号の符号化、音声信号の符号化などを行い、圧縮画像信号、圧縮音声信号を生成する。第１制御部２２３は、これらの圧縮画像信号、圧縮音声信号を、記録再生部２２０に送信する。 The image processing unit 207 and the audio processing unit 214 read out the image signal and the audio signal temporarily stored in the memory 215, encode the image signal, encode the audio signal, and the like, and perform the compressed image signal, the compressed audio signal, and the like. To generate. The first control unit 223 transmits these compressed image signals and compressed audio signals to the recording / reproducing unit 220.

記録再生部２２０は、記録媒体２２１に対して画像処理部２０７及び音声処理部２１４で生成された圧縮画像信号、圧縮音声信号、その他撮影に関する制御データ等を記録する。また、音声信号を圧縮符号化しない場合には、第１制御部２２３は、音声処理部２１４により生成された音声信号と画像処理部２０７により生成された圧縮画像信号とを、記録再生部２２０に送信し記録媒体２２１に記録させる。 The recording / reproducing unit 220 records the compressed image signal, the compressed audio signal, and other control data related to shooting on the recording medium 221 with the image processing unit 207 and the audio processing unit 214. When the audio signal is not compressed and encoded, the first control unit 223 transfers the audio signal generated by the audio processing unit 214 and the compressed image signal generated by the image processing unit 207 to the recording / playback unit 220. It is transmitted and recorded on the recording medium 221.

記録媒体２２１は、撮像装置１０１に内蔵された記録媒体でも、取外し可能な記録媒体でもよい。記録媒体２２１は、撮像装置１０１で生成した圧縮画像信号、圧縮音声信号、音声信号などの各種データを記録することができ、不揮発性メモリ２１６よりも大容量な媒体が一般的に使用される。例えば、記録媒体２２１は、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−Ｒ、ＤＶＤ−Ｒ、磁気テープ、不揮発性の半導体メモリ、フラッシュメモリ、などのあらゆる方式の記録媒体を含む。 The recording medium 221 may be a recording medium built in the image pickup apparatus 101 or a removable recording medium. The recording medium 221 can record various data such as a compressed image signal, a compressed audio signal, and an audio signal generated by the image pickup apparatus 101, and a medium having a capacity larger than that of the non-volatile memory 216 is generally used. For example, the recording medium 221 includes all types of recording media such as hard disks, optical disks, magneto-optical disks, CD-Rs, DVD-Rs, magnetic tapes, non-volatile semiconductor memories, and flash memories.

記録再生部２２０は、記録媒体２２１に記録された圧縮画像信号、圧縮音声信号、音声信号、各種データ、プログラムを読み出す（再生する）。そして読み出した圧縮画像信号、圧縮音声信号を、第１制御部２２３は画像処理部２０７及び音声処理部２１４に送信する。画像処理部２０７及び音声処理部２１４は、圧縮画像信号、圧縮音声信号を一時的にメモリ２１５に記憶させ、所定の手順で復号し、復号した信号を映像出力部２１７、音声出力部２１８に送信する。 The recording / reproducing unit 220 reads (reproduces) a compressed image signal, a compressed audio signal, an audio signal, various data, and a program recorded on the recording medium 221. Then, the first control unit 223 transmits the read compressed image signal and compressed audio signal to the image processing unit 207 and the audio processing unit 214. The image processing unit 207 and the audio processing unit 214 temporarily store the compressed image signal and the compressed audio signal in the memory 215, decode them according to a predetermined procedure, and transmit the decoded signals to the video output unit 217 and the audio output unit 218. do.

音声入力部２１３は複数のマイクが撮像装置１０１に搭載されており、音声処理部２１４は複数のマイクが設置された平面上の音の方向を検出することができ、後述する探索や自動撮影に用いられる。さらに、音声処理部２１４では、特定の音声コマンドを検出する。音声コマンドは事前に登録されたいくつかのコマンドの他、ユーザが特定音声を撮像装置に登録できる構成にしてもよい。また、音シーン認識も行う。音シーン認識では、予め大量の音声データを基に機械学習により学習させた学習済みモデルにより音シーン判定を行う。機械学習の具体的なアルゴリズムとしては、最近傍法、ナイーブベイズ法、決定木、サポートベクターマシンなどが挙げられる。また、ニューラルネットワークを利用して、学習するための特徴量、結合重み付け係数を自ら生成する深層学習（ディープラーニング）も挙げられる。適宜、上記アルゴリズムのうち利用できるものを用いて本実施形態に適用することができる。 The voice input unit 213 has a plurality of microphones mounted on the image pickup device 101, and the voice processing unit 214 can detect the direction of sound on a plane on which the plurality of microphones are installed, and is used for search and automatic shooting described later. Used. Further, the voice processing unit 214 detects a specific voice command. The voice command may be configured so that the user can register a specific voice in the image pickup apparatus in addition to some commands registered in advance. It also recognizes sound scenes. In sound scene recognition, sound scene determination is performed using a trained model trained by machine learning based on a large amount of voice data in advance. Specific algorithms for machine learning include the nearest neighbor method, the naive Bayes method, a decision tree, and a support vector machine. In addition, deep learning (deep learning) in which features and coupling weighting coefficients for learning are generated by themselves using a neural network can also be mentioned. As appropriate, any of the above algorithms that can be used can be applied to this embodiment.

本実施形態では、例えば、「歓声が上がっている」、「拍手している」、「声を発している」などの特定シーンを検出するためのニューラルネットワークが音声処理部２１４に設定されている。そして、特定音シーンや特定音声コマンドを検出すると、第１制御部２２３や第２制御部２１１に、検出トリガー信号を出力する構成になっている。 In the present embodiment, for example, a neural network for detecting a specific scene such as "cheering", "applause", or "speaking" is set in the voice processing unit 214. .. Then, when a specific sound scene or a specific voice command is detected, the detection trigger signal is output to the first control unit 223 and the second control unit 211.

すなわち、音声処理部２１４のニューラルネットワークは、あらかじめ「歓声が上がっている」、「拍手している」、「声を発している」シーンの音声情報を用意し、その音声情報を入力とし、検出トリガー信号を出力として学習する。 That is, the neural network of the voice processing unit 214 prepares the voice information of the scenes of "cheering", "clapping", and "speaking" in advance, and inputs and detects the voice information. Learn the trigger signal as an output.

撮像装置１０１のメインシステム全体を制御する第１制御部２２３とは別に設けられた、第２制御部２１１が第１制御部２２３の供給電源を制御する。 A second control unit 211, which is provided separately from the first control unit 223 that controls the entire main system of the image pickup apparatus 101, controls the power supply of the first control unit 223.

第１電源部２１０と第２電源部２１２は、第１制御部２２３と第２制御部２１１を動作させるための、電源をそれぞれ供給する。撮像装置１０１に設けられた電源ボタンの押下により、まず第１制御部２２３と第２制御部２１１の両方に電源が供給されるが、後述するように、第１制御部２２３は、第１電源部２１０へ自らの電源供給をＯＦＦするように制御される。第１制御部２２３が動作していない間も、第２制御部２１１は動作しており、装置揺れ検出部２０９や音声処理部２１４からの情報が入力される。第２制御部は各種入力情報を基にして、第１制御部２２３を起動するか否かの判定処理を行い、起動判定されると第１電源部に電源供給指示をする構成になっている。本実施形態では、電源部は電池から電力を供給する。すなわち、撮像装置１０１は携帯端末でもある。 The first power supply unit 210 and the second power supply unit 212 supply power for operating the first control unit 223 and the second control unit 211, respectively. By pressing the power button provided on the image pickup apparatus 101, power is first supplied to both the first control unit 223 and the second control unit 211. As will be described later, the first control unit 223 is the first power supply. It is controlled to turn off its own power supply to the unit 210. Even while the first control unit 223 is not operating, the second control unit 211 is operating, and information from the device shake detection unit 209 and the voice processing unit 214 is input. The second control unit is configured to perform a determination process of whether or not to start the first control unit 223 based on various input information, and when the activation is determined, a power supply instruction is given to the first power supply unit. .. In this embodiment, the power supply unit supplies electric power from the battery. That is, the image pickup device 101 is also a mobile terminal.

音声出力部２１８は、例えば撮影時などに撮像装置１０１に内蔵されたスピーカーから予め設定された音声パターンを出力する。 The audio output unit 218 outputs a preset audio pattern from a speaker built in the image pickup apparatus 101, for example, at the time of shooting.

ＬＥＤ制御部２２４は、例えば撮影時などに撮像装置１０１に設けられたＬＥＤを予め設定された点灯点滅パターンを制御する。 The LED control unit 224 controls a preset lighting / blinking pattern of the LED provided in the image pickup apparatus 101, for example, at the time of shooting.

映像出力部２１７は、例えば映像出力端子からなり、接続された外部ディスプレイ等に映像を表示させるために画像信号を送信する。また、音声出力部２１８、映像出力部２１７は、結合された１つの端子、例えばＨＤＭＩ（登録商標）（Ｈｉｇｈ−ＤｅｆｉｎｉｔｉｏｎＭｕｌｔｉｍｅｄｉａＩｎｔｅｒｆａｃｅ）端子のような端子であってもよい。 The video output unit 217 is composed of, for example, a video output terminal, and transmits an image signal in order to display the video on a connected external display or the like. Further, the audio output unit 218 and the video output unit 217 may be one combined terminal, for example, a terminal such as an HDMI (registered trademark) (High-Definition Multimedia Interface) terminal.

通信部２２２は、撮像装置１０１と外部装置との間で通信を行うもので、例えば、音声信号、画像信号、圧縮音声信号、圧縮画像信号などのデータを送信したり受信したりする。また、撮影開始や終了コマンド、パン、チルト、ズーム駆動など、撮影にかかわる制御信号を受信して、撮像装置１０１と相互通信可能な外部機器の指示から撮像装置１０１を駆動する。また、撮像装置１０１と外部装置との間で、後述する学習処理部２１９で処理される学習にかかわる各種パラメータなどの情報を送信したり受信したりする。通信部２２２は、例えば、赤外線通信モジュール、Ｂｌｕｅｔｏｏｔｈ（登録商標）通信モジュール、無線ＬＡＮ通信モジュール、ＷｉｒｅｌｅｓｓＵＳＢ、ＧＰＳ受信機等の無線通信モジュールである。 The communication unit 222 communicates between the image pickup device 101 and the external device, and transmits or receives data such as an audio signal, an image signal, a compressed audio signal, and a compressed image signal, for example. In addition, it receives control signals related to shooting such as shooting start and end commands, pan, tilt, and zoom drive, and drives the imaging device 101 from instructions of an external device capable of intercommunication with the imaging device 101. In addition, information such as various parameters related to learning processed by the learning processing unit 219, which will be described later, is transmitted and received between the image pickup device 101 and the external device. The communication unit 222 is, for example, a wireless communication module such as an infrared communication module, a Bluetooth (registered trademark) communication module, a wireless LAN communication module, a WirelessUSB, and a GPS receiver.

＜外部通信機器とのシステム構成＞
図３は、撮像装置１０１と外部装置３０１との無線通信システムの構成例を示す図である。撮像装置１０１は撮影機能を有するデジタルカメラであり、外部装置３０１はＢｌｕｅｔｏｏｔｈ通信モジュール、無線ＬＡＮ通信モジュールを含むスマートデバイスである。 <System configuration with external communication equipment>
FIG. 3 is a diagram showing a configuration example of a wireless communication system between the image pickup device 101 and the external device 301. The image pickup device 101 is a digital camera having a photographing function, and the external device 301 is a smart device including a Bluetooth communication module and a wireless LAN communication module.

撮像装置１０１とスマートデバイス３０１は、例えばＩＥＥＥ８０２．１１規格シリーズに準拠した無線ＬＡＮによる通信３０２と、例えばＢｌｕｅｔｏｏｔｈＬｏｗＥｎｅｒｇｙ（以下、「ＢＬＥ」と呼ぶ。）ＢＬＥなどの、制御局と従属局などの主従関係を有する通信３０３とによって通信可能である。なお、無線ＬＡＮ及びＢＬＥは通信手法の一例であり、各通信装置は、２つ以上の通信機能を有し、例えば制御局と従属局との関係の中で通信を行う一方の通信機能によって、他方の通信機能の制御を行うことが可能であれば、他の通信手法が用いられてもよい。ただし、一般性を失うことなく、無線ＬＡＮなどの第１の通信は、ＢＬＥなどの第２の通信より高速な通信が可能であり、また、第２の通信は、第１の通信よりも消費電力が少ないか通信可能距離が短いかの少なくともいずれかであるものとする。 The image pickup device 101 and the smart device 301 include, for example, a communication 302 by a wireless LAN compliant with the IEEE802.11 standard series, and a control station and a subordinate station such as Bluetooth Low Energy (hereinafter referred to as "BLE") BLE. Communication is possible with the communication 303 having a master-slave relationship. The wireless LAN and BLE are examples of communication methods, and each communication device has two or more communication functions, for example, by one communication function that communicates in a relationship between a control station and a subordinate station. If it is possible to control the other communication function, another communication method may be used. However, without losing generality, the first communication such as wireless LAN can perform higher-speed communication than the second communication such as BLE, and the second communication consumes more than the first communication. It shall be at least one of low power consumption and short communication range.

＜外部通信機器の構成＞
外部通信機器の一例としてのスマートデバイス３０１の構成を、図４を用いて説明する。スマートデバイス３０１は、いわゆる携帯電話、すなわち携帯端末である。 <Configuration of external communication equipment>
The configuration of the smart device 301 as an example of the external communication device will be described with reference to FIG. The smart device 301 is a so-called mobile phone, that is, a mobile terminal.

スマートデバイス３０１は、例えば、無線ＬＡＮ用の無線ＬＡＮ制御部４０１、及び、ＢＬＥ用のＢＬＥ制御部４０２に加え、公衆無線通信用の公衆回線制御部４０６を有する。また、スマートデバイス３０１は、パケット送受信部４０３をさらに有する。無線ＬＡＮ制御部４０１は、無線ＬＡＮのＲＦ制御、通信処理、ＩＥＥＥ８０２．１１規格シリーズに準拠した無線ＬＡＮによる通信の各種制御を行うドライバや無線ＬＡＮによる通信に関するプロトコル処理を行う。ＢＬＥ制御部４０２は、ＢＬＥのＲＦ制御、通信処理、ＢＬＥによる通信の各種制御を行うドライバやＢＬＥによる通信に関するプロトコル処理を行う。公衆回線制御部４０６は、公衆無線通信のＲＦ制御、通信処理、公衆無線通信の各種制御を行うドライバや公衆無線通信関連のプロトコル処理を行う。公衆無線通信は例えばＩＭＴ（ＩｎｔｅｒｎａｔｉｏｎａｌＭｕｌｔｉｍｅｄｉａＴｅｌｅｃｏｍｍｕｎｉｃａｔｉｏｎｓ）規格やＬＴＥ（ＬｏｎｇＴｅｒｍＥｖｏｌｕｔｉｏｎ）規格などに準拠したものである。パケット送受信部４０３は、無線ＬＡＮ並びにＢＬＥによる通信及び公衆無線通信に関するパケットの送信と受信との少なくともいずれかを実行するための処理を行う。なお、本例では、スマートデバイス３０１は、通信においてパケットの送信と受信との少なくともいずれかを行うものとして説明するが、パケット交換以外に、例えば回線交換など、他の通信形式が用いられてもよい。 The smart device 301 has, for example, a wireless LAN control unit 401 for wireless LAN, a BLE control unit 402 for BLE, and a public line control unit 406 for public wireless communication. In addition, the smart device 301 further includes a packet transmission / reception unit 403. The wireless LAN control unit 401 performs RF control of the wireless LAN, communication processing, a driver that performs various controls of communication by the wireless LAN conforming to the IEEE802.11 standard series, and protocol processing related to the communication by the wireless LAN. The BLE control unit 402 performs a driver that performs RF control of BLE, communication processing, various controls of communication by BLE, and protocol processing related to communication by BLE. The public line control unit 406 performs a driver for performing RF control of public wireless communication, communication processing, various controls of public wireless communication, and protocol processing related to public wireless communication. Public wireless communication conforms to, for example, IMT (International Multimedia Telecommunication) standards and LTE (Long Term Evolution) standards. The packet transmission / reception unit 403 performs processing for executing at least one of transmission and reception of packets related to communication by wireless LAN and BLE and public wireless communication. In this example, the smart device 301 is described as performing at least one of transmission and reception of packets in communication, but other communication formats such as circuit switching may be used in addition to packet switching. good.

スマートデバイス３０１は、例えば、制御部４１１、記憶部４０４、ＧＰＳ受信部４０５、表示部４０７、操作部４０８、音声入力音声処理部４０９、電源部４１０をさらに有する。制御部４１１は、例えば、記憶部４０４に記憶される制御プログラムを実行することにより、スマートデバイス３０１全体を制御する。記憶部４０４は、例えば制御部４１１が実行する制御プログラムと、通信に必要なパラメータ等の各種情報とを記憶する。後述する各種動作は、記憶部４０４に記憶された制御プログラムを制御部４１１が実行することにより、実現される。 The smart device 301 further includes, for example, a control unit 411, a storage unit 404, a GPS receiving unit 405, a display unit 407, an operation unit 408, a voice input voice processing unit 409, and a power supply unit 410. The control unit 411 controls the entire smart device 301, for example, by executing a control program stored in the storage unit 404. The storage unit 404 stores, for example, a control program executed by the control unit 411 and various information such as parameters required for communication. Various operations described later are realized by the control unit 411 executing the control program stored in the storage unit 404.

電源部４１０はスマートデバイス３０１に電源を供給する。表示部４０７は、例えば、ＬＣＤやＬＥＤのように視覚で認知可能な情報の出力、又はスピーカー等の音出力が可能な機能を有し、各種情報の表示を行う。操作部４０８は、例えばユーザによるスマートデバイス３０１の操作を受け付けるボタン等である。なお、表示部４０７及び操作部４０８は、例えばタッチパネルなどの共通する部材によって構成されてもよい。 The power supply unit 410 supplies power to the smart device 301. The display unit 407 has a function capable of outputting visually recognizable information such as an LCD or LED, or sound output of a speaker or the like, and displays various information. The operation unit 408 is, for example, a button or the like that accepts an operation of the smart device 301 by a user. The display unit 407 and the operation unit 408 may be composed of a common member such as a touch panel.

音声入力音声処理部４０９は、例えばスマートデバイス３０１に内蔵された汎用的なマイクから、ユーザが発した音声を取得し、音声認識処理により、ユーザの操作命令を取得する構成にしてもよい。 The voice input voice processing unit 409 may be configured to acquire the voice emitted by the user from, for example, a general-purpose microphone built in the smart device 301, and acquire the user's operation command by voice recognition processing.

また、スマートデバイス内の専用のアプリケーションを介して、ユーザの発音により音声コマンドを取得する。そして、無線ＬＡＮによる通信３０２を介して、撮像装置１０１の音声処理部２１４に特定音声コマンド認識させるための特定音声コマンドとして登録することもできる。 In addition, voice commands are acquired by the user's pronunciation via a dedicated application in the smart device. Then, it can be registered as a specific voice command for causing the voice processing unit 214 of the image pickup apparatus 101 to recognize the specific voice command via the communication 302 by the wireless LAN.

ＧＰＳ（Ｇｌｏｂａｌｐｏｓｉｔｉｏｎｉｎｇｓｙｓｔｅｍ）４０５は、衛星から通知されるＧＰＳ信号を受信し、ＧＰＳ信号を解析し、スマートデバイス３０１の現在位置（経度・緯度情報）を推定する。もしくは、位置推定は、ＷＰＳ（Ｗｉ−ＦｉＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ）等を利用して、周囲に存在する無線ネットワークの情報に基づいて、スマートデバイス３０１の現在位置を推定するようにしてもよい。取得した現在のＧＰＳ位置情報が予め事前に設定されている位置範囲（所定半径の範囲以内）に位置している場合に、ＢＬＥ制御部４０２を介して撮像装置１０１へ移動情報を通知し、後述する自動撮影や自動編集のためのパラメータとして使用する。また、ＧＰＳ位置情報に所定以上の位置変化があった場合に、ＢＬＥ制御部４０２を介して撮像装置１０１へ移動情報を通知し、後述する自動撮影や自動編集のためのパラメータとして使用する。 The GPS (Global Positioning System) 405 receives a GPS signal notified from a satellite, analyzes the GPS signal, and estimates the current position (longitude / latitude information) of the smart device 301. Alternatively, the position may be estimated by using WPS (Wi-Fi Positioning System) or the like to estimate the current position of the smart device 301 based on the information of the wireless network existing in the surrounding area. When the acquired current GPS position information is located within a preset position range (within a predetermined radius range), the movement information is notified to the image pickup apparatus 101 via the BLE control unit 402, which will be described later. Used as a parameter for automatic shooting and automatic editing. Further, when the GPS position information has a position change of a predetermined value or more, the movement information is notified to the image pickup apparatus 101 via the BLE control unit 402, and is used as a parameter for automatic shooting or automatic editing described later.

上記のように撮像装置１０１とスマートデバイス３０１は、無線ＬＡＮ制御部４０１、及び、ＢＬＥ制御部４０２を用いた通信により、撮像装置１０１とデータのやりとりを行う。例えば、音声信号、画像信号、圧縮音声信号、圧縮画像信号などのデータを送信したり受信したりする。また、スマートデバイスから撮像装置１０１の撮影などの操作指示であったり、音声コマンド登録データ送信や、ＧＰＳ位置情報に基づいた所定位置検出通知や場所移動通知を行う。また、スマートデバイス内の専用のアプリケーションを介しての学習用データの送受信も行う。 As described above, the image pickup device 101 and the smart device 301 exchange data with the image pickup device 101 by communication using the wireless LAN control unit 401 and the BLE control unit 402. For example, it transmits or receives data such as an audio signal, an image signal, a compressed audio signal, and a compressed image signal. In addition, the smart device issues an operation instruction such as shooting of the imaging device 101, transmits voice command registration data, and performs predetermined position detection notification and location movement notification based on GPS position information. It also sends and receives learning data via a dedicated application in the smart device.

＜撮像動作のシーケンス＞
図５は、本実施形態における撮像装置１０１の自動撮影処理のフローチャートである。 <Sequence of imaging operation>
FIG. 5 is a flowchart of the automatic photographing process of the image pickup apparatus 101 according to the present embodiment.

ユーザが撮像装置１０１に設けられた電源ボタンを操作すると、本フローチャートの処理が開始する。なお、本実施形態においては、常に撮像装置１０１とスマートデバイス３０１が無線通信による接続が確立され、スマートデバイス３０１上の専用アプリケーションから各種の操作が可能な状態とする。また、以下のフローチャートの各ステップの処理は、第１制御部２２３が撮像装置１０１の各部を制御することによって実現される。 When the user operates the power button provided on the image pickup apparatus 101, the processing of this flowchart starts. In the present embodiment, the image pickup device 101 and the smart device 301 are always connected by wireless communication, and various operations can be performed from the dedicated application on the smart device 301. Further, the processing of each step in the following flowchart is realized by the first control unit 223 controlling each unit of the image pickup apparatus 101.

Ｓ５０１では、第１制御部２２３は、自動撮影停止中の状態かどうかを判別する。自動撮影の停止に関しては後述する音声認識処理のフローチャートにて説明する。自動撮影停止中であった場合には、何も行わずに待機し、自動撮影の停止が解除されるまで待つ。自動撮影が停止中でなかった場合には、Ｓ５０２へ進み、画像認識処理を行う。 In S501, the first control unit 223 determines whether or not the automatic shooting is stopped. The stop of automatic shooting will be described in the flow chart of the voice recognition process described later. If the automatic shooting is stopped, it waits without doing anything and waits until the automatic shooting stop is released. If the automatic shooting is not stopped, the process proceeds to S502 and image recognition processing is performed.

Ｓ５０２では、第１制御部２２３は、画像処理部２０７に撮像部２０６で取り込まれた信号を画像処理させ、被写体認識用の画像を生成させる。 In S502, the first control unit 223 causes the image processing unit 207 to perform image processing on the signal captured by the imaging unit 206 to generate an image for subject recognition.

生成された画像からは、人物や物体認識などの被写体認識が行われる。 Subject recognition such as person or object recognition is performed from the generated image.

人物を認識する場合、被写体の顔や人体を検出する。顔検出処理では、人物の顔を判断するためのパターンが予め定められており、撮像された画像内に含まれる該パターンに一致する箇所を人物の顔画像として検出することができる。 When recognizing a person, the face or human body of the subject is detected. In the face detection process, a pattern for determining the face of a person is predetermined, and a portion matching the pattern included in the captured image can be detected as a face image of the person.

また、被写体の顔としての確からしさを示す信頼度も同時に算出し、信頼度は、例えば画像内における顔領域の大きさや、顔パターンとの一致度等から算出される。 At the same time, the reliability indicating the certainty of the subject's face is also calculated, and the reliability is calculated from, for example, the size of the face region in the image, the degree of coincidence with the face pattern, and the like.

物体認識についても同様に、予め登録されたパターンに一致する物体を認識することができる。 Similarly, for object recognition, it is possible to recognize an object that matches a pre-registered pattern.

また、撮像された画像内の色相や彩度等のヒストグラムを使用する方法で特徴被写体を抽出する方法などもある。この場合、撮影画角内に捉えられている被写体の画像に関し、その色相や彩度等のヒストグラムから導出される分布を複数の区間に分け、区間ごとに撮像された画像を分類する処理が実行される。 There is also a method of extracting a feature subject by using a histogram of hue, saturation, etc. in the captured image. In this case, regarding the image of the subject captured within the shooting angle of view, the process of dividing the distribution derived from the histogram of the hue, saturation, etc. into a plurality of sections and classifying the captured images for each section is executed. Will be done.

例えば、撮像された画像について複数の色成分のヒストグラムが作成され、その山型の分布範囲で区分けし、同一の区間の組み合わせに属する領域にて撮像された画像が分類され、被写体の画像領域が認識される。 For example, a histogram of a plurality of color components is created for the captured image, the image is divided by the mountain-shaped distribution range, the captured image is classified in the area belonging to the combination of the same sections, and the image area of the subject is divided. Be recognized.

認識された被写体の画像領域ごとに評価値を算出することで、当該評価値が最も高い被写体の画像領域を主被写体領域として判定することができる。 By calculating the evaluation value for each image area of the recognized subject, the image area of the subject having the highest evaluation value can be determined as the main subject area.

以上の方法で、撮像情報から各被写体情報を得ることができる。 By the above method, each subject information can be obtained from the imaging information.

Ｓ５０３では、第１制御部２２３は、像揺れ補正量の算出を行う。具体的には、まず、装置揺れ検出部２０９において取得した角速度および加速度情報に基づいて撮像装置の絶対角度の算出を行う。そして、絶対角度を打ち消す角度方向にチルト回転ユニット１０４およびパン回転ユニット１０５を動かす防振角度を求め、像揺れ補正量とする。なお、ここでの像揺れ補正量算出処理は、後述する学習処理によって、演算方法を変更することができる。 In S503, the first control unit 223 calculates the image shake correction amount. Specifically, first, the absolute angle of the image pickup device is calculated based on the angular velocity and acceleration information acquired by the device shake detection unit 209. Then, the vibration isolation angle for moving the tilt rotation unit 104 and the pan rotation unit 105 in the angle direction that cancels the absolute angle is obtained, and is used as the image shake correction amount. The calculation method of the image shake correction amount calculation process here can be changed by a learning process described later.

Ｓ５０４では、第１制御部２２３は、撮像装置の状態判定を行う。角速度情報や加速度情報やＧＰＳ位置情報などで検出した角度や移動量などにより、現在、撮像装置がどのような振動／動き状態なのかを判定する。 In S504, the first control unit 223 determines the state of the image pickup apparatus. Based on the angle and the amount of movement detected by the angular velocity information, the acceleration information, the GPS position information, etc., it is determined what kind of vibration / movement state the image pickup apparatus is currently in.

例えば、車に撮像装置１０１を装着して撮影する場合、移動された距離によって大きく周りの風景などの被写体情報が変化する。 For example, when the image pickup device 101 is attached to a car for shooting, the subject information such as the surrounding landscape changes greatly depending on the distance traveled.

そのため、車などに装着して速い速度で移動している「乗り物移動状態」か否かを判定し、後に説明する自動被写体探索に使用することができる。 Therefore, it can be used for automatic subject search, which will be described later, by determining whether or not the vehicle is in a "vehicle moving state" in which the vehicle is mounted on a car or the like and is moving at a high speed.

また、角度の変化が大きいか否かを判定し、撮像装置１０１が揺れ角度がほとんどない「置き撮り状態」であるのかを判定する。 Further, it is determined whether or not the change in the angle is large, and it is determined whether or not the image pickup apparatus 101 is in the “placed shooting state” where there is almost no shaking angle.

「置き撮り状態」である場合は、撮像装置１０１自体の角度変化はないと考えてよいので、置き撮り用の被写体探索を行うことができる。 In the "place-shooting state", it can be considered that the angle of the image pickup apparatus 101 itself does not change, so that the subject search for the stand-alone shooting can be performed.

また、比較的、角度変化が大きい場合は、「手持ち状態」と判定され、手持ち用の被写体探索を行うことができる。 Further, when the angle change is relatively large, it is determined to be in the "handheld state", and the subject can be searched for for handheld use.

Ｓ５０５では、第１制御部２２３は、被写体探索処理を行う。被写体探索は、以下の処理によって構成される。 In S505, the first control unit 223 performs the subject search process. The subject search is composed of the following processes.

（１）エリア分割
図８を用いて、エリア分割を説明する。図８（ａ）のように撮像装置（原点Ｏが撮像装置位置とする。）位置を中心として、全周囲でエリア分割を行う。図８（ａ）の例においては、チルト方向、パン方向それぞれ２２．５度で分割している。図８（ａ）のように分割すると、チルト方向の角度が０度から離れるにつれて、水平方向の円周が小さくなり、エリア領域が小さくなる。そこで、本実施形態の撮像装置は、図８（ｂ）のように、チルト角度が４５度以上の場合、水平方向のエリア範囲は２２．５度よりも大きく設定している。図８（ｃ）、（ｄ）に撮影画角内でのエリア分割された例を示す。軸１３０１は初期化時の撮像装置１０１の方向であり、この方向角度を基準位置としてエリア分割が行われる。１３０２は、撮像されている画像の画角エリアを示しており、そのときの画像例を図８（ｄ）に示す。画角に写し出されている画像内ではエリア分割に基づいて、図８（ｄ）の１３０３〜１３１８のように画像分割される。 (1) Area division The area division will be described with reference to FIG. As shown in FIG. 8A, the area is divided around the entire circumference centering on the position of the image pickup device (origin O is the position of the image pickup device). In the example of FIG. 8A, the tilt direction and the pan direction are each divided by 22.5 degrees. When divided as shown in FIG. 8A, the circumference in the horizontal direction becomes smaller and the area area becomes smaller as the angle in the tilt direction deviates from 0 degrees. Therefore, in the imaging device of the present embodiment, as shown in FIG. 8B, when the tilt angle is 45 degrees or more, the area range in the horizontal direction is set to be larger than 22.5 degrees. 8 (c) and 8 (d) show an example in which the area is divided within the shooting angle of view. The axis 1301 is the direction of the image pickup apparatus 101 at the time of initialization, and the area division is performed with this direction angle as a reference position. Reference numeral 1302 indicates an angle of view area of the captured image, and an example of the image at that time is shown in FIG. 8 (d). In the image projected at the angle of view, the image is divided as shown in 1303 to 1318 in FIG. 8D based on the area division.

（２）エリア毎の重要度レベルの算出
前記のように分割した各エリアについて、エリア内に存在する被写体やエリアのシーン状況に応じて、探索を行う優先順位を示す重要度レベルを算出する。被写体の状況に基づいた重要度レベルは、例えば、エリア内に存在する人物の数、人物の顔の大きさ、顔向き、顔検出の確からしさ、人物の表情、人物の個人認証結果に基づいて算出する。また、シーンの状況に応じた重要度レベルは、例えば、一般物体認識結果、シーン判別結果（青空、逆光、夕景など）、エリアの方向からする音のレベルや音声認識結果、エリア内の動き検知情報等である。また、撮像装置の状態判定（Ｓ５０４）で、撮像装置の振動状態が検出されており、振動状態に応じても重要度レベルが変化するようにもすることができる。例えば、「置き撮り状態」と判定された場合、顔認証で登録されている中で優先度の高い被写体（例えば撮像装置のユーザである）を中心に被写体探索が行われるように、特定人物の顔認証を検出すると重要度レベルが高くなるように判定される。また、後述する自動撮影も上記顔を優先して行われることになり、撮像装置のユーザが撮像装置を身に着けて持ち歩き撮影を行っている時間が多くても、撮像装置を取り外して机の上などに置くことで、ユーザが写った画像も多く残すことができる。このときパン・チルトにより探索可能であることから、撮像装置の置き角度などを考えなくても、適当に設置するだけでユーザが写った画像やたくさんの顔が写った集合写真などを残すことができる。なお、上記条件だけでは、各エリアに変化がない限りは、最も重要度レベルが高いエリアが同じとなり、その結果探索されるエリアがずっと変わらないことになってしまう。そこで、過去の撮影情報に応じて重要度レベルを変化させる。具体的には、所定時間継続して探索エリアに指定され続けたエリアは重要度レベルを下げたり、後述するＳ５１３にて撮影を行ったエリアでは、所定時間の間重要度レベルを下げたりしてもよい。 (2) Calculation of importance level for each area For each area divided as described above, the importance level indicating the priority of searching is calculated according to the subject existing in the area and the scene situation of the area. The importance level based on the subject's situation is based on, for example, the number of people present in the area, the size of the person's face, the face orientation, the certainty of face detection, the facial expression of the person, and the personal authentication result of the person. calculate. In addition, the importance level according to the situation of the scene is, for example, general object recognition result, scene discrimination result (blue sky, backlight, evening scene, etc.), sound level and voice recognition result from the direction of the area, motion detection in the area. Information etc. Further, the vibration state of the image pickup device is detected by the state determination (S504) of the image pickup device, and the importance level can be changed according to the vibration state. For example, when it is determined that the subject is in the "placed shooting state", the subject is searched for a subject having a high priority (for example, a user of an imaging device) registered by face recognition, so that the subject is searched for a specific person. When face recognition is detected, it is determined that the importance level is high. In addition, the automatic shooting described later will also be performed with priority given to the above-mentioned face, and even if the user of the image pickup device wears the image pickup device and takes a lot of time to carry around and take a picture, the image pickup device is removed and the desk is used. By placing it on the top, you can leave many images of the user. At this time, since it is possible to search by pan / tilt, it is possible to leave an image of the user or a group photo of many faces just by installing it properly without considering the placement angle of the image pickup device. can. Under the above conditions alone, as long as there is no change in each area, the area with the highest importance level will be the same, and as a result, the area to be searched will not change forever. Therefore, the importance level is changed according to the past shooting information. Specifically, the importance level is lowered for the area that has been continuously designated as the search area for a predetermined time, and the importance level is lowered for the area that was photographed in S513, which will be described later. May be good.

（３）探索対象エリアの決定
前記のように各エリアの重要度レベルが算出されたら、重要度レベルが高いエリアを探索対象エリアとして決定する。そして、探索対象エリアを画角に捉えるために必要なパン・チルト探索目標角度を算出する。 (3) Determining the search target area After the importance level of each area is calculated as described above, the area with the high importance level is determined as the search target area. Then, the pan / tilt search target angle required to capture the search target area at the angle of view is calculated.

Ｓ５０６では、第１制御部２２３は、パン・チルト駆動を行う。具体的には、像振れ補正量とパン・チルト探索目標角度に基づいた制御サンプリングでの駆動角度を加算することで、パン・チルト駆動量を算出し、鏡筒回転駆動部２０５によって、チルト回転ユニット１０４、パン回転ユニット１０５をそれぞれ駆動制御する。 In S506, the first control unit 223 performs pan / tilt drive. Specifically, the pan / tilt drive amount is calculated by adding the image shake correction amount and the drive angle in the control sampling based on the pan / tilt search target angle, and the lens barrel rotation drive unit 205 calculates the tilt rotation. The unit 104 and the pan rotation unit 105 are driven and controlled, respectively.

Ｓ５０７では第１制御部２２３は、ズームユニット２０１を制御しズーム駆動を行う。具体的には、Ｓ５０５で決定した探索対象被写体の状態に応じてズームを駆動させる。例えば、探索対象被写体が人物の顔であるとき、画像上の顔が小さすぎると検出可能な最小サイズを下回ることで検出ができず、見失ってしまう恐れがある。そのような場合は、望遠側にズームすることで画像上の顔のサイズが大きくなるように制御する。一方で、画像上の顔が大きすぎる場合、被写体や撮像装置自体の動きによって被写体が画角から外れやすくなってしまう。そのような場合は、広角側にズームすることで、画面上の顔のサイズが小さくなるように制御する。このようにズーム制御を行うことで、被写体を追跡するのに適した状態を保つことができる。 In S507, the first control unit 223 controls the zoom unit 201 to drive the zoom. Specifically, the zoom is driven according to the state of the search target subject determined in S505. For example, when the subject to be searched is the face of a person, if the face on the image is too small, it may not be detected because it is smaller than the minimum detectable size, and the face may be lost. In such a case, the size of the face on the image is controlled to be increased by zooming to the telephoto side. On the other hand, if the face on the image is too large, the subject tends to deviate from the angle of view due to the movement of the subject or the imaging device itself. In such a case, zooming to the wide-angle side controls the size of the face on the screen to be smaller. By performing the zoom control in this way, it is possible to maintain a state suitable for tracking the subject.

Ｓ５０５乃至Ｓ５０７では、パン・チルトやズーム駆動により被写体探索を行う方法を説明したが、広角なレンズを複数使用して全方位を一度に撮影する撮像システムで被写体探索を行ってもよい。全方位カメラの場合、撮像によって得られる信号すべてを入力画像として、被写体検出などの画像処理を行うと膨大な処理が必要となる。そこで、画像の一部を切り出して、切り出した画像範囲の中で被写体の探索処理を行う構成にする。上述した方法と同様にエリア毎の重要レベルを算出し、重要レベルに基づいて切り出し位置を変更し、後述する自動撮影の判定を行う。これにより画像処理による消費電力の低減や高速な被写体探索が可能となる。 In S505 to S507, a method of searching for a subject by pan / tilt or zoom drive has been described, but the subject search may be performed by an imaging system that uses a plurality of wide-angle lenses to shoot in all directions at once. In the case of an omnidirectional camera, enormous processing is required when performing image processing such as subject detection using all the signals obtained by imaging as input images. Therefore, a part of the image is cut out, and the subject search process is performed within the cut out image range. The important level for each area is calculated in the same manner as the above-mentioned method, the cutting position is changed based on the important level, and the automatic shooting determination described later is performed. This makes it possible to reduce power consumption by image processing and search for a subject at high speed.

Ｓ５０８では、第１制御部２２３は、頻度パラメータの読み込みを行う。頻度パラメータとは、自動撮影のされ易さを示す設定値である。スマートデバイス３０１の専用アプリケーションを介して、「低」「中」「高」といった選択肢の中からユーザが任意の頻度に設定が可能である。頻度を「高」に設定した場合には、「低」に設定した場合に比べて、所定時間あたりに多くの枚数が撮影されるようになる。「中」の設定は「低」と「高」の設定の間の枚数が撮影される。また、後述の頻度設定処理によって、自動的に変更され得る。 In S508, the first control unit 223 reads the frequency parameter. The frequency parameter is a set value indicating the ease of automatic shooting. The user can set any frequency from the options such as "low", "medium", and "high" via the dedicated application of the smart device 301. When the frequency is set to "high", a larger number of images are taken per predetermined time than when the frequency is set to "low". The "Medium" setting captures the number of shots between the "Low" and "High" settings. In addition, it can be automatically changed by the frequency setting process described later.

Ｓ５０９では、第１制御部２２３は、読み込んだ頻度パラメータが所定の値であるかを判定する。例えば、自動撮影を行う頻度として「最高」が設定されている場合には、Ｓ５１０へ進み、そうでない場合にはＳ５１２へ進む。なお、頻度が「最高」という設定は後述の頻度設定処理により自動的に変更された設定であり、スマートデバイス３０１の専用アプリケーションを用いた通常のユーザによる頻度の設定では、上記の通り「低」「中」「高」の選択肢から設定される。すなわちユーザ操作による設定では頻度「最高」には設定されない。 In S509, the first control unit 223 determines whether the read frequency parameter has a predetermined value. For example, if "highest" is set as the frequency for performing automatic shooting, the process proceeds to S510, and if not, the process proceeds to S512. The setting that the frequency is "highest" is a setting that is automatically changed by the frequency setting process described later, and the frequency setting by a normal user using the dedicated application of the smart device 301 is "low" as described above. It is set from "Medium" and "High" options. That is, the frequency is not set to "highest" in the setting by user operation.

Ｓ５１０では、第１制御部２２３は、後述するＳ７０５で開始した頻度パラメータの設定を「最高」から元に戻すまでの頻度ブースト時間が終了しているかを判定する。終了している場合にはＳ５１１へ進み、そうでない場合にはＳ５１２へ進む。 In S510, the first control unit 223 determines whether the frequency boost time from "highest" to returning the frequency parameter setting started in S705, which will be described later, has ended. If it is finished, the process proceeds to S511, and if not, the process proceeds to S512.

Ｓ５１１では、第１制御部２２３は、頻度ブースト時間が終了していたため、頻度パラメータを「最高」に設定される前の頻度設定に元に戻す。このとき、頻度ブースト時間中に、自動撮影によって所定枚数以上の撮影が行われた場合には、現在のシーンが撮影すべきシーンであると判断できるため、頻度ブースト時間を延長してもよい。そうすることで、さらにユーザが撮って欲しいシーンを撮り続けることができる。 In S511, since the frequency boost time has expired, the first control unit 223 restores the frequency parameter to the frequency setting before being set to "highest". At this time, if more than a predetermined number of shots are taken by automatic shooting during the frequency boost time, it can be determined that the current scene is the scene to be shot, so the frequency boost time may be extended. By doing so, it is possible to continue shooting the scenes that the user wants to shoot.

Ｓ５１２では、第１制御部２２３は、自動撮影を行うかどうかの判定を行う。 In S512, the first control unit 223 determines whether or not to perform automatic photographing.

ここで、自動撮影を行うかどうかの判定について説明する。自動撮影を行うかどうかの判定は、重要度スコアが所定値を超えるかどうかで行われる。重要度スコアとは、自動撮影を行うかどうかの判定に用いるパラメータであり、探索エリアを決定するための重要度レベルとは異なるものである。重要度スコアは、被写体の検出状況と時間経過に応じて得点が加点される。例えば、重要度スコアが２０００点を超えると自動撮影を行われるよう設計する場合を考える。この場合、まず、重要度スコアは初期値が０点であり、自動撮影のモードに入った時点からの時間経過によって加点されていく。優先度の高い被写体がいなければ、例えば１２０秒後に２０００点に達するような増加率で増加していく。優先度の高い被写体が検出されないまま１２０秒が経過した場合、時間経過による加点によって２０００点に達し、撮影が行われる。また、時間経過中に優先度の高い被写体を検出すると１０００点が加点される。このため、優先度の高い被写体が検出されている状態では、２０００点に達しやすくなり、結果的に撮影頻度が上がることになりやすい。 Here, the determination of whether or not to perform automatic shooting will be described. Judgment as to whether or not to perform automatic shooting is performed based on whether or not the importance score exceeds a predetermined value. The importance score is a parameter used for determining whether or not to perform automatic photographing, and is different from the importance level for determining the search area. Scores are added to the importance score according to the detection status of the subject and the passage of time. For example, consider a case where automatic shooting is performed when the importance score exceeds 2000 points. In this case, first, the importance score has an initial value of 0 points, and points are added according to the passage of time from the time when the automatic shooting mode is entered. If there is no subject with high priority, the number of points will increase at an increase rate of 2000 points after 120 seconds, for example. If 120 seconds elapse without detecting a high-priority subject, the number of points added over time reaches 2000 points, and shooting is performed. Further, if a subject having a high priority is detected during the lapse of time, 1000 points will be added. Therefore, in a state where a subject having a high priority is detected, it is easy to reach 2000 points, and as a result, the shooting frequency tends to increase.

また、例えば被写体の笑顔を認識した場合は、８００点が加点される。なお、この笑顔に基づく加点は、優先度の高い被写体でなくとも加点される。また、本実施形態では、笑顔に基づく加点の点数は優先度の高い被写体であるか否かに関わらず同じ点数である場合を例に挙げて説明するが、これに限られるものではない。例えば優先度の高い被写体の笑顔を検知したことに応じた加点の点数を、優先度が高くない被写体の笑顔を検知したことに応じた加点の点数よりも高くしてもよい。このようにすることで、よりユーザの意図に沿った撮影を行うことが可能になる。これらの被写体の表情変化に伴う加点により２０００点を超えれば自動撮影される。また、表情変化に伴う加点で２０００点を超えなくとも、その後の時間経過による加点で２０００点により短い時間で到達する。 Further, for example, when the smiling face of the subject is recognized, 800 points are added. The points added based on this smile are added even if the subject is not a high-priority subject. Further, in the present embodiment, the points to be added based on the smile will be described by taking as an example the case where the points are the same regardless of whether or not the subject has a high priority, but the present invention is not limited to this. For example, the score added according to the detection of the smile of the subject having a high priority may be higher than the score added according to the detection of the smile of the subject having a low priority. By doing so, it becomes possible to take a picture more in line with the user's intention. If the number of points exceeds 2000 due to the addition of points due to changes in the facial expressions of these subjects, automatic shooting is performed. In addition, even if the points added due to the change in facial expression do not exceed 2000 points, the points added after that time will reach 2000 points in a shorter time.

なお、時間経過による加点は、例えば１２０秒で２０００点になるよう加点する場合、１秒ごとに２０００／１２０点だけ加点する、すなわち時間に対して線形に加点する場合を例に挙げて説明するがこれに限られるものではない。例えば、１２０秒のうち１１０秒までは加点せず、１１０秒から１２０秒までの１０秒間で、秒間２００点ずつ加点して２０００点に達するような増加の仕方にしてもよい。このようにすることで、被写体の表情変化による加点で、優先度の高低に関わらず撮影される点数に達してしまうことを防ぐことができる。時間経過に伴い線形増加する加点方法の場合、すでに時間経過により加点されている状態が長いため、優先度の低い被写体の笑顔への変化に伴う加点であっても撮影される点数に達してしまうことが多く、優先度の高低がさほど反映されにくい。かといって表情変化に伴う加点の点数を低くすると表情変化のあるタイミングを逃すことになるため、加点の点数を下げることでの対応は避けたい。そこで、１１０秒までは加点しないようにする。このようにすれば、優先度の低い被写体は加点されないまま１１０秒が経過する。一方、優先度の高い被写体は検知した時点で１０００点が加点されるようにしているため、１１０秒まで時間経過による加点がなくとも１０００点は加点された状態になる。これにより、表情変化に伴う加点が行われる場合に、優先度の低い被写体は撮影を行う点数に達する可能性を、優先度の高い被写体にくらべて抑えることができ、優先度の高低が機能しやすい。上記の説明では表情変化を例に挙げたが、加点される基準はこのほかにも声が大きくなった場合や身振り手振りが大きくなった場合などが考えられる。これらについても優先度の高低を機能させやすくするために上記のような加点方法の差を設ければよい。 The points added with the passage of time will be described by taking, for example, a case where points are added so as to reach 2000 points in 120 seconds, and a case where only 2000/120 points are added every second, that is, points are added linearly with respect to time. Is not limited to this. For example, points may not be added up to 110 seconds out of 120 seconds, but may be increased so that points are added by 200 points per second to reach 2000 points in 10 seconds from 110 seconds to 120 seconds. By doing so, it is possible to prevent the number of points to be photographed from being reached regardless of the high or low priority due to the addition of points due to the change in the facial expression of the subject. In the case of the point addition method that linearly increases with the passage of time, points have already been added over time for a long time, so even if the points are added due to the change to the smile of a low-priority subject, the number of points to be shot will be reached. In many cases, it is difficult to reflect the high and low priorities. However, if the score of points added due to changes in facial expressions is lowered, the timing of changes in facial expressions will be missed, so it is desirable to avoid measures by lowering the points of points added. Therefore, do not add points until 110 seconds. In this way, 110 seconds elapse without adding points to the subject having a low priority. On the other hand, since 1000 points are added to a subject having a high priority at the time of detection, 1000 points are added even if there is no point added due to the passage of time up to 110 seconds. As a result, when points are added due to changes in facial expressions, the possibility that a low-priority subject will reach the number of points to be photographed can be suppressed as compared with a high-priority subject, and the high and low priorities function. Cheap. In the above explanation, the change in facial expression is taken as an example, but other criteria for adding points may be when the voice becomes louder or when the gesture becomes louder. Also for these, in order to make it easier for the high and low priorities to function, the difference in the point addition method as described above may be provided.

また、仮に被写体の行動によって２０００点を超えなくとも、時間経過によって必ず１２０秒で撮影されるため、一定期間まったく撮影されないということはない。 Further, even if the action of the subject does not exceed 2000 points, the image is always taken in 120 seconds with the passage of time, so that it is possible that the image is not taken at all for a certain period of time.

また、途中で被写体が検出された場合、１２０秒のうち、増加を開始する時間を前倒ししてもよい。つまり、例えば６０秒の時点で優先度の高い被写体が検出された場合、それによって１０００点が加点されてもまだ２０００点を超えないが、このまま１１０秒まで増加しないのではなく、被写体を検出したのち３０秒が経過したら線形増加を始めるようにしてもよい。あるいは、１２０秒の１０秒前ではなく２０秒前に線形増加を始めるようにしてもよい。このようにすれば、優先度の高い被写体が撮影される可能性が高まるため、よりユーザの意図に沿った撮影を実現しやすくなる。 Further, when the subject is detected in the middle, the time for starting the increase may be advanced in 120 seconds. That is, for example, when a subject with a high priority is detected at 60 seconds, even if 1000 points are added by that, the score does not still exceed 2000 points, but the subject is detected instead of increasing to 110 seconds as it is. After that, the linear increase may be started after 30 seconds have passed. Alternatively, the linear increase may be started 20 seconds before 10 seconds before 120 seconds. By doing so, the possibility that a subject having a high priority is taken is increased, and it becomes easier to realize the shooting according to the user's intention.

自動撮影が行われると、重要度スコアは０点にリセットされる。再度２０００点を超えるまで自動撮影は行われない。 When automatic shooting is performed, the importance score is reset to 0 points. Automatic shooting will not be performed until the number exceeds 2000 points again.

ここで、頻度パラメータは、時間経過による重要度スコアの増加の仕方をコントロールするために用いられる。上記の例で被写体が検出されていない場合には自動撮影されるまで１２０秒かかるように設定されている。これは頻度パラメータが「中」の場合を例に挙げて説明したものだが、頻度ブーストの状態（頻度パラメータ「最高」）では６０秒で自動撮影が行われるように、重要度スコアの増加のさせ方を変更する。この場合、増加の仕方は１秒ごとに２０００／６０点を加点してもよいし、例えば５５秒まで加点せず、６０秒までの残り５秒で、毎秒４００点ずつ加点してもよい。後者のようにした場合の利点は上に述べた通りである。なお、ほかの頻度の例を挙げると、例えば頻度パラメータ「高」の場合は、１００秒で２０００点になるよう増加させ、頻度パラメータ「低」の場合は、２４０秒で２０００点になるよう増加させるなどと設計する。以上の通り、頻度パラメータ「最高」の場合は、最も短い時間（本実施形態の説明では６０秒の例）で少なくとも１枚撮影される頻度になる。したがって、撮影の頻度を上げるということは、加点の方法を変えることにより時間当たりに撮影される枚数を増やすことであり、撮影の頻度を下げるということは、加点の方法を変えることにより時間当たりに撮影される枚数を減らすことである。 Here, the frequency parameter is used to control how the importance score increases over time. If the subject is not detected in the above example, it is set to take 120 seconds until automatic shooting. This was explained by taking the case where the frequency parameter is "medium" as an example, but in the frequency boost state (frequency parameter "highest"), the importance score is increased so that automatic shooting is performed in 60 seconds. Change the direction. In this case, 2000/60 points may be added every second, or 400 points may be added per second for the remaining 5 seconds up to 60 seconds without adding points until 55 seconds, for example. The advantages of doing the latter are as described above. To give another example of frequency, for example, when the frequency parameter is "high", it is increased to 2000 points in 100 seconds, and when the frequency parameter is "low", it is increased to 2000 points in 240 seconds. Design to let you. As described above, when the frequency parameter is "highest", at least one image is taken in the shortest time (60 seconds in the description of the present embodiment). Therefore, increasing the frequency of shooting means increasing the number of shots taken per hour by changing the method of adding points, and decreasing the frequency of shooting means increasing the number of shots per hour by changing the method of adding points. It is to reduce the number of shots.

以上が、自動撮影を行うかどうかの判定について説明である。上記の判断により、自動撮影すると判断した場合には、Ｓ５１３へ進み、撮影しないと判断した場合には、Ｓ５０１へと進む。 The above is the description of the determination of whether or not to perform automatic shooting. Based on the above determination, if it is determined that automatic shooting is performed, the process proceeds to S513, and if it is determined that shooting is not performed, the process proceeds to S501.

Ｓ５１３では、第１制御部２２３は、撮影処理を実行する。ここでいう撮影処理とは、静止画撮影や動画撮影が挙げられる。 In S513, the first control unit 223 executes a photographing process. The shooting process referred to here includes still image shooting and moving image shooting.

図６は、本実施形態における撮像装置１０１の音声認識処理のフローチャートである。撮像装置１０１に内蔵されたマイクに、ユーザが発した音声が入力された場合、音声入力音声処理部４０９において音声認識処理を行いユーザの操作命令を取得する。 FIG. 6 is a flowchart of the voice recognition process of the image pickup apparatus 101 according to the present embodiment. When the voice emitted by the user is input to the microphone built in the image pickup apparatus 101, the voice input voice processing unit 409 performs voice recognition processing and acquires the user's operation command.

Ｓ６０１では、第１制御部２２３は、ウェイクワードの検出がされたかどうかの判定を行う。ウェイクワードとは、撮像装置１０１に対する具体的な指示を音声で行う音声コマンド認識を開始するための起動コマンドである。音声によって指示を行う場合、ウェイクワード認識後にコマンドワードを発生し、認識が成功する必要がある。ウェイクワードの検出がされた場合には、Ｓ６０２へ進み、検出されなかった場合には検出されるまでＳ６０１の処理を繰り返す。 In S601, the first control unit 223 determines whether or not the wake word has been detected. The wake word is an activation command for starting voice command recognition that gives a specific instruction to the image pickup apparatus 101 by voice. When giving instructions by voice, it is necessary to generate a command word after wake word recognition and the recognition must be successful. If the wake word is detected, the process proceeds to S602, and if it is not detected, the process of S601 is repeated until it is detected.

Ｓ６０２では、第１制御部２２３は、自動撮影処理を停止状態にする。ウェイクワードを認識したら、コマンドワードの待ち受け状態となるため、自動撮影処理を停止する。自動撮影の停止とは、パン・チルト動作、ズーム動作を用いた被写体探索や撮影処理の実行を指す。自動撮影を停止する目的は、ウェイクワードの次に発せられるコマンドワードの指示に素早く反応するために、自動撮影の処理を停止してコマンドワード待ち受け状態にすることが挙げられる。また、音声指示によって撮影指示を与えようとしていた場合、パン・チルトを停止することでユーザが撮影しようとしていた方向で撮影できるようにすることが挙げられる。 In S602, the first control unit 223 stops the automatic photographing process. When the wake word is recognized, the command word is in the standby state, so the automatic shooting process is stopped. Stopping automatic shooting refers to subject search and shooting processing execution using pan / tilt operation and zoom operation. The purpose of stopping the automatic shooting is to stop the automatic shooting process and put it in the command word standby state in order to quickly respond to the command word instruction issued after the wake word. Further, when a shooting instruction is to be given by a voice instruction, it is possible to shoot in the direction in which the user is trying to shoot by stopping the pan / tilt.

Ｓ６０３では、第１制御部２２３は、ウェイクワードに認識成功をしたことをユーザに示すための認識音を鳴動させる。 In S603, the first control unit 223 sounds a recognition sound for indicating to the user that the wake word has been recognized successfully.

Ｓ６０４では、第１制御部２２３は、コマンドワードが検出されたかどうか判定を行う。コマンドワードが検出された場合にはＳ６０６に進み、検出されなかった場合にはＳ６０５に進む。 In S604, the first control unit 223 determines whether or not a command word has been detected. If the command word is detected, the process proceeds to S606, and if the command word is not detected, the process proceeds to S605.

Ｓ６０５では、第１制御部２２３は、ウェイクワードを検出し、コマンドワード待ち受け状態になってから所定時間が経過したかを判定する。所定時間が経過した場合にはＳ６０１に進み、コマンドワードの待ち受け状態を止めて、ウェイクワードの待ち受け状態となる。所定時間が経過していない場合には、コマンドワードが検出されるまでＳ６０４を繰り返す。 In S605, the first control unit 223 detects the wake word and determines whether a predetermined time has elapsed since the command word standby state was entered. When the predetermined time has elapsed, the process proceeds to S601, the standby state of the command word is stopped, and the standby state of the wake word is set. If the predetermined time has not elapsed, S604 is repeated until the command word is detected.

Ｓ６０６では、第１制御部２２３は、検出されたコマンドワードが静止画撮影コマンドかどうかの判定を行う。この静止画撮影コマンドは、撮像装置１０１に対して１枚の静止画の撮影・記録の実行要求を行うコマンドである。静止画撮影コマンドと判定した場合にはＳ６０７へ進み、そうでない場合にはＳ６０８へ進む。 In S606, the first control unit 223 determines whether or not the detected command word is a still image shooting command. This still image shooting command is a command for requesting the image pickup apparatus 101 to shoot and record a single still image. If it is determined that the command is still image shooting, the process proceeds to S607, and if not, the process proceeds to S608.

Ｓ６０７では、第１制御部２２３は、静止画撮影処理を行う。具体的には、撮像部２０６にて撮影した信号を画像処理部２０７において、例えばＪＰＥＧファイルに変換し、画像記録部２０８にて記録媒体２２１に記録を行う。 In S607, the first control unit 223 performs a still image photographing process. Specifically, the signal captured by the image pickup unit 206 is converted into, for example, a JPEG file by the image processing unit 207, and recorded on the recording medium 221 by the image recording unit 208.

Ｓ６０８では、第１制御部２２３は、検出されたコマンドワードが被写体探索コマンドかどうかの判定を行う。被写体探索コマンドと判定した場合にはＳ６０９へ進み、そうでない場合にはＳ６１０へ進む。 In S608, the first control unit 223 determines whether or not the detected command word is a subject search command. If it is determined that the command is a subject search command, the process proceeds to S609, and if not, the process proceeds to S610.

Ｓ６０９では、第１制御部２２３は、被写体探索処理を行う。すでにＳ５０５での被写体探索処理によって探索対象エリアが決定され、Ｓ５０６のパン・チルト駆動、Ｓ５０７のズーム駆動によって被写体を捉えている状態であれば、その被写体を追跡することを中止し、他の被写体を探すため、被写体探索処理を実行する。これは、被写体を捉えている状態で、ユーザが被写体探索を指示したのであれば、現在捉えている被写体とは別に撮影してほしい被写体が存在することを意味するためである。 In S609, the first control unit 223 performs the subject search process. If the search target area has already been determined by the subject search process in S505 and the subject is captured by the pan / tilt drive of S506 and the zoom drive of S507, the tracking of the subject is stopped and another subject is stopped. The subject search process is executed to search for. This is because if the user instructs the subject search while capturing the subject, it means that there is a subject to be photographed separately from the subject currently captured.

Ｓ６０７乃至Ｓ６０９の処理が完了後には、Ｓ６１０において、頻度設定処理を行う。頻度設定処理では、所定時間内にどのくらいの枚数の撮影を行うかの頻度パラメータを設定する処理である。処理内容の詳細については後述するが、Ｓ６１０で実行される頻度設定処理では撮影の頻度がより高くなるように設定される。 After the processing of S607 to S609 is completed, the frequency setting processing is performed in S610. The frequency setting process is a process of setting a frequency parameter for how many shots are taken within a predetermined time. The details of the processing contents will be described later, but in the frequency setting process executed in S610, the frequency of shooting is set to be higher.

Ｓ６１１では、第１制御部２２３は、検出されたコマンドワードが動画記録開始コマンドかどうかの判定を行う。動画撮影コマンドは、撮像装置１０１に対して動画像の撮像と記録を要求するコマンドである。動画記録開始コマンドと判定した場合にはＳ６１２へ進み、そうでない場合にはＳ６１３へ進む。 In S611, the first control unit 223 determines whether or not the detected command word is a moving image recording start command. The moving image shooting command is a command that requests the image pickup device 101 to capture and record a moving image. If it is determined that the command is to start moving image recording, the process proceeds to S612, and if not, the process proceeds to S613.

Ｓ６１２では、第１制御部２２３は、撮像部２０６を用いて動画像の撮影を開始し、記録媒体２２１へ記録を行う。動画の記録中は、パン・チルトやズーム駆動は行わず、被写体の探索は行わず、自動撮影は停止の状態を維持し続ける。 In S612, the first control unit 223 starts taking a moving image by using the imaging unit 206 and records the moving image on the recording medium 221. During video recording, pan / tilt and zoom drive are not performed, the subject is not searched, and automatic shooting continues to be stopped.

Ｓ６１３では、第１制御部２２３は、検出されたコマンドワードが動画記録停止コマンドかどうかの判定を行う。動画記録停止コマンドと判定した場合にはＳ６１４へ進み、そうでない場合にはＳ６１５へ進む。 In S613, the first control unit 223 determines whether or not the detected command word is a moving image recording stop command. If it is determined that the command is to stop moving image recording, the process proceeds to S614, and if not, the process proceeds to S615.

Ｓ６１４では、第１制御部２２３は、撮像部２０６を用いた動画像の撮影・記録を停止し、記録媒体２２１へ動画ファイルとしての記録を完了させる。 In S614, the first control unit 223 stops taking and recording a moving image using the imaging unit 206, and completes recording as a moving image file on the recording medium 221.

Ｓ６１５では、第１制御部２２３は、音声コマンドにおけるその他の処理を実行する。例えば、ユーザの指定した方向にパン・チルトを行うコマンドに対する処理や、露出補正など各種撮影パラメータを変更するコマンドに対する処理を行う事が挙げられる。 In S615, the first control unit 223 executes other processing in the voice command. For example, processing for a command to pan / tilt in a direction specified by the user and processing for a command to change various shooting parameters such as exposure compensation can be mentioned.

Ｓ６１６、Ｓ６１７では、第１制御部２２３は、Ｓ６０２にて停止した自動撮影に対して再開処理を行う。これによって、Ｓ５０２〜Ｓ５１０の処理が動作可能となり自動撮影が再開される。 In S616 and S617, the first control unit 223 performs a restart process for the automatic shooting stopped in S602. As a result, the processes of S502 to S510 can be operated, and automatic shooting is restarted.

このとき、動画の記録開始、記録停止の指示の場合には頻度設定処理は実行してない。これは、動画の記録開始後は連続して撮像部２０６からの信号を記録するため頻度設定を高く設定する意味がないことが理由である。また、動画の記録停止後は、ユーザが記録停止を指示したということは、記録に残すべきシーンが終わったことを示すので、いたずらに頻度を高く設定して無駄な画像を撮影しないようにするためである。 At this time, the frequency setting process is not executed in the case of instructions to start or stop recording of the moving image. This is because the signal from the imaging unit 206 is continuously recorded after the start of recording the moving image, so there is no point in setting the frequency setting high. Also, after the video recording is stopped, the fact that the user has instructed to stop recording indicates that the scene that should be recorded is over, so set the frequency unnecessarily to avoid shooting unnecessary images. Because.

また、撮像装置１０１が持つ電池残量などが少ない場合や、撮像装置１０１が発熱により所定温度以上になっている場合では、撮像部２０６などを頻繁に動作させないことが好ましい。このような状況では、後述図７のＳ７０４による頻度パラメータを「最高」に設定しないようにしてもよい。 Further, when the remaining battery level of the image pickup apparatus 101 is low, or when the image pickup apparatus 101 is heated to a predetermined temperature or higher due to heat generation, it is preferable not to operate the image pickup apparatus 206 or the like frequently. In such a situation, the frequency parameter according to S704 in FIG. 7, which will be described later, may not be set to “highest”.

図７は、本実施形態における撮像装置１０１の頻度設定処理のフローチャートである。ユーザが自動撮影を行う頻度を設定する手段としては、スマートデバイス３０１内の専用アプリケーションを介して行う方法がある。本フローチャートの処理は、図６のＳ６１０の実行に応じても開始される。さらに、スマートデバイス３０１内の専用アプリケーションを介してユーザが頻度の変更を指示したことに応じても開始される。 FIG. 7 is a flowchart of the frequency setting process of the image pickup apparatus 101 according to the present embodiment. As a means for setting the frequency at which the user performs automatic shooting, there is a method of performing the automatic shooting via a dedicated application in the smart device 301. The processing of this flowchart is also started in response to the execution of S610 in FIG. Further, it is also started in response to the user instructing the frequency change via the dedicated application in the smart device 301.

Ｓ７０１では、第１制御部２２３は、スマートデバイス３０１内の専用アプリケーションを介した頻度設定であるかを判定する。専用アプリケーションを介した頻度設定である場合にはＳ７０２に進み、そうでない場合（例えばＳ６１０で実行される場合）にはＳ７０３に進む。 In S701, the first control unit 223 determines whether the frequency is set via the dedicated application in the smart device 301. If the frequency is set via the dedicated application, the process proceeds to S702, and if not (for example, when the frequency is set in S610), the process proceeds to S703.

Ｓ７０２では、第１制御部２２３は、ユーザが指示した頻度パラメータに設定を行う。例えば、図９のようにスマートデバイス３０１内の専用アプリケーションの画面において、自動撮影頻度の項目から「低」・「中」・「高」を選択することで設定が可能である。 In S702, the first control unit 223 sets the frequency parameter instructed by the user. For example, as shown in FIG. 9, on the screen of the dedicated application in the smart device 301, the setting can be made by selecting "low", "medium", and "high" from the items of the automatic shooting frequency.

ここで、図９のアプリケーション画面について説明する。 Here, the application screen of FIG. 9 will be described.

スマートデバイス３０１の専用アプリケーションでは、自動的に撮影するコンテンツとして、静止画と動画が用意されている。さらに、自動的に撮影するコンテンツとして、静止画を優先するか、動画を優先するかを専用アプリケーションから設定することができる。この設定は、図９に示すように、スライダーバーのつまみをタッチ（フリック）して変更することができる。静止画を優先するよう設定された場合、動画よりも静止画を多く撮影する。また、動画を優先するよう設定された場合、静止画よりも動画を多く撮影する。 In the dedicated application of the smart device 301, still images and moving images are prepared as contents to be automatically captured. Furthermore, as the content to be automatically shot, it is possible to set whether to prioritize the still image or the moving image from the dedicated application. This setting can be changed by touching (flicking) the knob of the slider bar as shown in FIG. When it is set to give priority to still images, more still images are taken than moving images. Also, when it is set to give priority to moving images, more moving images are taken than still images.

また、撮像装置が撮像すべきシーンを探索する範囲を、正面方向から何度の範囲にするかを設定することもできる。図９の例では、正面から左右それぞれ３０度で合わせて６０度の範囲、正面から左右それぞれ９０度で合わせて１８０度の範囲、全周、の３パターンが設定できる。なお、より細かい範囲設定が可能なように数値を入力する形にしてもよい。 It is also possible to set how many ranges the imaging device searches for a scene to be imaged from the front direction. In the example of FIG. 9, three patterns can be set: a range of 60 degrees at 30 degrees from the front to the left and right, a range of 180 degrees at 90 degrees from the front to the left and right, and the entire circumference. In addition, a numerical value may be input so that a finer range can be set.

また、自動的に撮像する場合、撮像されたコンテンツが多くなりすぎることが懸念される。そこで、自動的に画像を削除する機能を設け、その機能をスマートデバイス３０１から入切りできるようにしている。なお、自動的に削除される画像は、例えば撮影日時が古いものから順に削除してもよいし、重要度が低い順から削除してもよい。ここでいう重要度とは、例えば静止画の場合は、ブレが少ないかどうかや人物が写っているかどうかなど、ユーザが残したくなるであろう画像であることが予測されるパラメータを数値化したものである。また、動画の場合は、例えば人物が写っているかどうかや、会話などの人の声が記録されているかどうかなどを数値化し、重要度を算出する。そして、合計数値の高いものほど重要度が高いものとして扱う。 Further, in the case of automatic imaging, there is a concern that the captured content will be too large. Therefore, a function for automatically deleting an image is provided so that the function can be turned on and off from the smart device 301. The images to be automatically deleted may be deleted in order from the oldest shooting date / time, or in ascending order of importance. The importance here is, for example, in the case of a still image, the parameters that are predicted to be an image that the user will want to leave, such as whether there is little blurring or whether a person is shown, are quantified. It is a thing. Further, in the case of a moving image, for example, whether or not a person is shown and whether or not a person's voice such as a conversation is recorded are quantified to calculate the importance. Then, the higher the total value, the higher the importance.

以上が図９の説明である。図７の説明に戻る。 The above is the description of FIG. Returning to the description of FIG.

Ｓ７０３では、第１制御部２２３は、音声認識処理から呼び出された頻度設定であるかを判定する。音声認識処理から呼び出された頻度設定である場合にはＳ７０４へ進み、そうでない場合には頻度設定処理を終了する。 In S703, the first control unit 223 determines whether the frequency setting is called from the voice recognition process. If the frequency setting is called from the voice recognition process, the process proceeds to S704, and if not, the frequency setting process ends.

Ｓ７０４では、第１制御部２２３は、頻度パラメータをＳ７０２で設定できる頻度よりもさらに高い頻度を設定する。このようにするのは、ユーザが撮影を指示したタイミングは、少なくとも撮影してほしいタイミングであることが理由である。すなわち、ユーザが撮影を指示したタイミングでは、撮影してほしい状況であるため、時間的に近い期間では、撮影してほしいシーンが生じやすいと考えられる。この点に着目し、本実施形態の撮像装置は、ユーザの音声コマンドによる音声指示をトリガーとして、音声コマンドが入力されてから一定の期間は撮影すべきシーンと推測し、撮影頻度を高くする。これによりユーザが撮って欲しい画像を逃さずに撮影することができる。本実施形態では、「最高」という頻度のパラメータに設定を行う事として説明をしているが、音声コマンド指示による頻度設定が行われる度に、頻度を段階的に高くするようにしてもよい。この場合、頻度の上限は撮像装置１０１が備える連写撮影の最速のコマ速度が上限となる。 In S704, the first control unit 223 sets the frequency parameter even higher than the frequency that can be set in S702. The reason for doing this is that the timing at which the user instructs the shooting is at least the timing at which the shooting is desired. That is, since the situation is such that the user wants to shoot at the timing when the user instructs the shooting, it is considered that a scene to be shot is likely to occur in a period close to the time. Focusing on this point, the imaging device of the present embodiment uses a voice instruction by a user's voice command as a trigger, presumes that the scene should be shot for a certain period after the voice command is input, and increases the shooting frequency. This makes it possible to capture an image that the user wants to capture without missing it. In the present embodiment, it is described that the parameter with the frequency of "highest" is set, but the frequency may be increased stepwise each time the frequency is set by the voice command instruction. In this case, the upper limit of the frequency is the fastest frame speed of continuous shooting included in the imaging device 101.

Ｓ７０５では、第１制御部２２３は、Ｓ７０４で「最高」に設定した頻度パラメータを、元のパラメータに戻すまでの頻度ブースト時間の設定を行い、カウントダウンを開始する。例えば、頻度設定が「中」に設定されている状態で、音声コマンド指示によって頻度設定が「最高」に設定された場合、仮に頻度ブースト時間を６０秒とすると、６０秒経過後には頻度設定が「中」に設定が戻る（実際の処理はＳ５１１で行われる）。なお、ここでいう頻度ブースト時間とは、頻度が最高の状態を維持する時間である。この頻度ブースト時間は、自動的に設定されるものだが、ユーザが任意の時間を設定できるようにしてもよい。 In S705, the first control unit 223 sets the frequency boost time until the frequency parameter set to "highest" in S704 is returned to the original parameter, and starts the countdown. For example, if the frequency setting is set to "medium" and the frequency setting is set to "highest" by voice command instruction, assuming that the frequency boost time is 60 seconds, the frequency setting will be set after 60 seconds have passed. The setting returns to "Medium" (actual processing is performed in S511). The frequency boost time referred to here is the time for maintaining the highest frequency. This frequency boost time is set automatically, but the user may be able to set any time.

このとき、この頻度ブースト時間は所定時間の経過によって設定を戻す以外に、自動撮影によって所定枚数の撮影がされるかどうかによって設定を元に戻しても良い。 At this time, the frequency boost time may be returned to the original setting depending on whether or not a predetermined number of shots are taken by automatic shooting, in addition to returning the setting when the predetermined time elapses.

また、頻度ブースト時間のカウントダウンが終了する前に、再度音声コマンドによって再度、頻度設定が「最高」に設定された場合には、頻度設定を元に戻すまでの所定時間もしくは所定枚数を延長する。 Further, if the frequency setting is set to "highest" again by the voice command before the countdown of the frequency boost time is completed, the predetermined time or the predetermined number of sheets until the frequency setting is restored is extended.

さらに、頻度設定を元に戻す判断として、被写体探索処理をパン方向の全方位に対して行ったかどうかで判断しても良い。 Further, as a determination to restore the frequency setting, it may be determined whether or not the subject search process is performed in all directions in the pan direction.

以上、本発明の好ましい実施例について説明したが、本発明はこれらの実施例に限定されず、その要旨の範囲内で種々の変形及び変更が可能である。 Although preferable examples of the present invention have been described above, the present invention is not limited to these examples, and various modifications and changes can be made within the scope of the gist thereof.

例えば、上記実施例ではユーザからの撮影指示の手段として音声コマンドの例を用いて説明した。これに加えて、スマートデバイスやＢＬＥリモコンからの通信手段を介しての指示で撮影を指示された場合にも、その指示を実行した後に頻度設定を「最高」に設定するようにしてもよい。また、撮像装置内の加速度センサーを利用した特定の振動パターンによる、この振動パターンに応じた処理の実行の指示を検出した場合にも、その指示を実行した後に頻度設定を「最高」に設定するようにしてもよい。さらに、撮像部を通してユーザの手の動きを解析しジェスチャーによるジェスチャー指示を受けた場合でも、その指示を実行した後に頻度設定を「最高」に設定するようにしてもよい。 For example, in the above embodiment, an example of a voice command has been used as a means for instructing a user to take a picture. In addition to this, even when shooting is instructed by an instruction via a communication means from a smart device or a BLE remote controller, the frequency setting may be set to "highest" after executing the instruction. Also, when an instruction to execute processing according to this vibration pattern is detected by a specific vibration pattern using the acceleration sensor in the image pickup device, the frequency setting is set to "highest" after executing the instruction. You may do so. Further, even when the movement of the user's hand is analyzed through the imaging unit and a gesture instruction is received by a gesture, the frequency setting may be set to "highest" after the instruction is executed.

また、本実施形態では、パン・チルト駆動とズーム駆動によって被写体を追尾することで、ユーザが欲する画像の撮影を行うことを特徴としてきた。これについては例えば、撮像手段として３６０°カメラを採用することで全方位を常時撮影し、撮影できた画像から必要な範囲の画像を切り出すことで被写体の画像を得るような実装も考えられる。このようにした場合は、常に動画記録を実行し、切り出し指示が入力されたことに応じて、静止画のフォーマットで記録を行った後、動画のフレームレートを上げるようにする。この場合でも、上述の実施形態での撮影頻度と同様に、フレームレートを設定できる最高のレートにしてもよいし、設定できる値を超えた値にしてもよい。また、上げたフレームレートを元に戻す条件も、上述の実施形態と同様に、一定時間の経過を採用すればよい。これにより、ユーザが画像の記録を望むタイミングの周辺ではより高頻度で記録することになる、その結果、例えば動体に対するピントのブレが生じていない画像を取得しやすくなるという効果が得られる。 Further, the present embodiment has been characterized in that the subject is tracked by pan / tilt drive and zoom drive to capture an image desired by the user. Regarding this, for example, it is conceivable to adopt a 360 ° camera as an imaging means to constantly shoot in all directions and to obtain an image of a subject by cutting out an image in a necessary range from the captured image. In such a case, the moving image recording is always executed, and the frame rate of the moving image is increased after recording in the still image format in response to the input of the cutout instruction. Even in this case, the frame rate may be set to the highest rate that can be set, or may be a value that exceeds the settable value, as in the case of the shooting frequency in the above-described embodiment. Further, as the condition for returning the increased frame rate to the original value, the elapse of a certain period of time may be adopted as in the above-described embodiment. As a result, the image is recorded more frequently around the timing when the user wants to record the image, and as a result, it becomes easier to acquire an image in which, for example, the image is not out of focus with respect to the moving object.

なお、頻度ブースト時間内に撮影タイミングが来なかった場合には、１枚も撮影されないということが考えられる。そこで、まず静止画撮影コマンドを受け付けた時点で、パン・チルトやズーム駆動は行わず、被写体の探索も行わずに１枚撮影する。続いて、被写体を探索しながら連続して３枚撮影を行う。その後、所定時間の間、頻度ブースト状態になり自動撮影を行う。こうすることで、ユーザが静止画撮影コマンドにより意図して静止画撮影を指示した場合に、１枚も撮影がされないということはなくなり、最低でも４枚は撮影されることになる。 If the shooting timing does not come within the frequency boost time, it is conceivable that no shot is shot. Therefore, when the still image shooting command is first received, pan / tilt and zoom drive are not performed, and one image is shot without searching for the subject. Then, while searching for the subject, three consecutive shots are taken. After that, for a predetermined time, the frequency is boosted and automatic shooting is performed. By doing so, when the user intentionally instructs the still image shooting by the still image shooting command, it is possible that no one shot is taken and at least four shots are taken.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other Examples)
The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

Claims

A sound collecting means for collecting sound and
An analysis means for analyzing the sound collected by the sound collecting means, and an analysis means.
Automatic shooting means that automatically shoots,
A setting means for setting the shooting frequency of the automatic shooting means and
Have,
An imaging device characterized in that, when a specific voice instruction is obtained as a result of analysis by the analysis means, the imaging frequency is set higher by the setting means after performing an operation according to the instruction.

The imaging device according to claim 1, wherein the automatic photographing means automatically pans, tilts, and zooms the imaging device, tracks a subject, and automatically captures a still image or a moving image. ..

The imaging device according to claim 1, wherein the frequency set by the setting means is set at a higher frequency than the frequency that can be arbitrarily set by the user.

The imaging apparatus according to claim 1, wherein when a predetermined time elapses after the photographing frequency is set higher by the setting means, the photographing frequency is restored to the original value.

The fourth aspect of claim 4, wherein when the automatic photographing means takes a predetermined number of images or more in a state where the photographing frequency is set higher by the setting means, the predetermined time is extended. Imaging device.

The imaging device according to claim 4, wherein when a specific voice instruction is recognized by the analysis means in a state where the photographing frequency is set higher by the setting means, the predetermined time is extended.

The imaging apparatus according to claim 1, wherein the photographing frequency is set higher by the setting means, and then the photographing frequency is restored when a predetermined number of images are photographed by the automatic photographing means.

The imaging according to claim 7, wherein when the automatic photographing means takes a predetermined number of images or more in a state where the photographing frequency is set higher by the setting means, the predetermined number of images is increased. Device.

The imaging device according to claim 7, wherein when a specific voice instruction is recognized by the analysis means in a state where the photographing frequency is set higher by the setting means, the predetermined number of images is increased.

Further, it has a rotating means for changing the direction of the image pickup apparatus, and after the shooting frequency is set higher by the setting means, the shooting frequency is restored when the subject in all directions is searched by the rotating means. The imaging device according to claim 1.

The imaging device according to claim 1, wherein when the specific voice instruction analyzed by the analysis means is a shooting instruction, the shooting frequency is set higher by the setting means.

The imaging device according to claim 1, wherein when the specific voice instruction analyzed by the analysis means is an instruction to search for a subject, the shooting frequency is set higher by the setting means.

The imaging according to claim 1, wherein when the specific voice instruction analyzed by the analysis means is an instruction to start recording a moving image, the frequency setting is not set high by the setting means. Device.

The imaging according to claim 1, wherein when the specific voice instruction analyzed by the analysis means is an instruction to stop recording of a moving image, the frequency setting is not set high by the setting means. Device.

When the remaining battery level of the image pickup apparatus is less than a predetermined amount, even if the voice analyzed by the analysis means is the specific voice instruction, the frequency setting is not set high by the setting means. The imaging device according to claim 1.

When the temperature of the image pickup apparatus is higher than a predetermined temperature, even if the voice analyzed by the analysis means is the specific voice instruction, the frequency setting is not set high by the setting means. The imaging device according to claim 1.

When a specific instruction is detected from a mobile terminal via a communication means, a specific vibration pattern using the acceleration sensor of the image pickup device is detected, or a specific instruction is given by a gesture instruction that realizes the instruction by the movement of the user's hand. The imaging device according to claim 1, wherein the frequency is set high by the setting means even when the frequency is increased.

A control method for an image pickup device having a sound collecting means for collecting sound.
An analysis step for analyzing the sound collected by the sound collecting means, and
An automatic shooting step that automatically shoots and
It has a setting step for setting the shooting frequency in the automatic shooting step.
If the result of the analysis by the analysis step is a specific voice instruction, the control of the imaging device is characterized in that the setting step is executed after the operation according to the instruction is performed to set the shooting frequency high. Method.

A computer-readable program for allowing a computer to function as each means of the imaging apparatus according to any one of claims 1 to 17.