JP7118746B2

JP7118746B2 - IMAGING DEVICE, CONTROL METHOD AND PROGRAM THEREOF

Info

Publication number: JP7118746B2
Application number: JP2018104913A
Authority: JP
Inventors: 龍介佐藤
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2018-05-31
Filing date: 2018-05-31
Publication date: 2022-08-16
Anticipated expiration: 2038-05-31
Also published as: JP2019212965A

Description

本発明は、撮像装置及びその制御方法及びプログラムに関するものである。 The present invention relates to an imaging apparatus, its control method, and program.

カメラ等の撮像装置による静止画・動画撮影においては、ユーザがファインダー等を通して撮影対象を決定し、撮影状況を自ら確認して撮影画像のフレーミングを調整することによって、画像を撮影するのが通常である。このような撮像装置では、ユーザの操作ミスを検知してユーザに通知したり、外部環境の検知を行い、撮影に適していない場合にユーザに通知したりする機能が備えられている。また、撮影に適した状態になるようにカメラを制御する仕組みが従来から存在している。 In taking still images and moving images with an image pickup device such as a camera, it is normal for the user to decide the object to be photographed through a viewfinder or the like, check the photographing conditions by himself, and adjust the framing of the photographed image, thereby photographing the image. be. Such an imaging apparatus has a function of detecting a user's operation error and notifying the user, or detecting an external environment and notifying the user when it is not suitable for photographing. In addition, conventionally, there is a mechanism for controlling a camera so that it is in a state suitable for photographing.

このようなユーザの操作により撮影を実行する撮像装置に対し、ユーザが撮影指示を与えることなく定期的および継続的に撮影を行うライフログカメラが存在する（特許文献１）。ライフログカメラは、ストラップ等でユーザの身体に装着された状態で用いられ、ユーザが日常生活で目にする光景を一定時間間隔で映像として記録するものである。ライフログカメラによる撮影は、ユーザがシャッタを切るなどの意図したタイミングで撮影するのではなく、一定の時間間隔で撮影を行うため、普段撮影しないような不意な瞬間を映像として残すことができる。 There is a lifelog camera that regularly and continuously takes pictures without the user giving an instruction to take pictures (Patent Document 1). A life log camera is used while being attached to the user's body with a strap or the like, and records scenes the user sees in daily life as images at regular time intervals. Shooting with a lifelog camera does not take pictures at the intended timing, such as when the user releases the shutter, but at regular time intervals.

特表２０１６－５３６８６８号公報Japanese Patent Publication No. 2016-536868

しかしながら、ユーザの身に着けるタイプの、これまでのライフログカメラでは、定期的に自動撮影を行うものであるので、得られる画像は意図とは無関係なものとなる可能性が高い。マイク等の音声入力部を用いて音源の方向を検知する場合、音声入力部の数が多ければ音源の方向検知を高精度に行うことができるが、部品コストが増加する。また、装置の構造やデザイン的な制約により、多数の音声入力部を設けるのは難しい場合もある。 However, conventional lifelog cameras of the type worn by the user take pictures automatically on a regular basis, so there is a high possibility that the resulting image will be unrelated to the user's intention. When the direction of a sound source is detected using voice input units such as microphones, the direction of the sound source can be detected with high precision if the number of voice input units is large, but the cost of parts increases. Moreover, it may be difficult to provide a large number of voice input units due to restrictions on the structure and design of the device.

本発明は上記問題に鑑みなされたものであり、特別な操作を行わずとも、ユーザの意図したタイミングで意図した構図の画像を撮像する技術を提供しようとするものである。 SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and it is an object of the present invention to provide a technique for capturing an image with an intended composition at a user's intended timing without performing a special operation.

この課題を解決するため、例えば本発明の撮像装置は以下の構成を備える。すなわち、
撮像部が設けられ、前記撮像部を所定の方向に回動可能な可動撮像部と、
前記可動撮像部に設けられた複数のマイクと、
前記複数のマイクを用いて音源の方向を検出する音方向検出を行う音方向検出手段と、
前記可動撮像部を第１の方向に向けた状態で、前記音方向検出手段により検出された音方向と、前記可動撮像部を前記第１の方向から前記所定の方向に回動させて第２の方向に向けた状態で、前記音方向検出手段により検出された音方向とに基づいて、音源の方向を特定する処理を行うように制御する制御手段と、を備え、前記第２の方向は、前記第１の方向から、０度より大きく９０度以下である予め定められた回転角度だけ、前記可動撮像部を前記所定の方向に回動させた方向であることを特徴とする。 In order to solve this problem, for example, the imaging device of the present invention has the following configuration. i.e.
a movable imaging unit provided with an imaging unit, the imaging unit being rotatable in a predetermined direction;
a plurality of microphones provided in the movable imaging unit;
sound direction detection means for detecting the direction of a sound source using the plurality of microphones;
With the movable imaging section directed in the first direction, the sound direction detected by the sound direction detection means and the second direction by rotating the movable imaging section from the first direction to the predetermined direction are detected. a control means for controlling to perform processing for identifying the direction of the sound source based on the sound direction detected by the sound direction detection means when the second direction is directed to the direction of , the direction in which the movable imaging section is rotated in the predetermined direction by a predetermined rotation angle greater than 0 degrees and less than or equal to 90 degrees from the first direction .

本発明によれば、特別な操作を行わずとも、ユーザの意図したタイミングで意図した構図の画像を撮像することが可能になる。 According to the present invention, it is possible to capture an image with an intended composition at a timing intended by the user without performing any special operation.

実施形態に係る撮像装置のブロック図。1 is a block diagram of an imaging device according to an embodiment; FIG. 実施形態に係る音声入力部と音声信号処理部の詳細ブロック図。FIG. 3 is a detailed block diagram of an audio input unit and an audio signal processing unit according to the embodiment; 実施形態に係る撮像装置の外観図と使用例を示す図。1A and 1B are diagrams showing an external view and a usage example of an imaging device according to an embodiment; FIG. 実施形態に係る撮像装置のパン動作とチルト動作を示す図。4A and 4B are diagrams showing a panning operation and a tilting operation of the imaging device according to the embodiment; FIG. 実施形態における中央制御部の処理手順を示すフローチャート。4 is a flowchart showing a processing procedure of a central control unit in the embodiment; 図５における音声コマンド処理の詳細を示すフローチャート。FIG. 6 is a flowchart showing the details of voice command processing in FIG. 5; FIG. 実施形態における音声コマンドの意味と音声コマンドとの関係を示す図。The figure which shows the meaning of a voice command in embodiment, and the relationship with a voice command. 実施形態における起動時から動作撮影開始コマンドに至るまでのタイミングチャート。4 is a timing chart from the time of activation to a command to start motion shooting in the embodiment; 実施形態に係る音方向検出処理を説明するための図。FIG. 4 is a diagram for explaining sound direction detection processing according to the embodiment; 実施形態に係る音方向特定処理を説明するための図。FIG. 4 is a diagram for explaining sound direction identification processing according to the embodiment; 実施形態における中央制御部の処理手順を示す他のフローチャート。4 is another flowchart showing the processing procedure of the central control unit in the embodiment;

以下図面に従って本発明に係る実施形態を詳細に説明する。 Embodiments according to the present invention will be described in detail below with reference to the drawings.

［第１の実施形態］
図１は、第１の実施形態に係る撮像装置１のブロック構成図である。撮像装置１は、光学レンズユニットを含み、撮像する撮像方向が可変の可動撮像部１００、及び、可動撮像部１００の駆動制御および、撮像装置全体を制御する中央制御部（ＣＰＵ）を含む支持部２００で構成される。 [First embodiment]
FIG. 1 is a block configuration diagram of an imaging device 1 according to the first embodiment. The imaging apparatus 1 includes an optical lens unit, a movable imaging section 100 whose imaging direction is variable, and a support section including a central control section (CPU) for driving control of the movable imaging section 100 and controlling the imaging apparatus as a whole. 200.

なお、支持部２００は、圧電素子を含む複数の振動体１１～１３が可動撮像部１００の面に対し接触するように設けられている。これらの振動体１１～１３の振動の制御により、可動撮像部１００がパン、チルト動作を行うことになる。なお、パン、チルト動作はサーボモータ等で実現しても構わない。 Note that the support section 200 is provided so that the plurality of vibrating bodies 11 to 13 including piezoelectric elements are in contact with the surface of the movable imaging section 100 . By controlling the vibrations of these vibrating bodies 11 to 13, the movable imaging section 100 performs panning and tilting operations. Note that the pan and tilt operations may be realized by a servomotor or the like.

可動撮像部１００は、レンズ部１０１、撮像部１０２、レンズアクチュエータ制御部１０３、及び、音声入力部１０４を有する。 The movable imaging section 100 has a lens section 101 , an imaging section 102 , a lens actuator control section 103 and an audio input section 104 .

レンズ部１０１は、ズームレンズ、絞り・シャッタ、および、フォーカレンズなどの撮影光学系で構成される。撮像部１０２は、ＣＭＯＳセンサやＣＣＤセンサなどの撮像素子を含み、レンズ部１０１により結像された光学像を光電変換して電気信号を出力する。レンズアクチュエータ制御部１０３は、モータドライバＩＣを含み、レンズ部１０１のズームレンズ、絞り・シャッタ、および、フォーカスレンズ等の各種アクチュエータを駆動する。各種アクチュエータは、後述する支持部２００内の中央制御部２０１より受信した、アクチュエータ駆動指示データに基づいて駆動される。音声入力部１０４はマイクロフォン（以降マイク）を含む音声入力部であり複数のマイク（実施形態では２つ）で構成されており、音声信号を電気信号にさらにデジタル信号（音声データ）に変換して出力する。 A lens unit 101 includes a photographing optical system such as a zoom lens, a diaphragm/shutter, and a focus lens. The imaging unit 102 includes an imaging device such as a CMOS sensor or a CCD sensor, photoelectrically converts an optical image formed by the lens unit 101, and outputs an electric signal. A lens actuator control unit 103 includes a motor driver IC, and drives various actuators of the lens unit 101 such as a zoom lens, an aperture/shutter, and a focus lens. Various actuators are driven based on actuator drive instruction data received from a central control unit 201 in the support unit 200, which will be described later. The audio input unit 104 is an audio input unit including a microphone (hereinafter referred to as a microphone), and is composed of a plurality of microphones (two in the embodiment), and converts an audio signal into an electrical signal and then into a digital signal (audio data). Output.

一方、支持部２００は、撮像装置１の全体の制御を行うための中央制御部２０１を有する。この中央制御部２０１は、ＣＰＵと、ＣＰＵが実行するプログラムを格納したＲＯＭ、及び、ＣＰＵのワークエリアとして使用されるＲＡＭで構成される。また、支持部２００は、撮像信号処理部２０２、映像信号処理部２０３、音声信号処理部２０４、操作部２０５、記憶部２０６、表示部２０７を有する。更に、支持部２００は、入出力端子部２０８、音声再生部２０９、電源部２１０、電源制御部２１１、位置検出部２１２、回動制御部２１３、無線通信部２１４、並びに、先に説明した振動体１１～１３を有する。 On the other hand, the support section 200 has a central control section 201 for controlling the imaging device 1 as a whole. The central control unit 201 is composed of a CPU, a ROM storing programs executed by the CPU, and a RAM used as a work area for the CPU. Further, the support unit 200 has an imaging signal processing unit 202 , a video signal processing unit 203 , an audio signal processing unit 204 , an operation unit 205 , a storage unit 206 and a display unit 207 . Further, the support unit 200 includes an input/output terminal unit 208, an audio reproduction unit 209, a power supply unit 210, a power control unit 211, a position detection unit 212, a rotation control unit 213, a wireless communication unit 214, and the vibration control unit 214 described above. It has bodies 11-13.

撮像信号処理部２０２は、可動撮像部１００の撮像部１０２から出力された電気信号を映像信号へ変換する。映像信号処理部２０３は、撮像信号処理部２０２から出力された映像信号を用途に応じて加工する。映像信号の加工は画像切り出し、及び、回転加工による電子防振動作や、被写体（顔）を検出する被写体検出処理も含まれる。 The imaging signal processing unit 202 converts the electrical signal output from the imaging unit 102 of the movable imaging unit 100 into a video signal. The video signal processing unit 203 processes the video signal output from the imaging signal processing unit 202 according to the application. Processing of the video signal includes image clipping, electronic anti-vibration operation by rotational processing, and subject detection processing for detecting a subject (face).

音声信号処理部２０４は、音声入力部１０４からのデジタル信号に対して音声処理を行う。音声入力部１０４が電気アナログ出力であれば、音声信号処理部２０４において、電気アナログ信号からデジタル信号に変換する構成が含まれても構わない。なお、音声入力部１０４を含めた音声信号処理部２０４の詳細については図２を用いて後述する。 The audio signal processing unit 204 performs audio processing on the digital signal from the audio input unit 104 . If the audio input unit 104 outputs an electrical analog signal, the audio signal processing unit 204 may include a configuration for converting an electrical analog signal into a digital signal. Details of the audio signal processing unit 204 including the audio input unit 104 will be described later with reference to FIG.

操作部２０５は、撮像装置１とユーザとの間のユーザインターフェースとして機能するものであり、各種スイッチ、ボタン等で構成される。記憶部２０６は、撮影により得られた映像情報などの種々のデータを記憶する。表示部２０７は、ＬＣＤなどのディスプレイを備え、映像信号処理部２０３から出力された信号に基づいて、必要に応じて画像表示を行う。また、この表示部２０７は、各種メニュー等を表示することで、ユーザインターフェースの一部として機能する。外部入出力端子部２０８は、外部装置との間で通信信号および映像信号を入出力する。音声再生部２０９はスピーカーを含み、音声データを電気信号に変換し、音声を再生する。電源部２１０は、撮像装置の全体（各要素）の駆動に必要な電力供給源であり、実施形態では充電可能なバッテリであるものとする。 An operation unit 205 functions as a user interface between the image capturing apparatus 1 and the user, and includes various switches, buttons, and the like. The storage unit 206 stores various data such as image information obtained by shooting. The display unit 207 has a display such as an LCD, and displays an image as necessary based on the signal output from the video signal processing unit 203 . The display unit 207 also functions as part of the user interface by displaying various menus and the like. The external input/output terminal unit 208 inputs/outputs communication signals and video signals to/from an external device. The audio reproduction unit 209 includes a speaker, converts audio data into an electric signal, and reproduces the audio. The power supply unit 210 is a power supply source necessary for driving the entire (each element) of the imaging apparatus, and is assumed to be a rechargeable battery in the embodiment.

電源制御部２１１は、撮像装置１の状態に応じて、上記の各構成要素への電源部２１０からの電力の供給／遮断を制御するものである。撮像装置１の状態によっては、不使用の要素が存在する。電源制御部２１１は、中央制御部２０１の制御下で、撮像装置１の状態によって不使用な要素への電力を遮断して、電力消費量を抑制する機能を果たす。なお、電力供給／遮断については、後述する説明から明らかにする。 The power supply control unit 211 controls supply/cutoff of power from the power supply unit 210 to each of the components described above, according to the state of the imaging apparatus 1 . Depending on the state of the imaging device 1, there are unused elements. Under the control of the central control unit 201 , the power control unit 211 cuts off power to unused elements according to the state of the imaging device 1 to reduce power consumption. It should be noted that power supply/cutoff will be clarified from the description given later.

位置検出部２１２はジャイロ、加速度センサ、ＧＰＳといった撮像装置１の動きを検出する。この位置検出部２１２は、撮像装置１がユーザに身に着ける場合にも対処するためである。回動制御部２１３は、中央制御部２０１からの指示に従って振動体１１～１３を駆動する信号を生成し、出力する。振動体１１～１３は圧電素子で構成され、回動制御部２１３から印加される駆動信号に応じて振動する。振動体１１～１３は、回動駆動部（パン・チルト駆動部）を構成する。この結果、可動撮像部１００は、中央制御部２０１が指示した方向に、パン動作、チルト動作することになる。 A position detection unit 212 detects the motion of the imaging device 1 such as a gyro, acceleration sensor, and GPS. This position detection unit 212 is for coping with the case where the imaging device 1 is worn by the user. Rotation control unit 213 generates and outputs signals for driving vibrators 11 to 13 in accordance with instructions from central control unit 201 . The vibrating bodies 11 to 13 are composed of piezoelectric elements, and vibrate according to drive signals applied from the rotation control section 213 . The vibrating bodies 11 to 13 constitute a rotation drive section (pan/tilt drive section). As a result, the movable imaging unit 100 performs panning and tilting operations in the directions indicated by the central control unit 201 .

無線通信部２１４はＷｉｆｉやＢＬＥなどの無線規格に準拠して画僧データ等のデータ送信を行う。 A wireless communication unit 214 transmits data such as image data in compliance with wireless standards such as Wifi and BLE.

次に、実施形態における音声入力部１０４および音声信号処理部２０４の構成と、音方向検出処理を図２を参照して説明する。同図は、音声入力部１０４および音声信号処理部２０４の構成と、音声信号処理部２０４、中央制御部２０１及び電源制御部２１１の接続関係を示している。 Next, the configuration of the audio input unit 104 and the audio signal processing unit 204 and sound direction detection processing according to the embodiment will be described with reference to FIG. This drawing shows the configuration of the audio input unit 104 and the audio signal processing unit 204, and the connection relationship among the audio signal processing unit 204, the central control unit 201, and the power supply control unit 211. FIG.

音声入力部１０４は、２つの無指向性のマイク（マイク１０４ａ、マイク１０４ｂ）で構成される。各マイクはＡ／Ｄコンバータを内蔵している。そして、予め設定されたサンプリングレート（コマンド検出・方向検出処理：１６ｋＨｚ、動画録音：４８ｋＨｚ）で音声をサンプリングし、内蔵のＡ／Ｄコンバータによりサンプリングした音声信号をデジタルの音声データとして出力する。なお、実施形態では音声入力部１０４は２つのデジタルマイクで構成されるものとしているが、アナログ出力のマイクで構成されても構わない。アナログマイクの場合、音声信号処理部２０４内に、対応するＡ／Ｄコンバータが設ければ良い。また、実施形態におけるマイクの数は２つとするが、２つ以上であれば良い。 The voice input unit 104 is composed of two omnidirectional microphones (microphone 104a, microphone 104b). Each microphone contains an A/D converter. Audio is sampled at a preset sampling rate (command detection/direction detection processing: 16 kHz, video recording: 48 kHz), and the audio signal sampled by the built-in A/D converter is output as digital audio data. Although the voice input unit 104 is configured with two digital microphones in the embodiment, it may be configured with analog output microphones. In the case of an analog microphone, a corresponding A/D converter may be provided within the audio signal processing unit 204 . Also, although the number of microphones in the embodiment is two, the number may be two or more.

マイク１０４ａは、撮像装置１の電源がＯＮの場合には無条件に電力が供給され、集音可能状態となる。一方、他のマイク１０４ｂは、中央制御部２０１の制御下での電源制御部２１１による電力供給／遮断の対象となっており、撮像装置１の電源がＯＮとなった初期状態では、電力は遮断されている。 When the imaging apparatus 1 is powered on, the microphone 104a is unconditionally supplied with electric power, and is in a state capable of collecting sounds. On the other hand, the other microphone 104b is subject to power supply/cutoff by the power control unit 211 under the control of the central control unit 201, and the power is cut off in the initial state when the imaging apparatus 1 is powered on. It is

音声信号処理部２０４は、音圧レベル検出部２０４１、音声用メモリ２０４２、音声コマンド認識部２０４３、音方向検出部２０４４、動画用音声処理部２０４５、及び、コマンドメモリ２０４６で構成される。 The audio signal processing unit 204 is composed of a sound pressure level detection unit 2041 , an audio memory 2042 , a voice command recognition unit 2043 , a sound direction detection unit 2044 , a video audio processing unit 2045 and a command memory 2046 .

音圧レベル検出部２０４１は、マイク１０４ａからの音声データが表す出力レベルが予め設定された閾値以上となったとき、音声検出を表す信号を電源制御部２１１及び音声用メモリ２０４２に供給する。 The sound pressure level detection unit 2041 supplies a signal representing sound detection to the power supply control unit 211 and the sound memory 2042 when the output level represented by the sound data from the microphone 104a becomes equal to or higher than a preset threshold value.

電源制御部２１１は、音圧レベル検出部２０４１から音声検出を表す信号を受信した場合、音声コマンド認識部２０４３への電力供給を行う。 The power control unit 211 supplies power to the voice command recognition unit 2043 when receiving a signal indicating voice detection from the sound pressure level detection unit 2041 .

音声用メモリ２０４２は、中央制御部２０１の制御下での電源制御部２１１による電力供給／遮断の対象の１つである。また、この音声用メモリ２０４２は、マイク１０４ａからの音声データを一時的に記憶するバッファメモリである。この音声用メモリ２０４２は、少なくとも、最長の音声コマンドを比較的ゆっくり発声した場合の全サンプリングデータを記憶可能な容量を有する。マイク１０４ａによるサンプリングレートが１６ＫＨｚであり、１サンプリングにつき２バイト（１６ビット）の音声データを出力する。最長の音声コマンドが仮に５秒であった場合、音声用メモリ２０４２は、約１６０Ｋバイト（≒５×１６×１０００×２）の容量を有する。また、音声用メモリ２０４２は、マイク１０４ａからの音声データで満たされた場合、古い音声データが新たな音声データで上書きされる。この結果、音声用メモリ２０４２は、直近の所定期間（上記例では約５秒）の音声データが保持されることになる。また、音声用メモリ２０４２は、音圧レベル検出部２０４１から音声検出を示す信号を受信したことをトリガにして、マイク１０４ａからの音声データをサンプリングデータ領域に格納していく。 The audio memory 2042 is one of the targets for power supply/cutoff by the power control unit 211 under the control of the central control unit 201 . The audio memory 2042 is a buffer memory that temporarily stores audio data from the microphone 104a. This voice memory 2042 has a capacity capable of storing at least all sampling data when the longest voice command is uttered relatively slowly. The sampling rate of the microphone 104a is 16 KHz, and 2 bytes (16 bits) of audio data are output per sampling. If the longest voice command is 5 seconds, voice memory 2042 has a capacity of approximately 160 Kbytes (≈5×16×1000×2). Also, when the audio memory 2042 is filled with audio data from the microphone 104a, old audio data is overwritten with new audio data. As a result, the audio memory 2042 holds the audio data for the most recent predetermined period (approximately 5 seconds in the above example). Also, the audio memory 2042 stores the audio data from the microphone 104a in the sampling data area, triggered by the reception of the signal indicating the detection of the audio from the sound pressure level detection unit 2041 .

コマンドメモリ２０４６は不揮発性のメモリで構成され、本撮像装置が認識する音声コマンドに係る情報を予め記憶（登録）されている。詳細は後述するが、コマンドメモリ２０４６に格納される音声コマンドの種類は例えば図７に示す通りであり、「起動コマンド」をはじめとして、複数種類のコマンドの情報が格納されている。 A command memory 2046 is composed of a non-volatile memory, and stores (registers) in advance information relating to voice commands recognized by the imaging apparatus. Although the details will be described later, the types of voice commands stored in the command memory 2046 are, for example, as shown in FIG. 7, and information on a plurality of types of commands including "start command" is stored.

音声コマンド認識部２０４３は、中央制御部２０１の制御下での電源制御部２１１による電力供給／遮断の対象の１つである。なお、音声認識そのものは周知であるので、ここでの説明は省略する。この音声コマンド認識部２０４３は、コマンドメモリ２０４６を参照し、音声用メモリ２０４２に格納された音声データの認識処理を行う。そして、音声コマンド認識部２０４３は、マイク１０４ａにより集音した音声データが、音声コマンドであるか否か、並びに、いずれの登録音声コマンドに一致するのかの判定を行う。そして、音声コマンド認識部２０４３は、コマンドメモリ２０４６に記憶されたいずれかの音声コマンドに一致する音声データを検出したとき、いずれのコマンドであるかを示す情報を中央制御部２０１に供給する。また、音声用メモリ２０４２内の、その音声コマンドを決定づけた最初と最後の音声データのアドレス（或いはタイミング）を中央制御部２０１に供給する。 The voice command recognition unit 2043 is one of the targets for power supply/cutoff by the power control unit 211 under the control of the central control unit 201 . It should be noted that speech recognition itself is well known, so the description is omitted here. The voice command recognition unit 2043 refers to the command memory 2046 and performs recognition processing of voice data stored in the voice memory 2042 . Then, the voice command recognition unit 2043 determines whether or not the voice data collected by the microphone 104a is a voice command and whether it matches any registered voice command. When voice data matching any of the voice commands stored in the command memory 2046 is detected, the voice command recognition unit 2043 supplies information indicating which command to the central control unit 201 . Also, it supplies to the central control unit 201 the address (or timing) of the first and last voice data that determined the voice command in the voice memory 2042 .

音方向検出部２０４４は、中央制御部２０１の制御下での電源制御部２１１による電力供給／遮断の対象の１つである。また、音方向検出部２０４４は、２つのマイク１０４ａ，１０４ｂからの音声データに基づき、周期的に音源の存在する方向の検出処理を行う。音方向検出部２０４４は、内部にバッファメモリ２０４４ａを有し、検出した音源方向を表す情報をバッファメモリ２０４４ａに格納する。なお、音方向検出部２０４４による音方向検出処理を行う周期（例えば、１６ｋＨｚ）は、マイク１０４ａのサンプリング周期に対して十分に長くて構わない。ただし、このバッファメモリ２０４４ａは、音声用メモリ２０４２に格納可能な音声データの期間と同じ期間分の音方向情報を記憶するための容量を有するものとする。 The sound direction detection unit 2044 is one of the targets for power supply/cutoff by the power supply control unit 211 under the control of the central control unit 201 . Also, the sound direction detection unit 2044 periodically detects the direction of the sound source based on the audio data from the two microphones 104a and 104b. The sound direction detection unit 2044 has an internal buffer memory 2044a, and stores information representing the detected sound source direction in the buffer memory 2044a. Note that the period (for example, 16 kHz) for performing the sound direction detection processing by the sound direction detection unit 2044 may be sufficiently longer than the sampling period of the microphone 104a. However, it is assumed that this buffer memory 2044a has a capacity for storing sound direction information for the same period as the period of audio data that can be stored in the audio memory 2042 .

動画用音声処理部２０４５は、中央制御部２０１の制御下での電源制御部２１１による電力供給／遮断の対象の１つである。動画用音声処理部２０４５は、２つのマイク１０４ａ，１０４ｂの２つの音声データをステレオ音声データとして入力する。そして、各種フィルタ処理、ウィンドカット、ステレオ感強調、駆動音除去、ＡＬＣ（ＡｕｔｏＬｅｖｅｌＣｏｎｔｒｏｌ）、圧縮処理といった動画音声用の音声処理を行う。詳細は後述する説明から明らかになるが、実施形態ではマイク１０４ａはステレオマイクのＬチャネル用マイク、マイク１０４ｂはＲチャネル用マイクとして機能する。 The video audio processing unit 2045 is one of the targets for power supply/cutoff by the power control unit 211 under the control of the central control unit 201 . The video audio processing unit 2045 inputs two audio data from the two microphones 104a and 104b as stereo audio data. Then, audio processing for moving image audio such as various filter processing, wind cut, stereo enhancement, drive sound removal, ALC (Auto Level Control), and compression processing is performed. Although the details will become clear from the description given later, in the embodiment, the microphone 104a functions as an L-channel microphone of a stereo microphone, and the microphone 104b functions as an R-channel microphone.

なお、図２では、音声入力部１０４の各マイクと音声信号処理部２０４に含まれる各ブロックとの接続は消費電力や回路構成を考慮し、２つのマイクにおける必要最低限で表している。しかし、電力および回路構成の許す限り、複数のマイクを音声信号処理部２０４に含まれる各ブロックで共有して使用しても構わない。また、本実施形態ではマイク１０４ａを基準のマイクとして接続しているが、どのマイクを基準としても構わない。 Note that in FIG. 2, connection between each microphone of the audio input unit 104 and each block included in the audio signal processing unit 204 is represented by the minimum necessary number of two microphones in consideration of power consumption and circuit configuration. However, as long as power and circuit configuration allow, multiple microphones may be shared by each block included in the audio signal processing unit 204 . Further, although the microphone 104a is connected as a reference microphone in this embodiment, any microphone may be used as a reference.

図３（ａ）乃至（ｅ）を参照して、撮像装置１の外観図および使用例を説明する。同図（ａ）は、実施形態に係る撮像装置１の外観の上面及び正面図を示している。撮像装置１の可動撮像部１００は、半球体形であり、水平から垂直方向の約９０度の範囲の切欠き窓を有し、図示矢印Ａが示す水平面にて３６０度に亘って回動可能な第１の筐体部１５０を有する。また、可動撮像部１００は、この切欠き窓に沿って図示の矢印Ｂが示す水平から垂直の範囲内で、レンズ部１０１及び撮像部１０２と一緒に回動可能な第２の筐体部１５１を有する。ここで、第１の筐体部１５０の矢印Ａの回動動作はパン動作、第２の筐体部１５１の矢印Ｂの回動動作はチルト動作に対応し、これらは振動体１１～１３の駆動によって実現している。 With reference to FIGS. 3A to 3E, external views and usage examples of the imaging device 1 will be described. FIG. 1(a) shows a top view and a front view of the appearance of the imaging device 1 according to the embodiment. The movable imaging unit 100 of the imaging device 1 has a hemispherical shape, has a notch window extending from the horizontal to the vertical direction by about 90 degrees, and is rotatable over 360 degrees in the horizontal plane indicated by the arrow A in the figure. It has a first housing part 150 . In addition, the movable imaging unit 100 is provided with a second housing unit 151 that can rotate together with the lens unit 101 and the imaging unit 102 along the cutout window within a horizontal to vertical range indicated by an arrow B in the drawing. have Here, the pivoting motion of the first casing 150 indicated by the arrow A corresponds to the panning motion, and the pivoting motion of the second casing 151 indicated by the arrow B corresponds to the tilting motion. realized by driving.

マイク１０４ａ、１０４ｂは、第２の筐体部１５１のレンズ部１０１及び撮像部１０２のリング部１０２ａの上方側の位置に配置されている。リング部１０２ａは、レンズ部１０１を保護するためのリング状の部材であり、レンズ部１０１の周囲を囲うように設けられている。図示からもわかるように、第２筐体部１５２を固定にした状態で、第１筐体部１５０を矢印Ａに沿ってどの方向にパン動作させたとしても、レンズ部１０１及び撮像部１０２に対する、マイク１０４ａ、１０４ｂの相対的な位置は変わらない。つまり、撮像部１０２の撮像方向に対して左側にマイク１０４ａが常に位置し、右側にマイク１０４ｂが常に位置することになる。それ故、撮像部１０２による撮像して得た画像が表す空間と、マイク１０４ａ，１０４ｂによる取得した音場は一定の関係を維持できることになる。 The microphones 104a and 104b are arranged above the lens section 101 of the second housing section 151 and the ring section 102a of the imaging section 102 . The ring portion 102 a is a ring-shaped member for protecting the lens portion 101 and is provided so as to surround the lens portion 101 . As can be seen from the drawing, even if the first housing unit 150 is panned in any direction along the arrow A while the second housing unit 152 is fixed, the lens unit 101 and the imaging unit 102 are affected. , the relative positions of the microphones 104a, 104b remain unchanged. That is, the microphone 104a is always positioned on the left side of the imaging direction of the imaging unit 102, and the microphone 104b is always positioned on the right side. Therefore, it is possible to maintain a constant relationship between the space represented by the image captured by the imaging unit 102 and the sound field acquired by the microphones 104a and 104b.

なお、実施形態における２つのマイク１０４ａ、１０４ｂは、図３（ａ）に示すように、パン動作の方向を表す仮想平面上に配置されている。また、これら２つのマイクは、図３（ａ）における１つの水平面上に位置するものとするが、多少のずれがあっても構わない。 It should be noted that the two microphones 104a and 104b in the embodiment are arranged on a virtual plane representing the direction of the pan operation, as shown in FIG. 3(a). Also, these two microphones are assumed to be positioned on one horizontal plane in FIG.

マイク１０４ａとマイク１０４ｂとの距離は、１０ｍｍ～３０ｍｍ程度が望ましい。また、図３（ａ）のマイク１０４ａ乃至１０４ｄの配置位置は一例であって、これらの配置方法は、メカ的制約やデザイン制約によって適宜変更しても構わない。 It is desirable that the distance between the microphone 104a and the microphone 104b is approximately 10 mm to 30 mm. Also, the arrangement positions of the microphones 104a to 104d in FIG. 3A are merely examples, and these arrangement methods may be changed as appropriate according to mechanical restrictions and design restrictions.

図３（ｂ）乃至（ｅ）は、実施形態における撮像装置１の利用形態を示している。図３（ｂ）は、机などに撮像装置１に載置される場合で、撮影者自身やその周囲の被写体の撮影を目的とするものである。図３（ｃ）は、撮像装置１を、撮影者の首にぶら下げる例であり、主に、撮影者の行動の前方の撮影を目的とするものである。図３（ｄ）は、撮像装置１を撮影者の肩に固定した使用例であり、図示の場合は、撮影者の周囲の前後、及び、右側の撮影を目的とするものである。そして、図３（ｅ）は、撮像装置１をユーザが持つ棒の端に固定する使用例であり、ユーザが望む所望の撮影位置（高所や手が届かない位置）に撮像装置１を移動させることで、撮影を行うことを目的とするものである。 FIGS. 3B to 3E show usage patterns of the imaging device 1 in the embodiment. FIG. 3(b) shows a case where the imaging apparatus 1 is placed on a desk or the like, for the purpose of photographing the photographer himself or surrounding subjects. FIG. 3(c) shows an example in which the imaging device 1 is hung around the photographer's neck, mainly for the purpose of photographing the front of the photographer's actions. FIG. 3(d) shows an example of use in which the imaging device 1 is fixed to the shoulder of the photographer. FIG. 3(e) shows a usage example in which the imaging device 1 is fixed to the end of a stick held by the user, and the imaging device 1 is moved to a desired shooting position desired by the user (a high place or a position out of reach). The purpose is to perform photography by

図４（ａ）乃至（ｃ）を用いて、本実施形態の撮像装置１のパン、チルトの動作を更に詳しく説明する。ここでは図３（ｂ）のように据え置いた使用例で前提として記載するが、そのほかの使用例においても同様である。 The panning and tilting operations of the imaging apparatus 1 of this embodiment will be described in more detail with reference to FIGS. Here, an example of use in which it is left stationary as shown in FIG.

図４（ａ）はレンズ部１０１が水平を向いている状態を示している。図４（ａ）を初期状態とし、第１筐体部１５０を、上方向から見て反時計回りに９０度パン動作させると、図４（ｂ）のようになる。一方、図４（ａ）の初期状態から、第２筐体部１５１の９０度チルト動作をおこなうと、図４（ｃ）の様になる。第１筐体部１５０、第２筐部体１５１の回動は、先に説明したように、回動制御部２１３により駆動される振動体１１～１３による振動にて実現している。 FIG. 4(a) shows a state in which the lens unit 101 faces horizontally. With FIG. 4(a) as the initial state, when the first housing unit 150 is panned counterclockwise by 90 degrees as viewed from above, the state is as shown in FIG. 4(b). On the other hand, when the second housing unit 151 is tilted by 90 degrees from the initial state shown in FIG. 4A, the state becomes as shown in FIG. 4C. Rotation of the first housing part 150 and the second housing body 151 is realized by vibration of the vibrating bodies 11 to 13 driven by the rotation control part 213, as described above.

次に、実施形態における撮像装置１の中央制御部２０１の処理手順を図５のフローチャートに従って説明する。同図に係る処理は、撮像装置１のメイン電源がＯＮ、もしくはリセットされた場合の中央制御部２０１の処理を示している。 Next, the processing procedure of the central control unit 201 of the imaging apparatus 1 according to the embodiment will be described according to the flowchart of FIG. The process of FIG. 1 shows the process of the central control unit 201 when the main power supply of the imaging apparatus 1 is turned on or reset.

中央制御部２０１は、ステップＳ１０１にて、撮像装置１の初期化処理を行う。この初期化処理にて、中央制御部２０１は、現在の可動撮像部１００の撮像部１０２の撮像方向における、水平面内の方向成分をパン動作の基準角度（０度）として決定する。なお、可動撮像部１００の撮像部１０２の撮像方向は、レンズ１０１の光軸方向（主軸方向）とも言い換えられる。 The central control unit 201 performs initialization processing of the imaging device 1 in step S101. In this initialization process, the central control unit 201 determines the direction component in the horizontal plane in the current imaging direction of the imaging unit 102 of the movable imaging unit 100 as the reference angle (0 degree) for the pan operation. The imaging direction of the imaging unit 102 of the movable imaging unit 100 can also be called the optical axis direction (main axis direction) of the lens 101 .

これ以降、可動撮像部１００のパン動作を行った後の撮像方向のうち水平面の成分は、この基準角度からの相対的な角度で表されるものとする。また、音方向検出部２０４４が検出する音源方向のうちの水平面の成分は、水平面における現在の可動撮像部１００の２つのマイクの位置を結ぶ直線を基準方向とし、当該基準方向に対する相対的な角度で表されるものとする。また、詳細は後述するが、音方向検出部２０４４は、撮像装置１の真上の方向（パン動作の回転軸の軸方向）に音源があるか否かの判定も行う。 From now on, the horizontal component of the imaging direction after the pan operation of the movable imaging unit 100 is represented by a relative angle from this reference angle. In addition, the horizontal component of the sound source direction detected by the sound direction detection unit 2044 is a relative angle with respect to the reference direction, which is a straight line connecting the current positions of the two microphones of the movable imaging unit 100 on the horizontal plane. shall be represented by Further, although the details will be described later, the sound direction detection unit 2044 also determines whether or not there is a sound source in the direction directly above the imaging device 1 (the axial direction of the rotation axis of the panning operation).

なお、この段階で、音声用メモリ２０４２、音方向検出部２０４４、動画用音声処理部２０４５、並び、マイク１０４ｂ乃至１０４への電力は遮断されている。 At this stage, power to the audio memory 2042, the sound direction detector 2044, the moving image audio processor 2045, and the microphones 104b to 104 is cut off.

初期化処理を終えると中央制御部２０１は、ステップＳ１０２にて、電源制御部２１１を制御して、音圧レベル検出部２０４１、マイク１０４ａへの電力の供給を開始する。この結果、音圧レベル検出部２０４１は、マイク１０４ａでサンプリングされた音声データに基づく音圧検出処理を実行し、予め設定された閾値を超える音圧レベルの音声データを検出したときにその旨を中央制御部に通知することになる。なお、この閾値は、例えば６０ｄＢＳＰＬ（ＳｏｕｎｄＰｒｅｓｓｕｒｅＬｅｖｅｌ）とするが、撮像装置１が環境等に応じて変更しても良いし、必要な周波数帯域だけに絞るようにしても良い。 After completing the initialization process, the central control unit 201 controls the power supply control unit 211 to start supplying power to the sound pressure level detection unit 2041 and the microphone 104a in step S102. As a result, the sound pressure level detection unit 2041 executes sound pressure detection processing based on the sound data sampled by the microphone 104a, and notifies when sound data with a sound pressure level exceeding a preset threshold is detected. It will notify the central control unit. This threshold value is, for example, 60 dB SPL (Sound Pressure Level), but the imaging device 1 may change it according to the environment or the like, or narrow it down to only the necessary frequency band.

中央制御部２０１は、ステップＳ１０３にて、音圧レベル検出部２０４１による閾値を超える音圧を表す音声データが検出されるのを待つ。閾値を超える音圧の音声データが検出されると、ステップＳ１０４にて、音声用メモリ２０４２はマイク１０４ａからの音声データの受信、格納処理を開始する。 In step S103, the central control unit 201 waits until the sound pressure level detection unit 2041 detects sound data representing a sound pressure exceeding the threshold. When audio data having a sound pressure exceeding the threshold is detected, in step S104, the audio memory 2042 starts receiving and storing audio data from the microphone 104a.

また、ステップＳ１０５にて、中央制御部２０１は、電源制御部２１１を制御し、音声コマンド認識部２０４３への電力供給を開始する。この結果、音声コマンド認識部２０４３は、コマンドメモリ２０４６を参照した音声用メモリ２０４２に格納されていく音声データの認識処理を開始する。そして、音声コマンド認識部２０４３は、音声用メモリ２０４２に格納された音声データの認識処理を行う。コマンドメモリ２０４６内のいずれかの音声コマンドと一致すると音声コマンドを認識した場合、その認識された音声コマンドを特定する情報を中央制御部２０１に通知する。さらに、音声用メモリ２０４２内の、認識した音声コマンドを決定づけた最初と最後の音声データのアドレス（或いはタイミング）情報とを含む情報を中央制御部２０１に通知することになる。 Further, in step S105, the central control unit 201 controls the power control unit 211 to start supplying power to the voice command recognition unit 2043. FIG. As a result, the voice command recognition unit 2043 starts recognizing voice data stored in the voice memory 2042 with reference to the command memory 2046 . The voice command recognition unit 2043 then recognizes the voice data stored in the voice memory 2042 . When the voice command is recognized when it matches any of the voice commands in the command memory 2046, the central control unit 201 is notified of information specifying the recognized voice command. Furthermore, the central control unit 201 is notified of information including the address (or timing) information of the first and last voice data that determined the recognized voice command in the voice memory 2042 .

ステップＳ１０６にて、中央制御部２０１は、電源制御部２１１を制御し、音方向検出部２０４４、マイク１０４ｂへの電力供給を開始する。この結果、音方向検出部２０４４は、２つのマイク１０４ａ，１０４ｂからの同時刻の音声データに基づく、音源方向の検出処理を開始することになる。音源の方向の検出処理は、所定周期で行われる。そして、音方向検出部２０４４は、検出した音方向を示す音方向情報を、内部のバッファメモリ２０４４ａに格納していく。このとき、音方向検出部２０４４は、音方向情報の決定に利用した音声データのタイミングが、音声用メモリ２０４２に格納された音声データのどのタイミングであったのかを対応付くように、バッファメモリ２０４４ａに格納する。典型的には、バッファメモリ２０４４ａに格納するのは、音方向と、音声用メモリ２０４２内の音声データのアドレスとすれば良い。なお、先に説明したように、水平面における現在の可動撮像部１００の２つのマイクの位置を結ぶ直線を基準方向とし、当該基準方向に対する音源の方向との差を表す検出角度が、音方向情報に含まれる。 In step S106, the central control unit 201 controls the power supply control unit 211 to start supplying power to the sound direction detection unit 2044 and the microphone 104b. As a result, the sound direction detection unit 2044 starts detection processing of the sound source direction based on the audio data from the two microphones 104a and 104b at the same time. The sound source direction detection processing is performed at a predetermined cycle. Then, the sound direction detection unit 2044 stores sound direction information indicating the detected sound direction in the internal buffer memory 2044a. At this time, the sound direction detection unit 2044 stores the timing of the sound data stored in the sound memory 2042 in the buffer memory 2044a so as to associate the timing of the sound data used for determining the sound direction information with the sound data stored in the sound memory 2042. store in Typically, what is stored in the buffer memory 2044a is the direction of sound and the address of the audio data in the memory 2042 for audio. As described above, the straight line connecting the current positions of the two microphones of the movable imaging unit 100 on the horizontal plane is taken as the reference direction, and the detection angle representing the difference between the direction of the sound source and the reference direction is the sound direction information. include.

ステップＳ１０７にて、中央制御部２０１は、電源制御部２１１を制御し、撮像部１０２、及び、レンズアクチュエータ制御部１０３への電力供給を開始する。この結果、可動撮像部１００は、撮像装置としての機能し始めることになる。 In step S<b>107 , the central control unit 201 controls the power control unit 211 to start supplying power to the imaging unit 102 and the lens actuator control unit 103 . As a result, the movable imaging section 100 starts functioning as an imaging device.

ステップＳ１０８にて、中央制御部２０１は、音声コマンド認識部２０４３から、音声コマンドが認識されたことを示す情報を受信したか否かを判定する。否の場合、中央制御部２０１は、処理をステップＳ１１０に進め、音声コマンド認識部２０４３を起動させてからの経過時間が、予め設定された閾値を超えたか否かを判定する。そして、経過時間が閾値以内である限り、中央制御部２０１は、音声コマンド認識部２０４３による音声コマンドが認識されるのを待つ。そして、閾値が示す時間が経過しても、音声コマンド認識部２０４３が音声コマンドを認識しなかった場合、中央制御部２０１は処理をステップＳ１１１に進める。このステップＳ１１１にて、中央制御部２０１は、電源制御部２１１を制御して音声コマンド認識部２０４３への電力を遮断する。そして、中央制御部２０１は、処理をステップＳ１０２に戻す。 In step S108, the central control unit 201 determines whether information indicating that the voice command has been recognized has been received from the voice command recognition unit 2043. If not, the central control unit 201 advances the process to step S110, and determines whether or not the elapsed time since the activation of the voice command recognition unit 2043 has exceeded a preset threshold. As long as the elapsed time is within the threshold, the central control unit 201 waits for the voice command recognition unit 2043 to recognize the voice command. If the voice command recognition unit 2043 does not recognize the voice command even after the time indicated by the threshold has passed, the central control unit 201 advances the process to step S111. In this step S111, the central control unit 201 controls the power supply control unit 211 to cut off power to the voice command recognition unit 2043. FIG. Then, the central control unit 201 returns the process to step S102.

一方、中央制御部２０１が、音声コマンド認識部２０４３から、音声コマンドが認識されたことを示す情報を受信した場合、処理をステップＳ１０９に進める。このステップＳ１０９にて、中央制御部２０１は、認識された音声コマンドが、図７に示される起動コマンドに対応するか否かを判定する。そして、認識された音声コマンドが起動コマンド以外のコマンドであると判定した場合、中央制御部２０１は処理をステップＳ１１０に進める。また、認識された音声コマンドが起動コマンドであった場合、中央制御部２０１は処理をステップＳ１０９からステップＳ１１２に進める。 On the other hand, when the central control unit 201 receives information indicating that the voice command has been recognized from the voice command recognition unit 2043, the process proceeds to step S109. At step S109, the central control unit 201 determines whether or not the recognized voice command corresponds to the activation command shown in FIG. Then, if it is determined that the recognized voice command is a command other than the activation command, the central control unit 201 advances the process to step S110. Also, if the recognized voice command is the activation command, the central control unit 201 advances the process from step S109 to step S112.

中央制御部２０１は、ステップＳ１１２にて、音声コマンド認識部２０４３で認識された音声コマンドに同期する音方向情報を、音方向検出部２０４４のバッファメモリ２０４４ａから取得する。音声コマンド認識部２０４３は、先に説明したように、ステップＳ１０８にて音声コマンドを認識したとき、音声用メモリ２０４２内の音声コマンドを表す先頭と終端を表す２つのアドレスを中央制御部２０１に通知する。そこで、中央制御部２０１は、ステップＳ１０８にて認識された音声コマンドに同期した音方向として、この２つのアドレスが示す期間内で検出した音方向情報をバッファメモリ２０４４ａから取得する。２つのアドレスが示す期間内に複数の音方向情報が存在することもある。その場合、中央制御部２０１はその中の時間的に最も後の音方向情報をバッファメモリ２０４４ａから取得する。時間的に後の音方向情報の方が、その音声コマンドを発した人物（音源）の現在の位置を表している蓋然性が高いからである。 The central control unit 201 acquires sound direction information synchronized with the voice command recognized by the voice command recognition unit 2043 from the buffer memory 2044a of the sound direction detection unit 2044 in step S112. As described above, when the voice command recognition unit 2043 recognizes the voice command in step S108, it notifies the central control unit 201 of two addresses representing the start and end of the voice command in the voice memory 2042. do. Therefore, the central control unit 201 acquires sound direction information detected within the period indicated by these two addresses from the buffer memory 2044a as the sound direction synchronized with the voice command recognized in step S108. A plurality of sound direction information may exist within the period indicated by two addresses. In that case, the central control unit 201 acquires the temporally latest sound direction information among them from the buffer memory 2044a. This is because the later sound direction information has a higher probability of representing the current position of the person (sound source) who issued the voice command.

次に、Ｓ１１３にて、中央制御部２０１は、回動制御部２１３を制御して、可動撮像部１００のパン動作を行い、現在の撮像部１０２の撮像方向（光軸方向）の水平面の角度を、所定の角度だけ回転させる。所定の角度とは、例えば、３０度又は９０度等、０度より大きく９０度以下の任意の角度である。 Next, in S113, the central control unit 201 controls the rotation control unit 213 to pan the movable imaging unit 100, and changes the horizontal plane angle of the current imaging direction (optical axis direction) of the imaging unit 102. is rotated by a given angle. The predetermined angle is an arbitrary angle greater than 0 degrees and less than or equal to 90 degrees, such as 30 degrees or 90 degrees.

次に、ステップＳ１１４にて、中央制御部２０１は、音声コマンド認識部２０４３から、新たな音声コマンドが認識されたことを示す情報を受信したか否かを判定する。否の場合、中央制御部２０１は、処理をステップＳ１１５に進め、現在、ユーザからの指示に従った実行中のジョブがあるか否かを判定する。有の場合は、Ｓ１１４に戻り、否の場合は、Ｓ１１６に進む。詳細は図６のフローチャートの説明から明らかになるが、動画撮影記録や追尾処理等が上記ジョブに相当する。ここでは、そのような実行中のジョブは存在しないものとして説明を続ける。 Next, in step S114, the central control unit 201 determines whether information indicating that a new voice command has been recognized has been received from the voice command recognition unit 2043. If not, the central control unit 201 advances the process to step S115 and determines whether or not there is a job currently being executed according to the instruction from the user. If yes, go back to S114, otherwise go to S116. Although the details will become clear from the description of the flowchart in FIG. 6, moving image recording, tracking processing, and the like correspond to the jobs described above. Here, the explanation is continued assuming that such a job in execution does not exist.

ステップＳ１１６にて、前回の音声コマンドを認識してからの経過時間が、予め設定された閾値を超えるか否かを判定する。否の場合、中央制御部２０１は処理をステップＳ１１４に戻し、音声コマンドの認識を待つ。そして、実行中のジョブが無く、且つ、前回の音声コマンドを認識してから閾値を超える時間が経過しても、更なる音声コマンドが認識されない場合、中央制御部２０１は処理をステップＳ１１７に進める。このステップＳ１１７にて、中央制御部２０１は、電源制御部２１１を制御し、撮像部１０２、レンズアクチュエータ１０３への電力を遮断する。そして、中央制御部２０１は、ステップＳ１１８にて、電源制御部２１１を制御し、音方向検出部２０４４への電力も遮断し、処理をステップＳ１０８に戻す。 In step S116, it is determined whether or not the elapsed time since recognition of the previous voice command exceeds a preset threshold. If not, the central control unit 201 returns the process to step S114 and waits for recognition of the voice command. Then, if there is no job being executed and no further voice command is recognized even after the time exceeding the threshold has passed since the recognition of the previous voice command, the central control unit 201 advances the process to step S117. . In step S<b>117 , the central control unit 201 controls the power supply control unit 211 to cut off power to the imaging unit 102 and the lens actuator 103 . Then, in step S118, the central control unit 201 controls the power supply control unit 211, cuts off the power to the sound direction detection unit 2044, and returns the process to step S108.

さて、中央制御部２０１が音声コマンド認識部２０４３から新たな音声コマンドが認識されたことを示す情報を受信したとする。この場合、音声コマンド認識部２０４３は、処理をステップＳ１１４からステップＳ１１９に進める。 Assume that the central control unit 201 receives information from the voice command recognition unit 2043 indicating that a new voice command has been recognized. In this case, the voice command recognition unit 2043 advances the process from step S114 to step S119.

中央制御部２０１は、ステップＳ１１９にて、音声コマンド認識部２０４３で認識された音声コマンドに同期する音方向情報を、音方向検出部２０４４のバッファメモリ２０４４ａから取得する。音声コマンド認識部２０４３は、先に説明したように、ステップＳ１１４にて音声コマンドを認識したとき、音声用メモリ２０４２内の音声コマンドを表す先頭と終端を表す２つのアドレスを中央制御部２０１に通知する。そこで、中央制御部２０１は、ステップＳ１１４にて認識された音声コマンドに同期した音方向として、この２つのアドレスが示す期間内で検出した音方向情報をバッファメモリ２０４４ａから取得する。 The central control unit 201 acquires sound direction information synchronized with the voice command recognized by the voice command recognition unit 2043 from the buffer memory 2044a of the sound direction detection unit 2044 in step S119. As described above, when the voice command recognition unit 2043 recognizes the voice command in step S114, it notifies the central control unit 201 of two addresses representing the beginning and end of the voice command in the voice memory 2042. do. Therefore, the central control unit 201 acquires the sound direction information detected within the period indicated by these two addresses from the buffer memory 2044a as the sound direction synchronized with the voice command recognized in step S114.

次に、ステップＳ１２０にて、中央制御部２０１は、音源の音方向を特定する音方向特定処理を行う。具体的には、ステップＳ１１２で取得した音方向と、ステップＳ１１９で取得した音方向とに基づいて、音源の音方向を特定し、特定した音方向を音方向特定処理の結果として内部メモリに記憶する。音方向特定処理の詳細は、後で図１０を用いて説明する。なお、中央制御部２０１の代わりに、音方向検出部２０４４が音方向特定処理を行ってもよい。 Next, in step S120, the central control unit 201 performs sound direction specifying processing for specifying the sound direction of the sound source. Specifically, the sound direction of the sound source is specified based on the sound direction acquired in step S112 and the sound direction acquired in step S119, and the specified sound direction is stored in the internal memory as a result of sound direction specifying processing. do. Details of the sound direction specifying process will be described later with reference to FIG. Note that the sound direction detection unit 2044 may perform the sound direction identification process instead of the central control unit 201 .

次に、ステップＳ１２１にて、中央制御部２０１は、回動制御部２１３を制御して、可動撮像部１００のパン動作を行い、現在の撮像部１０２の撮像方向（光軸方向）の水平面の角度を、特定した音源の音方向の水平面の角度に一致させる。 Next, in step S121, the central control unit 201 controls the rotation control unit 213 to pan the movable imaging unit 100 so that the horizontal plane in the current imaging direction (optical axis direction) of the imaging unit 102 The angle is matched to the angle of the horizontal plane of the sound direction of the identified sound source.

続いて、ステップＳ１２２にて、中央制御部２０１は、映像信号処理部２０３から撮像画像を受信し、撮像画像内に音声発生原となるオブジェクト（顔）が存在するか否かを画像認識処理により判定する。否の場合、中央制御部２０１は処理をステップＳ１２３に進め、回動制御部２１３を制御して、目標とするチルト角に向かって予め設定された角度だけ可動撮像部１００のチルト動作を行う。そして、ステップＳ１２４にて、中央制御部２０１は、撮像部１０２の撮像方向のチルト角が、チルト動作の上限（実施形態では水平方向に対して９０度）に到達したか否かを判定する。否の場合には、中央制御部２０１は処理をステップＳ１２２に戻す。こうして、中央制御部２０１は、チルト動作を行いながら、映像信号処理部２０３からの撮像画像の画角内に音声発生原となるオブジェクト（顔）が存在するか否かを画像認識処理により判定していく。そして、撮像部１０２の撮像方向のチルト角がチルトの上限に到達してもオブジェクトが検出されない場合、中央制御部２０１は処理をステップＳ１２４からステップＳ１１４に戻す。一方、撮像画像の画角内にオブジェクトが存在した場合、中央制御部２０１は処理をステップＳ１２５に進め、ステップＳ１１４で認識した音声コマンドに対応するジョブを実行する。 Subsequently, in step S122, the central control unit 201 receives the captured image from the video signal processing unit 203, and performs image recognition processing to determine whether or not an object (face) serving as a source of sound generation exists in the captured image. judge. If not, the central control unit 201 advances the process to step S123, controls the rotation control unit 213, and tilts the movable imaging unit 100 by a preset angle toward the target tilt angle. Then, in step S124, the central control unit 201 determines whether or not the tilt angle of the imaging direction of the imaging unit 102 has reached the upper limit of the tilt operation (90 degrees with respect to the horizontal direction in the embodiment). If not, the central control unit 201 returns the process to step S122. In this manner, the central control unit 201 performs image recognition processing to determine whether or not an object (face), which is the source of sound generation, exists within the angle of view of the captured image from the video signal processing unit 203 while performing the tilting operation. To go. If the tilt angle of the imaging direction of the imaging unit 102 reaches the tilt upper limit and no object is detected, the central control unit 201 returns the process from step S124 to step S114. On the other hand, if an object exists within the angle of view of the captured image, the central control unit 201 advances the process to step S125 and executes the job corresponding to the voice command recognized in step S114.

次に、図６のフローチャート、並びに、図７に示す音声コマンドテーブルに基づいて、ステップＳ１２５の処理の詳細を説明する。図７の音声コマンドテーブルに示される“Ｈｉ，Ｃａｍｅｒａ”等の音声コマンドに対応する音声パターンデータはコマンドメモリ２０４６に格納されるものである。なお、図７には音声コマンドとして代表的な数例示しているが、この数に特に制限はない。また、以下の説明における音声コマンドは、図５のステップＳ１１４のタイミングで検出された音声コマンドである点に注意されたい。 Next, details of the processing in step S125 will be described based on the flowchart in FIG. 6 and the voice command table shown in FIG. The command memory 2046 stores voice pattern data corresponding to voice commands such as "Hi, Camera" shown in the voice command table of FIG. FIG. 7 exemplifies a typical number of voice commands, but there is no particular limit to this number. Also, note that the voice command in the following description is the voice command detected at the timing of step S114 in FIG.

まず、ステップＳ２０１にて、中央制御部２０１は、音声コマンドが、起動コマンドであるか否かを判定する。 First, in step S201, the central control unit 201 determines whether or not the voice command is the activation command.

この起動コマンドは、撮像装置１に対し、撮像可能な状態に遷移させる音声コマンドである。この起動コマンドは、図５のステップＳ１０８で判定されるコマンドであり、撮像に係るジョブではない。よって、中央制御部２０１は、認識した音声コマンドが起動コマンドである場合には、そのコマンドについては無視し、処理をステップＳ１１４に戻す。 This activation command is a voice command that causes the imaging apparatus 1 to transition to an imaging-ready state. This activation command is a command determined in step S108 of FIG. 5, and is not a job related to imaging. Therefore, when the recognized voice command is the activation command, the central control unit 201 ignores the command and returns the process to step S114.

ステップＳ２０２にて、中央制御部２０１は、音声コマンドが、停止コマンドであるか否かを判定する。この停止コマンドは、一連の撮像可の状態から、起動コマンドの入力を待つ状態に遷移させるコマンドである。よって、中央制御部２０１は、認識した音声コマンドが停止コマンドである場合には、処理をステップＳ２１１に進める。ステップＳ２１１にて、中央制御部２０１は、電源制御部２１１を制御し、既に起動している撮像部１０２、音方向検出部２０４４、音声コマンド認識部２０４３、動画用音声処理部２０４５、マイク１０４ｂ乃至１０４ｄ等への電力を遮断し、これらを停止する。そして、中央制御部２０１は、処理を起動時のステップＳ１０３に戻す。 At step S202, the central control unit 201 determines whether the voice command is a stop command. This stop command is a command that causes a transition from a series of imaging enabled states to a state that waits for input of a start command. Therefore, when the recognized voice command is the stop command, the central control unit 201 advances the process to step S211. In step S211, the central control unit 201 controls the power supply control unit 211, and controls the image pickup unit 102, the sound direction detection unit 2044, the voice command recognition unit 2043, the moving image sound processing unit 2045, the microphone 104b to the microphone 104b to 104d etc. to shut them down. Then, the central control unit 201 returns the process to step S103 at the start.

ステップＳ２０３にて、中央制御部２０１は、音声コマンドが静止画撮影コマンドであるか否かを判定する。この静止画撮影コマンドは、撮像装置１に対して１枚の静止画の撮影・記録ジョブの実行の要求を行うコマンドである。よって、中央制御部２０１は、音声コマンドが静止画撮影コマンドであると判定した場合、処理をステップＳ２１２に進める。ステップＳ２１２にて、中央制御部２０１は、撮像部１０２で撮像した１枚の静止画像データを例えばＪＰＥＧファイルとして、記憶部２０６に格納する。なお、この静止画撮影コマンドのジョブが、１枚の静止画撮影記録により完結するので、先に説明した図５のステップＳ１１５で判定する対象のジョブとはならない。 In step S203, the central control unit 201 determines whether the voice command is a still image shooting command. This still image shooting command is a command for requesting the imaging apparatus 1 to perform a single still image shooting/recording job. Therefore, when the central control unit 201 determines that the voice command is the still image shooting command, the process proceeds to step S212. In step S212, the central control unit 201 stores the single still image data captured by the imaging unit 102 in the storage unit 206 as, for example, a JPEG file. Note that the job of this still image shooting command is completed by shooting and recording one still image, so it is not a job to be judged in step S115 of FIG. 5 described above.

ステップＳ２０４にて、中央制御部２０１は、音声コマンドが動画撮影コマンドであるか否かを判定する。動画撮影コマンドは、撮像装置１に対して音声付の動画像の撮像と記録を要求するコマンドである。中央制御部２０１は、音声コマンドが動画撮影コマンドであると判定した場合、処理をステップＳ２１３に進める。このステップＳ２１３にて、中央制御部２０１は、撮像部１０２による動画像の撮影と記録を開始し、処理をステップＳ１１４に戻す。実施形態では、撮像した動画像は記憶部２０６に格納されるものとするが、外部入出力端子部２０８を介してネットワーク上のファイルサーバに送信しても構わない。動画撮影コマンドは、動画像の撮像、記録を継続させるコマンドであるので、このコマンドによるジョブは、先に説明したステップＳ１１５で判定する対象のジョブとなる。 In step S204, the central control unit 201 determines whether the voice command is a moving image shooting command. The moving image shooting command is a command for requesting the imaging device 1 to shoot and record a moving image with sound. When the central control unit 201 determines that the voice command is the moving image shooting command, the process proceeds to step S213. In this step S213, the central control unit 201 starts capturing and recording moving images by the imaging unit 102, and returns the process to step S114. In the embodiment, captured moving images are stored in the storage unit 206, but may be transmitted to a file server on the network via the external input/output terminal unit 208. FIG. Since the moving image shooting command is a command to continue capturing and recording moving images, a job based on this command is a job to be judged in step S115 described above.

ステップＳ２０５にて、中央制御部２０１は、音声コマンドが動画撮影終了コマンドであるか否かを判定する。中央制御部２０１は、音声コマンドが動画撮影終了コマンドであり、尚且つ、現に動画像の撮像・記録中である場合には、その記録（ジョブ）を終了する。そして、中央制御部２０１は処理をステップＳ１１４に戻す。 In step S205, the central control unit 201 determines whether or not the voice command is a moving image shooting end command. If the voice command is a moving image shooting end command and the moving image is currently being shot/recorded, the central control unit 201 ends the recording (job). Then, the central control unit 201 returns the process to step S114.

ステップＳ２０６にて、中央制御部２０１は、音声コマンドが追尾コマンドであるか否かを判定する。追尾コマンドは、撮像装置１に対して、撮像部１０２の撮像方向に、ユーザを継続して位置させることを要求するコマンドである。中央制御部２０１は、音声コマンドが追尾コマンドであると判定した場合、処理をステップＳ２１５に進める。そして、ステップＳ２１５にて、中央制御部２０１は、映像信号処理部２０３で得られた映像の中心位置にオブジェクトが位置し続けるように、回動制御部２１３の制御を開始する。そして、中央制御部２０１は処理をステップＳ１１４に戻す。この結果、可動撮像部１００がパン動作、或いはチルト動作を行い、移動するユーザを追尾することになる。ただし、ユーザを追尾するものの、撮像した画像の記録は行わない。また、追尾している間は、先に説明した図５のステップＳ１１５で判定する対象のジョブとなる。そして、追尾終了コマンドを受信して初めて、中央制御部２０１はこの動画像の撮影記録を終了する。なお、追尾中に、例えば静止画撮影コマンドや動画撮影コマンドのジョブを実行しても構わない。 In step S206, central control unit 201 determines whether the voice command is a tracking command. The tracking command is a command that requests the imaging device 1 to continuously position the user in the imaging direction of the imaging unit 102 . When the central control unit 201 determines that the voice command is the tracking command, the process proceeds to step S215. In step S<b>215 , the central control unit 201 starts controlling the rotation control unit 213 so that the object continues to be positioned at the center position of the image obtained by the image signal processing unit 203 . Then, the central control unit 201 returns the process to step S114. As a result, the movable imaging unit 100 performs a pan operation or a tilt operation to track the moving user. However, although the user is tracked, the captured image is not recorded. Further, while tracking is being performed, the job is the target job for determination in step S115 of FIG. 5 described above. Only after receiving the tracking end command, the central control unit 201 ends the shooting and recording of this moving image. During tracking, for example, a still image shooting command job or a moving image shooting command job may be executed.

ステップＳ２０７にて、中央制御部２０１は、音声コマンドが追尾終了コマンドであるか否かを判定する。中央制御部２０１は、音声コマンドが追尾終了コマンドであり、尚且つ、現に追尾中である場合には、その記録（ジョブ）を終了する。そして、中央制御部２０１は処理をステップＳ１１４に戻す。 In step S207, the central control unit 201 determines whether the voice command is a tracking end command. If the voice command is a tracking end command and tracking is in progress, the central control unit 201 ends the recording (job). Then, the central control unit 201 returns the process to step S114.

以上であるが、上記以外の音声コマンドについては、ステップＳ２０７以降で実行されるが、ここでの説明は省略する。 As described above, voice commands other than the above are executed after step S207, but the description thereof is omitted here.

ここで、実施形態における撮像装置１におけるメイン電源ＯＮからの処理のシーケンスの一例を図８に示すタイミングチャートに従って説明する。 Here, an example of the processing sequence from when the main power supply is turned on in the imaging apparatus 1 according to the embodiment will be described with reference to the timing chart shown in FIG.

撮像装置１のメイン電源がＯＮになると、音圧レベル検出部２０４１はマイク１０４ａからの音声データの音圧レベルの検出処理を開始する。タイミングＴ６０１にて、ユーザは、起動コマンド“Ｈｉ，Ｃａｍｅｒａ”の発声を開始したとする。この結果、音圧レベル検出部２０４１が閾値以上の音圧を検出する。そして、これがトリガになって、タイミングＴ６０２にて、音声用メモリ２０４２がマイク１０４ａからの音声データの格納を開始し、音声コマンド認識部２０４３が音声コマンドの認識を開始する。また、上記トリガに応じて、タイミングＴ６０２にて、中央制御部２０１は、音方向検出部２０４４に電力供給を開始するとともに、撮像部１０２への電力供給も開始する。ユーザが起動コマンド“Ｈｉ，Ｃａｍｅｒａ”の発声を終えると、タイミングＴ６０４にて、音声コマンド認識部２０４３がその音声コマンドを認識し、且つ、認識した音声コマンドが起動コマンドであることを特定する。音声コマンド認識部２０４３は、音声用メモリ２０４２内の“Ｈｉ，Ｃａｍｅｒａ”を表す音声データの先頭と終端のアドレスと、認識結果を中央制御部２０１に通知する。中央制御部２０１は、受信した先頭と終端のアドレスが表す範囲を有効範囲として決定する。中央制御部２０１は、音方向検出部２０４４のバッファメモリ２０４４ａ内の有効範囲内から、タイミングＴ６０４～Ｔ６０５にて音声コマンド認識部２０４３で認識された音声コマンドに同期する音方向情報を取得する。 When the main power supply of the imaging device 1 is turned on, the sound pressure level detection unit 2041 starts detection processing of the sound pressure level of the audio data from the microphone 104a. At timing T601, it is assumed that the user starts uttering the activation command "Hi, Camera". As a result, the sound pressure level detector 2041 detects sound pressure equal to or higher than the threshold. Triggered by this, at timing T602, the voice memory 2042 starts storing voice data from the microphone 104a, and the voice command recognition unit 2043 starts recognizing voice commands. Further, in response to the trigger, at timing T602, the central control unit 201 starts supplying power to the sound direction detecting unit 2044 and also to the imaging unit 102 . When the user finishes uttering the activation command "Hi, Camera", at timing T604, the voice command recognition unit 2043 recognizes the voice command and specifies that the recognized voice command is the activation command. The voice command recognition unit 2043 notifies the central control unit 201 of the start and end addresses of the voice data representing "Hi, Camera" in the voice memory 2042 and the recognition result. The central control unit 201 determines the range indicated by the received leading and trailing addresses as the valid range. The central control unit 201 acquires sound direction information synchronized with the voice command recognized by the voice command recognition unit 2043 at timings T604 and T605 from within the effective range in the buffer memory 2044a of the sound direction detection unit 2044. FIG.

中央制御部２０１は、この起動コマンドが認識されたことをトリガにして、タイミングＴ６０５にて、回動制御部２１３を制御して、可動撮像部１００のパン動作を開始し、所定の角度（例えば、３０度又は９０度等の任意の角度）だけ回転させる。 Triggered by the recognition of this activation command, the central control unit 201 controls the rotation control unit 213 at timing T605 to start the panning operation of the movable imaging unit 100 to a predetermined angle (for example, , any angle such as 30 degrees or 90 degrees).

ユーザは、タイミングＴ６０６にて、例えば“Ｍｏｖｉｅｓｔａｒｔ”の発声を開始したとする。この場合、発生の開始のタイミングの音声データは、タイミングＴ６０７から順に音声用メモリ２０４２に格納されていく。そして、タイミングＴ６０８にて、音声コマンド認識部２０４３が、音声データを“Ｍｏｖｉｅｓｔａｒｔ”を表す音声コマンドとして認識する。音声コマンド認識部２０４３は、音声用メモリ２０４２内の“Ｍｏｖｉｅｓｔａｒｔ”を表す音声データの先頭と終端のアドレスと、認識結果を中央制御部２０１に通知する。中央制御部２０１は、受信した先頭と終端のアドレスが表す範囲を有効範囲として決定する。そして、中央制御部２０１は、音方向検出部２０４４のバッファメモリ２０４４ａ内の有効範囲内から、タイミングＴ６０８～Ｔ６０９にて音声コマンド認識部２０４３で認識された音声コマンドに同期する音方向情報を取得する。そして、中央制御部２０１は、タイミングＴ６０４～Ｔ６０５にて認識された音声コマンドに同期する音方向情報と、タイミングＴ６０８～Ｔ６０９にて認識された音声コマンドに同期する音方向情報とに基づいて、音源の音方向を特定する。音源の音方向を特定する音方向特定処理の詳細は、後で図１０を用いて説明する。そして、タイミングＴ６０９にて、特定した音源の音方向に基づいて、回動制御部２１３を制御して、可動撮像部１００のパン動作、チルト動作を開始する。 Assume that the user starts uttering, for example, "Movie start" at timing T606. In this case, the audio data at the start timing of generation are sequentially stored in the audio memory 2042 from timing T607. Then, at timing T608, the voice command recognition unit 2043 recognizes the voice data as a voice command representing "Movie start". The voice command recognition unit 2043 notifies the central control unit 201 of the start and end addresses of the voice data representing “Movie start” in the voice memory 2042 and the recognition result. The central control unit 201 determines the range indicated by the received leading and trailing addresses as the valid range. Then, the central control unit 201 acquires sound direction information synchronized with the voice command recognized by the voice command recognition unit 2043 at timings T608 to T609 from within the effective range in the buffer memory 2044a of the sound direction detection unit 2044. . Based on the sound direction information synchronized with the voice commands recognized at timings T604 and T605 and the sound direction information synchronized with the voice commands recognized at timings T608 and T609, the central control unit 201 selects the sound source. identify the sound direction of Details of the sound direction identifying process for identifying the sound direction of the sound source will be described later with reference to FIG. Then, at timing T609, the rotation control unit 213 is controlled based on the identified sound direction of the sound source, and the pan operation and tilt operation of the movable imaging unit 100 are started.

可動撮像部１００のパン動作、チルト動作中に、タイミングＴ６１０にて、撮像部１０２で撮像画像に被写体（オブジェクト；顔）を検出したとする。すると、中央制御部２０１はパン動作、チルト動作を停止する（タイミングＴ６１１）。また、タイミングＴ６１２にて、中央制御部２０１は、動画用音声処理部２０４５に電力を供給して、マイク１０４ａ、及び、１０４ｂによるステレオ音声の収音状態にする。そして、中央制御部２０１は、タイミングＴ６１３にて、音声付の動画像の撮像と記録を開始する。 Assume that the imaging unit 102 detects a subject (object; face) in the captured image at timing T610 during panning and tilting operations of the movable imaging unit 100 . Then, the central control unit 201 stops panning and tilting (timing T611). Also, at timing T612, the central control unit 201 supplies power to the moving image sound processing unit 2045 to bring the microphones 104a and 104b into a stereo sound pickup state. Then, at timing T613, the central control unit 201 starts capturing and recording a moving image with sound.

次に、実施形態における音方向検出部２０４４による音源方向の検出処理を説明する。この処理は、図５のステップＳ１０６以降、周期的に、且つ、継続的に行われるようにしてもよい。 Next, detection processing of the sound source direction by the sound direction detection unit 2044 in the embodiment will be described. This process may be performed periodically and continuously after step S106 in FIG.

まず、図９を用いて、マイク１０４ａとマイク１０４ｂの２つのマイクを用いた音方向検出処理を説明する。同図は、マイク１０４ａとマイク１０４ｂが平面上（仮想平面上）に配置されているとする。マイク１０４ａとマイク１０４ｂの距離をｄ［ａ‐ｂ］と表す。距離ｄ［ａ‐ｂ］に対して、撮像装置１と音源間の距離は十分に大きいと仮定する。この場合、マイク１０４ａとマイク１０４ｂの音声を比較することによって、両者間の音声の遅延時間を特定することができる。 First, sound direction detection processing using two microphones, ie, the microphone 104a and the microphone 104b, will be described with reference to FIG. In the figure, it is assumed that the microphones 104a and 104b are arranged on a plane (on a virtual plane). The distance between the microphone 104a and the microphone 104b is represented as d[ab]. Assume that the distance between the imaging device 1 and the sound source is sufficiently large with respect to the distance d[a−b]. In this case, by comparing the voices of the microphones 104a and 104b, the delay time between the voices can be specified.

到達遅延時間に音速（空気中は３４０ｍ／ｓ）を乗じることで、距離ｌ［ａ‐ｂ］を特定することができる。その結果、次式で音源方向角度θ［ａ‐ｂ］を特定することができる。
θ［ａ‐ｂ］＝ａｃｏｓ（ｌ［ａ‐ｂ］／ｄ［ａ‐ｂ］） By multiplying the arrival delay time by the speed of sound (340 m/s in air), the distance l[a−b] can be specified. As a result, the sound source direction angle θ[ab] can be specified by the following equation.
θ[ab]=acos(l[ab]/d[ab])

しかしながら、２つのマイクで求めた音方向は、求めた音源方向とθ［ａ‐ｂ］’との区別ができない。つまり、２つの方向のいずれであるのかまでは特定できないことになる。 However, the sound direction determined by the two microphones cannot be distinguished from the sound source direction determined by θ[a−b]′. In other words, it is not possible to specify which of the two directions is.

そこで、実施形態における音方向特定処理を以下、図１０（ａ）～（ｅ）を用いて説明する。具体的には、２つのマイクで推定できる音源方向は２つあるので、それら２つの方向を仮方向として扱う。そして、２回のタイミングに分けて、２つのマイクで音源の方向を求め、仮方向を２つ求める。そして、これらに共通している方向が、求める音源の方向として決定される。 Therefore, sound direction identification processing in the embodiment will be described below with reference to FIGS. Specifically, since there are two sound source directions that can be estimated with two microphones, these two directions are treated as temporary directions. Then, the direction of the sound source is determined by two microphones at two timings, and two tentative directions are determined. Then, the direction common to these is determined as the desired direction of the sound source.

図１０（ａ）において、マイク１０４ａ、マイク１０４ｂの並ぶ方向に直交する、レンズ部１０１の撮像方向（光軸方向）は、Ｙ軸方向と一致している。 In FIG. 10A, the imaging direction (optical axis direction) of the lens unit 101, which is orthogonal to the direction in which the microphones 104a and 104b are arranged, coincides with the Y-axis direction.

図９で説明したように、マイク１０４ａ、マイク１０４ｂより、距離ｄ［ａ‐ｂ］は既知であり、音声データより距離ｌ［ａ‐ｂ］を特定することができれば、θ［ａ‐ｂ］を特定できる。 As described with reference to FIG. 9, the distance d[a-b] is known from the microphones 104a and 104b, and if the distance l[a-b] can be specified from the voice data, then θ[a-b] can be identified.

音源が方向Ａ（Ｂ）に存在する場合の音方向特定処理について、図１０（ａ）（ｂ）を用いて説明する。図１０（ａ）に示すように、図５のステップＳ１１２で取得された音方向情報から、ＸＹ平面上の方向Ａもしくは方向Ａ′に音源が存在すると、中央制御部２０１により推定される。このとき、２つの検出角度θ１［ａ－ｂ］，θ１［ａ－ｂ］’が、音源方向の仮方向として検出される。 Sound direction specifying processing when the sound source exists in direction A (B) will be described with reference to FIGS. As shown in FIG. 10A, from the sound direction information obtained in step S112 of FIG. 5, the central control unit 201 estimates that a sound source exists in direction A or direction A' on the XY plane. At this time, two detection angles θ1[ab] and θ1[ab]' are detected as the tentative direction of the sound source direction.

次に、ステップＳ１１３でのパン動作により、撮像装置１は３０°回転したものとする。パン動作後のステップＳ１１９時点での、マイク１０４ａ、１０４ｂの配置と音源の位置関係は、図１０（ｂ）で示す通りとなり、ＸＹ平面上の方向Ｂもしくは方向Ｂ′に音源が存在すると、中央制御部２０１により推定される。このとき、２つの検出角度θ２［ａ－ｂ］，θ２［ａ－ｂ］’が、音源方向の仮方向として検出される。 Next, it is assumed that the imaging device 1 has been rotated by 30° by the panning operation in step S113. The positional relationship between the placement of the microphones 104a and 104b and the sound source at the time of step S119 after the pan operation is as shown in FIG. 10(b). It is estimated by the control unit 201 . At this time, two detection angles θ2[ab] and θ2[ab]' are detected as the provisional direction of the sound source direction.

中央制御部２０１は、ステップＳ１１２で取得された検出角度θ１［ａ－ｂ］（又はθ１［ａ－ｂ］’）、ステップＳ１１９で検出されたθ２［ａ－ｂ］（又はθ２［ａ－ｂ］’）の変位を算出する。このとき、パン動作前後の検出角度の大小関係は、θ１［ａ－ｂ］（又はθ１［ａ－ｂ］’）＜θ２［ａ－ｂ］（又はθ２［ａ－ｂ］’）となっている。このとき、θ１［ａ－ｂ］＋３０度＝θ２［ａ－ｂ］の関係式が成立する。この場合、パン動作したことで、音源から遠ざかる方向に回転したことがわかる。したがって、パン動作の回転方向とは逆側の位置で検出された方向Ａ（＝Ｂ）に音源が存在することが特定できる。また、音源の位置が移動していない場合は、方向Ａ，Ａ’，Ｂ，Ｂ’のうち、共通する方向Ａ（＝Ｂ）に音源が存在することが特定できる。中央制御部２０１は、特定した音源の方向Ａ（＝Ｂ）を音方向特定処理の結果として内部メモリに記憶する。 The central control unit 201 detects the detected angle θ1[ab] (or θ1[ab]′) obtained in step S112 and θ2[ab] (or θ2[ab] detected in step S119). ]') is calculated. At this time, the magnitude relationship of the detected angles before and after the pan operation is θ1[ab] (or θ1[ab]')<θ2[ab] (or θ2[ab]'). there is At this time, the relational expression θ1[ab]+30 degrees=θ2[ab] is established. In this case, it can be seen that the panning operation has caused the rotation in the direction away from the sound source. Therefore, it can be identified that the sound source exists in the direction A (=B) detected at the position opposite to the rotation direction of the panning motion. Further, when the position of the sound source does not move, it can be identified that the sound source exists in the common direction A (=B) among the directions A, A', B, and B'. The central control unit 201 stores the specified sound source direction A (=B) in the internal memory as a result of the sound direction specifying process.

また、音源が方向Ｃ（Ｄ）に存在する場合の音方向特定処理について、図１０（ｃ）（ｄ）を用いて説明する。図１０（ｃ）に示すように、図５のステップＳ１１２で取得された音方向情報から、ＸＹ平面上の方向Ｃもしくは方向Ｃ′に音源が存在すると、中央制御部２０１により推定される。このとき、２つの検出角度θ１［ａ－ｂ］，θ１［ａ－ｂ］’が、音源方向の仮方向として検出される。 Further, sound direction identification processing when the sound source exists in the direction C (D) will be described with reference to FIGS. As shown in FIG. 10(c), from the sound direction information obtained in step S112 of FIG. 5, the central control unit 201 estimates that a sound source exists in direction C or direction C' on the XY plane. At this time, two detection angles θ1[ab] and θ1[ab]' are detected as the tentative direction of the sound source direction.

次に、ステップＳ１１３でのパン動作により、撮像装置１は３０°回転したものとする。パン動作後のステップＳ１１９時点での、マイク１０４ａ、１０４ｂの配置と音源の位置関係は、図１０（ｄ）で示す通りとなり、ＸＹ平面上の方向Ｄもしくは方向Ｄ′に音源が存在すると、中央制御部２０１により推定される。このとき、２つの検出角度θ２［ａ－ｂ］，θ２［ａ－ｂ］’が、音源方向の仮方向として検出される。 Next, it is assumed that the imaging device 1 has been rotated by 30° by the panning operation in step S113. The positional relationship between the placement of the microphones 104a and 104b and the sound source at the time of step S119 after the panning operation is as shown in FIG. 10(d). It is estimated by the control unit 201 . At this time, two detection angles θ2[ab] and θ2[ab]' are detected as the provisional direction of the sound source direction.

中央制御部２０１は、ステップＳ１１２で取得された検出角度θ１［ａ－ｂ］（又はθ１［ａ－ｂ］’）、ステップＳ１１９で検出されたθ２［ａ－ｂ］（又はθ２［ａ－ｂ］’）の変位を算出する。このとき、パン動作前後の検出角度の大小関係は、θ１［ａ－ｂ］（又はθ１［ａ－ｂ］’）＞θ２［ａ－ｂ］（又はθ２［ａ－ｂ］’）となっている。このとき、θ１［ａ－ｂ］＝θ２［ａ－ｂ］＋３０度の関係式が成立する。この場合、パン動作したことで、音源に近づく方向に回転したことがわかる。したがって、パン動作の回転方向とは同じ側の位置で検出された方向Ｃ（＝Ｄ）に音源が存在することが特定できる。また、音源の位置が移動していない場合は、方向Ｃ，Ｃ’，Ｄ，Ｄ’のうち、共通する方向Ｃ（＝Ｄ）に音源が存在することが特定できる。中央制御部２０１は、特定した音源の方向Ｃ（＝Ｄ）を音方向特定処理の結果として内部メモリに記憶する。 The central control unit 201 detects the detected angle θ1[ab] (or θ1[ab]′) obtained in step S112 and θ2[ab] (or θ2[ab] detected in step S119). ]') is calculated. At this time, the magnitude relationship of the detected angles before and after the panning operation is θ1[ab] (or θ1[ab]')>θ2[ab] (or θ2[ab]'). there is At this time, a relational expression of θ1[ab]=θ2[ab]+30 degrees is established. In this case, it can be seen that the panning operation has caused the rotation to approach the sound source. Therefore, it can be identified that the sound source exists in the direction C (=D) detected at the position on the same side as the rotation direction of the panning motion. Also, when the position of the sound source does not move, it can be identified that the sound source exists in the common direction C (=D) among the directions C, C', D, and D'. The central control unit 201 stores the specified sound source direction C (=D) in the internal memory as a result of the sound direction specifying process.

なお、ステップＳ１１３でのパン動作により、撮像装置１を９０°回転してもよく、その場合について図１０（ａ）（ｅ）を用いて説明する。音源が方向Ａ（Ｅ）に存在する場合の音方向特定処理について説明する。図１０（ａ）の説明は上述のとおりなので、繰り返さない。 Note that the imaging apparatus 1 may be rotated by 90° by the panning operation in step S113, and this case will be described with reference to FIGS. 10A and 10E. The sound direction identifying process when the sound source exists in the direction A (E) will be described. The description of FIG. 10(a) is as described above, and will not be repeated.

ステップＳ１１３でのパン動作により、撮像装置１は９０°回転する。パン動作後のステップＳ１１９時点での、マイク１０４ａ、１０４ｂの配置と音源の位置関係は、図１０（ｅ）で示す通りとなり、ＸＹ平面上の方向Ｅもしくは方向Ｅ′に音源が存在すると、中央制御部２０１により推定される。このとき、２つの検出角度θ３［ａ－ｂ］，θ３［ａ－ｂ］’が、音源方向の仮方向として検出される。 The pan operation in step S113 rotates the imaging device 1 by 90 degrees. The positional relationship between the placement of the microphones 104a and 104b and the sound source at the time of step S119 after the panning operation is as shown in FIG. 10(e). It is estimated by the control unit 201 . At this time, two detected angles θ3[ab] and θ3[ab]' are detected as the provisional direction of the sound source direction.

中央制御部２０１は、ステップＳ１１２で取得された検出角度θ１［ａ－ｂ］（又はθ１［ａ－ｂ］’）、ステップＳ１１９で検出されたθ３［ａ－ｂ］（又はθ３［ａ－ｂ］’）の変位を算出する。このとき、パン動作前後の検出角度の大小関係は、θ１［ａ－ｂ］（又はθ１［ａ－ｂ］’）＜θ３［ａ－ｂ］（又はθ３［ａ－ｂ］’）となっている。つまり、パン動作したことで、音源から遠ざかる方向に回転したことがわかる。このとき、θ１［ａ－ｂ］＋９０度＝θ３［ａ－ｂ］の関係式が成立する。この場合、パン動作したことで、音源から遠ざかる方向に回転したことがわかる。したがって、パン動作の回転方向とは逆側の位置で検出された方向Ｅ（＝Ａ）に音源が存在することが特定できる。また、音源の位置が移動していない場合は、方向Ａ，Ａ’，Ｅ，Ｅ’のうち、共通する方向Ａ（＝Ｅ）に音源が存在することが特定できる。中央制御部２０１は、特定した音源の方向Ａ（＝Ｅ）を音方向特定処理の結果として内部メモリに記憶する。 The central control unit 201 detects the detected angle θ1[ab] (or θ1[ab]′) obtained in step S112 and θ3[ab] (or θ3[ab] detected in step S119). ]') is calculated. At this time, the magnitude relationship of the detected angle before and after the panning operation is θ1[ab] (or θ1[ab]')<θ3[ab] (or θ3[ab]'). there is In other words, it can be seen that the panning operation causes rotation in the direction away from the sound source. At this time, the relational expression θ1[ab]+90 degrees=θ3[ab] is established. In this case, it can be seen that the panning operation has caused the rotation in the direction away from the sound source. Therefore, it can be identified that the sound source exists in the direction E (=A) detected at the position opposite to the rotation direction of the panning motion. Moreover, when the position of the sound source does not move, it can be identified that the sound source exists in the common direction A (=E) among the directions A, A', E, and E'. The central control unit 201 stores the specified direction A (=E) of the sound source in the internal memory as a result of the sound direction specifying process.

マイク１０４ａ、マイク１０４ｄ間の距離ｄ［ａ‐ｄ］は既知であるので、音声データから距離ｌ［ａ‐ｄ］を特定できるので、θ［ａ‐ｄ］も特定できる。 Since the distance d[a−d] between the microphones 104a and 104d is known, the distance l[a−d] can be specified from the audio data, and θ[a−d] can also be specified.

ステップＳ１１３で、可動撮像部１００のパン動作により回転させるパン回転角度は、例えば、０＜パン回転角度≦９０の範囲で、予め任意の値に設定可能である。パン回転角度が小さいほど、パン動作の駆動時間が短くなるので、早く音方向特定処理の結果が得られるが、特定される音源の方向の精度が低くなる。一方、パン回転角度が大きいほど、パン動作の駆動時間が長くなるので、音方向特定処理の結果が得られるのが遅くなるが、特定される音源の方向の精度が高くなる。例えば、図８の時刻Ｔ６０３～Ｔ６０７の期間に、音源の位置が３０度以上移動した場合、ステップＳ１１３で可動撮像部１００のパン動作により回転させるパン回転角度が３０度である場合は、音方向特定処理により正確な音源の方向を特定できないことがある。しかし、図８の時刻Ｔ６０３～Ｔ６０７の期間に、音源の位置が３０度以上移動した場合でも、ステップＳ１１３で可動撮像部１００のパン動作により回転させるパン回転角度が９０度である場合は、音方向特定処理によりある程度正確な音源の方向を特定し得る。 In step S113, the pan rotation angle rotated by the pan operation of the movable imaging unit 100 can be set to any value in advance within the range of 0<pan rotation angle≦90, for example. The smaller the panning rotation angle, the shorter the driving time of the panning operation. Therefore, the result of the sound direction specifying process can be obtained quickly, but the accuracy of the specified direction of the sound source becomes low. On the other hand, the larger the panning rotation angle, the longer the driving time of the panning operation. Therefore, although the result of the sound direction specifying process is obtained later, the accuracy of the specified direction of the sound source increases. For example, when the position of the sound source moves by 30 degrees or more during the period from time T603 to T607 in FIG. It may not be possible to accurately identify the direction of the sound source due to the identification process. However, even if the position of the sound source moves by 30 degrees or more during the period from time T603 to T607 in FIG. The direction identification process can identify the direction of the sound source with some degree of accuracy.

なお、ステップＳ１１３で、可動撮像部１００のパン動作中に、音圧レベル検出部２０４１による閾値を超える音圧を表す音声データが検出された場合は、そのタイミングでパン動作を停止させてもよい。また、図５のステップＳ１０３にて、音圧レベル検出部２０４１による閾値を超える音圧を表す音声データが検出された期間の長さに比例して大きくなるように、パン回転角度を決定してもよい。 In step S113, when the sound pressure level detection unit 2041 detects sound data representing a sound pressure exceeding the threshold during the panning operation of the movable imaging unit 100, the panning operation may be stopped at that timing. . Further, in step S103 of FIG. 5, the panning rotation angle is determined so as to increase in proportion to the length of the period during which the sound data representing the sound pressure exceeding the threshold is detected by the sound pressure level detection unit 2041. good too.

また、θ２［ａ‐ｂ］、θ３［ａ－ｂ］と検知角度を増やしていけば、方向検知の角度の精度を高めることも可能である。例えば、可動撮像部１００のパン動作による回転角度が３０度の場合と、９０度の場合の両方の音方向検出結果を中央制御部２０１が取得し、２回分の音方向検出結果に基づいて、音源の方向を特定すれば、方向検知の角度の精度が高くなる。例えば、２回分の音方向特定処理の結果が異なる場合は、２回分の音方向特定処理の結果の平均値を求めることにより、音源の方向を確定してもよい。その場合のフロー図を、図１１に示す。 Further, by increasing the detection angle to θ2[ab] and θ3[ab], it is possible to improve the accuracy of the direction detection angle. For example, the central control unit 201 acquires the sound direction detection results for both the cases where the pan operation of the movable imaging unit 100 is 30 degrees and the case where the rotation angle is 90 degrees, and based on the two sound direction detection results, If the direction of the sound source is specified, the accuracy of the direction detection angle will be improved. For example, if the results of two sound direction identifying processes are different, the direction of the sound source may be determined by calculating the average value of the results of the two sound direction identifying processes. A flow chart in that case is shown in FIG.

図１１のフロー図では、図５のフロー図からステップＳ３０１，Ｓ３０２が追加されている。例えば、ステップＳ１１３では、中央制御部２０１は、回動制御部２１３を制御して、可動撮像部１００のパン動作を行い、現在の撮像部１０２の撮像方向（光軸方向）の水平面の角度を、３０度だけ回転させる。 In the flowchart of FIG. 11, steps S301 and S302 are added from the flowchart of FIG. For example, in step S113, the central control unit 201 controls the rotation control unit 213 to pan the movable imaging unit 100, and changes the horizontal angle of the current imaging direction (optical axis direction) of the imaging unit 102. , rotated by 30 degrees.

そして、中央制御部２０１は、ステップＳ３０１にて、音声コマンド認識部２０４３で認識された音声コマンドに同期する音方向情報を、音方向検出部２０４４のバッファメモリ２０４４ａから取得する。音声コマンド認識部２０４３は、先に説明したように、ステップＳ１０８にて音声コマンドを認識したとき、音声用メモリ２０４２内の音声コマンドを表す先頭と終端を表す２つのアドレスを中央制御部２０１に通知する。そこで、中央制御部２０１は、ステップＳ１０８にて認識された音声コマンドに同期した音方向として、この２つのアドレスが示す期間内で検出した音方向情報をバッファメモリ２０４４ａから取得する。 Then, in step S301, the central control unit 201 acquires sound direction information synchronized with the voice command recognized by the voice command recognition unit 2043 from the buffer memory 2044a of the sound direction detection unit 2044. FIG. As described above, when the voice command recognition unit 2043 recognizes the voice command in step S108, it notifies the central control unit 201 of two addresses representing the start and end of the voice command in the voice memory 2042. do. Therefore, the central control unit 201 acquires sound direction information detected within the period indicated by these two addresses from the buffer memory 2044a as the sound direction synchronized with the voice command recognized in step S108.

ステップＳ３０２では、中央制御部２０１は、回動制御部２１３を制御して、可動撮像部１００のパン動作を行い、現在の撮像部１０２の撮像方向（光軸方向）の水平面の角度を、さらに６０度だけ回転させる。これにより、ステップＳ１１３とステップＳ３０２により、パン回転される角度の合計は９０度となる。 In step S302, the central control unit 201 controls the rotation control unit 213 to pan the movable imaging unit 100, and further changes the horizontal angle of the current imaging direction (optical axis direction) of the imaging unit 102 to Rotate by 60 degrees. As a result, the total angle of pan rotation becomes 90 degrees in steps S113 and S302.

この場合、ステップＳ１１９では、中央制御部２０１は、ステップＳ１１４にて認識された音声コマンドに同期した音方向として、検出した音方向情報をバッファメモリ２０４４ａから取得する。この結果、撮像部１０２の撮像方向（光軸方向）の水平面の角度を、基準角度に対して３０度回転させた状態で検出された音方向と、９０度回転させた状態で検出された音方向とが得られる。そして、ステップＳ１２０にて、中央制御部２０１は、ステップＳ１１２，Ｓ３０１，Ｓ１１９で取得した３回分の音方向に基づいて、音源の音方向を特定し、特定した音方向を音方向特定処理の結果として内部メモリに記憶する。 In this case, in step S119, the central control unit 201 acquires the detected sound direction information from the buffer memory 2044a as the sound direction synchronized with the voice command recognized in step S114. As a result, the sound direction detected when the angle of the horizontal plane of the imaging direction (optical axis direction) of the imaging unit 102 is rotated by 30 degrees with respect to the reference angle, and the sound direction detected when the angle of the horizontal plane is rotated by 90 degrees. direction is obtained. Then, in step S120, the central control unit 201 identifies the sound direction of the sound source based on the three sound directions obtained in steps S112, S301, and S119, and determines the identified sound direction as the result of the sound direction identification processing. stored in the internal memory as

上記実施形態によれば、特別な操作を行わずとも、ユーザの意図したタイミングで意図した構図の画像を撮像することができる。この結果、音声コマンドを発声した人物（の顔）以外を誤って被写体とすることを抑制できる。また、音声コマンドを発した人物の意図したジョブを実行することも可能になる。また、少ないマイク数で音源の方向検知を高精度に行うことができる。また、少ないマイク数でよいため、部品コストが抑えられ、簡易な構造で、音源の方向検知を高精度に行うことができる。 According to the above embodiment, it is possible to capture an image with an intended composition at a timing intended by the user without performing a special operation. As a result, it is possible to prevent a person other than (the face of) the person who uttered the voice command from being mistakenly set as the subject. It also becomes possible to execute a job intended by the person who issued the voice command. Also, the direction of the sound source can be detected with high accuracy with a small number of microphones. In addition, since only a small number of microphones are required, the cost of parts can be suppressed, and the direction of the sound source can be detected with high accuracy with a simple structure.

更に、上記実施形態で説明したように、マイク１０４ａ，１０４ｂ、音声信号処理部２０４を構成する各要素は、実際にそれらが利用する段階でなって初めて中央制御部２０１の制御の元で電力供給が行われる。よって、全構成要素が可動状態にある場合と比較して、電力消費量を抑制できる。 Furthermore, as described in the above embodiment, power is supplied to the microphones 104a and 104b and the components of the audio signal processing unit 204 under the control of the central control unit 201 only when they are actually used. is done. Therefore, power consumption can be suppressed compared to the case where all components are in a movable state.

上記実施形態では、音圧レベル検出部２０４１、音声コマンド認識部２０４３、音方向検出部２０４４、動画用音声処理部２０４５等が、中央制御部２０１とは独立した処理部とする例を説明した。しかしながら、中央制御部２０１が、これらの全部或いは一部を、プログラムを実行することで代替しても構わない。 In the above embodiment, the sound pressure level detection unit 2041, the voice command recognition unit 2043, the sound direction detection unit 2044, the moving image audio processing unit 2045, and the like have been described as processing units independent of the central control unit 201. FIG. However, the central control unit 201 may substitute all or part of these by executing a program.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other examples)
The present invention supplies a program that implements one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in the computer of the system or apparatus reads and executes the program. It can also be realized by processing to It can also be implemented by a circuit (for example, ASIC) that implements one or more functions.

１撮像装置
１００可動撮像部
１０１レンズ部
１０２撮像部
１０４音声入力部
１０４ａ，１０４ｂマイク
２０１中央制御部
２０４１音圧レベル検出部
２０４２音声用メモリ
２０４３音声コマンド認識部
２０４４音方向検出部
２０４５動画像音声処理部
２０４６コマンドメモリ 1 imaging device 100 movable imaging unit 101 lens unit 102 imaging unit 104 voice input unit 104a, 104b microphone 201 central control unit 2041 sound pressure level detection unit 2042 voice memory 2043 voice command recognition unit 2044 sound direction detection unit 2045 moving image audio processing Part 2046 Command memory

Claims

a movable imaging unit provided with an imaging unit, the imaging unit being rotatable in a predetermined direction;
a plurality of microphones provided in the movable imaging unit;
sound direction detection means for detecting the direction of a sound source using the plurality of microphones;
With the movable imaging section directed in the first direction, the sound direction detected by the sound direction detection means and the second direction by rotating the movable imaging section from the first direction to the predetermined direction are detected. a control means for controlling to perform processing for specifying the direction of the sound source based on the sound direction detected by the sound direction detection means when the sound direction is directed to the direction of
The second direction is a direction in which the movable imaging section is rotated in the predetermined direction by a predetermined rotation angle greater than 0 degrees and less than or equal to 90 degrees from the first direction. An imaging device characterized by:

The control means controls the first sound direction and the second sound direction detected by the sound direction detecting means and the movable image pickup part to move in the second direction with the movable image pickup part directed in the first direction. 2. A common sound direction is specified as the direction of the sound source based on the third sound direction and the fourth sound direction detected by the sound direction detecting means when the sound direction is directed toward the Imaging device.

The control means controls the difference between the first sound direction or the second sound direction detected by the sound direction detection means and a reference direction, and the movable image pickup unit, with the movable image pickup section directed in the first direction. direction of the sound source, based on the magnitude relationship between the third sound direction or the fourth sound direction detected by the sound direction detecting means and the reference direction. 2. The imaging device according to claim 1, wherein the image pickup device is specified.

The movable imaging unit
a first housing unit provided with the imaging unit and rotatable in a tilting direction;
a second housing provided with the first housing and rotatable in a panning direction;
4. The imaging apparatus according to any one of claims 1 to 3, wherein the plurality of microphones are provided on the first casing.

5. The method according to any one of claims 1 to 4, wherein two of the plurality of microphones are maintained at positions sandwiching the imaging direction of the imaging unit, and function as stereo microphones in moving image shooting. imaging device.

The control means performs a panning operation of the movable imaging unit until an imaging direction of the imaging unit becomes the direction of the sound source, and after performing the panning operation, starts a tilting operation of the movable imaging unit to move the subject. 6. The imaging apparatus according to any one of claims 1 to 5, wherein the imaging device is positioned within an angle of view of the imaging unit.

Further comprising recognition means for recognizing a voice command represented by voice data based on voice data input from one of the plurality of microphones,
7. The imaging apparatus according to claim 4, wherein, when a voice command is recognized by said recognition means, said control means executes processing based on the recognized voice command.

The voice commands include a command requesting shooting and recording of a still image, a command requesting shooting and recording of a moving image with sound, and a command requesting that the imaging direction of the imaging unit track the movement of the user. 8. The imaging device of claim 7, comprising:

Further comprising sound pressure level detection means connected to one of the plurality of microphones and detecting audio data representing a sound pressure level exceeding a preset threshold;
The control means is
When the imaging device is activated, only the microphone connected to the sound pressure level detection means among the plurality of microphones is used to wait for detection of audio data having a sound pressure level exceeding the threshold;
Triggered by detection of audio data having a sound pressure level exceeding the threshold, the recognition means is activated to start speech recognition, and power supply to the sound direction detection means is started. Item 9. The imaging device according to item 7 or 8.

10. The imaging apparatus according to any one of claims 7 to 9, wherein the plurality of microphones are arranged on a virtual plane representing the direction of the panning operation.

11. The imaging apparatus according to any one of claims 1 to 10, wherein the predetermined direction is a panning direction.

A control method for an imaging device comprising: a movable imaging unit provided with an imaging unit, the imaging unit being rotatable in a predetermined direction; and a plurality of microphones provided in the movable imaging unit, comprising:
a sound direction detection step of detecting the direction of a sound source using the plurality of microphones;
With the movable imaging section directed in the first direction, the sound direction detected in the sound direction detection step and the second direction by rotating the movable imaging section from the first direction to the predetermined direction are detected. and a control step of controlling to perform processing for specifying the direction of the sound source based on the sound direction detected in the sound direction detection step in a state in which the sound direction is directed to the direction of
The second direction is a direction in which the movable imaging section is rotated in the predetermined direction by a predetermined rotation angle greater than 0 degrees and less than or equal to 90 degrees from the first direction. A method of controlling an imaging device characterized by :

A computer-readable program for causing a computer to function as each means of the imaging apparatus according to any one of claims 1 to 11.