JP2009239346A

JP2009239346A - Photographing device

Info

Publication number: JP2009239346A
Application number: JP2008079115A
Authority: JP
Inventors: Akio Tamura; 明穂田村; Tetsuo Nishimoto; 哲夫西元; Katsuji Yoshimura; 克二吉村; Masayoshi Omura; 昌良大村
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2008-03-25
Filing date: 2008-03-25
Publication date: 2009-10-15

Abstract

PROBLEM TO BE SOLVED: To provide technique for following up and photographing a specific subject by a photographing device having simpler constitution than the conventional ones. SOLUTION: The photographing device 1 has a plurality of microphones 15 arranged in a column. A control unit 11 of the photographing device 1 analyzes sound data representing sounds picked up by the respective microphones 15 and estimates a plurality of directions of sound sources according to analysis results. Further, the control unit 11 collates the sound data representing the sounds from the estimated sound source directions with collation data stored in a collation data storage area 121 to specify the direction of the specific subject according to matching degrees thereof. Further, the control unit 11 analyzes the sound data by the microphones 15 to detect transition of the direction of the subject according to analysis results, and controls a rotating mechanism 70 to change the direction of the photographing device 1 such that the detected direction is included in a photographing range of the photographing device 1. COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、撮影を行う技術に関する。 The present invention relates to a technique for performing photographing.

静止画や動画を撮影するデジタルカメラ等の撮影装置において、特定の被写体を追尾して撮影する技術が提案されている。例えば、特許文献１には、発信機を取り付けた被写体を自動追尾して撮影する装置が提案されている。また、特許文献２には、複数の被写体のそれぞれにタグを取り付け、画角とほぼ等しい広がり角で同調信号を送信し、同調信号に応じて各タグから発信される応答信号を受信してタグ方向を検出し、検出したタグ方向に撮影方向をほぼ一致させて撮影を行う装置が提案されている。この装置によれば、被写体が複数いる場合であっても、画角内の被写体だけを確実に捕捉して撮影することができる。
特開２００３−６９８８４号公報特開２００６−２６１９９９号公報 Techniques have been proposed for tracking and shooting a specific subject in a shooting device such as a digital camera for shooting still images and moving images. For example, Patent Document 1 proposes an apparatus for automatically tracking and photographing a subject to which a transmitter is attached. Further, in Patent Document 2, a tag is attached to each of a plurality of subjects, a tuning signal is transmitted with a spread angle substantially equal to the angle of view, and a response signal transmitted from each tag is received in response to the tuning signal. There has been proposed an apparatus that detects a direction and performs imaging by making the imaging direction substantially coincide with the detected tag direction. According to this apparatus, even when there are a plurality of subjects, only the subject within the angle of view can be reliably captured and photographed.
JP 2003-69884 A JP 2006-261999 A

しかしながら、特許文献１や２に記載の技術では、撮影装置に発信機（タグ）からの信号を受信するための受信部を別途設ける必要があり、撮影装置の装置構成が複雑になってしまうという問題があった。
本発明は上述した背景の下になされたものであり、従来と比較して、より簡易な構成で、特定の被写体を追尾して撮影することのできる技術を提供することを目的とする。 However, in the techniques described in Patent Documents 1 and 2, it is necessary to separately provide a receiving unit for receiving a signal from a transmitter (tag) in the photographing device, and the device configuration of the photographing device becomes complicated. There was a problem.
The present invention has been made under the above-described background, and an object of the present invention is to provide a technique capable of tracking and shooting a specific subject with a simpler configuration than in the past.

上記課題を解決するために、本発明の好適な態様である撮影装置は、撮影範囲が設定され、該撮影範囲内の映像を表す映像データを出力する撮影手段と、前記撮影手段の周囲でマイクロホン毎に音声を収音し、音声データとして出力する複数のマイクロホンと、前記複数のマイクロホンのそれぞれから出力される音声データを解析し、解析結果に応じて音源の方向を１又は複数推定する推定手段と、前記推定手段により推定された音源の方向のうちの少なくともいずれか一つを特定する特定手段と、前記マイクロホン毎の音声データを解析し、解析結果に応じて前記特定手段により特定された音源の方向の遷移を検出する検出手段と、前記撮影手段の前記撮影範囲を、前記検出手段により検出された音源の方向を含む範囲に変更する撮影範囲変更手段とを具備することを特徴とする。 In order to solve the above problems, a photographing apparatus according to a preferred aspect of the present invention includes a photographing unit that sets a photographing range and outputs video data representing a video in the photographing range, and a microphone around the photographing unit. A plurality of microphones that collect sound for each time and output as sound data, and an estimation unit that analyzes sound data output from each of the plurality of microphones and estimates one or a plurality of sound source directions according to the analysis result And specifying means for specifying at least one of the directions of the sound source estimated by the estimating means; and analyzing the sound data for each microphone and identifying the sound source specified by the specifying means according to the analysis result Detecting means for detecting a transition in the direction of the image capturing area, and the imaging range for changing the imaging range of the imaging means to a range including the direction of the sound source detected by the detecting means Characterized by comprising a further means.

上述の態様において、前記撮影範囲変更手段により前記撮影範囲が変更されたときに、前記撮影手段から出力される映像データを所定の記憶手段に記憶する映像データ記憶制御手段を具備してもよい。 In the above-described aspect, there may be provided video data storage control means for storing video data output from the photographing means in a predetermined storage means when the photographing range is changed by the photographing range changing means.

また、上述の態様において、前記推定手段は、前記マイクロホン毎の音声データの相関に基づいて、前記撮影手段の周囲における音圧の分布を算出し、算出した分布において音圧のピークが表れる方向を前記音源の方向として推定してもよい。 In the above aspect, the estimating means calculates a sound pressure distribution around the photographing means based on the correlation of the sound data for each microphone, and determines the direction in which the sound pressure peak appears in the calculated distribution. The direction of the sound source may be estimated.

また、上述の態様において、前記撮影範囲変更手段は、前記方向推定手段により推定された音源の方向が前記撮影範囲に含まれるように、前記撮影手段の向きを変更してもよい。
また、上述の態様において、前記検出手段は、前記推定手段により算出された音圧の分布における音圧のピークの遷移を検出してもよい。 In the above-described aspect, the photographing range changing unit may change the direction of the photographing unit so that the direction of the sound source estimated by the direction estimating unit is included in the photographing range.
In the above-described aspect, the detection unit may detect a transition of a sound pressure peak in the sound pressure distribution calculated by the estimation unit.

また、上述の態様において、音声の特徴を表す照合用データを記憶する照合用データ記憶手段と、前記マイクロホン毎の音声データから、前記推定手段により推定された方向のそれぞれに対応した方向別音声データを生成する方向別音声データ生成手段とを具備し、前記特定手段は、前記方向別音声データ生成手段により生成された方向別音声データを、前記照合用データ記憶手段に記憶された照合用データと照合し、両者の一致度に基づいて前記音源の方向を特定してもよい。 Further, in the above-mentioned aspect, the collation data storage means for storing the collation data representing the characteristics of the voice, and the direction-specific voice data corresponding to each of the directions estimated by the estimation means from the voice data for each microphone. Direction-specific audio data generation means for generating the direction-specific audio data generated by the direction-specific audio data generation means and the verification data stored in the verification data storage means The direction of the sound source may be specified based on the degree of matching between the two.

また、上述の態様において、前記複数のマイクロホン毎の音声データから、前記特定手段により特定された方向に対応した音声データを照合用データとして生成する照合用データ生成手段と、前記マイクロホン毎の音声データから、前記推定手段により推定された方向のそれぞれに対応した方向別音声データを生成する方向別音声データ生成手段とを具備し、前記検出手段は、前記方向別音声データ生成手段により生成された方向別音声データのそれぞれを前記照合用データ生成手段により生成された照合用データと照合し、その一致度に基づいて前記音源の方向の遷移を検出してもよい。
また、上述の態様において、前記特定手段は、操作手段から出力される信号に応じて前記音源の方向を特定してもよい。 In the above-described aspect, collation data generating means for generating voice data corresponding to the direction specified by the specifying means from the voice data for each of the plurality of microphones as verification data; and the voice data for each microphone Directional audio data generation means for generating direction-specific audio data corresponding to each of the directions estimated by the estimation means, and the detection means is a direction generated by the direction-specific audio data generation means. Each of the different audio data may be collated with the collation data generated by the collation data generating means, and the direction change of the sound source may be detected based on the degree of coincidence.
In the above-described aspect, the specifying unit may specify the direction of the sound source according to a signal output from the operation unit.

また、上述の態様において、前記照合用データは、特定の個人の声の特徴情報を含んでもよい。
また、上述の態様において、前記方向別音声データ生成手段は、前記推定手段により推定された方向からのそれぞれについて、該方向からの音圧が高くなるようにミキシングして方向別音声データを生成してもよい。 In the above-described aspect, the verification data may include specific personal voice feature information.
In the above-described aspect, the direction-specific sound data generation unit generates direction-specific sound data by mixing each direction from the direction estimated by the estimation unit so that the sound pressure from the direction becomes high. May be.

また、上述の態様において、前記方向別音声データ生成手段は、独立成分分析を用いて前記音声データから音源に対応する音声データを推定することで方向別音声データを生成してもよい。
また、上述の態様において、前記推定手段は、独立成分分析を用いて音源の方向を推定してもよい。 In the above-described aspect, the direction-specific sound data generation means may generate direction-specific sound data by estimating sound data corresponding to a sound source from the sound data using independent component analysis.
In the above-described aspect, the estimation means may estimate the direction of the sound source using independent component analysis.

本発明によれば、従来と比較して、より簡易な構成で、特定の被写体を追尾して撮影することができる。 According to the present invention, it is possible to track and photograph a specific subject with a simpler configuration than in the past.

以下、図面を参照して、本発明の実施形態について説明する。
＜Ａ：構成＞
図１は、この発明の一実施形態である撮影装置１のハードウェア構成を示すブロック図であり、図２は撮影装置１の外観を示す斜視図である。撮影装置１は、静止画や動画を撮影する機能を備えた装置であり、例えばデジタルカメラである。図１において、制御部１１は、ＣＰＵ（Central Processing Unit）やＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）を備え、ＲＯＭ又は記憶部１２に記憶されているコンピュータプログラムを読み出して実行することにより、バスＢＵＳを介して撮影装置１の各部を制御する。記憶部１２は、制御部１１によって実行されるコンピュータプログラムやその実行時に使用されるデータを記憶するための記憶手段であり、例えばハードディスク装置である。表示部１３は、液晶パネル等を備え、制御部１１による制御の下に各種の画像を表示する。操作部１４は、撮影装置１の利用者による操作に応じた信号を制御部１１に出力する。操作部１４は、十字キー（図示略）や、録音を開始・終了させるための録音ボタンＢ１、静止画像の撮影及び動画像の撮影を開始・終了させるための撮影ボタンＢ２等の各種のボタンを備えており、撮影装置１の利用者は、これらのボタンを押下することで、静止画像の撮影や動画像の撮影等の各種の操作を行うことができる。なお、静止画の撮影と動画像の撮影との切替は、撮影装置１に設けられた切替スイッチ（図示略）によって切り替えられるようになっている。 Embodiments of the present invention will be described below with reference to the drawings.
<A: Configuration>
FIG. 1 is a block diagram illustrating a hardware configuration of a photographing apparatus 1 according to an embodiment of the present invention, and FIG. 2 is a perspective view illustrating an appearance of the photographing apparatus 1. The photographing device 1 is a device having a function of photographing a still image or a moving image, for example, a digital camera. In FIG. 1, the control unit 11 includes a CPU (Central Processing Unit), a ROM (Read Only Memory), and a RAM (Random Access Memory), and reads and executes a computer program stored in the ROM or the storage unit 12. Thus, each unit of the photographing apparatus 1 is controlled via the bus BUS. The storage unit 12 is a storage unit for storing a computer program executed by the control unit 11 and data used at the time of execution, and is, for example, a hard disk device. The display unit 13 includes a liquid crystal panel and the like, and displays various images under the control of the control unit 11. The operation unit 14 outputs a signal corresponding to an operation by the user of the photographing apparatus 1 to the control unit 11. The operation unit 14 has various buttons such as a cross key (not shown), a recording button B1 for starting / ending recording, and a shooting button B2 for starting / ending still image shooting and moving image shooting. The user of the photographing apparatus 1 can perform various operations such as still image photographing and moving image photographing by pressing these buttons. Note that switching between still image shooting and moving image shooting can be switched by a change-over switch (not shown) provided in the shooting apparatus 1.

撮影部１８は、撮影用レンズ１８ａ等を備え、撮影し、撮影した映像を表す映像データを出力する。撮影部１８は、撮影用レンズ１８ａを前後に移動させることによって撮影範囲を変更可能である。撮影装置１の利用者は、操作部１４の十字キー等を用いて撮影部１８の撮影範囲を設定できるようになっており、撮影部１８は、操作部１４からの信号に応じて、撮影用レンズ１８ａを移動させて撮影範囲を設定する。なお、本実施形態に係る映像データは静止画像を表すデータや動画像を表すデータを含む。 The photographing unit 18 includes a photographing lens 18a and the like, photographs and outputs video data representing the photographed video. The photographing unit 18 can change the photographing range by moving the photographing lens 18a back and forth. The user of the photographing apparatus 1 can set the photographing range of the photographing unit 18 using the cross key or the like of the operation unit 14, and the photographing unit 18 performs photographing according to a signal from the operation unit 14. The imaging range is set by moving the lens 18a. Note that the video data according to the present embodiment includes data representing still images and data representing moving images.

マイクロホンアレイＭＡは、複数のマイクロホン１５１，１５２，…，１５ｎ（ｎは２以上の自然数）が列状に配置されて構成されている。図２に示すように、撮影装置１の前面（撮影用レンズ１８ａが設けられている面と同じ面）に、複数のマイクロホン１５１，１５２，…，１５ｎが列状に配置されており、これら複数のマイクロホン１５１，１５２，…１５ｎは、撮影装置１の周囲で収音する。なお、これらの複数のマイクロホン１５１，１５２，…，１５ｎは指向性マイクロホンであることが望ましい。以下の説明では、マイクロホン１５１，１５２，…，１５ｎを各々区別する必要がない場合には、これらを「マイクロホン１５」と称して説明する。マイクロホン１５は、収音し、収音した音声を表すアナログ信号を出力する収音手段である。音声処理部１６は、マイクロホン１５が出力するアナログ信号をＡ／Ｄ変換してデジタルデータを生成する。また、音声処理部１６は、制御部１１の制御の下、デジタル形式の音声データをＤ／Ａ変換してアナログ信号を生成し、生成したアナログ信号をスピーカ１７に出力する。スピーカ１７は、音声処理部１６から供給されるアナログ信号に応じた強度で放音する放音手段である。 The microphone array MA is configured by arranging a plurality of microphones 151, 152,..., 15n (n is a natural number of 2 or more) in a row. As shown in FIG. 2, a plurality of microphones 151, 152,..., 15 n are arranged in a row on the front surface of the photographing apparatus 1 (the same surface as the surface on which the photographing lens 18 a is provided). The microphones 151, 152,... 15n collect sound around the photographing apparatus 1. The plurality of microphones 151, 152,..., 15n are preferably directional microphones. In the following description, when it is not necessary to distinguish the microphones 151, 152,..., 15n, these will be referred to as “microphones 15”. The microphone 15 is a sound collection unit that collects sound and outputs an analog signal representing the collected sound. The sound processing unit 16 A / D converts the analog signal output from the microphone 15 to generate digital data. In addition, under the control of the control unit 11, the audio processing unit 16 D / A converts digital audio data to generate an analog signal, and outputs the generated analog signal to the speaker 17. The speaker 17 is a sound emitting unit that emits sound with an intensity corresponding to the analog signal supplied from the sound processing unit 16.

なお、この実施形態では、マイクロホン１５とスピーカ１７とが撮影装置１に含まれている場合について説明するが、音声処理部１６に入力端子及び出力端子を設け、オーディオケーブルを介してその入力端子に外部マイクロホンを接続するとしても良く、同様に、オーディオケーブルを介してその出力端子に外部スピーカを接続するとしても良い。また、この実施形態では、マイクロホン１５から音声処理部１６へ入力される音声信号及び音声処理部１６からスピーカ１７へ出力される音声信号がアナログ音声信号である場合について説明するが、デジタル音声データを入出力するようにしても良い。このような場合には、音声処理部１６にてＡ／Ｄ変換やＤ／Ａ変換を行う必要はない。表示部１３、操作部１４、撮影部１８についても同様であり、撮影装置１に内蔵される形式であってもよく、外付けされる形式であってもよい。 In this embodiment, the case where the microphone 15 and the speaker 17 are included in the photographing apparatus 1 will be described. However, the audio processing unit 16 is provided with an input terminal and an output terminal, and the input terminal is connected to the input terminal via an audio cable. An external microphone may be connected, and similarly, an external speaker may be connected to the output terminal via an audio cable. In this embodiment, the case where the audio signal input from the microphone 15 to the audio processing unit 16 and the audio signal output from the audio processing unit 16 to the speaker 17 are analog audio signals will be described. You may make it input / output. In such a case, the audio processing unit 16 does not need to perform A / D conversion or D / A conversion. The same applies to the display unit 13, the operation unit 14, and the imaging unit 18, and may be a format built in the imaging device 1 or an externally attached format.

記憶部１２は、図示のように、照合用データ記憶領域１２１と、動画データ記憶領域１２２とを有している。照合用データ記憶領域１２１には、予め録音された特定の人物の音声の特徴（周波数特性等）を表す照合用データが記憶されている。この照合用データは、制御部１１が後述する照合処理を行う際に用いられるデータである。なお、以下の説明では、説明の便宜上、照合用データ記憶領域１２１に記憶された照合用データの表す音声を「特定音声」と称することとする。動画データ記憶領域１２２には、撮影部１８から出力される映像データとマイクアレイＭＡが収音された音声を表す音声データとを含む動画データが記憶される。操作部１４の撮影ボタンＢ２が撮影者によって操作されることにより撮影が行われると、制御部１１は、撮影部１８から出力される映像データとマイクアレイＭＡが収音した音声を表す音声データとを含む動画データをこの動画データ記憶領域１２２に記憶する。 The storage unit 12 includes a verification data storage area 121 and a moving image data storage area 122 as shown in the figure. The collation data storage area 121 stores collation data representing the characteristics (frequency characteristics, etc.) of a specific person's voice recorded in advance. This data for collation is data used when the control part 11 performs the collation process mentioned later. In the following explanation, for convenience of explanation, the voice represented by the collation data stored in the collation data storage area 121 is referred to as “specific voice”. The moving image data storage area 122 stores moving image data including video data output from the photographing unit 18 and sound data representing sound collected by the microphone array MA. When shooting is performed by operating the shooting button B2 of the operation unit 14 by the photographer, the control unit 11 includes video data output from the shooting unit 18 and audio data representing the sound collected by the microphone array MA. Is stored in the moving image data storage area 122.

撮影装置１は、図２に示すように、回動機構７０に取り付けられている。回動機構７０は机等の台６１に設置されており、撮影装置１は、回動機器７０によって矢印Ｐ方向に回動可能となっている。図３は、回動機構７０の構成の一例を示す図である。回動機構７０は、図示のように、回動部７１と固定部７２とを有している。回動部７１は撮影装置１に固定される。固定部７２には軸７２１が設けられており、この軸７２１が回動部７１の軸受７１１に差し込まれ、回動部７１は軸７２１で回動可能に支持される。固定部７２には駆動歯車７２２と、この駆動歯車７２２を回転させるモータ７２３とが設けられている。モータ７２３は、制御部１１の制御の下に駆動歯車７２２を回転させる。駆動歯車７２２が回転することにより、駆動歯車７２２とかみ合った受動歯車７１２が回転され、これにより、モータ７２３の駆動により回動部７１が軸７２１を中心に回転する。 The imaging device 1 is attached to a rotation mechanism 70 as shown in FIG. The rotation mechanism 70 is installed on a table 61 such as a desk, and the photographing apparatus 1 can be rotated in the direction of arrow P by the rotation device 70. FIG. 3 is a diagram illustrating an example of the configuration of the rotation mechanism 70. The rotation mechanism 70 has a rotation part 71 and a fixing part 72 as shown in the figure. The rotating unit 71 is fixed to the photographing apparatus 1. The fixed portion 72 is provided with a shaft 721, and the shaft 721 is inserted into the bearing 711 of the rotating portion 71, and the rotating portion 71 is rotatably supported by the shaft 721. The fixed portion 72 is provided with a drive gear 722 and a motor 723 that rotates the drive gear 722. The motor 723 rotates the drive gear 722 under the control of the control unit 11. By rotating the drive gear 722, the passive gear 712 meshed with the drive gear 722 is rotated, whereby the rotating portion 71 is rotated about the shaft 721 by driving the motor 723.

＜Ｂ：動作＞
＜Ｂ−１：照合用データ登録動作＞
次に、この実施形態の動作について説明する。まず、撮影装置１の利用者は、操作部１４を操作して、照合用データを登録するための操作を行う。撮影者が録音ボタンＢ１を押下して録音を開始すると、操作部１４は、操作された内容に応じた操作信号を出力し、制御部１１は、操作部１４から供給される信号に応じて、音声処理部１６を制御して録音を開始する。利用者の音声はマイクロホン１５で収音されて音声信号に変換され、音声処理部１６へ出力される。音声処理部１６は、マイクロホン１５から出力される音声信号をデジタルデータ（以下「音声データ」という）に変換する。制御部１１は、音声処理部１６から出力される音声データに所定のフィルタリング処理等を施して、音声データから音声の特徴を表す特徴データを生成し、生成した特徴データを照合用データとして照合用データ記憶領域１２１に記憶する。利用者が録音ボタンＢ１を押下して録音を終了させる操作を行うと、制御部１１は、操作部１４から供給される信号に応じて録音を終了する。 <B: Operation>
<B-1: Collation data registration operation>
Next, the operation of this embodiment will be described. First, the user of the photographing apparatus 1 operates the operation unit 14 to perform an operation for registering verification data. When the photographer presses the recording button B1 to start recording, the operation unit 14 outputs an operation signal corresponding to the operated content, and the control unit 11 responds to the signal supplied from the operation unit 14. The audio processing unit 16 is controlled to start recording. The user's voice is picked up by the microphone 15, converted into a voice signal, and output to the voice processing unit 16. The audio processing unit 16 converts the audio signal output from the microphone 15 into digital data (hereinafter referred to as “audio data”). The control unit 11 performs predetermined filtering processing or the like on the audio data output from the audio processing unit 16 to generate feature data representing the features of the audio from the audio data, and uses the generated feature data as verification data. Store in the data storage area 121. When the user performs an operation of ending recording by pressing the recording button B1, the control unit 11 ends the recording in response to a signal supplied from the operation unit 14.

＜Ｂ−２：撮影動作＞
次に、撮影装置１が行う撮影動作について説明する。撮影装置１は、動画撮影モードと、静止画撮影モードと、自動撮影モード、との３つのモードを切り替えることができる。利用者は、撮影装置１の操作部１４を操作して、撮影モードを切り替えることができる。制御部１１は、操作部１４から出力される信号に応じて、選択されたモードの撮影処理を行う。以下では、自動撮影モードが選択された場合の動作について説明する。なお、動画撮影や静止画撮影の動作については、従来の撮影装置の動作と同様であり、ここではその詳細な説明を省略する。 <B-2: Shooting operation>
Next, a photographing operation performed by the photographing apparatus 1 will be described. The photographing apparatus 1 can switch between three modes: a moving image photographing mode, a still image photographing mode, and an automatic photographing mode. The user can switch the photographing mode by operating the operation unit 14 of the photographing apparatus 1. The control unit 11 performs shooting processing in the selected mode in accordance with a signal output from the operation unit 14. Hereinafter, an operation when the automatic shooting mode is selected will be described. Note that the operation of moving image shooting and still image shooting is the same as the operation of a conventional shooting apparatus, and detailed description thereof is omitted here.

図４は、撮影装置１が行う撮影処理の流れを示すフローチャートである。撮影者が撮影装置１の電源をオンにして、自動撮影モードを選択することにより、図４に示す処理が開示される。自動撮影モードが選択されると、制御部１１は動画像の撮影を開始する。マイクロホン１５は収音した音声を音声信号に変換し、音声処理部１６へ出力する。音声処理部１６は、マイクロホン１５から出力される音声信号を音声データに変換する。制御部１１は、複数のマイクロホン１５のそれぞれに対応する音声データをミキシングして、全体の音声を表す全体音声データを生成し、生成した全体音声データと、撮影部１８から出力される映像データとをあわせて動画データとして動画データ記憶領域１２２に記憶する。 FIG. 4 is a flowchart showing the flow of the photographing process performed by the photographing apparatus 1. The process shown in FIG. 4 is disclosed when the photographer turns on the power of the photographing apparatus 1 and selects the automatic photographing mode. When the automatic shooting mode is selected, the control unit 11 starts shooting a moving image. The microphone 15 converts the collected sound into a sound signal and outputs the sound signal to the sound processing unit 16. The audio processing unit 16 converts the audio signal output from the microphone 15 into audio data. The control unit 11 mixes audio data corresponding to each of the plurality of microphones 15 to generate overall audio data representing the entire audio, the generated overall audio data, and video data output from the imaging unit 18. Are stored in the moving image data storage area 122 as moving image data.

また、制御部１１は、マイクロホン１５毎の音声データを解析し、解析結果に応じて音源の方向（以下「音源方向」）を推定する（ステップＳ１）。ここでは、制御部１１は、複数のマイクロホン１５のそれぞれから出力された音声信号の音圧を検出し、検出したマイクロホン１５毎の音圧の相関に基づいて音圧の分布を算出し、算出した分布において音圧のピークが表れる方向を音源の方向として推定する。この推定処理の具体的な内容の一例について、図５を参照しつつ説明する。 Further, the control unit 11 analyzes the sound data for each microphone 15 and estimates the direction of the sound source (hereinafter referred to as “sound source direction”) according to the analysis result (step S1). Here, the control unit 11 detects the sound pressure of the audio signal output from each of the plurality of microphones 15, calculates the sound pressure distribution based on the detected sound pressure correlation for each microphone 15, and calculates The direction in which the sound pressure peak appears in the distribution is estimated as the direction of the sound source. An example of specific contents of this estimation process will be described with reference to FIG.

図５は、制御部１１が算出する音圧の分布の一例を示す図である。図において、横軸はマイクアレイＭＡの中心方向に対する角度を示し、縦軸は音圧を示す。或る音源で発生した音波が複数のマイクロホン１５のそれぞれに到達するまでの時間は、撮影装置１からみた音源の方向（角度）によってそれぞれ異なる。この原理を利用して、この動作例では、所定単位量の角度毎に、角度に応じた遅延時間をマイクロホン１５毎に予め設定しておき、制御部１１は、マイクロホン１５毎の音声データを各マイクロホン１５に応じた遅延時間だけそれぞれ遅延させ、遅延させたマイクロホン１５毎の音声データをミキシングして、各角度に対応する音圧を算出する。次いで、制御部１１は、算出した角度毎の音圧（すなわち音圧の分布）においてピークが表れる角度を１又は複数検出し、検出した角度を音源の方向とする。図５に示す例においては、制御部１１は、音圧のピークが表れる角度θ１、θ２、θ３を音源方向として推定する。 FIG. 5 is a diagram illustrating an example of a sound pressure distribution calculated by the control unit 11. In the figure, the horizontal axis indicates the angle with respect to the center direction of the microphone array MA, and the vertical axis indicates the sound pressure. The time required for the sound wave generated by a certain sound source to reach each of the plurality of microphones 15 varies depending on the direction (angle) of the sound source as viewed from the photographing apparatus 1. Using this principle, in this operation example, for each predetermined unit amount of angle, a delay time corresponding to the angle is set in advance for each microphone 15, and the control unit 11 stores the audio data for each microphone 15. Each delay time corresponding to the microphone 15 is delayed, the sound data for each delayed microphone 15 is mixed, and the sound pressure corresponding to each angle is calculated. Next, the control unit 11 detects one or a plurality of angles at which a peak appears in the calculated sound pressure for each angle (that is, sound pressure distribution), and sets the detected angle as the direction of the sound source. In the example shown in FIG. 5, the control unit 11 estimates the angles θ1, θ2, and θ3 at which the sound pressure peaks appear as the sound source directions.

次いで、制御部１１は、推定した音源方向のうちの少なくともいずれか一つを、特定の被写体がいる方向（以下、「特定方向」という）として特定する。この動作例では、制御部１１は、まず、複数のマイクロホン１５毎の音声データを、所定単位量の角度毎に、各角度からの音声の音圧が高くなるようにミキシングして、角度毎の音声データ（以下「方向別音声データ」という）を生成する（ステップＳ２）。次いで、制御部１１は、生成した方向別音声データに所定のフィルタ処理等を施して音声の特徴を表す特徴データを生成し、生成した特徴データを、照合用データ記憶領域１２１に記憶された照合用データと照合し、その一致度が最も高い方向を特定方向として特定する（ステップＳ３）。 Next, the control unit 11 specifies at least one of the estimated sound source directions as a direction in which the specific subject is present (hereinafter referred to as “specific direction”). In this operation example, the control unit 11 first mixes the audio data for each of the plurality of microphones 15 for each predetermined unit amount of angle so that the sound pressure of the sound from each angle becomes high. Audio data (hereinafter referred to as “directional audio data”) is generated (step S2). Next, the control unit 11 performs a predetermined filtering process or the like on the generated direction-specific sound data to generate feature data representing the features of the sound, and the generated feature data is stored in the matching data storage area 121. The direction with the highest degree of coincidence is identified as a specific direction (step S3).

特定方向を特定すると、制御部１１は、回動機構７０のモータ７２３を制御して、撮影装置１の撮影範囲に特定方向が含まれるように、撮影装置１の向きを変更する（ステップＳ４）。このとき、制御部１１は、撮影装置１の撮影範囲の中央方向が特定方向と一致するように、撮影装置１の向きを変更するようにしてもよい。 When the specific direction is specified, the control unit 11 controls the motor 723 of the rotation mechanism 70 to change the orientation of the photographing apparatus 1 so that the specific direction is included in the photographing range of the photographing apparatus 1 (step S4). . At this time, the control unit 11 may change the orientation of the photographing apparatus 1 so that the central direction of the photographing range of the photographing apparatus 1 matches the specific direction.

制御部１１は、撮影部１８から出力される映像データとマイクロホン１５が収音した音声を表す音声データを含む動画データを、動画データ記憶領域１２２に出力する（ステップＳ５）。次いで、制御部１１は、撮影を終了するか否かを判定し（ステップＳ６）、判定結果が肯定的である場合には（ステップＳ６；ＹＥＳ）、撮影を終了する（ステップＳ７）。一方、判定結果が否定的である場合には（ステップＳ６；ＮＯ）、制御部１１は撮影を継続して行う。 The control unit 11 outputs the moving image data including the video data output from the photographing unit 18 and the audio data representing the sound collected by the microphone 15 to the moving image data storage area 122 (step S5). Next, the control unit 11 determines whether or not to end shooting (step S6). If the determination result is affirmative (step S6; YES), the shooting ends (step S7). On the other hand, when the determination result is negative (step S6; NO), the control unit 11 continuously performs photographing.

制御部１１は、撮影中において、マイクロホン１５毎の音声データを解析し、解析結果に応じて特定方向の遷移を検出する（ステップＳ８）。この動作例では、制御部１１は、マイクロホン１５毎の音声データの音圧の相関に基づいて、方向に対する音圧の分布を算出し、算出した分布において音圧のピークが表れる方向を、所定単位時間（例えば、１０ｍｓ、等）毎に検出する。そして、制御部１１は、音圧のピークの遷移を検出し、検出結果に応じて特定方向の遷移を検出する。特定方向の遷移の検出の態様としては、例えば、制御部１１は、音圧のピークを検出し、検出したピークの角度と前回に検出したピークの角度との差分が予め定められた閾値以下である場合に、前回のピーク位置の音源が移動したと判断する。具体的には、例えば、図５に示す例において、角度θ２が特定方向として特定されている場合において、図５に示す状態の所定単位時間経過後に、音圧分布が、図６に示すものに遷移したとする。このとき、図６に示す角度θ２１と図５に示す角度θ２との差分が予め定められた閾値以下である場合には、制御部１１は、図５に示す時刻において角度θ２の方向にあった音源が、角度θ２１の方向に移動したと判断し、角度θ２１を特定方向として検出する。 During the shooting, the control unit 11 analyzes the sound data for each microphone 15 and detects a transition in a specific direction according to the analysis result (step S8). In this operation example, the control unit 11 calculates the sound pressure distribution with respect to the direction based on the correlation of the sound pressure of the sound data for each microphone 15, and sets the direction in which the sound pressure peak appears in the calculated distribution to a predetermined unit. It is detected every time (for example, 10 ms). And the control part 11 detects the transition of the peak of a sound pressure, and detects the transition of a specific direction according to a detection result. For example, the control unit 11 detects a sound pressure peak, and the difference between the detected peak angle and the previously detected peak angle is equal to or less than a predetermined threshold. In some cases, it is determined that the sound source at the previous peak position has moved. Specifically, for example, in the example shown in FIG. 5, when the angle θ2 is specified as the specific direction, the sound pressure distribution is changed to that shown in FIG. 6 after a predetermined unit time in the state shown in FIG. Suppose that a transition occurs. At this time, when the difference between the angle θ21 shown in FIG. 6 and the angle θ2 shown in FIG. 5 is equal to or smaller than a predetermined threshold, the control unit 11 is in the direction of the angle θ2 at the time shown in FIG. It is determined that the sound source has moved in the direction of the angle θ21, and the angle θ21 is detected as a specific direction.

このように、制御部１１は、所定単位時間毎に音圧のピークを検出し、検出したピークの方向と特定方向との差分に応じて特定方向の移動を検出する。制御部１１は、撮影している期間にわたってこの移動の検出を逐次行い、検出結果に応じて回動機構７０を制御して撮影装置１の方向を特定方向に追尾して変更する。これにより、特定の被写体が移動した場合であっても、特定の被写体の方向を追尾して撮影を行うことができる。 Thus, the control unit 11 detects the peak of the sound pressure every predetermined unit time, and detects the movement in the specific direction according to the difference between the detected peak direction and the specific direction. The control unit 11 sequentially detects this movement over the period of shooting, controls the rotation mechanism 70 according to the detection result, and tracks and changes the direction of the imaging device 1 in a specific direction. Thereby, even when the specific subject moves, it is possible to perform shooting while tracking the direction of the specific subject.

以上のようにして、撮影装置１は、特定方向の遷移に追尾して撮影部１８の撮影範囲を変更し、撮影範囲内の映像の映像データを生成するとともに、撮影範囲内の全体の音声を表す音声データを生成し、これらのデータを含む動画データを動画データ記憶領域１２２に記憶する。 As described above, the imaging apparatus 1 tracks the transition in the specific direction, changes the imaging range of the imaging unit 18, generates video data of the video within the imaging range, and reproduces the entire audio within the imaging range. Audio data to be represented is generated, and moving image data including these data is stored in the moving image data storage area 122.

＜Ｃ：実施形態の効果＞
以上説明したように本実施形態によれば、制御部１１が、被写体（音源）の方向を複数推定し、推定した複数の音源方向から利用者が所望する音源を特定し、特定した方向が撮影範囲に含まれるように撮影装置１の向きを変更する。これにより、撮影者は、或る特定の被写体（例えば、自分の家族、自分の好きな鳥、等）を追尾して撮影を行うことができる。 <C: Effect of the embodiment>
As described above, according to the present embodiment, the control unit 11 estimates a plurality of subject (sound source) directions, specifies a sound source desired by the user from the estimated plurality of sound source directions, and captures the specified direction. The orientation of the photographing apparatus 1 is changed so as to be included in the range. Thereby, the photographer can track and photograph a specific subject (for example, his / her family, his / her favorite bird, etc.).

また、本実施形態では、制御部１１が、音源の方向の遷移を検出するから、これにより、撮影者は、撮影したい被写体が移動した場合であっても、その移動を追尾しつつ、所望する被写体を撮影範囲に含めて撮影することができる。 In the present embodiment, since the control unit 11 detects a change in the direction of the sound source, the photographer desires while tracking the movement even when the subject to be photographed moves. It is possible to shoot by including the subject in the shooting range.

また、この実施形態では、マイクアレイＭＡを用いて音源方向を特定するから、被写体に発信機をつけたり撮影装置１に発信機からの信号を受信するための受信部を別途設けたりする必要がなく、従来と比較してより簡易な構成で、特定の被写体を追尾して撮影することができる。 Further, in this embodiment, since the sound source direction is specified using the microphone array MA, it is not necessary to attach a transmitter to the subject or to separately provide a receiver for receiving a signal from the transmitter in the photographing apparatus 1. Thus, a specific subject can be tracked and photographed with a simpler configuration than in the past.

また、本実施形態によれば、マイクロホン１５毎の音圧の相関に基づいて、方向に対する音圧の分布を算出し、算出した分布において音圧のピークが表れる方向を、音源の方向として推定する。このように音圧の分布によって音源の方向を推定するから、複雑な処理を行うことなく音源の方向を推定することができる。また、音源方向の推定処理に要する処理時間を短くすることができる。 Further, according to the present embodiment, the sound pressure distribution with respect to the direction is calculated based on the sound pressure correlation for each microphone 15, and the direction in which the sound pressure peak appears in the calculated distribution is estimated as the direction of the sound source. . Since the direction of the sound source is estimated from the sound pressure distribution in this way, the direction of the sound source can be estimated without performing complicated processing. In addition, the processing time required for the sound source direction estimation process can be shortened.

また、本実施形態によれば、所定単位量の角度毎に方向別音声データを生成し、生成した各方向別音声データを照合用データ記憶領域１２１に記憶された照合用データと照合し、その一致度に基づいて方向を特定する。すなわち、所望する被写体の音声を撮影装置１に登録しておくだけで、撮影装置１が、登録された被写体を追尾して撮影するから、これにより、撮影者は、所望の被写体が移動するたびにその都度撮影装置１を移動させたり撮影装置１の向きを変えたりするといった複雑な操作を行う必要がない。 In addition, according to the present embodiment, direction-specific audio data is generated for each angle of a predetermined unit amount, the generated direction-specific audio data is compared with the verification data stored in the verification data storage area 121, and The direction is specified based on the degree of coincidence. That is, simply by registering the sound of the desired subject in the photographing device 1, the photographing device 1 tracks and records the registered subject, so that the photographer moves the desired subject. There is no need to perform complicated operations such as moving the photographing device 1 or changing the orientation of the photographing device 1 each time.

ところで、撮影したい被写体の位置を撮影者が容易に特定できない場合がある。例えば、木に止まって鳴いているセミを撮影したい場合や、林の中で特定の野鳥を撮影したい場合等においては、被写体（セミや鳥等）がどこにいるのかを特定することが困難である場合が多く、また、被写体を見つけたとしても被写体がすぐに移動してしまい、再度被写体を見失ってしまう場合が多い。また、例えば、子供の運動会や学芸会において、撮影者が撮影したい自身の子供を大勢の児童の中から見つけだすことは困難である場合が多く、自身の子供を捜している間に重要なシャッターチャンスを逃してしまう場合があった。従来の技術では、撮影開始時に被写体を画像解析により認識する方法があるが、画像解析により認識することが困難な被写体（セミ、鳥、同じ体操服を着た児童の中に子供、等）については適用することができなかった。
それに対しこの実施形態では、被写体の音声を照合して被写体の方向を特定するから、これにより、被写体が視認しにくい場合であっても、被写体を追尾して撮影することができる。なお、マイクロホン１５の周波数特性は人間の可聴領域を超える範囲であってもよく、例えば超音波に対しても適用可能である。 Incidentally, there are cases where the photographer cannot easily identify the position of the subject to be photographed. For example, if you want to shoot a cicada that is ringing in a tree or if you want to shoot a specific wild bird in the forest, it is difficult to specify where the subject (such as a cicada or a bird) is. In many cases, even if a subject is found, the subject moves quickly and often loses sight of the subject again. In addition, for example, in a children's athletic meet or a school performance, it is often difficult for a photographer to find his or her child to be photographed from a large number of children. There was a case of missing. In the conventional technology, there is a method for recognizing a subject by image analysis at the start of shooting, but for subjects that are difficult to recognize by image analysis (semi-, birds, children among the children wearing the same gym clothes, etc.) Could not be applied.
On the other hand, in this embodiment, since the direction of the subject is specified by collating the voice of the subject, the subject can be tracked and photographed even when the subject is difficult to visually recognize. Note that the frequency characteristics of the microphone 15 may be in a range that exceeds the human audible range, and can be applied to, for example, ultrasonic waves.

＜Ｄ：変形例＞
以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限定されることなく、他の様々な形態で実施可能である。以下にその一例を示す。なお、以下の各態様を適宜に組み合わせてもよい。
（１）音源方向の推定は、独立成分分析（Independent Component Analysis）を用いてもよい。独立成分分析は、複数の信号源からの各信号が空間内で混合されて複数のセンサに到来し、これらセンサで観測された到来信号から、各源信号の到来方向の推定や各源信号を分離することを、その源信号の混合系の情報を知らずに行うものであり、例えば特許３８８１３６７（特許文献３）の背景技術に記載されている。また、特許文献３に記載されている信号源の到来方向を求める技術を用いても良い。 <D: Modification>
As mentioned above, although embodiment of this invention was described, this invention is not limited to embodiment mentioned above, It can implement with another various form. An example is shown below. In addition, you may combine each following aspect suitably.
(1) The sound source direction may be estimated by using independent component analysis. Independent component analysis is a method in which each signal from multiple signal sources is mixed in space and arrives at multiple sensors. The separation is performed without knowing the information of the mixed system of the source signal, and is described in the background art of, for example, Japanese Patent No. 3881367 (Patent Document 3). Further, a technique for obtaining the arrival direction of the signal source described in Patent Document 3 may be used.

（２）方向別音声データまたは特定方向音声データの生成方法は、上述の実施形態に記載された方法に限らず、前述の独立成分分析を用いてマイクロホン１５の音声データから音源に対応する音声データを推定することで求めてもよい。また、特許文献３に記載されている技術を用いても良い。 (2) The method of generating the direction-specific audio data or the specific direction audio data is not limited to the method described in the above embodiment, and the audio data corresponding to the sound source from the audio data of the microphone 15 using the above-described independent component analysis. You may obtain | require by estimating. Further, the technique described in Patent Document 3 may be used.

（３）照合用データが記憶する音声の特徴は、個人の声の特徴（声紋など）であってもよい。制御部１１は、方向別音声データを解析し、特定の個人の声の特徴が検出されたか否かを判定することで、音源の方向を特定するようにしてもよい。 (3) The voice feature stored in the verification data may be a personal voice feature (voice print or the like). The control unit 11 may determine the direction of the sound source by analyzing the direction-specific audio data and determining whether or not a specific individual voice feature is detected.

（４）上述の実施形態では、制御部１１が、特定方向が撮影範囲に含まれるように撮影装置１の向きを変更するようにしたが、これに変えて、特定方向を撮影者に報知するようにしてもよい。具体的には、例えば、撮影装置１が撮影社に特定方向を案内するための音声メッセージを出力してもよく、また、例えば、表示部１３に特定方向を報知するメッセージを表示するようにしてもよい。また、例えば、撮影装置１の内部に、水平方向に回転する振動子を設け、撮影装置１の撮影範囲の中心方向から特定方向へ回転する向きに振動子を回転させることによって、特定方向を振動によって撮影者に報知するようにしてもよい。 (4) In the above-described embodiment, the control unit 11 changes the orientation of the photographing apparatus 1 so that the specific direction is included in the photographing range, but instead, this notifies the photographer of the specific direction. You may do it. Specifically, for example, the photographing apparatus 1 may output a voice message for guiding a specific direction to the photographing company. For example, a message for notifying the specific direction may be displayed on the display unit 13. Also good. Also, for example, a vibrator that rotates in the horizontal direction is provided inside the photographing apparatus 1, and the specific direction is vibrated by rotating the vibrator in a direction that rotates in the specific direction from the center direction of the photographing range of the photographing apparatus 1. You may make it alert | report to a photographer.

（５）上述の実施形態では、制御部１１がモータ７２３を駆動して撮影装置１の向きを変更するようにしたが、撮影範囲の変更の態様はこれに限らず、例えば、図７に示すように、撮影用レンズ１８ｂやＣＣＤ１８ｃ等を含む撮影部１８Ａを図中Ｐ方向に回動させる回動機構を撮影装置１Ａに設ける構成とし、撮影部１８Ａを回動させることによって撮影範囲を変更するようにしてもよい。図７に示す例において、回動部７５はローラ７６ａ，７６ｂ，７６ｃによって撮影装置１Ａに回動可能に支持されており、この回動部７５に撮影部１８Ａが固定されている。回動部７５の回動に伴って撮影部１８Ａが回動する。モータ７７は、制御部１１の制御の下にローラ７８を回転させ、ローラ７８の回転に伴って回動部７５が回転する。なお、撮影部１８Ａを回動させる構成はこれに限らず、他の回動機構を用いて撮影部を回動させるようにしてもよい。
このように、撮影装置の撮影範囲を変更する態様としては、撮影装置本体を回動させるようにしてもよく、また、撮影機構を回動させるようにしてもよく、要は、制御部１１が、発音有りと判定された方向が撮影範囲に含まれるように、撮影範囲を変更するように撮影装置１を制御すればよい。 (5) In the above-described embodiment, the control unit 11 drives the motor 723 to change the orientation of the photographing apparatus 1, but the manner of changing the photographing range is not limited to this, and for example, as shown in FIG. As described above, the photographing apparatus 1A is provided with a rotation mechanism for rotating the photographing unit 18A including the photographing lens 18b, the CCD 18c, and the like in the P direction in the drawing, and the photographing range is changed by rotating the photographing unit 18A. You may do it. In the example shown in FIG. 7, the rotating unit 75 is rotatably supported by the photographing apparatus 1 A by rollers 76 a, 76 b, and 76 c, and the photographing unit 18 A is fixed to the rotating unit 75. The imaging unit 18 A rotates with the rotation of the rotation unit 75. The motor 77 rotates the roller 78 under the control of the control unit 11, and the rotating unit 75 rotates with the rotation of the roller 78. Note that the configuration for rotating the imaging unit 18A is not limited to this, and the imaging unit may be rotated using another rotation mechanism.
As described above, as a mode of changing the photographing range of the photographing apparatus, the photographing apparatus main body may be rotated, or the photographing mechanism may be rotated. The photographing apparatus 1 may be controlled so as to change the photographing range so that the direction determined to have sound generation is included in the photographing range.

（６）上述の実施形態では、制御部１１は、音圧の分布における音圧のピークが表れる方向の遷移を検出することによって、被写体の移動を検出したが、これに代えて、制御部１１が、所定単位量の方向毎に、各方向からの音声の音圧が高くなるようにミキシングして方向別音声データを生成し、生成した方向別音声データのそれぞれを記憶された照合用データと照合し、その一致度に基づいて被写体（音源）の方向の遷移を検出するようにしてもよい。このとき、照合用データとしては、上述の実施形態の照合用データ記憶領域１２１に記憶した照合用データと同様のものを用いてもよい。すなわち、音声データにフィルタリング処理を施して音声の特徴を抽出した特徴データを照合用データとして用い、制御部１１が、方向毎の音声データにフィルタリング処理を施して音声の特徴を抽出し、抽出した特徴を表す特徴データと照合用データとを照合し、一致度に基づいて特定の被写体の方向の遷移を検出してもよい。 (6) In the above-described embodiment, the control unit 11 detects the movement of the subject by detecting the transition in the direction in which the peak of the sound pressure appears in the sound pressure distribution. Instead, the control unit 11 However, for each direction of a predetermined unit amount, mixing is performed so that the sound pressure of the sound from each direction becomes high, and direction-specific sound data is generated, and each of the generated direction-specific sound data is stored in the matching data It is possible to collate and detect a change in the direction of the subject (sound source) based on the degree of coincidence. At this time, as the verification data, the same data as the verification data stored in the verification data storage area 121 of the above-described embodiment may be used. That is, the feature data obtained by filtering the voice data to extract the voice features is used as the matching data, and the control unit 11 performs the filtering process on the voice data for each direction to extract and extract the voice features. The feature data representing the feature and the matching data may be collated, and the transition of the direction of a specific subject may be detected based on the degree of coincidence.

（７）上述の実施形態では、制御部１１が、音源方向毎の音声データを、照合用データ記憶領域１２１に記憶された照合用データと照合し、その一致度に基づいて特定方向を特定するようにした。これに代えて、撮影者が表示部１３に表示された被写体の位置を視認して撮影したい被写体の方向を操作部１４を操作して入力するようにしてもよい。具体的には、例えば、制御部１１が、図４のステップＳ１に示した音源方向推定処理を終えた後に、音源方向推定処理によって推定された音源方向を表示部１３に表示することによって撮影者に報知し、撮影者が、表示部１３に表示された音源方向のいずれかを操作部１４を用いて選択するようにしてもよい。この場合は、制御部１１は、操作部１４からの操作信号に応じて、推定した音源方向のうち、撮影者によって選択された音源方向を特定方向として特定する。 (7) In the above-described embodiment, the control unit 11 collates the sound data for each sound source direction with the collation data stored in the collation data storage area 121 and identifies the specific direction based on the degree of coincidence. I did it. Alternatively, the photographer may input the direction of the subject to be photographed by viewing the position of the subject displayed on the display unit 13 by operating the operation unit 14. Specifically, for example, the control unit 11 displays the sound source direction estimated by the sound source direction estimation process on the display unit 13 after the sound source direction estimation process shown in Step S1 of FIG. The photographer may select one of the sound source directions displayed on the display unit 13 by using the operation unit 14. In this case, the control unit 11 specifies the sound source direction selected by the photographer as the specific direction among the estimated sound source directions in accordance with the operation signal from the operation unit 14.

このように、制御部１１が、音源方向毎の音声データを解析することによって複数の音源方向から特定方向を特定するようにしてもよく、また、操作部１４からの操作信号に応じて特定方向を特定するようにしてもよく、要は、制御部１１が、推定した音源方向のうちの少なくともいずれか一つを特定するようにすればよい。
また、上述の実施形態では、制御部１１は、一つの特定方向を特定したが、特定方向を複数特定するようにしてもよい。 As described above, the control unit 11 may specify the specific direction from the plurality of sound source directions by analyzing the audio data for each sound source direction, and may specify the specific direction according to the operation signal from the operation unit 14. In short, it is only necessary that the control unit 11 specifies at least one of the estimated sound source directions.
Moreover, in the above-mentioned embodiment, although the control part 11 specified one specific direction, you may make it specify two or more specific directions.

また、特定方向の特定方法として、例えば、撮影者が任意の方向を選択できるようにしてもよい。この場合は、撮影者は、操作部１４を用いて収音したい方向を指定する操作を行い、制御部１１が、操作部１４からの信号に応じて、指定された方向を特定方向として、その方向からの音声を表す特定音声データを生成するようにしてもよい。具体的には、例えば、周囲の騒音が大きいために被写体の方向でピークが検出され得ないような場合においては、撮影者が収音したい方向を指定することで、撮影装置１がより好適に特定方向音声データを生成することができる。 Further, as a specific direction specifying method, for example, the photographer may be able to select an arbitrary direction. In this case, the photographer performs an operation of designating a direction in which sound is desired to be collected using the operation unit 14, and the control unit 11 sets the designated direction as a specific direction according to a signal from the operation unit 14. You may make it produce | generate the specific audio | voice data showing the audio | voice from a direction. Specifically, for example, in a case where a peak cannot be detected in the direction of the subject due to a high ambient noise, the photographing apparatus 1 is more suitably specified by specifying the direction in which the photographer wants to collect sound. Specific direction voice data can be generated.

また、撮影者が収音したい方向を指定するモードと、上述の実施形態で示したような撮影装置１が特定方向を自動的に検出するモードとを、撮影者が操作部１４を用いて選択できるようにしてもよい。この場合は、制御部１１は、操作部１４からの操作信号に応じて、選択されたモードに応じて特定方向の特定処理や特定方向音声データ生成処理等を行う。 In addition, the photographer uses the operation unit 14 to select a mode in which the photographer wants to collect sound and a mode in which the photographing apparatus 1 automatically detects a specific direction as described in the above-described embodiment. You may be able to do it. In this case, the control unit 11 performs specific direction specific processing, specific direction audio data generation processing, and the like according to the selected mode in response to an operation signal from the operation unit 14.

（８）上述の実施形態では、図２に示すような、複数のマイクロホン１５が列状に配置されて構成されたマイクアレイを備える撮影装置１について説明したが、マイクロホン１５の配置態様はこれに限らず、例えば、マイクロホン１５が面状（２次元状）に配置されて構成されたマイクアレイを備える構成としてもよい。また、例えば、図８に示すような、撮影装置１の前面及び側面に３次元状に配置されて構成されたマイクアレイを備える構成としてもよい。この場合は、撮影装置１Ａは、音源の角度として、ｘ軸方向（図８参照）の角度だけでなく、ｙ軸方向及びｚ軸方向（図８参照）の角度をも推定することができるから、音源の方向を３次元で推定することができ、より詳細な方向を推定することができる。また、この場合は、より広い範囲で音源を検出することができる。 (8) In the above-described embodiment, the imaging apparatus 1 including the microphone array configured by arranging a plurality of microphones 15 as shown in FIG. 2 has been described. However, the arrangement of the microphones 15 is not limited thereto. For example, the microphone 15 may be configured to include a microphone array in which the microphone 15 is arranged in a planar shape (two-dimensional shape). For example, as shown in FIG. 8, it is good also as a structure provided with the microphone array comprised by the three-dimensional arrangement | positioning on the front surface and side surface of the imaging device 1. FIG. In this case, the imaging apparatus 1A can estimate not only the angle in the x-axis direction (see FIG. 8) but also the angle in the y-axis direction and the z-axis direction (see FIG. 8) as the angle of the sound source. The direction of the sound source can be estimated in three dimensions, and a more detailed direction can be estimated. In this case, the sound source can be detected in a wider range.

また、マイクロホンが２次元や３次元に配置されて構成されたマイクアレイを備える撮影装置を用いる場合には、撮影装置を水平方向（図８における矢印Ｐ方向）に回転させるに加えて、垂直方向（図８における矢印Ｑ方向）に回転させる回動機構７０Ａを備える構成としてもよい。 In addition, in the case of using an imaging device including a microphone array in which microphones are arranged two-dimensionally or three-dimensionally, in addition to rotating the imaging device in the horizontal direction (arrow P direction in FIG. 8), the vertical direction It is good also as a structure provided with 70 A of rotation mechanisms rotated in the (arrow Q direction in FIG. 8).

また、上述の実施形態におけるマイクロホン１５は、小型のシリコンマイクを用いるようにしてもよい。また、通常の音波に加えて超音波を収音可能な小型のシリコンマイクを用いる構成とし、超音波を発信する超音波タグを被写体に装着させ、この超音波タグから発信される超音波をシリコンマイクが受信し、制御部１１が、シリコンマイクが受信した超音波に基づいて被写体の方向を特定するようにしてもよい。この場合は、被写体に超音波タグを装着させる必要があるものの、録音するためのマイクアレイを超音波の受信器として併用することができるため、簡易な装置構成で被写体の方向を特定することができる。 The microphone 15 in the above embodiment may be a small silicon microphone. In addition, a small silicon microphone that can pick up ultrasonic waves in addition to normal sound waves is used, an ultrasonic tag that emits ultrasonic waves is attached to the subject, and the ultrasonic waves that are emitted from the ultrasonic tags are siliconized. The microphone may receive the control unit 11 may determine the direction of the subject based on the ultrasonic waves received by the silicon microphone. In this case, although it is necessary to attach an ultrasonic tag to the subject, a microphone array for recording can be used together as an ultrasonic receiver, so the direction of the subject can be specified with a simple device configuration. it can.

（９）上述の実施形態では、動画データを記憶する記憶手段としてハードディスク装置等の記憶部１２を用いたが、動画データを記憶する記憶手段はハードディスク装置に限らず、例えば、ＳＤカード、ＣＤ−Ｒ、ＣＤ−Ｒ／Ｗ等の記録媒体であってもよく、要は、制御部１１が、動画データを、コンピュータが読取可能な記録媒体に記録するようにすればよい。また、制御部１１が動画データを通信ネットワークを介して所定のサーバ装置に出力するようにしてもよい。 (9) In the above-described embodiment, the storage unit 12 such as a hard disk device is used as the storage unit for storing the moving image data. However, the storage unit for storing the moving image data is not limited to the hard disk device, for example, an SD card, a CD- The recording medium may be a recording medium such as R or CD-R / W. In short, the control unit 11 may record the moving image data on a computer-readable recording medium. Moreover, you may make it the control part 11 output a moving image data to a predetermined | prescribed server apparatus via a communication network.

（１０）上述の実施形態では、撮影装置１が、上記実施形態に係る全ての処理を実行するようになっていた。これに対し、通信ネットワークや通信Ｉ／Ｆ等で接続された２以上の装置で上記実施形態に係る処理を分担して実行するようにし、それら複数の装置を備えるシステムが同実施形態の撮影装置１を実現するようにしてもよい。具体的には、例えば、デジタルカメラとコンピュータ装置とがＵＳＢ等の通信Ｉ／Ｆを介して接続されたシステムとして構成されていてもよい。 (10) In the above-described embodiment, the photographing apparatus 1 executes all the processes according to the above-described embodiment. On the other hand, the processing according to the above embodiment is divided and executed by two or more devices connected by a communication network, a communication I / F, and the like, and a system including the plurality of devices is an imaging device according to the embodiment. 1 may be realized. Specifically, for example, a system in which a digital camera and a computer apparatus are connected via a communication I / F such as a USB may be configured.

（１１）上述の実施形態では、撮影装置１の制御部１１は、音圧分布を算出してピーク値が表れる角度を音源方向として推定した。音源方向の推定方法はこれに限らず、例えば、所定単位量の角度毎に音圧を検出し、検出した音圧が予め定められた閾値以上となる角度を音源方向として検出するようにしてもよく、要は、制御部１１が、マイクロホン１５から出力される音声データの音圧を所定単位量の角度毎に検出し、検出した角度毎の音圧から音源方向を推定するようにすればよい。 (11) In the above-described embodiment, the control unit 11 of the photographing apparatus 1 calculates the sound pressure distribution and estimates the angle at which the peak value appears as the sound source direction. The method of estimating the sound source direction is not limited to this. For example, the sound pressure is detected for each predetermined unit amount of angle, and the angle at which the detected sound pressure is equal to or greater than a predetermined threshold may be detected as the sound source direction. In short, the control unit 11 may detect the sound pressure of the audio data output from the microphone 15 for each angle of a predetermined unit amount, and estimate the sound source direction from the sound pressure for each detected angle. .

また、上述の実施形態では、音声データの音圧に基づいて音源方向を推定したが、これに限らず、方向毎の音声データの周波数特性を検出し、検出した周波数特性に基づいて音源方向を推定してもよい。
このように、音源方向は、音声データの音圧に基づいて検出してもよく、また、周波数に基づいて検出するようにしてもよく、要は、制御部１１が、マイクロホン１５から出力される音声データを音声解析し、解析結果に応じて音源方向を推定するものであればよい。 In the above-described embodiment, the sound source direction is estimated based on the sound pressure of the sound data. However, the present invention is not limited to this, and the frequency characteristic of the sound data for each direction is detected, and the sound source direction is determined based on the detected frequency characteristic. It may be estimated.
Thus, the sound source direction may be detected based on the sound pressure of the audio data, or may be detected based on the frequency. In short, the control unit 11 is output from the microphone 15. Any sound analysis may be used as long as sound data is analyzed and the sound source direction is estimated according to the analysis result.

また、制御部１１が撮影部１８から出力される映像データを画像解析して人物抽出（又は顔抽出）処理を行い、抽出した人物（又は顔）の位置に対応する方向を音源方向として推定するようにしてもよい。また、上述の音声解析とこの画像解析とを併用して音源方向を推定するようにしてもよい。このように音声解析結果に加えて画像解析結果を用いることで、音源推定処理の精度を高くすることができる。 Further, the control unit 11 analyzes the video data output from the photographing unit 18 and performs person extraction (or face extraction) processing, and estimates the direction corresponding to the position of the extracted person (or face) as the sound source direction. You may do it. Further, the sound source direction may be estimated by using both the above-described audio analysis and the image analysis. Thus, the accuracy of the sound source estimation process can be increased by using the image analysis result in addition to the voice analysis result.

（１２）上述の実施形態では、制御部１１は、音源方向からの音声を表す音声データの特徴を表す特徴データと、照合用データ記憶領域１２１に記憶された照合用データとを照合し、その一致度に基づいて特定方向を特定するようにしたが、特定方向の特定方法はこれに限らず、例えば、特定の被写体の画像を表すデータを照合用データとして照合用データ記憶領域１２１に予め記憶しておき、制御部１１が、撮影部１８から出力される映像データを画像解析し、解析結果に応じて人物抽出（又は顔抽出）処理を行い、抽出した人物（又は顔）の画像データと照合用データ記憶領域１２１に記憶された照合用データとを照合して、その一致度に基づいて特定方向を特定するようにしてもよい。 (12) In the above-described embodiment, the control unit 11 collates the feature data representing the characteristics of the voice data representing the sound from the sound source direction with the collation data stored in the collation data storage area 121, Although the specific direction is specified based on the degree of coincidence, the specific direction specifying method is not limited to this. For example, data representing an image of a specific subject is stored in the verification data storage area 121 in advance as verification data. In addition, the control unit 11 performs image analysis on the video data output from the photographing unit 18, performs person extraction (or face extraction) processing according to the analysis result, and extracts image data of the person (or face). The specific direction may be specified based on the degree of coincidence by collating with the collation data stored in the collation data storage area 121.

また、上述の実施形態では、制御部１１が、マイクロホン１５で収音した音声を表す音声データにフィルタリング処理等を施して音声の特徴を表す特徴データを生成し、生成した特徴データを照合用データとして用いたが、これに限らず、マイクロホン１５で収音した音声を表す音声データをそのまま照合用データとして用いてもよい。 In the above-described embodiment, the control unit 11 performs filtering processing or the like on the sound data representing the sound collected by the microphone 15 to generate feature data representing the sound characteristics, and the generated feature data is used as the matching data. However, the present invention is not limited to this, and sound data representing the sound picked up by the microphone 15 may be used as it is as collation data.

（１３）また、上述の実施形態において、制御部１１が音源（被写体）にピントを合わせるように撮影部１８を制御するようにしてもよい。この場合は、例えば、制御部１１が、複数の異なるマイクロホン１５（例えば、図２に示すマイクロホン１５１とマイクロホン１５ｎ）が収音した音声を表す音声データを解析し、音が複数のマイクロホン１５のそれぞれに到達する時間差を算出し、算出した時間差を用いて撮影装置１と音源との距離を算出し、算出結果に応じてフォーカス制御を行うようにしてもよい。 (13) In the above-described embodiment, the control unit 11 may control the photographing unit 18 to focus on the sound source (subject). In this case, for example, the control unit 11 analyzes audio data representing the sound collected by a plurality of different microphones 15 (for example, the microphone 151 and the microphone 15n illustrated in FIG. 2), and the sound is transmitted to each of the plurality of microphones 15. It is also possible to calculate the time difference to reach, calculate the distance between the imaging device 1 and the sound source using the calculated time difference, and perform focus control according to the calculation result.

また、撮影装置１が、音源の方向にカメラの方向を変更するとともに、被写体の接近や遠ざかりに応じて、ズームインやズームアウトを自動的に行うようにしてもよい。この場合、制御部１１が、上述の処理によって撮影装置１と音源との距離を算出し、算出した距離に応じて自動的にズームアップ（例えば、人の顔がアップになる様に）するように制御してもよい。 In addition, the photographing apparatus 1 may change the direction of the camera to the direction of the sound source, and may automatically zoom in or out according to the approach or distance of the subject. In this case, the control unit 11 calculates the distance between the photographing apparatus 1 and the sound source by the above-described processing, and automatically zooms up (for example, so that the human face is increased) according to the calculated distance. You may control to.

（１４）上述の実施形態では、本発明に係る撮影装置をデジタルカメラに適用した例について説明したが、本発明に係る撮影装置が適用される装置はデジタルカメラに限らず、例えば、パーソナルコンピュータ、携帯通信端末、コンピュータゲーム機等であってもよく、本発明に係る撮影装置は様々な装置に適用可能である。 (14) In the above-described embodiment, the example in which the photographing apparatus according to the present invention is applied to a digital camera has been described. However, the apparatus to which the photographing apparatus according to the present invention is applied is not limited to a digital camera, for example, a personal computer, A mobile communication terminal, a computer game machine, etc. may be sufficient and the imaging device concerning this invention is applicable to various apparatuses.

（１５）上述の実施形態における撮影装置１の制御部１１によって実行されるプログラムは、磁気テープ、磁気ディスク、フレキシブルディスク、光記録媒体、光磁気記録媒体、ＲＡＭ、ＲＯＭなどのコンピュータが読取可能な記録媒体に記録した状態で提供し得る。また、インターネットのようなネットワーク経由で撮影装置１にダウンロードさせることも可能である。 (15) A program executed by the control unit 11 of the photographing apparatus 1 in the above-described embodiment is readable by a computer such as a magnetic tape, a magnetic disk, a flexible disk, an optical recording medium, a magneto-optical recording medium, a RAM, and a ROM. It can be provided in a state of being recorded on a recording medium. It is also possible to download the image capturing apparatus 1 via a network such as the Internet.

撮影装置のハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware constitutions of an imaging device. 撮影装置の外観の一例を示す斜視図である。It is a perspective view which shows an example of the external appearance of an imaging device. 回動機構の構成の一例を示す図である。It is a figure which shows an example of a structure of a rotation mechanism. 撮影装置が行う撮影処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the imaging | photography process which an imaging device performs. 制御部が算出する音圧分布の一例を示す図である。It is a figure which shows an example of the sound pressure distribution which a control part calculates. 制御部が算出する音圧分布の一例を示す図である。It is a figure which shows an example of the sound pressure distribution which a control part calculates. 回動機構の構成の一例を示す図である。It is a figure which shows an example of a structure of a rotation mechanism. 撮影装置の外観の一例を示す斜視図である。It is a perspective view which shows an example of the external appearance of an imaging device.

Explanation of symbols

１…撮影装置、１１…制御部、１２…記憶部、１３…表示部、１４…操作部、１５…マイクロホン、１６…音声処理部、１７…スピーカ、１８…撮影部、１２１…照合用データ記憶領域、１２２…動画データ記憶領域。 DESCRIPTION OF SYMBOLS 1 ... Imaging | photography apparatus, 11 ... Control part, 12 ... Memory | storage part, 13 ... Display part, 14 ... Operation part, 15 ... Microphone, 16 ... Audio | voice processing part, 17 ... Speaker, 18 ... Shooting part, 121 ... Data storage for collation Area 122: A moving image data storage area.

Claims

A shooting means for setting a shooting range and outputting video data representing a video in the shooting range;
A plurality of microphones that collect sound for each microphone around the photographing means and output as sound data;
Analyzing audio data output from each of the plurality of microphones, and estimating means for estimating one or a plurality of sound source directions according to the analysis result;
Specifying means for specifying at least one of the directions of the sound sources estimated by the estimating means;
Detecting means for analyzing voice data for each microphone, and detecting a change in direction of a sound source specified by the specifying means according to an analysis result;
An imaging apparatus comprising: an imaging range changing unit that changes the imaging range of the imaging unit to a range including a direction of a sound source detected by the detection unit.

The video data storage control means for storing video data output from the photographing means in a predetermined storage means when the photographing range is changed by the photographing range changing means. The imaging device described.

The estimation means calculates a sound pressure distribution around the photographing means based on the correlation of the sound data for each microphone, and estimates the direction in which the sound pressure peak appears in the calculated distribution as the direction of the sound source. The photographing apparatus according to claim 1 or 2, characterized in that

The photographing range changing means changes the direction of the photographing means so that the direction of the sound source estimated by the direction estimating means is included in the photographing range. The imaging device according to item.

The imaging device according to claim 3, wherein the detection means detects a transition of a peak of sound pressure in the distribution of sound pressure calculated by the estimation means.

Collation data storage means for storing collation data representing the characteristics of speech;
Direction-specific sound data generating means for generating direction-specific sound data corresponding to each of the directions estimated by the estimation means from the sound data for each microphone; and
The specifying unit collates the direction-specific audio data generated by the direction-specific audio data generation unit with the verification data stored in the verification data storage unit, and determines the direction of the sound source based on the degree of coincidence of the two The imaging device according to claim 1, wherein the imaging device is specified.

Collation data generating means for generating, as collation data, voice data corresponding to the direction specified by the specifying means from the voice data for each of the plurality of microphones;
Direction-specific sound data generating means for generating direction-specific sound data corresponding to each of the directions estimated by the estimation means from the sound data for each microphone; and
The detection means collates each direction-specific sound data generated by the direction-specific sound data generation means with matching data generated by the matching data generation means, and based on the degree of coincidence, the direction of the sound source The imaging device according to claim 1, wherein a transition of the image is detected.

8. The photographing apparatus according to claim 6, wherein the collation data includes characteristic information of a specific individual voice.

The direction-specific sound data generation means generates direction-specific sound data by mixing each direction from the direction estimated by the estimation means so that the sound pressure from the direction becomes high. The imaging device according to any one of 6 to 8.

9. The direction-specific sound data generating means generates direction-specific sound data by estimating sound data corresponding to a sound source from the sound data using independent component analysis. The imaging device according to item 1.

The imaging apparatus according to claim 1, wherein the estimation unit estimates a direction of a sound source using independent component analysis.