JP2011061461A

JP2011061461A - Imaging apparatus, directivity control method, and program therefor

Info

Publication number: JP2011061461A
Application number: JP2009208483A
Authority: JP
Inventors: Tatsuya Koizumi; 達哉小泉
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2009-09-09
Filing date: 2009-09-09
Publication date: 2011-03-24

Abstract

<P>PROBLEM TO BE SOLVED: To more surely emphasize and input a voice uttered by an object. <P>SOLUTION: Since a DVC 100 recognizes a face of a person in a through image, and controls the directivity of a microphone part 111 on the basis of a range occupied by the face of the person in the through image, the voice is input by directivity for emphasizing the voice input from the range occupied by the face of the person in an imaging range. Thus, the voice uttered by the person is emphasized and input no matter from which part of the face of the person it is uttered, and the voice uttered by the person is further surely emphasized and input. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、撮像装置、指向性制御方法及びそのプログラムに関し、例えば撮像装置におけるマイクロホンの指向性を制御する際に適用して好適なものである。 The present invention relates to an imaging apparatus, a directivity control method, and a program thereof, and is suitable for application when controlling the directivity of a microphone in an imaging apparatus, for example.

デジタルビデオカメラ（以下、これをＤＶＣとも呼ぶ）などの撮像装置において、被写体が発する音声を強調して入力する撮像装置が提案されている。例えば、撮像装置から見た被写体の方向を検出し、当該方向から入力される音声を強調して入力する撮像装置が提案されている（例えば特許文献１参照）。 In an imaging apparatus such as a digital video camera (hereinafter also referred to as “DVC”), an imaging apparatus that emphasizes and inputs sound emitted from a subject has been proposed. For example, there has been proposed an imaging apparatus that detects the direction of a subject viewed from an imaging apparatus and emphasizes and inputs sound input from the direction (see, for example, Patent Document 1).

特開２００８−１９３１９６号公報JP 2008-193196 A

ところで、被写体から発せられる音声というのは、撮像装置から見て必ずしも被写体の中心から発せられるわけではなく、被写体の右端から発せられたり左端から発せられたりする場合もある。 By the way, the sound emitted from the subject is not necessarily emitted from the center of the subject when viewed from the imaging device, and may be emitted from the right end or the left end of the subject.

このような場合、上述した装置において、例えば被写体の中心から撮像装置に向かう方向の音声を強調して入力するようにしていると、被写体の右端から発せられる音声や左端から発せられる音声を強調して入力することはできない。 In such a case, in the above-described apparatus, for example, if the voice in the direction from the center of the subject toward the imaging device is emphasized and input, the voice emitted from the right end of the subject and the voice emitted from the left end are emphasized. Cannot be entered.

つまり上述した撮像装置では、一点から入力される音声しか強調して入力することができないので、必ずしも被写体が発する音声を強調して入力し得るとは言えなかった。 That is, in the above-described imaging apparatus, only the sound input from one point can be emphasized and input, and thus it cannot always be said that the sound emitted from the subject is emphasized and input.

本発明は以上の点を考慮してなされたもので、一段と確実に被写体が発する音声を強調して入力し得る撮像装置、指向性制御方法及びそのプログラムを提案しようとするものである。 The present invention has been made in consideration of the above points, and intends to propose an imaging apparatus, a directivity control method, and a program thereof that can emphasize and input sound emitted from a subject more reliably.

かかる課題を解決するため本発明の撮像装置においては、撮像画像を取得する撮像部と、音声を入力する音声入力部と、撮像画像における被写体を認識する認識部と、撮像画像における被写体の占める範囲に基づいて、音声入力部の指向性を制御する制御部とを設けるようにした。 In order to solve this problem, in the imaging apparatus of the present invention, an imaging unit that acquires a captured image, an audio input unit that inputs sound, a recognition unit that recognizes a subject in the captured image, and a range occupied by the subject in the captured image And a control unit for controlling the directivity of the voice input unit.

このように本発明の撮像装置は、撮像画像における被写体の占める範囲に基づいて音声入力部の指向性を制御することにより、撮像範囲のうち被写体の占める範囲から入力される音声を強調する指向性で音声を入力できる。これにより本発明の撮像装置は、撮像装置から見て被写体のどの部分から発せられたかによらず、被写体が発する音声を強調して入力することができる。 As described above, the imaging apparatus of the present invention controls the directivity of the audio input unit based on the range occupied by the subject in the captured image, thereby enhancing the directivity that emphasizes the sound input from the range occupied by the subject in the imaging range. Voice can be input with. As a result, the image pickup apparatus of the present invention can emphasize and input the sound emitted from the subject regardless of which part of the subject is emitted from the image pickup apparatus.

本発明によれば、撮像画像における被写体の占める範囲に基づいて音声入力部の指向性を制御することにより、撮像範囲のうち被写体の占める範囲から入力される音声を強調する指向性で音声を入力できる。これにより、撮像装置から見て被写体のどの部分から発せられたかによらず、被写体が発する音声を強調して入力することができる。かくして一段と確実に被写体が発する音声を強調して入力し得る撮像装置、指向性制御方法及びそのプログラムを実現できる。 According to the present invention, by controlling the directivity of the sound input unit based on the range occupied by the subject in the captured image, the sound is input with the directivity that emphasizes the sound input from the range occupied by the subject in the imaging range. it can. This makes it possible to emphasize and input the sound emitted from the subject regardless of which part of the subject is emitted from the imaging device. In this way, it is possible to realize an imaging apparatus, a directivity control method, and a program thereof that are capable of enhancing and inputting the sound emitted from the subject.

第１の実施の形態の概要を示す機能ブロック図である。It is a functional block diagram which shows the outline | summary of 1st Embodiment. ＤＶＣ（デジタルビデオカメラ）のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of DVC (digital video camera). マイクロホン部及び指向角可変部のハードウェア構成の具体例を示すブロック図である。It is a block diagram which shows the specific example of the hardware constitutions of a microphone part and a directivity angle variable part. マイクロホン部の指向特性の一例の説明に供する略線図である。It is a basic diagram with which it uses for description of an example of the directional characteristic of a microphone part. 指向角制御処理の概要の説明に供する略線図である。It is a basic diagram with which it uses for description of the outline | summary of a directivity angle control process. 水平方向における撮像画角の算出方法の説明に供する略線図である。It is a basic diagram with which it uses for description of the calculation method of the imaging view angle in a horizontal direction. 顔画角の算出方法の説明に供する略線図である。It is an approximate line figure used for explanation of a calculation method of a face angle of view. 顔枠が複数検出された場合の顔画角の算出方法の説明に供する略線図である。It is an approximate line figure used for explanation of a calculation method of a face angle of view when a plurality of face frames are detected. 顔画角と適切指向角の関係の説明に供する略線図である。It is an approximate line figure used for explanation of a relation between a face angle of view and a suitable directivity angle. 指向角制御処理手順を示すフローチャートである。It is a flowchart which shows a directivity angle control processing procedure. 第２乃至第４の実施の形態の概要を示す機能ブロック図である。It is a functional block diagram which shows the outline | summary of 2nd thru | or 4th embodiment. 第２の実施の形態における指向角制御処理の説明に供する略線図である。It is a basic diagram with which it uses for description of the directivity angle control process in 2nd Embodiment. 第３の実施の形態における指向角制御処理の説明に供する略線図である。It is a basic diagram with which it uses for description of the directivity angle control process in 3rd Embodiment. 第４の実施の形態における指向角制御処理の説明に供する略線図である。It is a basic diagram with which it uses for description of the directivity angle control process in 4th Embodiment. 他の実施の形態における指向角制御処理（１）の説明に供する略線図である。It is a basic diagram with which it uses for description of the directivity angle control process (1) in other embodiment. 他の実施の形態における指向角制御処理（２）の説明に供する略線図である。It is an approximate line figure used for explanation of directivity angle control processing (2) in other embodiments. 他の実施の形態における指向角制御処理（３）の説明に供する略線図である。It is a basic diagram with which it uses for description of the directivity angle control process (3) in other embodiment. 他の実施の形態における指向角制御処理（４）の説明に供する略線図である。It is a basic diagram with which it uses for description of the directivity angle control process (4) in other embodiment.

以下、発明を実施するための最良の形態（以下実施の形態とする）について説明する。尚、説明は以下の順序で行う。
１．第１の実施の形態
２．第２の実施の形態
３．第３の実施の形態
４．第４の実施の形態
５．他の実施の形態 Hereinafter, the best mode for carrying out the invention (hereinafter referred to as an embodiment) will be described. The description will be given in the following order.
1. 1. First embodiment 2. Second embodiment 3. Third embodiment 4. Fourth embodiment Other embodiments

＜１．第１の実施の形態＞
［１−１．第１の実施の形態の概要］
まず、第１の実施の形態の概要を説明する。この概要を説明した後、本実施の形態の具体例の説明に移る。 <1. First Embodiment>
[1-1. Outline of First Embodiment]
First, the outline of the first embodiment will be described. After the outline is described, the description moves to a specific example of the present embodiment.

図１において１は、撮像装置を示す。この撮像装置１は、撮像画像を取得する撮像部２と、音声を入力する音声入力部３と、撮像画像における被写体を認識する認識部４と、撮像画像における被写体の占める範囲に基づいて、音声入力部３の指向性を制御する制御部５とを有している。 In FIG. 1, reference numeral 1 denotes an imaging apparatus. The imaging apparatus 1 includes an imaging unit 2 that acquires a captured image, an audio input unit 3 that inputs sound, a recognition unit 4 that recognizes a subject in the captured image, and a voice based on a range occupied by the subject in the captured image. And a control unit 5 that controls the directivity of the input unit 3.

このような構成により撮像装置１は、撮像範囲のうち被写体の占める範囲から入力される音声を強調する指向性で音声を入力できる。これにより撮像装置１は、撮像装置１から見て被写体のどの部分から発せられたかによらず被写体が発する音声を強調して入力することができる。 With such a configuration, the imaging apparatus 1 can input sound with directivity that emphasizes sound input from the range occupied by the subject in the imaging range. Thereby, the imaging device 1 can emphasize and input the sound emitted from the subject regardless of which part of the subject is emitted from the imaging device 1.

このような構成でなる撮像装置１の具体例について、以下、詳しく説明する。 A specific example of the imaging apparatus 1 having such a configuration will be described in detail below.

［１−２．ＤＶＣのハードウェア構成］
上述した撮像装置１の具体例となるデジタルビデオカメラ（ＤＶＣ）のハードウェア構成について、図２を用いて説明する。 [1-2. Hardware configuration of DVC]
A hardware configuration of a digital video camera (DVC), which is a specific example of the imaging apparatus 1 described above, will be described with reference to FIG.

ＤＶＣ１００は、制御部１０１が、内蔵のフラッシュメモリ１０２に書き込まれたプログラムをＲＡＭ１０３にロードして実行することで各種処理を実行すると共に、タッチパネル１０４や操作部１０５からの入力信号に応じて各部を制御する。因みにＲＡＭは、Random Access Memoryの略である。 In the DVC 100, the control unit 101 executes various processes by loading the program written in the built-in flash memory 102 into the RAM 103 and executing the program. The DVC 100 also sets each unit according to an input signal from the touch panel 104 or the operation unit 105. Control. Incidentally, RAM is an abbreviation for Random Access Memory.

タッチパネル１０４は、液晶パネル１０６と共にタッチスクリーン１０７を構成するデバイスであり、タッチパネル１０４上の任意の位置が指でタッチされると、タッチされた位置を液晶パネル１０６に表示させる画面の座標として検出する。そしてタッチパネル１０４は、タッチされた位置の座標に応じた入力信号を制御部１０１に送る。 The touch panel 104 is a device that forms a touch screen 107 together with the liquid crystal panel 106. When an arbitrary position on the touch panel 104 is touched with a finger, the touched position is detected as a screen coordinate displayed on the liquid crystal panel 106. . The touch panel 104 sends an input signal corresponding to the coordinates of the touched position to the control unit 101.

操作部１０５は、ズームレバー（ＴＥＬＥ／ＷＩＤＥ）、シャッタボタン、電源ボタン、モード切替ボタンなどからなるデバイスであり、これらの押下操作に応じた入力信号を制御部１０１に送る。 The operation unit 105 is a device including a zoom lever (TELE / WIDE), a shutter button, a power button, a mode switching button, and the like, and sends an input signal corresponding to these pressing operations to the control unit 101.

制御部１０１は、タッチパネル１０４又は操作部１０５を介して、撮影モードへ切り替えるよう指示されると、動作モードを撮影モードに切り替える。 When instructed to switch to the shooting mode via the touch panel 104 or the operation unit 105, the control unit 101 switches the operation mode to the shooting mode.

すると撮像部１０８は、制御部１０１の制御のもと、レンズ部１０９を介して取り込んだ被写体からの光を撮像素子で電気信号に変換（すなわち光電変換）することで、アナログの画像信号を得る。そして撮像部１０８は、この画像信号をデジタルの画像信号に変換した後、制御部１０１に送る。 Then, under the control of the control unit 101, the imaging unit 108 obtains an analog image signal by converting light from the subject captured via the lens unit 109 into an electrical signal (that is, photoelectric conversion) by the imaging device. . Then, the imaging unit 108 converts this image signal into a digital image signal, and then sends it to the control unit 101.

制御部１０１は、撮像部１０８から送られてくる画像信号に所定の処理を施して液晶パネル１０６に送る。この結果、液晶パネル１０６には、被写体の画像がスルー画像として表示される。こうすることで、ＤＶＣ１００は、撮影者に被写体を確認させることができる。 The control unit 101 performs predetermined processing on the image signal sent from the imaging unit 108 and sends it to the liquid crystal panel 106. As a result, the image of the subject is displayed on the liquid crystal panel 106 as a through image. By doing so, the DVC 100 can make the photographer confirm the subject.

またこのとき、制御部１０１は、撮像部１０８から送られてくる画像信号を顔認識処理部１１０に送る。顔認識処理部１１０は、制御部１０１の制御のもと、送られてきた画像信号を解析して、この画像信号に基づく画像（つまりスルー画像）から人物の顔を認識する処理（これを顔認識処理とも呼ぶ）を行う。そして顔認識処理部１１０は、スルー画像から人物の顔が認識されたか否か、顔と認識されたのはスルー画像のどの部分かなどを、顔認識処理の結果として、制御部１０１に返す。 At this time, the control unit 101 sends the image signal sent from the imaging unit 108 to the face recognition processing unit 110. Under the control of the control unit 101, the face recognition processing unit 110 analyzes the transmitted image signal and recognizes a person's face from an image based on the image signal (that is, a through image) Also called recognition processing). The face recognition processing unit 110 then returns to the control unit 101 as a result of the face recognition processing whether or not a person's face has been recognized from the through image and which part of the through image has been recognized as a face.

制御部１０１は、アイコンや、顔と認識された部分を示す矩形の枠（これを顔枠とも呼ぶ）などのグラフィックス信号を生成して、これを画像信号に重畳する。この結果、液晶パネル１０６には、スルー画像と共にアイコンや顔枠などが表示される。 The control unit 101 generates a graphics signal such as an icon or a rectangular frame indicating a part recognized as a face (also referred to as a face frame), and superimposes it on the image signal. As a result, an icon, a face frame, and the like are displayed on the liquid crystal panel 106 together with the through image.

またこのとき、制御部１０１は、顔認識処理の結果に基づいて、人物の発する声を強調して入力するのに適したマイクロホン部１１１の指向角（これを適切指向角とも呼ぶ）を算出する。尚、この適切指向角を算出する方法については、後述する指向角制御処理の中で詳しく説明する。 At this time, the control unit 101 calculates a directivity angle (also referred to as an appropriate directivity angle) of the microphone unit 111 that is suitable for emphasizing and inputting a voice uttered by a person based on the result of the face recognition process. . The method for calculating the appropriate directivity angle will be described in detail in the directivity angle control process described later.

そして制御部１０１は、指向角可変部１１２を介して、マイクロホン部１１１の指向角を適切指向角になるように制御する。 Then, the control unit 101 controls the directivity angle of the microphone unit 111 to be an appropriate directivity angle via the directivity angle variable unit 112.

ここで、マイクロホン部１１１及び指向角可変部１１２のハードウェア構成の具体例を、図３を用いて説明する。 Here, a specific example of the hardware configuration of the microphone unit 111 and the directivity angle varying unit 112 will be described with reference to FIG.

例えばマイクロホン部１１１は、鋭指向性マイクロホン１１１Ａ及び無指向性マイクロホン１１１Ｂで構成される。図４（Ａ）に鋭指向性マイクロホン１１１Ａの指向特性（ポーラパターン）を示し、図４（Ｂ）に無指向性マイクロホン１１１Ｂの指向特性を示す。 For example, the microphone unit 111 includes an acute directional microphone 111A and an omnidirectional microphone 111B. 4A shows the directivity characteristic (polar pattern) of the sharp directivity microphone 111A, and FIG. 4B shows the directivity characteristic of the omnidirectional microphone 111B.

ここではマイクロホンの指向角とは、例えば指向主軸を０［ｄＢ］とした場合、−６［ｄＢ］以上となる範囲の角度を示したものとする。尚ここでは、鋭指向性マイクロホン１１１Ａの指向主軸は、撮像部１０８の撮像主軸（つまりＤＶＣ１００の正面方向）と一致しているとする。 Here, the directivity angle of the microphone indicates an angle in a range of −6 [dB] or more when the directivity main axis is 0 [dB], for example. Here, it is assumed that the directional main axis of the sharp directivity microphone 111A coincides with the imaging main axis of the imaging unit 108 (that is, the front direction of the DVC 100).

指向角可変部１１２（図３）は、レベル可変部１１２Ａ及び１１２Ｂと、加算器１１２Ｃとから構成される。レベル可変部１１２Ａは、制御部１０１の制御のもと、鋭指向性マイクロホン１１１Ａから送られてくる音声信号のレベルを変化させるようになされている。レベル可変部１１２Ｂは、制御部１０１の制御のもと、無指向性マイクロホン１１１Ｂから送られてくる音声信号のレベルを変化させるようになされている。 The directivity angle variable unit 112 (FIG. 3) includes level variable units 112A and 112B and an adder 112C. The level variable unit 112A is configured to change the level of the audio signal transmitted from the sharp directivity microphone 111A under the control of the control unit 101. The level variable unit 112B is configured to change the level of the audio signal transmitted from the omnidirectional microphone 111B under the control of the control unit 101.

加算器１１２Ｃは、レベル可変部１１２Ａから送られてくる鋭指向性マイクロホン１１１Ａの音声信号と、レベル可変部１１２Ｂから送られてくる無指向性マイクロホン１１１Ｂの音声信号とを合成するようになされている。 The adder 112C synthesizes the audio signal of the acute directional microphone 111A sent from the level variable unit 112A and the audio signal of the omnidirectional microphone 111B sent from the level variable unit 112B. .

図４（Ｃ）に、例えば鋭指向性マイクロホン１１１Ａのレベルが５０％であり、無指向性マイクロホン１１１Ｂのレベルが５０％である場合のマイクロホン部１１１の指向特性を示す。図４（Ｃ）に示すように、マイクロホン部１１１の指向特性は、鋭指向性マイクロホン１１１Ａ及び無指向性マイクロホン１１１Ｂの指向特性をレベル比に応じて合成したものとなる。尚、マイクロホン部１１１の指向主軸は、鋭指向性マイクロホン１１１Ａの指向主軸（すなわち撮像部１０８の撮像主軸）と一致している。 FIG. 4C shows the directivity characteristics of the microphone unit 111 when the level of the sharp directional microphone 111A is 50% and the level of the omnidirectional microphone 111B is 50%, for example. As shown in FIG. 4C, the directivity of the microphone unit 111 is a combination of the directivity of the sharp directivity microphone 111A and the non-directive microphone 111B according to the level ratio. The directional main axis of the microphone unit 111 coincides with the directional main axis of the sharp directional microphone 111A (that is, the imaging main axis of the imaging unit 108).

マイクロホン部１１１の指向角を小さくする場合、制御部１０１は、レベル可変部１１２Ａを制御して鋭指向性マイクロホン１１１Ａのレベルを大きくし、レベル可変部１１２Ｂを制御して無指向性マイクロホン１１１Ｂのレベルを小さくする。 When the directivity angle of the microphone unit 111 is reduced, the control unit 101 controls the level variable unit 112A to increase the level of the sharp directivity microphone 111A and controls the level variable unit 112B to control the level of the omnidirectional microphone 111B. Make it smaller.

一方、マイクロホン部１１１の指向角を大きくする場合、制御部１０１は、レベル可変部１１２Ａを制御して鋭指向性マイクロホン１１１Ａのレベルを小さくし、レベル可変部１１２Ｂを制御して無指向性マイクロホン１１１Ｂのレベルを大きくする。 On the other hand, when the directivity angle of the microphone unit 111 is increased, the control unit 101 controls the level variable unit 112A to decrease the level of the sharp directivity microphone 111A and controls the level variable unit 112B to control the omnidirectional microphone 111B. Increase the level.

尚、マイクロホン部１１１の指向角の最小値は、鋭指向性マイクロホン１１１Ａの指向角となり、マイクロホン部１１１の指向角の最大値は、無指向性マイクロホン１１１Ｂの指向角（つまり３６０度）となる。 The minimum value of the directivity angle of the microphone unit 111 is the directivity angle of the sharp directivity microphone 111A, and the maximum value of the directivity angle of the microphone unit 111 is the directivity angle of the omnidirectional microphone 111B (that is, 360 degrees).

因みにマイクロホン部１１１の指向角を可変制御する構成としては、上述の構成に限らず、この他種々の構成を用いるようにしてもよい。またマイクロホンの指向角の定義としては、上述の定義に限らず、マイクロホンが音声を強調して入力し得る範囲の角度を示すものであれば、例えば聴感上の効果によって定義したものなど、この他種々の定義を用いるようにしてもよい。 Incidentally, the configuration for variably controlling the directivity angle of the microphone unit 111 is not limited to the above-described configuration, and various other configurations may be used. In addition, the definition of the microphone directivity is not limited to the above definition, and other microphones may be used as long as they indicate angles within a range where the microphone can emphasize and input sound. Various definitions may be used.

ここで、操作部１０５（図２）のシャッタボタンが押下されたとする。すると制御部１０１は、シャッタボタンの押下に応じて動画像の記録を開始する。すなわち制御部１０１は、撮像部１０８から送られてくる画像信号と、マイクロホン部１１１から指向角可変部１１２を介して入力される音声信号とをＲＡＭ１０３に一時記憶させ、画像信号を動画エンコーダ１１３に送る。 Here, it is assumed that the shutter button of the operation unit 105 (FIG. 2) is pressed. Then, the control unit 101 starts recording a moving image in response to pressing of the shutter button. That is, the control unit 101 temporarily stores the image signal transmitted from the imaging unit 108 and the audio signal input from the microphone unit 111 via the directivity angle varying unit 112 in the RAM 103, and the image signal is stored in the moving image encoder 113. send.

動画エンコーダ１１３は、この画像信号を所定の動画フォーマットで圧縮することで、動画データを生成していく。尚、ここでは、所定の動画フォーマットとして、例えば、Ｈ．２６４フォーマットを利用することとする。 The moving image encoder 113 generates moving image data by compressing the image signal in a predetermined moving image format. Here, as a predetermined moving image format, for example, H.264 is used. The H.264 format is used.

また制御部１０１は、ＲＡＭ１０３に一時記憶させた音声信号を所定の音声フォーマットで圧縮することで、音声データを生成していく。そして制御部１０１は、この音声データと、動画エンコーダ１１３で生成された動画データとを多重化することで動画音声データを生成していく。 The control unit 101 also generates audio data by compressing the audio signal temporarily stored in the RAM 103 in a predetermined audio format. Then, the control unit 101 generates moving image audio data by multiplexing the audio data and the moving image data generated by the moving image encoder 113.

さらに制御部１０１は、この動画音声データを、ＲＡＭ１０３に書き戻してから、フラッシュメモリ１０２又は記録媒体１１４に記録していく。 Further, the control unit 101 writes this moving image audio data back to the RAM 103 and then records it in the flash memory 102 or the recording medium 114.

その後、再び操作部１０５のシャッタボタンが押下されると、制御部１０１は、動画像の記録を終了する。すなわち制御部１０１は、このときＲＡＭ１０３に残存する動画音声データをフラッシュメモリ１０２又は記録媒体１１４に記録することで、撮影開始から終了までの一連の動画音声データの記録を完了する。そして制御部１０１は、この動画音声データに例えば撮影日時などの付帯情報を付与し、動画音声ファイルとしてフラッシュメモリ１０２又は記録媒体１１４に記録する。このようにしてＤＶＣ１００は、動画像を記録する。 Thereafter, when the shutter button of the operation unit 105 is pressed again, the control unit 101 ends the recording of the moving image. That is, the control unit 101 records the moving image audio data remaining in the RAM 103 at this time in the flash memory 102 or the recording medium 114, thereby completing the recording of a series of moving image audio data from the start to the end of shooting. Then, the control unit 101 adds incidental information such as shooting date and time to the moving image audio data, and records it in the flash memory 102 or the recording medium 114 as a moving image audio file. In this way, the DVC 100 records a moving image.

また制御部１０１は、タッチパネル１０４又は操作部１０５を介して再生モードへの切換操作が行われると、再生モードに切り換わる。すると制御部１０１は、フラッシュメモリ１０２又は記録媒体１１４から指定された動画音声ファイルを読み出して、ＲＡＭ１０３に一時記憶させる。 Further, the control unit 101 switches to the reproduction mode when the switching operation to the reproduction mode is performed via the touch panel 104 or the operation unit 105. Then, the control unit 101 reads the designated moving image / audio file from the flash memory 102 or the recording medium 114 and temporarily stores it in the RAM 103.

そして制御部１０１は、この動画音声ファイルから、動画データと音声データとを分離して、動画データを動画デコーダ１１５に送る。 Then, the control unit 101 separates the moving image data and the audio data from the moving image / audio file, and sends the moving image data to the moving image decoder 115.

動画デコーダ１１５は、この動画データを圧縮されたときと同一の動画フォーマットで伸張することで元の画像信号を得、この画像信号がＲＡＭ１０３に書き戻されていく。 The moving picture decoder 115 obtains the original image signal by expanding the moving picture data in the same moving picture format as when it was compressed, and the image signal is written back to the RAM 103.

また制御部１０１は、音声データを圧縮されたときと同一の音声フォーマットで伸張することで元の音声信号を得、これをＲＡＭ１０３に書き戻していく。 Further, the control unit 101 obtains the original audio signal by decompressing the audio data in the same audio format as when it was compressed, and writes it back to the RAM 103.

そして制御部１０１は、画像信号をＲＡＭ１０３から読み出し、所定の処理を施して液晶パネル１０６に送る。これと共に制御部１１０は、音声信号をＲＡＭ１０３から読み出し、所定の処理を施してスピーカ１１６に送る。 Then, the control unit 101 reads the image signal from the RAM 103, performs a predetermined process, and sends it to the liquid crystal panel 106. At the same time, the control unit 110 reads out the audio signal from the RAM 103, performs a predetermined process, and sends it to the speaker 116.

この結果、液晶パネル１０６には、画像信号に基づく動画像が表示される。またこのとき、音声信号に基づく音声がスピーカ１１６から出力される。このようにしてＤＶＣ１００は、動画像及び音声を再生する。 As a result, a moving image based on the image signal is displayed on the liquid crystal panel 106. At this time, sound based on the sound signal is output from the speaker 116. In this way, the DVC 100 reproduces moving images and sounds.

尚、このＤＶＣ１００の撮像部１０８が、上述した撮像装置１の撮像部２に相当するハードウェアであり、ＤＶＣ１００のマイクロホン部１１１及び指向角可変部１１２が、上述した撮像装置１の音声入力部３に相当するハードウェアである。またＤＶＣ１００の顔認識処理部１１０が、上述した撮像装置１の認識部４に相当するハードウェアであり、ＤＶＣ１００の制御部１０１が、上述した撮像装置１の制御部５に相当するハードウェアである。 The imaging unit 108 of the DVC 100 is hardware corresponding to the imaging unit 2 of the imaging device 1 described above, and the microphone unit 111 and the directivity angle varying unit 112 of the DVC 100 are the audio input unit 3 of the imaging device 1 described above. It is the hardware equivalent to. The face recognition processing unit 110 of the DVC 100 is hardware corresponding to the recognition unit 4 of the imaging device 1 described above, and the control unit 101 of the DVC 100 is hardware corresponding to the control unit 5 of the imaging device 1 described above. .

［１−３．指向角制御処理］
上述したようにＤＶＣ１００は、人物の発する声を強調して入力するのに適切なマイクロホン部１１１の指向角（適切指向角）を算出し、適切指向角となるようにマイクロホン部１１１の指向角を制御するようになされている。以下、このマイクロホン部１１１の指向角を適切指向角となるように制御する処理（以下、これを指向角制御処理とも呼ぶ）について詳しく説明する。 [1-3. Directional angle control processing]
As described above, the DVC 100 calculates a directivity angle (appropriate directivity angle) of the microphone unit 111 that is appropriate for emphasizing and inputting a voice uttered by a person, and sets the directivity angle of the microphone unit 111 so as to be an appropriate directivity angle. It is made to control. Hereinafter, processing for controlling the directivity angle of the microphone unit 111 to be an appropriate directivity angle (hereinafter also referred to as directivity angle control processing) will be described in detail.

まず、指向角制御処理の概要について、図５を用いて説明する。 First, the outline of the directivity control process will be described with reference to FIG.

図５（Ａ）及び（Ｂ）は、被写体となっている人物Ｐ及びＤＶＣ１００を真上から見下ろした様子を示した図である。 FIGS. 5A and 5B are views showing a state in which the person P and the DVC 100 that are subjects are looked down from directly above.

ここで、人物Ｐの発する声を強調して入力するためには、撮像範囲Ａｃのうち人物Ｐの発する声の音源である顔Ｐｆが占める範囲Ａｆから入力される音声を強調するように、マイクロホン部１１１の指向角を制御すればよいと考えられる。 Here, in order to emphasize and input the voice uttered by the person P, the microphone is emphasized so as to emphasize the voice input from the range Af occupied by the face Pf which is the sound source of the voice uttered by the person P in the imaging range Ac. It is considered that the directivity angle of the unit 111 may be controlled.

ゆえにＤＶＣ１００は、撮像部１０８における撮像範囲Ａｃの画角（これを撮像画角とも呼ぶ）θのうち顔Ｐｆが占める範囲Ａｆの画角（以下、これを顔画角とも呼ぶ）αに応じて、マイクロホン部１１１の指向角を制御するようになされている。すなわち顔画角αは、顔Ｐｆが占める範囲Ａｆのみが撮像されるような画角を示す。 Therefore, the DVC 100 corresponds to the angle of view (hereinafter also referred to as the face angle of view) α of the range Af occupied by the face Pf of the angle of view (also referred to as the angle of view of view) θ of the imaging range Ac in the imaging unit 108. The directivity angle of the microphone unit 111 is controlled. That is, the face angle of view α indicates a field angle at which only the range Af occupied by the face Pf is captured.

尚ＤＶＣ１００は、水平方向における撮像画角θ及び顔画角αを用いるようになされている。音声に対する人間の耳の指向性は、水平方向の方がより影響を受けるためである。 Note that the DVC 100 uses the imaging field angle θ and the face field angle α in the horizontal direction. This is because the directivity of the human ear with respect to voice is more affected in the horizontal direction.

例えば図５（Ａ）に示すように人物ＰがＤＶＣ１００から遠い場合、顔画角αは人物ＰがＤＶＣ１００に近い場合よりも小さい。ゆえにこの場合ＤＶＣ１００は、指向角制御処理において、マイクロホン部１１１の指向角を小さくする。つまりＤＶＣ１００は、図５（Ｃ）に示すようにマイクロホン部１１１の指向性を狭くする。 For example, as shown in FIG. 5A, when the person P is far from the DVC 100, the face angle of view α is smaller than when the person P is close to the DVC 100. Therefore, in this case, the DVC 100 reduces the directivity angle of the microphone unit 111 in the directivity angle control process. That is, the DVC 100 narrows the directivity of the microphone unit 111 as shown in FIG.

一方図５（Ｂ）に示すように人物ＰがＤＶＣ１００に近い場合、顔画角αは、人物ＰがＤＶＣ１００から遠い場合よりも大きい。ゆえにこの場合ＤＶＣ１００は、指向角制御処理において、マイクロホン部１１１の指向角を大きくする。つまりＤＶＣ１００は、図５（Ｄ）に示すように、マイクロホン部１１１の指向性を広くする。 On the other hand, as shown in FIG. 5B, when the person P is close to the DVC 100, the face angle of view α is larger than when the person P is far from the DVC 100. Therefore, in this case, the DVC 100 increases the directivity angle of the microphone unit 111 in the directivity angle control process. That is, the DVC 100 widens the directivity of the microphone unit 111 as shown in FIG.

このようにＤＶＣ１００は、指向角制御処理において、被写体となる人物ＰがＤＶＣ１００から遠いほどマイクロホン部１１１の指向角を小さくし、人物ＰがＤＶＣ１００から近いほどマイクロホン部１１１の指向角を大きくするようになされている。 In this way, in the directivity angle control process, the DVC 100 decreases the directivity angle of the microphone unit 111 as the subject person P is farther from the DVC 100, and increases the directivity angle of the microphone unit 111 as the person P is closer to the DVC 100. Has been made.

以上が指向角制御処理の概要である。次に指向角制御処理の具体的な処理について説明する。制御部１０１は、動作モードを撮影モードに切り替えるよう指示されると、動作モードを撮影モードに切り替えると共に、指向角制御処理を開始する。 The above is the outline of the directivity control process. Next, specific processing of the directivity control process will be described. When instructed to switch the operation mode to the shooting mode, the control unit 101 switches the operation mode to the shooting mode and starts the directivity control process.

そして制御部１０１は、次のようにして顔画角αを算出する。 Then, the control unit 101 calculates the face angle of view α as follows.

まず制御部１０１は、撮像部１０８における撮像画角θを算出する。図６（Ａ）に示すように、３５ｍｍフィルムのフレームサイズは、横方向の長さが３６[ｍｍ]であり、縦方向の長さが２４［ｍｍ］である。 First, the control unit 101 calculates an imaging angle of view θ in the imaging unit 108. As shown in FIG. 6A, the frame size of the 35 mm film is 36 [mm] in the horizontal direction and 24 [mm] in the vertical direction.

そして図６（Ｂ）に示すように、水平方向における撮像画角θは、焦点距離ｆ（３５ｍｍフィルム換算値）が高さであり、３５ｍｍフィルムにおけるフレームの横方向の長さ３６[ｍｍ]が底辺である二等辺三角形の頂角と等しい。 As shown in FIG. 6B, the imaging field angle θ in the horizontal direction is such that the focal length f (35 mm film equivalent) is the height, and the horizontal length 36 [mm] of the 35 mm film is It is equal to the apex angle of the base isosceles triangle.

ゆえに制御部１０１は、撮像部１０８から現在の焦点距離ｆを取得し、撮像画角θを、焦点距離ｆを用いて式（１）及び式（２）より算出する。尚、式（２）は、式（１）を変形したものである。 Therefore, the control unit 101 acquires the current focal length f from the imaging unit 108, and calculates the imaging angle of view θ from the equations (1) and (2) using the focal length f. Expression (2) is a modification of Expression (1).

次に制御部１０１は、顔認識処理部１１０から取得した顔認識処理の結果に基づいて、図７（Ａ）に示すように、スルー画像Ｔｐにおける顔枠Ｆｓの位置及び大きさを検出する。 Next, the control unit 101 detects the position and size of the face frame Fs in the through image Tp based on the result of the face recognition processing acquired from the face recognition processing unit 110, as shown in FIG.

顔枠Ｆｓが１つ検出された場合、制御部１０１は、スルー画像Ｔｐの縦方向の中心線Ｏから顔枠Ｆｓの右端までの長さｓＲ１と、中心線Ｏから顔枠Ｆｓの左端までの長さｓＬ１とを比較する。そして制御部１０１は、これらのうち長い方を、顔画角αを算出するための長さ（これを算出用長さとも呼ぶ）ｓとして設定する。 When one face frame Fs is detected, the control unit 101 determines the length sR1 from the vertical center line O of the through image Tp to the right end of the face frame Fs, and the center line O to the left end of the face frame Fs. The length sL1 is compared. The control unit 101 sets the longer one of these as a length s for calculating the face angle of view α (also referred to as a calculation length) s.

例えば図７（Ａ）に示す場合は、長さｓＲ１の方が長さｓＬ１よりも長いので、制御部１０１は、長さｓＲ１を算出用長さｓとして設定する。 For example, in the case shown in FIG. 7A, since the length sR1 is longer than the length sL1, the control unit 101 sets the length sR1 as the calculation length s.

このように算出用長さｓを算出すると、制御部１０１は、スルー画像Ｔｐの横方向の半分の長さｗに対する算出用長さｓの比ｎを、式（３）より算出する。 When the calculation length s is calculated in this manner, the control unit 101 calculates the ratio n of the calculation length s to the half length w in the horizontal direction of the through image Tp from the equation (3).

図７（Ｂ）に、顔が認識された人物ＰとＤＶＣ１００とを真上から見下ろした様子を示す。このとき制御部１０１は、撮像範囲において顔枠Ｆｓに対応する顔（つまり人物Ｐの顔Ｐｆ）が占める範囲の画角を顔画角αとして算出する。 FIG. 7B shows a state in which the person P whose face is recognized and the DVC 100 are looked down from directly above. At this time, the control unit 101 calculates the angle of view of the range occupied by the face corresponding to the face frame Fs (that is, the face Pf of the person P) in the imaging range as the face angle of view α.

図７（Ｂ）に示すように、撮像画角θの半分の角度（θ／２）の正接であるｔａｎ（θ／２）と、顔画角αの半分の角度（α／２）の正接である（ｔａｎ（α／２）との比は、長さｗと算出用長さｓとの比ｎと一致する。ゆえに制御部１０１は、顔画角αを、比ｎと撮像画角θとを用いて式（４）より算出する。 As shown in FIG. 7B, tan (θ / 2), which is a half angle (θ / 2) of the imaging field angle θ, and a tangent of a half angle (α / 2) of the face field angle α. (Tan (α / 2)) is equal to the ratio n between the length w and the calculation length s. Therefore, the control unit 101 determines the face angle of view α as the ratio n and the imaging angle of view θ. And is calculated from the equation (4).

このように制御部１０１は、スルー画像Ｔｐにおいて、中心線Ｏを中心とする、顔枠Ｆｓが占める範囲を含む最小の範囲（つまり中心線Ｏから左右に算出用長さｓの範囲）を検出する。そして制御部１０１は、当該範囲と撮像画角θとに基づいて、撮像画角θにおける人物の顔が占める範囲の画角を顔画角αとして算出するようになされている。 In this way, the control unit 101 detects the minimum range including the range occupied by the face frame Fs centered on the center line O (that is, the range of the calculation length s from the center line O to the left and right) in the through image Tp. To do. Based on the range and the imaging field angle θ, the control unit 101 calculates the field angle of the range occupied by the human face at the imaging field angle θ as the face field angle α.

一方、顔認識処理の結果より顔枠Ｆｓが複数検出された場合、制御部１０１は、当該複数の顔枠Ｆｓのうち中心線Ｏから一番遠い顔枠Ｆｓの端までの長さを、算出用長さｓとして設定する。 On the other hand, when a plurality of face frames Fs are detected from the result of the face recognition process, the control unit 101 calculates the length from the center line O to the end of the face frame Fs farthest from the plurality of face frames Fs. This is set as the use length s.

例えば図８に示すように、左から順に顔枠Ｆｓ１、顔枠Ｆｓ２、顔枠Ｆｓ３が検出されたとする。ここでは、中心線Ｏから顔枠Ｆｓ１の左端までの長さｓＬ２の方が、中心線Ｏから顔枠Ｆｓ３の右端までの長さｓＲ２よりも長く、顔枠Ｆｓ１が中心線Ｏから一番遠いので、制御部１０１は、長さｓＬ２を算出用長さｓとして設定する。 For example, as shown in FIG. 8, it is assumed that a face frame Fs1, a face frame Fs2, and a face frame Fs3 are detected in order from the left. Here, the length sL2 from the center line O to the left end of the face frame Fs1 is longer than the length sR2 from the center line O to the right end of the face frame Fs3, and the face frame Fs1 is farthest from the center line O. Therefore, the control unit 101 sets the length sL2 as the calculation length s.

そして制御部１０１は、顔枠Ｆｓが１つ検出された場合と同様に、式（３）及び式（４）を用いて顔画角αを算出する。 And the control part 101 calculates the face angle of view (alpha) using Formula (3) and Formula (4) similarly to the case where one face frame Fs is detected.

このように制御部１０１は、スルー画像Ｔｐにおいて、中心線Ｏを中心とする、複数の顔枠Ｆｓを全て含む最小の範囲（つまり、中心線Ｏから左右に算出用長さｓの範囲）を検出する。そして制御部１０１は、当該範囲と撮像画角θとに基づいて、認識された複数の顔を全て含む最小の範囲の画角を顔画角αとして算出するようになされている。 In this way, the control unit 101 sets a minimum range including all of the plurality of face frames Fs around the center line O (that is, a range of the calculation length s from the center line O to the left and right) in the through image Tp. To detect. Based on the range and the imaging angle of view θ, the control unit 101 calculates the angle of view of the minimum range including all the recognized faces as the face angle of view α.

このようにして顔画角αを算出すると、制御部１０１は、顔画角αを用いて適切指向角βを算出する。 When the face angle of view α is calculated in this way, the control unit 101 calculates an appropriate directivity angle β using the face angle of view α.

ここで理想的には、図９（Ａ）に示すグラフのように、適切指向角βは顔画角αと同じ値である方が望ましい。こうすることで、マイクロホン部１１１が、顔Ｐｆが占める範囲Ａｆ（図５）から入力される音声のみを強調して入力できるからである。 Here, ideally, it is desirable that the appropriate directivity angle β is the same value as the face angle of view α, as in the graph shown in FIG. This is because the microphone unit 111 can emphasize and input only the sound input from the range Af occupied by the face Pf (FIG. 5).

しかし、指向角を小さいところまで制御しうるマイクロホン、つまり非常に鋭い指向性に制御可能なマイクロホンは製造が困難でありコストが高いので、使用できないことも多いと考えられる。 However, a microphone that can control the directivity angle to a small level, that is, a microphone that can be controlled to have a very sharp directivity is difficult to manufacture and expensive, so it is considered that the microphone cannot be used in many cases.

ゆえにこのＤＶＣ１００において、顔画角αと適切指向角βとの関係は、図９（Ｂ）に示すグラフのようになっている。すなわち顔画角αが大きくなるほど適切指向角βが大きくなり、適切指向角βはマイクロホン部１１１における指向角の最小値βｍｉｎ以上の値をとる。 Therefore, in this DVC 100, the relationship between the face angle of view α and the appropriate directivity angle β is as shown in the graph of FIG. That is, as the face angle of view α increases, the appropriate directivity angle β increases, and the appropriate directivity angle β takes a value equal to or greater than the minimum directivity angle βmin in the microphone unit 111.

また顔画角αは、撮像部１０８における撮像画角θの最大値αｍａｘ以下の値をとる。ゆえに適切指向角βは、例えば撮像画角θの最大値αｍａｘに適した指向角であるβｍａｘ以下の値をとる。 Further, the face angle of view α takes a value that is less than or equal to the maximum value αmax of the imaging angle of view θ in the imaging unit 108. Therefore, the appropriate directivity angle β takes a value equal to or less than βmax, which is a directivity angle suitable for the maximum value αmax of the imaging field angle θ, for example.

このように顔画角αと適切指向角βとが対応付けられるよう、制御部１０１は、顔画角αを用いて、例えば式（５）を用いて適切指向角βを算出するようになされている。尚、係数ｋは０以上であり、顔画角αがαｍａｘのとき適切指向角βがβｍａｘとなるような係数である。 In this way, the control unit 101 uses the face angle of view α to calculate the appropriate direction angle β using, for example, Equation (5) so that the face angle of view α and the appropriate directivity angle β are associated with each other. ing. The coefficient k is 0 or more, and the appropriate directivity angle β is βmax when the face angle of view α is αmax.

因みに顔画角αから適切指向角βを算出する式としては、顔画角αが示す範囲から入力される音声を強調するような適切指向角βを算出する式であれば、式（５）に限らず、この他種々の式を用いるようにしてもよい。 Incidentally, the expression for calculating the appropriate directivity angle β from the face angle of view α is an expression for calculating the appropriate directivity angle β that emphasizes the voice input from the range indicated by the face angle of view α. Not limited to this, various other formulas may be used.

このようにして適切指向角βを算出すると、制御部１０１は、指向角可変部１１２を介して、マイクロホン部１１１の指向角を適切指向角βとなるように制御し、指向角制御処理を終了する。 When the appropriate directivity angle β is calculated in this way, the control unit 101 controls the directivity angle of the microphone unit 111 to be the appropriate directivity angle β via the directivity angle variable unit 112 and ends the directivity angle control process. To do.

以上のようにして制御部１０１は、顔認識処理により認識された顔が占める範囲の画角（顔画角α）に基づいて適切指向角βを算出し、適切指向角βとなるようにマイクロホン部１１１の指向角を制御するようになされている。 As described above, the control unit 101 calculates the appropriate directivity angle β based on the view angle (face view angle α) of the range occupied by the face recognized by the face recognition process, and the microphone so that the proper directivity angle β is obtained. The directivity angle of the unit 111 is controlled.

［１−４．指向角制御処理手順］
次に上述した指向角制御処理の動作処理手順（これを指向角制御処理手順とも呼ぶ）について、図１０に示すフローチャートを用いて説明する。 [1-4. Directional angle control processing procedure]
Next, the operation processing procedure of the directivity angle control process described above (also referred to as a directivity angle control process procedure) will be described with reference to the flowchart shown in FIG.

因みにこの指向角制御処理手順ＲＴ１は、ＤＶＣ１００の制御部１０１が、フラッシュメモリ１０２に書き込まれているプログラムに従って実行する処理手順である。 Incidentally, this directivity angle control processing procedure RT1 is a processing procedure executed by the control unit 101 of the DVC 100 according to a program written in the flash memory 102.

制御部１０１は、タッチパネル１０４又は操作部１０５を介して、撮影モードへ切り替えるよう指示されると、動作モードを撮影モードに切り替えると共に指向角制御処理手順ＲＴ１を開始して、ステップＳＰ１に移る。 When the control unit 101 is instructed to switch to the shooting mode via the touch panel 104 or the operation unit 105, the control unit 101 switches the operation mode to the shooting mode and starts the directivity angle control processing procedure RT1, and proceeds to step SP1.

ステップＳＰ１において制御部１０１は、撮像部１０８からスルー画像Ｔｐを取得し、顔認識処理部１１０に送る。そして制御部１０１は、顔認識処理部１１０から送られてきた顔認識処理の結果に基づいて、スルー画像Ｔｐから人物の顔が認識されたか否かを判別する。 In step SP <b> 1, the control unit 101 acquires the through image Tp from the imaging unit 108 and sends it to the face recognition processing unit 110. Then, the control unit 101 determines whether or not a human face has been recognized from the through image Tp based on the result of the face recognition process sent from the face recognition processing unit 110.

このステップＳＰ１において否定結果が得られると、このことは、ＤＶＣ１００において人物が撮影されていないことを意味する。このとき制御部１０１は、マイクロホン部１１１の指向角の制御を行わず（つまり現在の指向角を変化させず）、再度ステップＳＰ１に戻り、スルー画像Ｔｐから人物の顔が認識されるまで待ち受ける。 If a negative result is obtained in this step SP1, this means that no person is photographed in the DVC 100. At this time, the control unit 101 does not control the directivity angle of the microphone unit 111 (that is, does not change the current directivity angle), returns to step SP1 again, and waits until a human face is recognized from the through image Tp.

一方ステップＳＰ１において肯定結果が得られると、このことは、ＤＶＣ１００において人物が撮影されていることを意味し、このとき制御部１０１は次のステップＳＰ２に移る。 On the other hand, if a positive result is obtained in step SP1, this means that a person is photographed in the DVC 100. At this time, the control unit 101 proceeds to the next step SP2.

ステップＳＰ２において制御部１０１は、撮像部１０８から現在の焦点距離ｆを取得し、次のステップＳＰ３に移る。 In step SP2, the control unit 101 acquires the current focal length f from the imaging unit 108, and proceeds to the next step SP3.

ステップＳＰ３において制御部１０１は、焦点距離ｆを用いて撮像画角θを算出する。また制御部１０１は、顔認識処理の結果に基づいて、スルー画像Ｔｐにおける顔枠Ｆｓが占める範囲を検出する。そして制御部１０１は、撮像画角θとスルー画像Ｔｐにおける顔枠Ｆｓが占める範囲とに基づいて顔画角αを算出し、次のステップＳＰ４に移る。 In step SP3, the control unit 101 calculates the imaging field angle θ using the focal length f. Further, the control unit 101 detects a range occupied by the face frame Fs in the through image Tp based on the result of the face recognition process. Then, the control unit 101 calculates the face angle of view α based on the imaging angle of view θ and the range occupied by the face frame Fs in the through image Tp, and proceeds to the next step SP4.

ステップＳＰ４において制御部１０１は、顔画角αから適切指向角βを算出して、次のステップＳＰ５に移る。 In step SP4, the control unit 101 calculates an appropriate directivity angle β from the face angle of view α, and proceeds to the next step SP5.

ステップＳＰ５において制御部１０１は、指向角可変部１１２を介して、適切指向角βと現在のマイクロホン部１１１の指向角とが一致するか否かを判別する。 In step SP5, the control unit 101 determines whether or not the appropriate directivity angle β matches the current directivity angle of the microphone unit 111 via the directivity angle variable unit 112.

このステップＳＰ５において否定結果が得られると、このとき制御部１０１は次のステップＳＰ６に移る。 If a negative result is obtained in step SP5, the control unit 101 moves to next step SP6.

ステップＳＰ６において制御部１０１は、指向角可変部１１２を介して、マイクロホン部１１１の指向角を適切指向角βとなるように制御して、再度ステップＳＰ５に戻る。 In step SP6, the control unit 101 controls the directivity angle of the microphone unit 111 to be an appropriate directivity angle β via the directivity angle variable unit 112, and returns to step SP5 again.

一方ステップＳＰ５において適切指向角βと現在のマイクロホン部１１１の指向角とが一致することより肯定結果が得られると、このとき制御部１０１は次のステップＳＰ７に移る。 On the other hand, if a positive result is obtained in step SP5 that the appropriate directivity angle β matches the current directivity angle of the microphone unit 111, the control unit 101 proceeds to the next step SP7.

ステップＳＰ７において制御部１０１は、タッチパネル１０４又は操作部１０５を介して指向角制御処理を終了するよう指示されたか否かを判別する。 In step SP7, the control unit 101 determines whether an instruction to end the directivity angle control process is given via the touch panel 104 or the operation unit 105.

このステップＳＰ７において否定結果が得られると、制御部１０１は再度ステップＳＰ１に戻り、ステップＳＰ１〜ＳＰ７を繰り返す。 If a negative result is obtained in step SP7, the control unit 101 returns to step SP1 again and repeats steps SP1 to SP7.

一方ステップＳＰ７において肯定結果が得られると、制御部１０１は、指向角制御処理手順ＲＴ１を終了する。 On the other hand, if a positive result is obtained in step SP7, the control unit 101 ends the directivity angle control processing procedure RT1.

このような指向角制御処理手順ＲＴ１により、ＤＶＣ１００は、適切指向角βを算出し、適切指向角βとなるようにマイクロホン部１１１の指向角を制御するようになされている。 By such a directivity angle control processing procedure RT1, the DVC 100 calculates an appropriate directivity angle β and controls the directivity angle of the microphone unit 111 so that the proper directivity angle β is obtained.

［１−５．第１の実施の形態における動作及び効果］
以上の構成において、ＤＶＣ１００の顔認識処理部１１０は、ＤＶＣ１００の撮像部１０８で撮像された画像（スルー画像Ｔｐ）から、被写体となっている人物の顔を認識する処理（顔認識処理）を行う。そして顔認識処理部１１０は、顔認識処理の結果をＤＶＣ１００の制御部１０１に送る。 [1-5. Operation and Effect in First Embodiment]
In the above configuration, the face recognition processing unit 110 of the DVC 100 performs processing (face recognition processing) for recognizing the face of the person who is the subject from the image (through image Tp) captured by the imaging unit 108 of the DVC 100. . Then, the face recognition processing unit 110 sends the result of the face recognition processing to the control unit 101 of the DVC 100.

制御部１０１は、撮像部１０８から、画像が撮像された際の焦点距離ｆを取得し、撮像画角θを算出する。 The control unit 101 acquires the focal length f when the image is captured from the imaging unit 108 and calculates the imaging angle of view θ.

また制御部１０１は、顔認識処理の結果に基づいて、スルー画像Ｔｐにおいて、人物の顔と認識された部分を示す顔枠Ｆｓが占める範囲を検出する。 Further, the control unit 101 detects a range occupied by the face frame Fs indicating a portion recognized as a human face in the through image Tp based on the result of the face recognition process.

そして制御部１０１は、スルー画像Ｔｐの中心線Ｏを中心とする、顔枠Ｆｓが占める範囲を含む最小の範囲を検出し、当該範囲に基づいて撮像画角θにおける人物の顔が占める範囲の画角（顔画角α）を算出する。 Then, the control unit 101 detects the minimum range including the range occupied by the face frame Fs with the center line O of the through image Tp as the center, and the range of the range occupied by the human face at the imaging angle of view θ is based on the range. The angle of view (face angle of view α) is calculated.

そして制御部１０１は、この顔画角αを用いて、この顔画角αが示す範囲から入力される音声を強調するような適切指向角βを算出する。そして制御部１０１は、指向角可変部１１２を介して、適切指向角βとなるようにマイクロホン部１１１の指向角を制御する。 Then, the control unit 101 uses the face angle of view α to calculate an appropriate directivity angle β that enhances the voice input from the range indicated by the face angle of view α. Then, the control unit 101 controls the directivity angle of the microphone unit 111 through the directivity angle variable unit 112 so that the proper directivity angle β is obtained.

これによりＤＶＣ１００は、撮像範囲のうち人物の顔が占める範囲から入力される音声を強調する指向性で音声を入力できるので、ＤＶＣ１００から見て人物の顔のどの部分から発せられたかによらず人物が発する声を強調して入力することができる。 Thus, the DVC 100 can input the sound with directivity that emphasizes the sound input from the range occupied by the person's face in the imaging range, so that the person can be used regardless of which part of the person's face is viewed from the DVC 100. Can be input with emphasis on the voices.

ゆえに例えば人物がＤＶＣ１００に対して正面を向けている場合、つまりＤＶＣ１００から見て人物の顔のほぼ中心から声が発せられる場合でも、ＤＶＣ１００は、人物が発する声を強調して入力することができる。また人物がＤＶＣ１００に対して横を向けている場合、つまりＤＶＣ１００から見て人物の顔のおよそ右端又は左端から声が発せられる場合でも、ＤＶＣ１００は、人物が発する声を強調して入力することができる。 Therefore, for example, even when a person is facing the DVC 100, that is, even when a voice is uttered from almost the center of the person's face as viewed from the DVC 100, the DVC 100 can input the voice uttered by the person with emphasis. . Further, even when a person faces sideways with respect to the DVC 100, that is, when a voice is uttered from the right end or the left end of the person's face as viewed from the DVC 100, the DVC 100 can input the voice uttered by the person with emphasis. it can.

また制御部１０１は、顔認識処理により複数の顔が認識された場合、スルー画像Ｔｐの中心線Ｏを中心とする、当該複数の顔を示す複数の顔枠Ｆｓが全て含まれる最小の範囲を検出する。 In addition, when a plurality of faces are recognized by the face recognition process, the control unit 101 sets a minimum range including all of the plurality of face frames Fs indicating the plurality of faces centered on the center line O of the through image Tp. To detect.

そして制御部１０１は、当該範囲と撮像画角θとに基づいて、認識された複数の顔が全て含まれる範囲の画角（顔画角α）を算出し、この顔画角αに基づいて、マイクロホン部１１１の指向性を制御するようにした。 Then, the control unit 101 calculates an angle of view (face angle of view α) of a range including all the recognized faces based on the range and the imaging angle of view θ, and based on the face angle of view α. The directivity of the microphone unit 111 is controlled.

これによりＤＶＣ１００は、撮像範囲のうち複数の人物の顔が占める範囲から入力される音声を強調する指向性で音声を入力できるので、一人のみならず、複数の人物が発する声を強調して入力することができる。 As a result, the DVC 100 can input sound with directivity that emphasizes the sound input from the range occupied by the faces of a plurality of persons in the imaging range, so that input is performed by emphasizing not only one person but also a plurality of persons. can do.

ところで、被写体となる人物が移動する場合、人物の移動に合わせてマイクロホン部１１１の指向主軸の方向を移動させることで、人物の発する声を強調して入力することが考えられる。 By the way, when the person who becomes the subject moves, it is conceivable that the voice of the person is emphasized and inputted by moving the direction of the directional main axis of the microphone unit 111 in accordance with the movement of the person.

しかしこのようにマイクロホン部１１１の指向主軸の方向を移動させると、それに伴ってマイクロホン部１１１に入力される音声の音像定位が移動してしまうので、当該音声が再生されたときにユーザに違和感を与えてしまうこととなる。 However, if the direction of the directional main axis of the microphone unit 111 is moved in this way, the sound image localization of the sound input to the microphone unit 111 is moved accordingly, so that the user feels uncomfortable when the sound is reproduced. Will be given.

これに対して本発明のＤＶＣ１００は、顔画角αが示す範囲、強調して入力するようにマイクロホン部１１１の指向角の大きさを制御するようにした。つまりＤＶＣ１００は、人物の顔が占める範囲を含む、撮像主軸を中心とする範囲を強調して入力するようにマイクロホン部１１１の指向角の大きさを制御するようにした。 On the other hand, the DVC 100 of the present invention controls the size of the directivity angle of the microphone unit 111 so that the range indicated by the face angle of view α is input with emphasis. That is, the DVC 100 controls the size of the directivity angle of the microphone unit 111 so that the range centered on the imaging main axis including the range occupied by the person's face is emphasized.

これによりＤＶＣ１００は、撮像主軸を中心として、つまりマイクロホン部１１１の指向主軸を中心として、マイクロホン部１１１の指向角の大きさを変化させるだけで、人物の発する声を強調して入力できる。ゆえにＤＶＣ１００は、マイクロホン部１１１の指向主軸を変化させなくても人物の発する声を強調して入力できるので、音像定位を移動させることなく、ユーザに違和感を与えないようにできる。 As a result, the DVC 100 can emphasize and input a voice uttered by a person only by changing the size of the directional angle of the microphone unit 111 around the imaging main axis, that is, the directional main axis of the microphone unit 111. Therefore, the DVC 100 can emphasize and input a voice uttered by a person without changing the directional main axis of the microphone unit 111, so that the user can be prevented from feeling uncomfortable without moving the sound image localization.

またＤＶＣ１００は、顔画角αを、マイクロホン部１１１において制御されうる範囲内（つまり最小指向角βｍｉｎ以上）の指向角と対応付ける式（５）を用いて、顔画角αから適切指向角βを算出するようにした。そしてＤＶＣ１００は、このようにして算出した適切指向角βとなるようにマイクロホン部１１１の指向角を制御するようにした。 In addition, the DVC 100 uses the expression (5) that associates the face angle of view α with the directivity angle within the range that can be controlled by the microphone unit 111 (that is, the minimum directivity angle βmin or more). Calculated. The DVC 100 controls the directivity angle of the microphone unit 111 so that the proper directivity angle β calculated in this way is obtained.

これによりＤＶＣ１００は、マイクロホン部１１１において指向角が制御されうる範囲によらず、顔画角αに基づいてマイクロホン部１１１の指向角を制御することができる。つまりＤＶＣ１００において、例えば鋭指向性に制御可能なマイクロホンでなくても、指向性を可変制御しうるマイクロホンであれば、種々のマイクロホンを用いることができる。 Thereby, the DVC 100 can control the directivity angle of the microphone unit 111 based on the face angle of view α regardless of the range in which the directivity angle can be controlled in the microphone unit 111. That is, in the DVC 100, various microphones can be used as long as the microphones can variably control the directivity, for example, not the microphones that can be controlled to sharp directivity.

以上の構成によれば、ＤＶＣ１００は、スルー画像における人物の顔を認識し、スルー画像における人物の顔が占める範囲に基づいて、マイクロホン部１１１の指向性を制御するようにした。 According to the above configuration, the DVC 100 recognizes the person's face in the through image and controls the directivity of the microphone unit 111 based on the range occupied by the person's face in the through image.

これによりＤＶＣ１００は、撮像範囲のうち人物の顔が占める範囲から入力される音声を強調する指向性で音声を入力できるので、人物の顔のどの部分から発せられたかによらず人物が発する声を強調して入力することができる。かくしてＤＶＣ１００は、一段と確実に人物が発する声を強調して入力することができる。 As a result, the DVC 100 can input the voice with directivity that emphasizes the voice input from the range occupied by the person's face in the imaging range, so the voice uttered by the person regardless of which part of the person's face is emitted. Can be input with emphasis. Thus, the DVC 100 can input a voice that a person utters more surely.

＜２．第２の実施の形態＞
［２−１．第２の実施の形態の概要］
次に第２の実施の形態について説明する。因みにこの概要を説明した後、本実施の形態の具体例の説明に移る。 <2. Second Embodiment>
[2-1. Outline of Second Embodiment]
Next, a second embodiment will be described. By the way, after explaining this outline, it moves to the explanation of a specific example of the present embodiment.

図１と対応する部分について同様の符号を付した図１１において、１０は、第２の実施の形態における撮像装置を示す。この撮像装置１０は、上述した第１の実施の形態と同様の撮像部２、音声入力部３及び認識部４を有している。 In FIG. 11, in which parts corresponding to those in FIG. 1 are denoted by the same reference numerals, reference numeral 10 denotes an imaging apparatus according to the second embodiment. The imaging device 10 includes the same imaging unit 2, voice input unit 3, and recognition unit 4 as those in the first embodiment described above.

また撮像装置１０は、認識部４により認識された被写体の中から、任意の被写体を選択する選択部１１を有している。 In addition, the imaging apparatus 10 includes a selection unit 11 that selects an arbitrary subject from among the subjects recognized by the recognition unit 4.

さらに撮像装置１０は、撮像画像において、選択部１１により選択された一又は複数の被写体が全て含まれる範囲を検出し、当該範囲に基づいて音声入力部３の指向性を制御する制御部１２を有している。 Furthermore, the imaging device 10 detects a range including all of one or a plurality of subjects selected by the selection unit 11 in the captured image, and controls the control unit 12 that controls the directivity of the audio input unit 3 based on the range. Have.

このような構成でなる撮像装置１０の具体例であるＤＶＣ２００について、以下、詳しく説明する。尚ＤＶＣ２００のハードウェア構成については、第１の実施の形態におけるＤＶＣ１００のハードウェア構成（図２）と同様であるので第１の実施の形態を参照とする。 The DVC 200, which is a specific example of the imaging apparatus 10 having such a configuration, will be described in detail below. The hardware configuration of the DVC 200 is the same as the hardware configuration (FIG. 2) of the DVC 100 in the first embodiment, and therefore the first embodiment is referred to.

尚第２の実施の形態において、ＤＶＣ２００の撮像部１０８が、上述した撮像装置１０の撮像部２に相当するハードウェアである。またＤＶＣ２００のマイクロホン部１１１及び指向角可変部１１２が、上述した撮像装置１０の音声入力部３に相当するハードウェアである。さらにＤＶＣ２００の顔認識処理部１１０が、上述した撮像装置１０の認識部４に相当するハードウェアである。さらにＤＶＣ２００の制御部１０１が、上述した撮像装置１０の選択部１１及び制御部１２に相当するハードウェアである。 In the second embodiment, the imaging unit 108 of the DVC 200 is hardware corresponding to the imaging unit 2 of the imaging apparatus 10 described above. Further, the microphone unit 111 and the directivity angle varying unit 112 of the DVC 200 are hardware corresponding to the audio input unit 3 of the imaging device 10 described above. Furthermore, the face recognition processing unit 110 of the DVC 200 is hardware corresponding to the recognition unit 4 of the imaging device 10 described above. Furthermore, the control unit 101 of the DVC 200 is hardware corresponding to the selection unit 11 and the control unit 12 of the imaging device 10 described above.

［２−２．指向角制御処理］
第２の実施の形態におけるＤＶＣ２００の制御部１０１は、顔認識処理部１１０から顔認識処理の結果を取得すると、これに基づいて、スルー画像Ｔｐにおける顔枠Ｆｓの位置及び大きさを検出する。 [2-2. Directional angle control processing]
When the control unit 101 of the DVC 200 in the second embodiment acquires the result of the face recognition processing from the face recognition processing unit 110, the control unit 101 detects the position and size of the face frame Fs in the through image Tp based on the acquired result.

顔枠Ｆｓが１つ検出された場合、制御部１０１は、第１の実施の形態と同様に、顔枠Ｆｓが含まれる範囲を検出し、この範囲を用いて、顔枠Ｆｓが示す顔が占める範囲の画角を顔画角αとして算出する。 When one face frame Fs is detected, the control unit 101 detects a range including the face frame Fs and uses the range to detect the face indicated by the face frame Fs, as in the first embodiment. The angle of view of the occupied range is calculated as the face angle of view α.

一方顔枠Ｆｓが複数検出された場合、制御部１０１は、それぞれの顔枠Ｆｓの面積を算出し、最も面積の大きい顔枠Ｆｓがどれかを判別する。最も面積の大きい顔枠Ｆｓは、ＤＶＣ２００に最も距離が近い顔を示している。つまり最も面積の大きい顔枠Ｆｓが示す顔の人物は、ＤＶＣ２００に向かって声を発している可能性が高いと考えられる。 On the other hand, when a plurality of face frames Fs are detected, the control unit 101 calculates the area of each face frame Fs and determines which face frame Fs has the largest area. The face frame Fs having the largest area indicates the face closest to the DVC 200. That is, it is considered that there is a high possibility that the person with the face indicated by the face frame Fs having the largest area is speaking toward the DVC 200.

ゆえに制御部１０１は、最も面積の大きい顔枠Ｆｓが示す顔の人物を、声を発している人物であると予測し、当該最も面積の大きい顔枠Ｆｓを選択する。そして制御部１０１は、選択した顔枠Ｆｓにおいて、中心線Ｏから右端までの長さ及び中心線Ｏから左端までの長さを算出し、これらのうち長い方を算出用長さｓとして設定する。 Therefore, the control unit 101 predicts the person with the face indicated by the face frame Fs having the largest area as the person who is speaking, and selects the face frame Fs having the largest area. Then, the control unit 101 calculates the length from the center line O to the right end and the length from the center line O to the left end in the selected face frame Fs, and sets the longer one as the calculation length s. .

例えば図１２に示すように、顔認識処理により、左から順に顔枠Ｆｓ４、顔枠Ｆｓ５、顔枠Ｆｓ６が検出されたとする。このとき制御部１０１は、顔枠Ｆｓ６の面積が一番大きいと判別したとすると、顔枠Ｆｓ６を選択する。ここでは中心線Ｏから顔枠Ｆｓ６の左端までの長さｓＬ３よりも、中心線Ｏから顔枠Ｆｓ６の右端までの長さｓＲ３の方が長いので、制御部１０１は、長さｓＲ３を算出用長さｓとして設定する。 For example, as shown in FIG. 12, it is assumed that a face frame Fs4, a face frame Fs5, and a face frame Fs6 are detected in order from the left by the face recognition process. At this time, if it is determined that the area of the face frame Fs6 is the largest, the control unit 101 selects the face frame Fs6. Here, since the length sR3 from the center line O to the right end of the face frame Fs6 is longer than the length sL3 from the center line O to the left end of the face frame Fs6, the control unit 101 calculates the length sR3. Set as length s.

そして制御部１０１は、上述した第１の実施の形態と同様に、式（３）及び式（４）を用いて顔画角αを算出する。 And the control part 101 calculates the face angle of view (alpha) using Formula (3) and Formula (4) similarly to 1st Embodiment mentioned above.

つまり制御部１０１は、スルー画像Ｔｐにおいて一番面積の大きい顔枠Ｆｓが占める範囲を検出する。そして制御部１０１は、中心線Ｏを中心とする、当該顔枠Ｆｓが占める範囲を含む最小の範囲（つまり中心線Ｏから左右に算出用長さｓの範囲）を検出し、この範囲を用いて当該顔枠Ｆｓが示す顔が占める範囲の画角を顔画角αとして算出する。 That is, the control unit 101 detects a range occupied by the face frame Fs having the largest area in the through image Tp. Then, the control unit 101 detects a minimum range including the range occupied by the face frame Fs with the center line O as the center (that is, a range of the calculation length s from the center line O to the left and right), and uses this range. Then, the angle of view of the range occupied by the face indicated by the face frame Fs is calculated as the face angle of view α.

そして制御部１０１は、上述した第１の実施の形態と同様に、顔画角αを用いて式（５）により適切指向角βを算出し、指向角可変部１１２を介して、マイクロホン部１１１の指向角を適切指向角βとなるように制御する。 Then, as in the first embodiment described above, the control unit 101 calculates the appropriate directivity angle β using the face angle of view α using Equation (5), and the microphone unit 111 via the directivity angle variable unit 112. Is controlled so as to be an appropriate directivity angle β.

以上の構成によればＤＶＣ２００は、複数の顔が認識された場合、複数の顔枠Ｆｓの中から最も面積の大きい顔枠Ｆｓが示す顔の人物を、ＤＶＣ２００に向かって声を発している人物であると予測し、当該最も面積の大きい顔枠Ｆｓを選択する。 According to the above configuration, when a plurality of faces are recognized, the DVC 200 is a person who utters a person whose face is indicated by the face frame Fs having the largest area among the plurality of face frames Fs toward the DVC 200. And the face frame Fs having the largest area is selected.

そしてＤＶＣ２００は、選択した顔枠Ｆｓが含まれる範囲に基づいて、当該顔枠Ｆｓが示す人物の顔が占める範囲の画角（顔画角α）を算出し、この顔画角αに基づいて、マイクロホン部１１１の指向性を制御するようにした。 The DVC 200 calculates an angle of view (face angle of view α) of a range occupied by the face of the person indicated by the face frame Fs based on the range including the selected face frame Fs, and based on the face angle of view α. The directivity of the microphone unit 111 is controlled.

これによりＤＶＣ２００は、ＤＶＣ２００に最も距離が近い人物、つまりＤＶＣ２００に向かって声を発している可能性が高い人物の顔が占める範囲から入力される音声を強調する指向性で音声を入力できる。ゆえにＤＶＣ２００は、ＤＶＣ２００に向かって声を発している可能性が高い人物の声を強調して入力できるので、一段と確実に人物が発する声を強調して入力することができる。 As a result, the DVC 200 can input the voice with a directivity that emphasizes the voice input from the range occupied by the face of the person closest to the DVC 200, that is, the person who is likely to speak to the DVC 200. Therefore, the DVC 200 can emphasize and input the voice of a person who has a high possibility of speaking toward the DVC 200, and thus can more reliably emphasize and input the voice of the person.

＜３．第３の実施の形態＞
次に第３の実施の形態について説明する。第３の実施の形態における撮像装置２０は、上述した第２の実施の形態における撮像装置１０（図１１）と機能構成については同様であるので、第２の実施の形態を参照とする。 <3. Third Embodiment>
Next, a third embodiment will be described. Since the imaging apparatus 20 in the third embodiment is similar in functional configuration to the imaging apparatus 10 (FIG. 11) in the second embodiment described above, reference is made to the second embodiment.

このような撮像装置２０の具体例であるＤＶＣ３００について、以下、詳しく説明する。尚ＤＶＣ３００のハードウェア構成については、第１の実施の形態におけるＤＶＣ１００のハードウェア構成（図３）と同様であるので第１の実施の形態を参照とする。 The DVC 300 that is a specific example of such an imaging apparatus 20 will be described in detail below. Note that the hardware configuration of the DVC 300 is the same as the hardware configuration of the DVC 100 in the first embodiment (FIG. 3), so the first embodiment will be referred to.

［３−１．指向角制御処理］
第３の実施の形態におけるＤＶＣ３００の顔認識処理部１１０は、顔認識処理において、第１の実施の形態と同様にスルー画像Ｔｐから人物の顔を認識すると共に、人物の口を認識する。そして顔認識処理部１１０は、認識された顔において口が認識されたか否かも顔認識処理の結果として制御部１０１に返す。 [3-1. Directional angle control processing]
In the face recognition process, the face recognition processing unit 110 of the DVC 300 in the third embodiment recognizes a person's face from the through image Tp and recognizes the person's mouth in the same manner as in the first embodiment. Then, the face recognition processing unit 110 also returns to the control unit 101 as a result of the face recognition processing whether or not the mouth has been recognized in the recognized face.

ここで口が認識された顔の人物は、ＤＶＣ３００に向かって声を発している可能性が高いと考えられる。ゆえに制御部１０１は、顔認識処理部１１０から顔認識処理の結果を取得すると、口が認識された顔枠Ｆｓが示す顔の人物を、声を発している人物であると予測し、当該口が認識された顔枠Ｆｓを選択する。 Here, it is considered that the face person whose mouth is recognized has a high possibility of speaking toward the DVC 300. Therefore, when the control unit 101 acquires the result of the face recognition processing from the face recognition processing unit 110, the control unit 101 predicts the person of the face indicated by the face frame Fs whose mouth is recognized as the person who is speaking, and The face frame Fs in which is recognized is selected.

例えば図１３に示すように、顔認識処理により、左から順に顔枠Ｆｓ７、顔枠Ｆｓ８、顔枠Ｆｓ９が検出されたとする。尚、顔枠Ｆｓ７においては、例えば人物が横を向いているために口が認識されていないとし、顔枠Ｆｓ８及び顔枠Ｆｓ９においては、口が認識されているとする。このとき制御部１０１は、口が認識されている顔枠Ｆｓ８及び顔枠Ｆｓ９を選択する。 For example, as shown in FIG. 13, it is assumed that a face frame Fs7, a face frame Fs8, and a face frame Fs9 are detected in order from the left by the face recognition process. In the face frame Fs7, for example, it is assumed that the mouth is not recognized because a person is facing sideways, and the mouth is recognized in the face frames Fs8 and Fs9. At this time, the control unit 101 selects the face frame Fs8 and the face frame Fs9 whose mouth is recognized.

そして制御部１０１は、選択した顔枠Ｆｓの中で、中心線Ｏから一番遠い顔枠Ｆｓの端までの長さを算出用長さｓとして設定する。 Then, the control unit 101 sets the length from the center line O to the end of the face frame Fs farthest in the selected face frame Fs as the calculation length s.

図１３に示す場合では、中心線Ｏから顔枠Ｆｓ８の左端までの長さｓＬ４よりも、中心線Ｏから顔枠Ｆｓ９の右端までの長さｓＲ４の方が長いので、制御部１０１は、長さｓＲ４を算出用長さｓとして設定する。 In the case illustrated in FIG. 13, the length sR4 from the center line O to the right end of the face frame Fs9 is longer than the length sL4 from the center line O to the left end of the face frame Fs8. The length sR4 is set as the calculation length s.

つまり制御部１０１は、スルー画像Ｔｐにおいて、中心線Ｏを中心とする、口が認識された顔枠Ｆｓを全て含む最小の範囲（つまり中心線Ｏから左右に算出用長さｓの範囲）を検出する。そして制御部１０１は、この範囲と撮像画角θとに基づいて、口が認識された顔を全て含む最小の範囲の画角を顔画角αとして算出する。 That is, in the through image Tp, the control unit 101 sets a minimum range including all the face frames Fs in which the mouth is recognized centered on the center line O (that is, a range of the calculation length s from the center line O to the left and right). To detect. Then, the control unit 101 calculates, based on this range and the imaging field angle θ, the minimum field angle including all the faces whose mouth is recognized as the face field angle α.

以上の構成によればＤＶＣ３００は、認識された顔枠Ｆｓの中から口が認識された一又は複数の顔枠Ｆｓが示す顔の人物を、ＤＶＣ３００に向かって声を発している人物であると予測し、当該顔枠Ｆｓを選択する。 According to the above configuration, the DVC 300 is a person who is speaking out to the DVC 300 a person whose face is indicated by one or more face frames Fs whose mouth is recognized from among the recognized face frames Fs. Predict and select the face frame Fs.

そしてＤＶＣ３００は、選択した顔枠Ｆｓが全て含まれる範囲に基づいて、当該顔枠Ｆｓが示す人物の顔が全て含まれるような範囲の画角（顔画角α）を算出し、この顔画角αに基づいて、マイクロホン部１１１の指向性を制御するようにした。 Then, the DVC 300 calculates an angle of view (face angle of view α) of a range in which all the faces of the person indicated by the face frame Fs are included based on the range including all of the selected face frame Fs. The directivity of the microphone unit 111 is controlled based on the angle α.

これによりＤＶＣ３００は、ＤＶＣ３００に向かって口を向けている人物、つまりＤＶＣ３００に向かって声を発している可能性が高い人物の顔が占める範囲から入力される音声を強調する指向性で音声を入力できる。ゆえにＤＶＣ３００は、ＤＶＣ３００に向かって声を発している可能性が高い一又は複数の人物の声を強調して入力できるので、一段と確実に人物が発する声を強調して入力することができる。 As a result, the DVC 300 inputs the voice with the directivity that emphasizes the voice inputted from the range occupied by the face of the person whose mouth is facing the DVC 300, that is, the person who is likely to speak to the DVC 300. it can. Therefore, the DVC 300 can input the voice of one or a plurality of persons who are highly likely to speak toward the DVC 300, and can input the voice generated by the person with more certainty.

＜４．第４の実施の形態＞
次に第４の実施の形態について説明する。第４の実施の形態における撮像装置４０は、上述した第２の実施の形態における撮像装置１０（図１１）と機能構成については同様であるので、第２の実施の形態を参照とする。 <4. Fourth Embodiment>
Next, a fourth embodiment will be described. Since the imaging apparatus 40 in the fourth embodiment is similar in functional configuration to the imaging apparatus 10 (FIG. 11) in the second embodiment described above, reference is made to the second embodiment.

このような撮像装置４０の具体例であるＤＶＣ４００について、以下、詳しく説明する。尚ＤＶＣ４００のハードウェア構成についても、第１の実施の形態におけるＤＶＣ１００のハードウェア構成（図３）と同様であるので第１の実施の形態を参照とする。 The DVC 400, which is a specific example of such an imaging apparatus 40, will be described in detail below. The hardware configuration of the DVC 400 is the same as the hardware configuration (FIG. 3) of the DVC 100 in the first embodiment, and therefore the first embodiment is referred to.

［４−１．指向角制御処理］
第４の実施の形態におけるＤＶＣ１００の制御部１０１は、顔認識処理部１１０から顔認識処理の結果を取得すると、これに基づいて、スルー画像Ｔｐにおける顔枠Ｆｓの位置及び大きさを検出する。そして制御部１０１は、液晶パネル１０６に表示されているスルー画像Ｔｐ上に顔枠Ｆｓを表示させる。 [4-1. Directional angle control processing]
When the control unit 101 of the DVC 100 according to the fourth embodiment acquires the result of the face recognition processing from the face recognition processing unit 110, the control unit 101 detects the position and size of the face frame Fs in the through image Tp based on this result. Then, the control unit 101 displays the face frame Fs on the through image Tp displayed on the liquid crystal panel 106.

そして制御部１０１は、ユーザ操作によりタッチパネル１０４又は操作部１０５を介して、任意の顔枠Ｆｓが指定されると、指定された顔枠Ｆｓを選択する。 Then, when an arbitrary face frame Fs is designated by the user operation via the touch panel 104 or the operation unit 105, the control unit 101 selects the designated face frame Fs.

例えば図１４に示すように、顔認識処理により、左から順に顔枠Ｆｓ１０、顔枠Ｆｓ１１、顔枠Ｆｓ１２が検出されたとする。このとき、ユーザ操作によりタッチパネル１０４又は操作部１０５を介して、例えば顔枠Ｆｓ１１が指定された場合、制御部１０１は、顔枠Ｆｓ１１を選択する。 For example, as shown in FIG. 14, it is assumed that a face frame Fs10, a face frame Fs11, and a face frame Fs12 are detected in order from the left by the face recognition process. At this time, for example, when the face frame Fs11 is designated by the user operation via the touch panel 104 or the operation unit 105, the control unit 101 selects the face frame Fs11.

そして制御部１０１は、選択した顔枠Ｆｓにおいて、中心線Ｏから右端までの長さ及び中心線Ｏから左端までの長さを算出し、これらのうち長い方を算出用長さｓとして設定する。 Then, the control unit 101 calculates the length from the center line O to the right end and the length from the center line O to the left end in the selected face frame Fs, and sets the longer one as the calculation length s. .

図１４に示す場合では、中心線Ｏから顔枠Ｆｓ１１の左端までの長さｓＬ５の方が中心線Ｏから顔枠Ｆｓ１１の右端までの長さｓＲ５よりも長いので、制御部１０１は、長さｓＬ５を算出用長さｓとして設定する。 In the case illustrated in FIG. 14, the length sL5 from the center line O to the left end of the face frame Fs11 is longer than the length sR5 from the center line O to the right end of the face frame Fs11. sL5 is set as the calculation length s.

つまり制御部１０１は、スルー画像Ｔｐにおいて、中心線Ｏを中心とする、ユーザ入力に基づいて選択した顔枠Ｆｓが占める範囲を含む最小の範囲（つまり中心線Ｏから左右に算出用長さｓの範囲）を検出する。そして制御部１０１は、この範囲を用いて選択した顔枠Ｆｓに対応する顔が占める範囲の画角を顔画角αとして算出する。 That is, in the through image Tp, the control unit 101 has the minimum range including the range occupied by the face frame Fs selected based on the user input centered on the center line O (that is, the calculation length s from the center line O to the left and right). ). Then, the control unit 101 calculates the angle of view of the range occupied by the face corresponding to the face frame Fs selected using this range as the face angle of view α.

また制御部１０１は、図１４（Ｂ）に示すように、選択している顔枠Ｆｓ１１を強調して（例えば二重線などで）表示させる。これと共に制御部１０１は、指向角制御処理を行っていることを示す指向角制御アイコンＩｃをスルー画像Ｔｐ上に表示させる。 Further, as shown in FIG. 14B, the control unit 101 emphasizes the selected face frame Fs11 and displays it (for example, with a double line). At the same time, the control unit 101 displays a directivity angle control icon Ic indicating that directivity angle control processing is being performed on the through image Tp.

またこれと共に制御部１０１は、中心線Ｏから左右に算出用長さｓの範囲、つまりマイクロホン部１１１に声が強調して入力される範囲を示すマイクロホンバーＢｍをスルー画像Ｔｐ上に表示させる。マイクロホンバーＢｍは、塗りつぶされた範囲がマイクロホン部１１１に声が強調して入力される範囲を示すようになされている。 At the same time, the control unit 101 displays on the through image Tp a microphone bar Bm indicating the range of the calculation length s from the center line O to the left and right, that is, the range in which voice is emphasized and input to the microphone unit 111. In the microphone bar Bm, the filled range indicates a range where the voice is input to the microphone unit 111 with emphasis.

これによりＤＶＣ４００は、現在どの範囲がマイクロホン部１１１に声が強調して入力されるかをユーザに通知することができるようになされている。 As a result, the DVC 400 can notify the user which range is currently input to the microphone unit 111 with the voice emphasized.

以上の構成によればＤＶＣ４００は、認識された顔枠Ｆｓの中から、タッチパネル１０４又は操作部１０５を介したユーザ操作に基づいて顔枠Ｆｓを選択する。そしてＤＶＣ４００は、選択した顔枠Ｆｓが含まれる範囲に基づいて、当該顔枠Ｆｓが示す人物の顔が占める範囲の画角（顔画角α）を算出する。そしてＤＶＣ４００は、この顔画角αに基づいてマイクロホン部１１１の指向性を制御するようにした。 According to the above configuration, the DVC 400 selects the face frame Fs from the recognized face frames Fs based on a user operation via the touch panel 104 or the operation unit 105. Then, the DVC 400 calculates the angle of view (face angle of view α) of the range occupied by the face of the person indicated by the face frame Fs based on the range including the selected face frame Fs. The DVC 400 controls the directivity of the microphone unit 111 based on the face angle of view α.

これによりＤＶＣ４００は、ユーザ操作に基づいて選択した人物の顔が占める範囲から入力される声を強調する指向性で声を入力できるので、ユーザの所望する人物が発する声を強調して入力することができる。 As a result, the DVC 400 can input a voice with a directivity that emphasizes the voice input from the range occupied by the face of the person selected based on the user operation. Therefore, the DVC 400 emphasizes and inputs the voice uttered by the person desired by the user. Can do.

＜５．他の実施の形態＞
［５−１．他の実施の形態１］
尚上述した第１の実施の形態では、制御部１０１は、顔認識処理により認識された人物の顔が占める範囲の画角（顔画角α）を算出し、これに基づいてマイクロホン部１１１の指向性を制御するようにした。 <5. Other embodiments>
[5-1. Other Embodiment 1]
In the first embodiment described above, the control unit 101 calculates the angle of view (face angle of view α) of the range occupied by the face of the person recognized by the face recognition process, and based on this, the control unit 101 calculates the angle of view. The directivity was controlled.

これに限らず制御部１０１は、顔認識処理により人物の口が認識された場合、人物の口が占める範囲の画角を算出し、これに基づいてマイクロホン部１１１の指向性を制御するようにしてもよい。 Not limited to this, when the person's mouth is recognized by the face recognition process, the control unit 101 calculates the angle of view of the range occupied by the person's mouth, and controls the directivity of the microphone unit 111 based on this. May be.

この場合、制御部１０１は、顔認識処理の結果から、図１５に示すように口と認識された部分を示す矩形の枠（これを口枠とも呼ぶ）Ｍｓの位置及び大きさを検出する。 In this case, the control unit 101 detects the position and size of a rectangular frame (also referred to as a mouth frame) Ms indicating a portion recognized as a mouth as shown in FIG. 15 from the result of the face recognition process.

そして制御部１０１は、口枠Ｍｓにおいて、中心線Ｏから右端までの長さｓＲ６及び中心線Ｏから左端ｓＬ６までの長さを算出し、これらのうち長い方（図１５ではｓＬ６）を算出用長さｓとして設定する。つまり制御部１０１は、スルー画像Ｔｐの中心線Ｏを中心とする、口枠Ｍｓが占める範囲を含む最小の範囲を検出する。 Then, the control unit 101 calculates the length sR6 from the center line O to the right end and the length from the center line O to the left end sL6 in the mouth frame Ms, and the longer one (sL6 in FIG. 15) is used for calculation. Set as length s. That is, the control unit 101 detects the minimum range including the range occupied by the mouth frame Ms, with the center line O of the through image Tp as the center.

そして制御部１０１は、上述した第１の実施の形態と同様の方法で、顔画角αの代わりに、撮像画角θにおける口が占める範囲の画角を算出し、これを用いて適切指向角βを算出する。そして制御部１０１は、指向角可変部１１２を介して、適切指向角βとなるようにマイクロホン部１１１の指向角を制御する。 Then, the control unit 101 calculates the angle of view of the range occupied by the mouth in the imaging angle of view θ instead of the face angle of view α in the same manner as in the first embodiment described above, and uses this to appropriately direct The angle β is calculated. Then, the control unit 101 controls the directivity angle of the microphone unit 111 through the directivity angle variable unit 112 so that the proper directivity angle β is obtained.

これによりＤＶＣ１００は、撮像範囲において、人物の発する声の音源である口が占める範囲から入力される音声を強調する指向性で音声を入力できるので、一段と確実に人物の発する声を強調して入力することができる。 As a result, the DVC 100 can input the voice with directivity that emphasizes the voice input from the range occupied by the mouth, which is the sound source of the voice uttered by the person, in the imaging range, so that the voice uttered by the person is more reliably emphasized and input. can do.

またこれに限らず制御部１０１は、音声を発する被写体であれば、例えば動物など、この他種々の被写体が占める範囲の画角に基づいて、マイクロホン部１１１の指向性を制御するようにしてもよい。 In addition, the control unit 101 is not limited to this, and may control the directivity of the microphone unit 111 based on the angle of view of a range occupied by various other subjects such as animals, for example, as long as the subject emits sound. Good.

［５−２．他の実施の形態２］
また上述した第２の実施の形態では、制御部１０１は、最も面積の大きい顔枠Ｆｓが示す顔の人物を、声を発している人物であると予測し、この人物の顔が占める範囲の画角に基づいてマイクロホン部１１１の指向性を制御するようにした。 [5-2. Other Embodiment 2]
In the second embodiment described above, the control unit 101 predicts that the person with the face indicated by the face frame Fs having the largest area is the person who is speaking, and the range occupied by the face of the person is The directivity of the microphone unit 111 is controlled based on the angle of view.

これに限らず制御部１０１は、この他種々の方法で声を発している被写体を予測し、声を発していると予測された被写体が占める範囲の画角に基づいてマイクロホン部１１１の指向性を制御してもよい。 The control unit 101 is not limited to this, and predicts the subject that is speaking by various other methods, and the directivity of the microphone unit 111 is based on the angle of view of the range occupied by the subject predicted to be speaking. May be controlled.

例えば制御部１０１は、最も横幅の広い顔枠Ｆｓが示す顔の人物を、声を発している人物であると予測し、この人物の顔が占める範囲の画角に基づいてマイクロホン部１１１の指向性を制御してもよい。 For example, the control unit 101 predicts the person whose face is indicated by the widest face frame Fs as a person who is speaking, and directs the microphone unit 111 based on the angle of view of the range occupied by this person's face. Sex may be controlled.

この場合制御部１０１は、顔認識処理の結果に基づいて、検出された顔枠Ｆｓの横幅をそれぞれ算出し、最も横幅の広い顔枠Ｆｓがどれかを判別する。最も横幅の広い顔枠Ｆｓは、ＤＶＣ２００に距離が近く且つＤＶＣ２００に正面を向けていると考えられる。つまり最も横幅の広い顔枠Ｆｓが示す顔の人物は、ＤＶＣ２００に向かって声を発している可能性が高いと考えられる。 In this case, the control unit 101 calculates the width of each detected face frame Fs based on the result of the face recognition process, and determines which face frame Fs has the widest width. It is considered that the face frame Fs having the widest width is close to the DVC 200 and faces the DVC 200 in front. That is, it is considered that there is a high possibility that the person whose face is indicated by the widest face frame Fs is speaking toward the DVC 200.

例えば図１６（Ａ）に示すように、顔認識処理により、左から順に顔枠Ｆｓ１３、顔枠Ｆｓ１４、顔枠Ｆｓ１５が検出されたとする。このとき顔枠Ｆｓ１３が示す顔の人物はＤＶＣ２００から遠いために、図１６（Ｂ）に示すように、顔枠Ｆｓ１３の横幅Ｌ１３は最も狭いとする。 For example, as shown in FIG. 16A, it is assumed that a face frame Fs13, a face frame Fs14, and a face frame Fs15 are detected in order from the left by the face recognition process. At this time, since the face person indicated by the face frame Fs13 is far from the DVC 200, the horizontal width L13 of the face frame Fs13 is assumed to be the narrowest as shown in FIG.

また顔枠Ｆｓ１４が示す顔の人物は、最もＤＶＣ２００に近いため顔枠Ｆｓ１４の面積は最も大きいが、ＤＶＣ２００に対して斜めを向いているため、その横幅Ｌ１４は、顔枠Ｆｓ１５の横幅Ｌ１５よりも狭いとする。 Since the face person indicated by the face frame Fs14 is closest to the DVC 200, the face frame Fs14 has the largest area. However, since the face person faces obliquely with respect to the DVC 200, the width L14 is larger than the width L15 of the face frame Fs15. Narrow.

また顔枠Ｆｓ１５が示す顔の人物は、顔枠Ｆｓ１４よりも面積は小さいが、ＤＶＣ２００に対して正面を向いているため、その横幅Ｌ１５は最も広いとする。 Further, the face person indicated by the face frame Fs15 has a smaller area than the face frame Fs14, but faces the front with respect to the DVC 200, and therefore the width L15 is assumed to be the widest.

このとき制御部１０１は、最も横幅の広い顔枠Ｆｓ１５が示す顔の人物を、声を発している人物であると予測し、顔枠Ｆｓ１５を選択する。そして制御部１０１は、選択した顔枠Ｆｓ１５において、中心線Ｏから右端までの長さｓＲ７及び中心線Ｏから左端までの長さｓＬ７を算出し、これらのうち長い方（図１６の場合はｓＬ７）を算出用長さｓとして設定する。つまり制御部１０１は、スルー画像Ｔｐの中心線Ｏを中心とする、選択した顔枠Ｆｓ１５が占める範囲を含む最小の範囲を検出する。 At this time, the control unit 101 predicts the person with the face indicated by the widest face frame Fs15 as the person who is speaking, and selects the face frame Fs15. Then, the control unit 101 calculates the length sR7 from the center line O to the right end and the length sL7 from the center line O to the left end in the selected face frame Fs15, and the longer one (sL7 in the case of FIG. 16). ) Is set as the calculation length s. That is, the control unit 101 detects the minimum range including the range occupied by the selected face frame Fs15 centered on the center line O of the through image Tp.

そして制御部１０１は、上述した第１の実施の形態と同様に、顔画角α及び適切指向角βを算出し、指向角可変部１１２を介して、マイクロホン部１１１の指向角を適切指向角βとなるように制御する。 Then, as in the first embodiment described above, the control unit 101 calculates the face angle of view α and the appropriate directivity angle β, and sets the directivity angle of the microphone unit 111 via the directivity angle variable unit 112 to the appropriate directivity angle. Control to be β.

こうすることでＤＶＣ２００は、ＤＶＣ２００に向かって声を発している可能性が高い人物の顔が占める範囲から入力される音声を強調する指向性で音声を入力できる。ゆえにＤＶＣ２００は、声を発している可能性が高い人物が発する声を強調して入力できるので、一段と確実に人物が発する声を強調して入力することができる。 By doing so, the DVC 200 can input the voice with the directivity that emphasizes the voice input from the range occupied by the face of a person who is likely to speak to the DVC 200. Therefore, the DVC 200 can emphasize and input a voice uttered by a person who has a high possibility of uttering voice, so that the voice uttered by the person can be more reliably emphasized and input.

またこれに限らず制御部１０１は、例えば口が開いていると認識された顔の人物を、声を発している人物であると予測し、この人物の顔が占める範囲の画角に基づいてマイクロホン部１１１の指向性を制御するようにしてもよい。 For example, the control unit 101 predicts a person whose face is recognized as having an open mouth as a person who speaks, and based on the angle of view of the range occupied by the person's face. The directivity of the microphone unit 111 may be controlled.

この場合、顔認識処理部１１０は、顔認識処理においてスルー画像から人物の顔を認識すると共に人物の口が開いているか否かを認識し、顔認識処理の結果として制御部１０１に返す。 In this case, the face recognition processing unit 110 recognizes a person's face from the through image in the face recognition process, recognizes whether the person's mouth is open, and returns the result to the control unit 101 as a result of the face recognition process.

例えば図１７に示すように、顔認識処理により、左から順に顔枠Ｆｓ１６、顔枠Ｆｓ１７、顔枠Ｆｓ１８が検出されたとする。尚、顔枠Ｆｓ１６及びＦｓ１８においては、例えば人物が口を閉じているために口が開いていないと認識され、顔枠Ｆｓ１７においては、口が開いていると認識されたとする。 For example, as shown in FIG. 17, it is assumed that a face frame Fs16, a face frame Fs17, and a face frame Fs18 are detected in order from the left by the face recognition process. In the face frames Fs16 and Fs18, for example, it is recognized that the mouth is not open because a person closes his mouth, and in the face frame Fs17, it is recognized that the mouth is open.

ここで口が開いていると認識された顔の人物は、ＤＶＣ３００に向かって声を発している可能性が一段と高いと考えられる。ゆえに制御部１０１は、顔認識処理部１１０から顔認識処理の結果を取得すると、口が開いていると認識された顔枠Ｆｓ１７が示す顔の人物を、声を発している人物であると予測し、当該顔枠Ｆｓ１７を選択する。 Here, it is considered that the person whose face is recognized as having an open mouth is more likely to be speaking toward the DVC 300. Therefore, when the control unit 101 acquires the result of the face recognition processing from the face recognition processing unit 110, the control unit 101 predicts that the person of the face indicated by the face frame Fs17 recognized as having an open mouth is a person who is speaking. Then, the face frame Fs17 is selected.

そして制御部１０１は、選択した顔枠Ｆｓ１７において、中心線Ｏから右端までの長さｓＲ８及び中心線Ｏから左端ｓＬ８までの長さを算出し、これらのうち長い方（図１７ではｓＬ８）を算出用長さｓとして設定する。つまり制御部１０１は、スルー画像Ｔｐの中心線Ｏを中心とする、選択した顔枠Ｆｓ１７が占める範囲を含む最小の範囲を検出する。 Then, the control unit 101 calculates the length sR8 from the center line O to the right end and the length from the center line O to the left end sL8 in the selected face frame Fs17, and the longer one (sL8 in FIG. 17) is calculated. Set as the calculation length s. That is, the control unit 101 detects the minimum range including the range occupied by the selected face frame Fs17 centered on the center line O of the through image Tp.

これによりＤＶＣ３００は、口を開けている人物、つまり声を発している可能性が一段と高い人物の顔が占める範囲から入力される音声を強調する指向性で音声を入力できる。ゆえにＤＶＣ３００は、声を発している可能性が一段と高い人物の発する声を強調して入力できるので、一段と確実に人物が発する声を強調して入力することができる。 As a result, the DVC 300 can input the voice with directivity that emphasizes the voice input from the range occupied by the face of the person whose mouth is open, that is, the person who is more likely to speak. Therefore, the DVC 300 can emphasize and input a voice uttered by a person who has a higher possibility of speaking, so that the voice uttered by a person can be more surely input.

またこれに限らず制御部１０１は、顔認識処理によって認識された顔の中から、主被写体（例えば構図のバランスが最もよい被写体など）を認識し、主被写体として認識された顔の人物を、声を発している人物であると予測するようにしてもよい。そして制御部１０１は、声を発していると予測された人物を選択し、選択した人物の顔が占める範囲の画角に基づいてマイクロホン部１１１の指向性を制御するようにしてもよい。 In addition, the control unit 101 recognizes a main subject (for example, a subject with the best composition balance) from the faces recognized by the face recognition process, and determines the person of the face recognized as the main subject. It may be predicted that the person is speaking. Then, the control unit 101 may select a person predicted to be speaking and control the directivity of the microphone unit 111 based on the angle of view of the range occupied by the face of the selected person.

［５−３．他の実施の形態３］
さらに上述した第４の実施の形態では、制御部１０１は、ユーザ入力により指定された顔枠Ｆｓを選択し、選択した顔枠Ｆｓが示す顔が占める範囲の画角に基づいてマイクロホン部１１１の指向性を制御するようにした。 [5-3. Other Embodiment 3]
Furthermore, in the fourth embodiment described above, the control unit 101 selects the face frame Fs designated by the user input, and based on the angle of view of the range occupied by the face indicated by the selected face frame Fs, the control unit 101 The directivity was controlled.

これに限らず制御部１０１は、認識された顔枠Ｆｓの中からこの他種々の方法で顔枠Ｆｓを選択し、選択した顔枠Ｆｓが示す顔が占める範囲の画角に基づいてマイクロホン部１１１の指向性を制御してもよい。 Not limited to this, the control unit 101 selects the face frame Fs from the recognized face frame Fs by various other methods, and the microphone unit based on the angle of view of the range occupied by the face indicated by the selected face frame Fs. The directivity of 111 may be controlled.

例えばＤＶＣ４００は、予め各個人に優先度が設定されている場合、この優先度に基づいて顔枠Ｆｓを選択するようにしてもよい。 For example, when a priority is set for each individual in advance, the DVC 400 may select the face frame Fs based on this priority.

この場合、顔認識処理部１１０は、顔認識処理により検出された顔枠Ｆｓがどの個人であるかを、予めフラッシュメモリ１０２等に記録されている各個人の顔の特徴量に基づいて認識する個人認識処理を行い、その結果を制御部１０１に送る。 In this case, the face recognition processing unit 110 recognizes which individual the face frame Fs detected by the face recognition process is based on the feature amount of each individual face recorded in advance in the flash memory 102 or the like. Individual recognition processing is performed, and the result is sent to the control unit 101.

制御部１０１は、予めフラッシュメモリ１０２等に記録されている各個人の優先度に基づいて、認識された個人の優先度を判別し、最も優先度の高い個人の顔を示す顔枠Ｆｓを選択する。 The control unit 101 determines the priority of the recognized individual based on the priority of each individual recorded in advance in the flash memory 102 or the like, and selects the face frame Fs indicating the face of the individual with the highest priority. To do.

例えば図１８（Ａ）に示すように、顔認識処理により、左から順に顔枠Ｆｓ１９、顔枠Ｆｓ２０、顔枠Ｆｓ２１が検出されたとする。またここでは、図１８（Ｂ）に示すように、顔枠Ｆｓ１９が示す顔の個人は優先度が１であるとし、顔枠Ｆｓ２０が示す顔の個人は優先度が３であるとし、顔枠Ｆｓ２１が示す顔の個人は優先度が２であるとする。 For example, as shown in FIG. 18A, it is assumed that the face frame Fs19, the face frame Fs20, and the face frame Fs21 are detected in order from the left by the face recognition process. Further, here, as shown in FIG. 18B, it is assumed that the individual of the face indicated by the face frame Fs19 has a priority of 1, the individual of the face indicated by the face frame Fs20 has a priority of 3, and the face frame The individual of the face indicated by Fs21 has a priority of 2.

このとき制御部１０１は、最も優先度の高い個人の顔を示す顔枠Ｆｓ２０を選択して、中心線Ｏから右端までの長さｓＲ９及び中心線Ｏから左端ｓＬ９までの長さを算出し、これらのうち長い方（図１８ではｓＬ９）を算出用長さｓとして設定する。つまり制御部１０１は、スルー画像Ｔｐの中心線Ｏを中心とする、選択した顔枠Ｆｓ２０が占める範囲を含む最小の範囲を検出する。 At this time, the control unit 101 selects the face frame Fs20 indicating the face of the individual with the highest priority, calculates the length sR9 from the center line O to the right end and the length from the center line O to the left end sL9, Of these, the longer one (sL9 in FIG. 18) is set as the calculation length s. That is, the control unit 101 detects the minimum range including the range occupied by the selected face frame Fs20 with the center line O of the through image Tp as the center.

これによりＤＶＣ４００は、予め設定された優先度が最も高い人物の顔が占める範囲から入力される音声を強調する指向性で音声を入力できるので、当該優先度が最も高い人物の発する声を強調して入力できる。 As a result, the DVC 400 can input the voice with the directivity that emphasizes the voice input from the range occupied by the face of the person with the highest priority set in advance, so that the voice uttered by the person with the highest priority is emphasized. Can be entered.

また例えば制御部１０１は、子供の優先度を大人の優先度よりも高く設定しておき、この優先度に基づいて顔枠Ｆｓを選択するようにしてもよい。 For example, the control unit 101 may set the priority of the child higher than the priority of the adult, and may select the face frame Fs based on this priority.

この場合、顔認識処理部１１０は、顔認識処理により検出された顔枠Ｆｓが示す人物の年齢を判別する年齢判別処理を行い、その結果を制御部１０１に送る。 In this case, the face recognition processing unit 110 performs an age determination process for determining the age of the person indicated by the face frame Fs detected by the face recognition process, and sends the result to the control unit 101.

制御部１０１は、判別した年齢に基づいて、認識された顔の優先度を判別し、最も優先度の高い人物（例えば子供）の顔を示す顔枠Ｆｓを選択する。 The control unit 101 determines the priority of the recognized face based on the determined age, and selects the face frame Fs indicating the face of the person with the highest priority (for example, a child).

またこれに限らず制御部１０１は、認識された顔枠Ｆｓの各々に対してこの他種々の方法で設定された優先度に基づいて顔枠Ｆｓを選択し、選択した顔枠Ｆｓが示す顔が占める範囲の画角に基づいてマイクロホン部１１１の指向性を制御してもよい。 In addition, the control unit 101 selects the face frame Fs based on the priorities set by various other methods for each recognized face frame Fs, and the face indicated by the selected face frame Fs. The directivity of the microphone unit 111 may be controlled based on the angle of view of the range occupied by.

［５−４．他の実施の形態４］
さらに上述した第１の実施の形態では、制御部１０１は、スルー画像Ｔｐから人物の顔が認識されなかった場合、マイクロホン部１１１の指向角の制御を行わないようにした。 [5-4. Other Embodiment 4]
Furthermore, in the first embodiment described above, the control unit 101 does not control the directivity angle of the microphone unit 111 when a human face is not recognized from the through image Tp.

これに限らず制御部１０１は、スルー画像Ｔｐから人物の顔が認識されなかった場合、マイクロホン部１１１を無指向性にするようにしてもよい。 Not limited to this, the control unit 101 may make the microphone unit 111 omnidirectional when a human face is not recognized from the through image Tp.

これによりＤＶＣ１００は、撮影時の状況に適した指向性で音声を入力することができる。スルー画像Ｔｐから人物の顔が認識されなかった場合は、風景などを撮影していることが多く、様々な方向から音声が入力されると考えられるからである。 As a result, the DVC 100 can input sound with directivity suitable for the situation at the time of shooting. This is because when a person's face is not recognized from the through image Tp, a landscape or the like is often photographed, and it is considered that sound is input from various directions.

またこれに限らず、マイクロホン部１１１がサラウンドマイクロホンとしての機能を有するのであれば、制御部１０１は、スルー画像Ｔｐから人物の顔が認識されなかった場合、マイクロホン部１１１をサラウンドマイクロホンとして機能させるようにしてもよい。 Not limited to this, if the microphone unit 111 has a function as a surround microphone, the control unit 101 causes the microphone unit 111 to function as a surround microphone when a person's face is not recognized from the through image Tp. It may be.

［５−５．他の実施の形態５］
さらに上述した第４の実施の形態では、制御部１０１は、ユーザ操作により指定された顔枠Ｆｓを選択し、選択した顔枠Ｆｓが示す顔の人物が発する声を強調するように、マイクロホン部１１１の指向性を制御するようにした。 [5-5. Other Embodiment 5]
Further, in the fourth embodiment described above, the control unit 101 selects the face frame Fs specified by the user operation, and emphasizes the voice uttered by the person of the face indicated by the selected face frame Fs. 111 directivity was controlled.

これに限らず、制御部１０１は、ユーザ操作により指定された顔枠Ｆｓが示す顔の人物が発する声を強調しないように、マイクロホン部１１１の指向性を制御するようにしてもよい。 Not limited to this, the control unit 101 may control the directivity of the microphone unit 111 so as not to emphasize the voice uttered by the face person indicated by the face frame Fs specified by the user operation.

具体的に、例えば図１４（Ａ）に示すように、顔枠Ｆｓ１０、顔枠Ｆｓ１１、顔枠Ｆｓ１２が検出されたとする。このとき制御部１０１は、ユーザ操作により顔枠Ｆ１０が指定されたと認識すると、顔枠Ｆｓ１０よりも中心線Ｏに近い顔枠Ｆｓ（つまり顔枠Ｆｓ１１及び顔枠Ｆｓ１２）のみが含まれる範囲を検出する。そして制御部１０１は、この範囲を用いて、顔枠Ｆｓ１１及び顔枠Ｆｓ１２が示す顔の人物のみが含まれる範囲の画角を算出し、これに基づいてマイクロホン部１１１の指向角を制御する。 Specifically, for example, as shown in FIG. 14A, it is assumed that a face frame Fs10, a face frame Fs11, and a face frame Fs12 are detected. At this time, when the control unit 101 recognizes that the face frame F10 is designated by the user operation, the control unit 101 detects a range including only the face frame Fs closer to the center line O than the face frame Fs10 (that is, the face frame Fs11 and the face frame Fs12). To do. Then, using this range, the control unit 101 calculates the angle of view of the range including only the face person indicated by the face frame Fs11 and the face frame Fs12, and controls the directivity angle of the microphone unit 111 based on this.

これによりＤＶＣ４００は、ユーザ操作により指定された顔枠Ｆｓ１０が示す顔の人物が発する声は強調して入力されないようにできる。またこれと共にＤＶＣ１００は、顔枠Ｆｓ１０よりも中心線Ｏ寄りの顔枠Ｆｓ１１及び顔枠Ｆｓ１２が示す顔の人物が発する声が強調して入力されるようにすることができる。 As a result, the DVC 400 can emphasize and prevent the voice from the person whose face is indicated by the face frame Fs10 designated by the user operation from being input. At the same time, the DVC 100 can emphasize and input a voice uttered by the face person indicated by the face frame Fs11 and the face frame Fs12 closer to the center line O than the face frame Fs10.

［５−６．他の実施の形態６］
さらに上述した第１の実施の形態では、制御部１０１は、撮影モードに切り替わると共に指向角制御処理を開始するようにした。そして制御部１０１は、タッチパネル１０４又は操作部１０５を介して指向角制御処理を終了するよう指示されない限り、指向角制御処理を継続して実行するようにした。 [5-6. Other Embodiment 6]
Furthermore, in the first embodiment described above, the control unit 101 switches to the shooting mode and starts the directivity angle control process. The control unit 101 continuously executes the directivity angle control process unless instructed via the touch panel 104 or the operation unit 105 to end the directivity angle control process.

これに限らず、制御部１０１は、この他種々のタイミングで指向角制御処理を開始したり終了したりするようにしてもよい。 However, the present invention is not limited to this, and the control unit 101 may start and end the directivity control process at various other timings.

例えば制御部１０１は、ＤＶＣ１００が通常撮影される向きから９０度回転された向きで撮影されているとき（つまり縦撮りされているとき）は、指向角制御処理を実行しないようにしてもよい。この場合ＤＶＣ１００には、ＤＶＣ１００が９０度回転されたことを認識できるセンサ（例えばジャイロセンサなど）が設けられているとする。 For example, the control unit 101 may not execute the directivity control process when the DVC 100 is photographed in a direction rotated 90 degrees from the normal photographing direction (that is, when the DVC 100 is photographed vertically). In this case, it is assumed that the DVC 100 is provided with a sensor (for example, a gyro sensor) that can recognize that the DVC 100 has been rotated 90 degrees.

［５−７．他の実施の形態７］
さらに上述した第４の実施の形態では、制御部１０１は、ユーザ操作によりスルー画像Ｔｐ上の顔枠Ｆｓを指定させることで、マイクロホン部１１１に声が強調して入力される範囲を決定するようにした。 [5-7. Other Embodiment 7]
Furthermore, in the fourth embodiment described above, the control unit 101 determines a range in which voice is emphasized and input to the microphone unit 111 by designating the face frame Fs on the through image Tp by a user operation. I made it.

これに限らず制御部１０１は、この他種々のユーザ操作により、マイクロホン部１１１に声が強調して入力される範囲を決定するようにしてもよい。 However, the present invention is not limited to this, and the control unit 101 may determine a range in which voice is emphasized and input to the microphone unit 111 by various other user operations.

例えば制御部１０１は、マイクロホンバーＢｍ（図１３（Ｂ））に対するタッチ操作によりマイクロホン部１１１に声が強調して入力される範囲を決定するようにしてもよい。 For example, the control unit 101 may determine a range in which voice is emphasized and input to the microphone unit 111 by a touch operation on the microphone bar Bm (FIG. 13B).

この場合制御部１０１は、マイクロホンバーＢｍがタッチ操作されるごとにマイクロホンバーＢｍの範囲を切り替える。図１３（Ａ）に示す場合を考えると、例えば顔枠Ｆｓ１１のみが含まれる範囲、顔枠Ｆｓ１１及び顔枠Ｆｓ１２が含まれる範囲、全ての顔枠Ｆｓが含まれる範囲といった順に切り替える。 In this case, the control unit 101 switches the range of the microphone bar Bm every time the microphone bar Bm is touched. Considering the case shown in FIG. 13A, for example, a range including only the face frame Fs11, a range including the face frame Fs11 and the face frame Fs12, and a range including all the face frames Fs are switched in this order.

そして制御部１０１は、マイクロホンバーＢｍが示す範囲をマイクロホン部１１１に声が強調して入力される範囲として決定し、この範囲の画角に基づいて、マイクロホン部１１１の指向性を制御する。 Then, the control unit 101 determines the range indicated by the microphone bar Bm as a range in which voice is emphasized and input to the microphone unit 111, and controls the directivity of the microphone unit 111 based on the angle of view of this range.

［５−８．他の実施の形態８］
さらに上述した第１の実施の形態では、制御部１０１は、認識された人物の顔が占める範囲の画角（顔画角α）を用いて式（５）により適切指向角βを算出して、適切指向角βとなるようマイクロホン部１１１の指向角を制御するようにした。 [5-8. Other Embodiment 8]
Furthermore, in the first embodiment described above, the control unit 101 calculates the appropriate directivity angle β using Equation (5) using the angle of view (face angle of view α) of the range occupied by the recognized human face. In addition, the directivity angle of the microphone unit 111 is controlled so that the proper directivity angle β is obtained.

これに限らず制御部１０１は、顔画角αをマイクロホン部１１１において制御されうる範囲内の指向角と対応付け、対応付けた指向角となるようにマイクロホン部１１１の指向角を制御するのであれば、この他種々の方法でマイクロホン部１１１の指向性を制御するようにしてもよい。 Not limited to this, the control unit 101 associates the face angle of view α with a directivity angle within a range that can be controlled by the microphone unit 111, and controls the directivity angle of the microphone unit 111 so that the correlated directivity angle is obtained. For example, the directivity of the microphone unit 111 may be controlled by various other methods.

例えば制御部１０１は、顔画角αの値と、当該値に対応する適切指向角βとを、予め対応付けてフラッシュメモリ１０２等に記録しておくようにしてもよい。この場合、例えば顔画角αがｘ度以上ｙ度以下の場合に、適切指向角βをｚ度とするといったように、顔画角αの範囲と適切指向角βとを対応付けておくようにしてもよい。 For example, the control unit 101 may record the value of the face angle of view α and the appropriate directivity angle β corresponding to the value in the flash memory 102 or the like in advance. In this case, for example, when the face angle of view α is greater than or equal to x degrees and less than or equal to y degrees, the range of the face angle of view α is associated with the appropriate direction angle β such that the appropriate directivity angle β is set to z degrees. It may be.

［５−９．他の実施の形態９］
さらに上述した第１の実施の形態では、ＤＶＣ１００に指向角が連続的に可変であるマイクロホン部１１１及び指向角可変部１１２を設けるようにした。これに限らず、ＤＶＣ１００では、指向性が可変なマイクロホンであれば、この他種々のマイクロホンを用いるようにしてもよい。 [5-9. Other Embodiment 9]
Furthermore, in the above-described first embodiment, the DVC 100 is provided with the microphone unit 111 and the directivity angle variable unit 112 whose directivity angle is continuously variable. The DVC 100 is not limited to this, and various other microphones may be used as long as the directivity is variable.

［５−１０．他の実施の形態１０］
さらに上述した第１の実施の形態では、制御部１０１は、スルー画像Ｔｐにおいて人物の顔が占める範囲と撮像画角θとに基づいて、人物の顔が占める範囲の画角（顔画角α）を算出し、この顔画角αに基づいてマイクロホン部１１１の指向性を制御するようにした。 [5-10. Other Embodiment 10]
Furthermore, in the first embodiment described above, the control unit 101 determines the field angle (face angle of view α) of the range occupied by the person's face based on the range occupied by the person's face and the imaging angle of view θ in the through image Tp. ) And the directivity of the microphone unit 111 is controlled based on the face angle of view α.

これに限らず制御部１０１は、スルー画像Ｔｐにおいて被写体の占める範囲に基づいてマイクロホン部１１１の指向性を制御するのであれば、この他種々の方法を用いるようにしてもよい。例えば制御部１０１は、スルー画像Ｔｐにおいて人物の顔が占める範囲が広くなるのに応じてマイクロホン部１１１の指向性を広くし、当該範囲が狭くなるのに応じてマイクロホン部１１１の指向性を狭くするようにしてもよい。 Not limited to this, the control unit 101 may use various other methods as long as it controls the directivity of the microphone unit 111 based on the range occupied by the subject in the through image Tp. For example, the control unit 101 increases the directivity of the microphone unit 111 in accordance with the increase of the range occupied by the human face in the through image Tp, and decreases the directivity of the microphone unit 111 in accordance with the decrease of the range. You may make it do.

［５−１１．他の実施の形態１１］
さらに上述した第１の実施の形態では、撮像装置１としてのＤＶＣ１００に、撮像部２としての撮像部１０８と、音声入力部３としてのマイクロホン部１１１及び指向角可変部１１２と、認識部４としての顔認識処理部１１０とを設けるようにした。また撮像装置１としてのＤＶＣ１００に、制御部５としての制御部１０１を設けるようにした。 [5-11. Other Embodiment 11]
Further, in the first embodiment described above, the DVC 100 as the imaging device 1, the imaging unit 108 as the imaging unit 2, the microphone unit 111 and the directivity angle variable unit 112 as the audio input unit 3, and the recognition unit 4 are used. The face recognition processing unit 110 is provided. Further, the control unit 101 as the control unit 5 is provided in the DVC 100 as the imaging device 1.

さらに上述した第２の実施の形態では、撮像装置１０としてのＤＶＣ２００に、撮像部２としての撮像部１０８と、音声入力部３としてのマイクロホン部１１１及び指向角可変部１１２と、認識部４としての顔認識処理部１１０とを設けるようにした。また撮像装置１０としてのＤＶＣ２００に、選択部１１及び制御部１２としての制御部１０１を設けるようにした。 Further, in the second embodiment described above, the DVC 200 as the imaging device 10, the imaging unit 108 as the imaging unit 2, the microphone unit 111 and the directivity angle variable unit 112 as the audio input unit 3, and the recognition unit 4 are used. The face recognition processing unit 110 is provided. Further, the selection unit 11 and the control unit 101 as the control unit 12 are provided in the DVC 200 as the imaging device 10.

本発明はこれに限らず、同様の機能を有するのであれば、上述したＤＶＣ１００又はＤＶＣ２００の各機能部を、他の種々のハードウェアもしくはソフトウェアにより構成するようにしてもよい。 The present invention is not limited to this, and each functional unit of the DVC 100 or DVC 200 described above may be configured by other various hardware or software as long as it has similar functions.

さらに上述した第１の実施の形態では、ＤＶＣ１００に本発明を適用するようにした。これに限らず、指向性を可変なマイクロホンを有する撮像装置であれば、例えばカメラを有するパーソナルコンピュータや携帯型電話機など、この他種々の撮像装置に適用するようにしてもよく、また適用することができる。 Furthermore, in the first embodiment described above, the present invention is applied to the DVC 100. However, the present invention is not limited to this, and any imaging device having a microphone with variable directivity may be applied to various other imaging devices such as a personal computer having a camera and a mobile phone. Can do.

［５−１２．他の実施の形態１２］
さらに上述した実施の形態では、指向角制御処理手順ＲＴ１を実行するためのプログラムを、ＤＶＣ１００のフラッシュメモリ１０２に書き込んでおくようにした。 [5-12. Other Embodiment 12]
Furthermore, in the above-described embodiment, a program for executing the directivity angle control processing procedure RT1 is written in the flash memory 102 of the DVC 100.

これに限らず、このプログラムを例えば記録媒体１１４に記録しておき、ＤＶＣ１００の制御部１０１が、このプログラムを記録媒体１１４から読み出して実行するようにしてもよい。また記録媒体１１４から読み出したプログラムを、フラッシュメモリ１０２にインストールするようにしてもよい。 For example, the program may be recorded on the recording medium 114 and the control unit 101 of the DVC 100 may read the program from the recording medium 114 and execute the program. Further, the program read from the recording medium 114 may be installed in the flash memory 102.

［５−１３．他の実施の形態１３］
さらに本発明は、上述した第１乃至第４の実施の形態と他の実施の形態とに限定されるものではない。すなわち本発明は、上述した第１乃至第４の実施の形態と他の実施の形態の一部または全部を任意に組み合わせた形態、もしくは一部を抽出した形態にもその適用領域が及ぶものである。 [5-13. Other Embodiment 13]
Furthermore, the present invention is not limited to the above-described first to fourth embodiments and other embodiments. That is, the present invention extends to the form in which some or all of the above-described first to fourth embodiments and other embodiments are arbitrarily combined or a part is extracted. is there.

例えば上述した第３の実施の形態と他の実施の形態４とを組み合わせるようにしてもよい。この場合制御部１０１は、顔認識処理の結果、口が認識されなかった場合に、マイクロホン部１１１を無指向性に制御する。 For example, the third embodiment described above and the other embodiment 4 may be combined. In this case, the control unit 101 controls the microphone unit 111 to be non-directional when the mouth is not recognized as a result of the face recognition processing.

本発明は、例えばデジタルビデオカメラなど、マイクロホンを有する撮像装置で広く利用することができる。 The present invention can be widely used in an imaging apparatus having a microphone such as a digital video camera.

１、１０、２０、３０……撮像装置。２、１０８……撮像部、３……音声入力部、４……認識部、５、１２、１０１……制御部、１１……選択部、１００、２００、３００、４００……ＤＶＣ、１１０……顔認識処理部、１１１……マイクロホン部、１１２……指向角可変部、θ……撮像画角、α……顔画角、β……適切指向角。 1, 10, 20, 30... 2, 108 ... Imaging unit, 3 ... Voice input unit, 4 ... Recognition unit, 5, 12, 101 ... Control unit, 11 ... Selection unit, 100, 200, 300, 400 ... DVC, 110 ... ... face recognition processing unit, 111 ... microphone part, 112 ... directivity angle variable part, θ ... imaging field angle, α ... face field angle, β ... appropriate directivity angle.

Claims

An imaging unit for acquiring a captured image;
A voice input unit for inputting voice;
A recognition unit for recognizing a subject in the captured image;
An imaging apparatus comprising: a control unit that controls directivity of the audio input unit based on a range occupied by the subject in the captured image.

A selection unit for selecting an arbitrary subject from the subjects recognized by the recognition unit;
The control unit
The imaging apparatus according to claim 1, wherein in the captured image, a range including all of one or a plurality of subjects selected by the selection unit is detected, and directivity of the audio input unit is controlled based on the range.

The selection part
The subject predicted to emit sound toward the imaging device is selected from the subjects recognized by the recognition unit, and the subject predicted to emit sound toward the imaging device is selected. The imaging device described.

The recognition unit
Recognize the face as the subject, recognize the mouth in the face,
The selection part
The imaging device according to claim 3, wherein the subject whose mouth is recognized by the recognition unit is predicted to be a subject that emits sound toward the imaging device, and the subject is selected.

The selection part
The imaging device according to claim 3, wherein the subject in the captured image that has the largest area occupied by the subject is predicted to be a subject that is producing sound toward the imaging device, and the subject is selected.

The selection part
The imaging device according to claim 2, wherein a subject is selected from subjects recognized by the recognition unit based on a priority set in advance for each of the subjects.

The recognition unit
Recognizing a human face as a subject in the captured image,
The control unit
The imaging device according to claim 1, wherein directivity of the voice input unit is controlled based on a range occupied by the face in the captured image.

The recognition unit
Recognizing a person's mouth as a subject in the captured image,
The control unit
The imaging device according to claim 1, wherein directivity of the voice input unit is controlled based on a range occupied by the mouth in the captured image.

The control unit
Based on the angle of view of the imaging unit when the captured image is acquired and the range occupied by the subject in the captured image, the subject angle of view that is the angle of view of the range occupied by the subject in the angle of view of the imaging unit is determined. The imaging apparatus according to claim 1, wherein the imaging device calculates and controls the directivity of the audio input unit based on the subject field angle.

The control unit
The imaging apparatus according to claim 9, wherein the subject angle of view is associated with a directivity angle within a range that can be controlled by the audio input unit, and the directivity angle of the audio input unit is controlled to be the associated directivity angle.

The imaging device acquires the captured image,
The imaging device recognizes the subject in the captured image,
A directivity control method in which the imaging device controls the directivity of the audio input unit of the imaging device based on a range occupied by the subject in the captured image.

In the imaging device,
An acquisition step of acquiring a captured image;
A recognition step for recognizing a subject in the captured image;
A directivity control program for executing a control step for controlling directivity of a sound input unit of an imaging device based on a range occupied by the subject in the captured image.