JP2011166407A

JP2011166407A - Acoustic setting device and method

Info

Publication number: JP2011166407A
Application number: JP2010026308A
Authority: JP
Inventors: Kei Ito; 圭伊藤
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2010-02-09
Filing date: 2010-02-09
Publication date: 2011-08-25

Abstract

<P>PROBLEM TO BE SOLVED: To provide an acoustic setting device for setting an audio output condition concerning a listening sound output state in response to change in the correlation of a still object with respect to speakers, while detecting the still object arranged in a listening space. <P>SOLUTION: The acoustic system 100 sets the audio output condition related to the output state toward the listening space SP with a plurality of speakers 101 arranged therein concerning the listening sound to be listened by a listener, which is output from the speakers 101. The system includes: an audio output unit for driving the speakers 101, on the basis of an audio signal; a listening space environment detector for detecting the still object existing in the listening space SP; and an audio output condition setter for allowing the audio output condition setting to be the setting corresponding to the change when the listening space environment detector detects the change in the correlation of the in-space existence object with respect to the speakers 101. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、オーディオ機器などのスピーカから聴取空間に向けて出力される聴取者に聴取させるべき聴取音の、前記聴取空間に対するオーディオ出力条件の設定を行なう音響設定装置および音響設定方法に関する。 The present invention relates to a sound setting device and a sound setting method for setting an audio output condition for a listening sound to be heard by a listener output from a speaker such as an audio device to a listening space.

近年、複数のスピーカを配置して、臨場感ある音場を表現する音響システムに関する技術が各種提案されている。このような音響システムでは、特定した聴取者の聴取位置に応じてスピーカ位置、音声フィルタ処理、音圧や出力タイミングなどの聴取音に関するオーディオ出力条件の設定を行なって音場を実現させる音響設定装置を備えている。 In recent years, various technologies related to an acoustic system that arranges a plurality of speakers and expresses a realistic sound field have been proposed. In such an acoustic system, an acoustic setting device that realizes a sound field by setting audio output conditions related to listening sound such as speaker position, sound filter processing, sound pressure and output timing according to the listening position of the specified listener It has.

また、音響システムにカメラを設置し、撮像したフレームに顔が進入した際に、肌色成分を検出しその位置に応じてオーディオ出力条件設定を行なう技術も知られている（例えば、特許文献１参照）。 There is also known a technique in which a camera is installed in an acoustic system and a skin color component is detected and an audio output condition is set according to the position when a face enters a captured frame (see, for example, Patent Document 1). ).

しかしながら、上述の従来の音響設定装置では、まず、使用前に聴取音の聴取空間に対する出力状態に関するオーディオ出力条件設定を、聴取空間の環境に応じて行う必要がある。例えば、５．１ｃｈサラウンドシステムでは、聴取者位置に対するフロント、リア、サブウーハなどの複数のスピーカのそれぞれの距離を聴取者自らが測定しオーディオ出力条件設定を行なう必要があった。
また、近年では聴取者が測定する代わりに、聴取者位置にヘッドセットのような集音装置を事前に設置し、聴取空間における聴取空間環境を確認するものがある。その場合には、聴取者や障害物などの影響が無い環境にしなければならないため、その分、聴取者の手間となってしまうという問題や、集音時に音域毎に確認する必要があるため、設定時間を要するという問題がある。 However, in the above-described conventional sound setting device, first, it is necessary to set an audio output condition regarding the output state of the listening sound with respect to the listening space according to the environment of the listening space before use. For example, in the 5.1ch surround system, it is necessary for the listener himself / herself to measure the distances of a plurality of speakers such as the front, rear, and subwoofer with respect to the listener position and to set the audio output condition.
In recent years, instead of measuring by a listener, a sound collecting device such as a headset is installed in advance at the listener position to check the listening space environment in the listening space. In that case, because it must be in an environment that is not affected by the listener or obstacles, it will be troublesome for the listener, and it is necessary to check each sound range at the time of sound collection, There is a problem that setting time is required.

そして何より、その設定された聴取空間内の環境に変化が生じると、聴取者がそれに応じて各スピーカの位置などを再度設定する必要がある。例えば、聴取空間に配置された椅子が大きさの異なるものに交換され、その設置位置も変更されるといった聴取空間環境の変化が生じた場合、聴取者の位置および音響の反射度合いが変わるため、各スピーカの位置・向きその他の聴取音の聴取空間に対する出力状態に関するオーディオ出力条件を再設定する必要がある。
また、上述の特許文献１に記載の技術は、顔の進入の際にのみ肌色成分に応じてオーディオ出力条件設定を行なうため、上述のような静止物体の大きさや位置などの変動に対応することができず、加えて、聴取者が存在するのに肌色成分（顔）の検出ができなかった場合、聴取空間に対する聴取音の適正なオーディオ出力条件の設定ができなかった。 Above all, when a change occurs in the environment in the set listening space, the listener needs to set the position of each speaker again according to the change. For example, when a change in the listening space environment occurs such that the chair placed in the listening space is replaced with a different one and the installation position is changed, the listener's position and the degree of sound reflection change, It is necessary to reset the audio output conditions regarding the output state of the listening position of each speaker and other listening sound to the listening space.
In addition, since the technique described in Patent Document 1 sets the audio output condition according to the skin color component only when a face enters, it can cope with the change in the size and position of the stationary object as described above. In addition, when the skin color component (face) could not be detected even though there was a listener, it was not possible to set an appropriate audio output condition of the listening sound for the listening space.

本発明は、上述の課題を解決することを目的とするものであり、聴取空間に配置された静止物体を検出しながら静止物体のスピーカに対する相対関係の変化に応じて聴取音の出力状態に関するオーディオ出力条件の設定が可能な音響設定装置および音響設定方法を提供することを目的とする。 An object of the present invention is to solve the above-described problems, and an audio relating to an output state of a listening sound in accordance with a change in a relative relationship between a stationary object and a speaker while detecting a stationary object arranged in the listening space. An object of the present invention is to provide a sound setting device and a sound setting method capable of setting output conditions.

上記目的を達成するために本発明の音響設定装置は、複数のスピーカが設置された聴取空間に向けて、前記スピーカから出力される聴取者に聴取させるべき聴取音の、前記聴取空間に対する出力状態に関するオーディオ出力条件の設定を行なう音響設定装置であって、前記複数のスピーカをオーディオ信号に基づき駆動させるオーディオ出力手段と、前記聴取空間に存在する静止物体を検出する聴取空間環境検出手段と、前記聴取空間環境検出手段が、前記静止物体の前記スピーカに対する相対関係が変化したことを検出したときに、前記変化に対応したオーディオ出力条件に設定するオーディオ出力条件設定手段とを備えていることを特徴とする。 In order to achieve the above object, the sound setting device of the present invention provides an output state of a listening sound to be heard by a listener output from the speaker toward a listening space in which a plurality of speakers are installed. An audio setting device for setting audio output conditions related to audio output means for driving the plurality of speakers based on audio signals, listening space environment detecting means for detecting a stationary object existing in the listening space, and The listening space environment detecting means comprises audio output condition setting means for setting an audio output condition corresponding to the change when detecting that the relative relation of the stationary object to the speaker has changed. And

本発明の音響設定装置では、聴取空間に配置された静止物体を検出しながら静止物体体のスピーカに対する相対関係が変化したときには、聴取音の聴取空間に対する出力状態に関するオーディオ出力条件設定を、相対関係の変化に応じた設定とするため、聴取空間において静止物体とスピーカとの相対関係に変化が生じた場合にも最適なオーディオ出力条件の設定が可能である。 In the acoustic setting device of the present invention, when the relative relationship of the stationary object to the speaker changes while detecting the stationary object arranged in the listening space, the audio output condition setting relating to the output state of the listening sound to the listening space is set to the relative relationship. Therefore, the optimum audio output condition can be set even when the relative relationship between the stationary object and the speaker changes in the listening space.

図１は本発明の実施例１に係る音響設定装置を備えた音響システムの外観を示す斜め正面から見た斜視図である。FIG. 1 is a perspective view illustrating an external appearance of an acoustic system including an acoustic setting device according to a first embodiment of the present invention as viewed from an oblique front. 図２は実施例１の音響設定装置を備えた音響システムの外観を示す平面図である。FIG. 2 is a plan view illustrating an appearance of an acoustic system including the acoustic setting device according to the first embodiment. 図３は実施例１の音響設定装置を備えた音響システムの概要を示すブロック図である。FIG. 3 is a block diagram illustrating an outline of an acoustic system including the acoustic setting device according to the first embodiment. 図４は実施例１の音響設定装置を備えた音響システムに組み込まれているカメラユニットの概要を示すブロック図である。FIG. 4 is a block diagram illustrating an outline of a camera unit incorporated in an acoustic system including the acoustic setting device according to the first embodiment. 図５は実施例１で用いるＲＧＢの画像データにおける細分化されたエリアを示す図である。FIG. 5 is a diagram showing subdivided areas in the RGB image data used in the first embodiment. 図６は実施例１の音響設定装置を備えた音響システムにおけるオーディオ出力条件設定処理の流れを示すフローチャートである。FIG. 6 is a flowchart showing the flow of the audio output condition setting process in the sound system including the sound setting device of the first embodiment. 図７は実施例１の作用説明用の聴取空間ＳＰの一例を示す斜視図である。FIG. 7 is a perspective view showing an example of the listening space SP for explaining the operation of the first embodiment. 図８は図７に示す聴取空間ＳＰに対応して実施例１の音響設定装置が作動した状態を示す斜視図である。FIG. 8 is a perspective view showing a state in which the sound setting device according to the first embodiment is operated corresponding to the listening space SP shown in FIG. 図９は図６のステップＳ６で実行するＡＦ処理の詳細を示すフローチャートである。FIG. 9 is a flowchart showing details of the AF process executed in step S6 of FIG. 図１０は実施例１の作用説明用の聴取空間ＳＰの環境変動例を示す斜視図である。FIG. 10 is a perspective view showing an example of environmental variation of the listening space SP for explaining the operation of the first embodiment. 図１１は実施例１の作用説明図であって、（ａ）は図７に示す聴取空間ＳＰの画像データを示し、（ｂ）はその画像データをＡＦ処理して得られるグルーピングエリアを示している。11A and 11B are diagrams for explaining the operation of the first embodiment. FIG. 11A shows image data of the listening space SP shown in FIG. 7, and FIG. 11B shows a grouping area obtained by AF processing of the image data. Yes. 図１２は実施例１の作用説明図であって、（ａ）は図１０に示す聴取空間ＳＰの画像データを示し、（ｂ）はその画像データをＡＦ処理して得られるグルーピングエリアを示している。12A and 12B are diagrams for explaining the operation of the first embodiment. FIG. 12A shows image data of the listening space SP shown in FIG. 10, and FIG. 12B shows a grouping area obtained by performing AF processing on the image data. Yes. 図１３は実施例１の音響設定装置が、図７に示す聴取空間ＳＰの聴取空間環境に応じてスピーカ１０１，１０１を水平方向に回動させて角度θ１，θ２に設定した状態を示す図である。FIG. 13 is a diagram illustrating a state in which the sound setting device according to the first embodiment rotates the speakers 101 and 101 horizontally to set the angles θ1 and θ2 according to the listening space environment of the listening space SP illustrated in FIG. is there. 図１４は実施例１の音響設定装置が、図１０に示す聴取空間ＳＰの聴取空間環境に応じてスピーカ１０１，１０１を水平方向に回動させて角度θ１，θ２に設定した状態を示す図である。FIG. 14 is a diagram illustrating a state in which the sound setting device according to the first embodiment rotates the speakers 101 and 101 horizontally to set the angles θ1 and θ2 according to the listening space environment of the listening space SP illustrated in FIG. is there. 図１５は実施例２の音響設定装置を備えた音響システムにおけるオーディオ出力条件設定処理の流れを示すフローチャートである。FIG. 15 is a flowchart illustrating the flow of an audio output condition setting process in an acoustic system including the acoustic setting device according to the second embodiment. 図１６は実施例２の作用を説明するための聴取空間ＳＰの一例を示す斜視図である。FIG. 16 is a perspective view showing an example of the listening space SP for explaining the operation of the second embodiment. 図１７は実施例２の作用を説明するための聴取空間ＳＰの環境変動例を示す斜視図である。FIG. 17 is a perspective view showing an example of environmental variation of the listening space SP for explaining the operation of the second embodiment. 図１８は実施例２の作用説明図であって、（ａ）は図１６に示す聴取空間ＳＰの画像データを示し、（ｂ）はその画像データをＡＦ処理して得られるグルーピングエリアを示し、（ｃ）はそのグルーピングエリアにおける顔検知範囲を示している。FIG. 18 is a diagram for explaining the operation of the second embodiment, where (a) shows image data of the listening space SP shown in FIG. 16, (b) shows a grouping area obtained by AF processing of the image data, (C) shows the face detection range in the grouping area. 図１９は実施例２の作用説明図であって、（ａ）は図１７に示す聴取空間ＳＰの画像データを示し、（ｂ）はその画像データをＡＦ処理して得られるグルーピングエリアを示し、（ｃ）はそのグルーピングエリアにおける顔検知範囲を示している。FIG. 19 is a diagram for explaining the operation of the second embodiment. (A) shows the image data of the listening space SP shown in FIG. 17, (b) shows the grouping area obtained by AF processing of the image data, (C) shows the face detection range in the grouping area. 図２０は実施例２の音響設定装置が、図１６に示す聴取空間ＳＰの聴取空間環境に応じてスピーカ１０１，１０１を水平方向に回動させて角度θ１，θ２に設定した状態を示す図である。FIG. 20 is a diagram illustrating a state in which the sound setting device according to the second embodiment rotates the speakers 101 and 101 horizontally to set the angles θ1 and θ2 according to the listening space environment of the listening space SP illustrated in FIG. is there. 図２１は実施例２の音響設定装置の図１７に示す聴取空間ＳＰの聴取空間環境に応じてスピーカ１０１，１０１を水平方向に回動させて角度θ１，θ２に設定した状態を示す図である。FIG. 21 is a diagram illustrating a state in which the speakers 101 and 101 are rotated in the horizontal direction according to the listening space environment of the listening space SP illustrated in FIG. . 図２２は実施例３の音響設定装置を備えた音響システムにおけるオーディオ出力条件設定処理の流れの前半部分を示すフローチャートである。FIG. 22 is a flowchart illustrating the first half of the flow of the audio output condition setting process in the sound system including the sound setting device according to the third embodiment. 図２３は実施例３の音響設定装置を備えた音響システムにおけるオーディオ出力条件設定処理の流れの後半部分を示すフローチャートである。FIG. 23 is a flowchart illustrating the second half of the flow of the audio output condition setting process in the sound system including the sound setting device according to the third embodiment. 図２４は実施例３の作用を説明するための聴取空間ＳＰの一例を示す斜視図である。FIG. 24 is a perspective view showing an example of the listening space SP for explaining the operation of the third embodiment. 図２５は実施例３の音響設定装置が、図２４に示す聴取空間ＳＰの聴取空間環境に応じて作動した例の説明図であって、（ａ）は図２４に示す聴取空間ＳＰの画像データを示し、（ｂ）はその画像データをＡＦ処理して得られるグルーピングエリアを示し、（ｃ）はそのグルーピングエリアにおける顔検知範囲を示している。FIG. 25 is an explanatory diagram of an example in which the sound setting device according to the third embodiment is operated in accordance with the listening space environment of the listening space SP shown in FIG. 24, and (a) is the image data of the listening space SP shown in FIG. 24. (B) shows a grouping area obtained by AF processing of the image data, and (c) shows a face detection range in the grouping area. 図２６は実施例３の音響設定装置が、図２４に示す聴取空間ＳＰの聴取空間環境に応じてスピーカ１０１，１０１を水平方向に回動させて角度θ１，θ２に設定した状態を示す図である。FIG. 26 is a diagram illustrating a state in which the sound setting device according to the third embodiment rotates the speakers 101 and 101 horizontally to set the angles θ1 and θ2 according to the listening space environment of the listening space SP illustrated in FIG. is there. 図２７は実施例４の音響設定装置を備えた音響システムにおけるオーディオ出力条件設定処理の流れの前半部分を示すフローチャートである。FIG. 27 is a flowchart illustrating the first half of the flow of the audio output condition setting process in the sound system including the sound setting device according to the fourth embodiment. 図２８は実施例４の音響設定装置を備えた音響システムにおけるオーディオ出力条件設定処理の流れの後半部分を示すフローチャートである。FIG. 28 is a flowchart illustrating the second half of the flow of the audio output condition setting process in the sound system including the sound setting device according to the fourth embodiment. 図２９は実施例４の音響設定装置の視線検知処理の説明図であり、（ａ）は視線検知処理に用いるテンプレートの一例を示し、（ｂ）はこのテンプレートを用いたマッチング処理例を示している。FIG. 29 is an explanatory diagram of the line-of-sight detection process of the sound setting device according to the fourth embodiment. FIG. 29A illustrates an example of a template used for the line-of-sight detection process, and FIG. 29B illustrates an example of a matching process using the template. Yes. 図３０は実施例４の音響設定装置が聴取空間ＳＰの聴取空間環境に応じて作動した例の説明図であって、（ａ）は聴取空間ＳＰの画像データを示し、（ｂ）はその画像データをＡＦ処理して得られるグルーピングエリアを示し、（ｃ）は視線検知処理前の顔検知範囲を示し、（ｄ）は視線検知処理を経た後の顔検知範囲を示している。FIG. 30 is an explanatory diagram of an example in which the sound setting device according to the fourth embodiment operates in accordance with the listening space environment of the listening space SP, where (a) shows image data of the listening space SP, and (b) shows the image. A grouping area obtained by AF processing of data is shown, (c) shows a face detection range before the eye gaze detection process, and (d) shows a face detection range after the eye gaze detection process. 図３１は実施例４の音響設定装置の作動例の説明図であって、図２４に示す聴取空間ＳＰから装置から向かって右側の聴取者ＭＡが眠った場合の聴取空間環境に応じてスピーカ１０１，１０１を水平方向に回動させて角度θ１，θ２に設定した状態を示す図である。FIG. 31 is an explanatory diagram of an operation example of the sound setting device according to the fourth embodiment. The speaker 101 corresponds to the listening space environment when the right listener MA sleeps from the listening space SP shown in FIG. , 101 is rotated horizontally and is set to angles θ1 and θ2.

以下、本発明を実施するための形態について、添付図面を参照しながら実施例を挙げて説明する。 DESCRIPTION OF EMBODIMENTS Hereinafter, embodiments for carrying out the present invention will be described with reference to the accompanying drawings.

［構成］
まず、図１〜図４に基づいて、実施例１の音響設定装置の構成について説明する。なお、図１は本発明の実施例１に係る音響設定装置を備えた音響システム１００の外観を示す斜め正面から見た斜視図、図２は実施例１の音響設定装置を備えた音響システム１００の外観を示す平面図、図３は音響システム１００の概要を示すブロック図、図４は音響システム１００に組み込まれているカメラユニット１０２の概要を示すブロック図である。 [Constitution]
First, based on FIGS. 1-4, the structure of the acoustic setting apparatus of Example 1 is demonstrated. 1 is a perspective view showing an external appearance of the acoustic system 100 including the acoustic setting device according to the first embodiment of the invention, and FIG. 2 is an acoustic system 100 including the acoustic setting device according to the first embodiment. FIG. 3 is a block diagram showing the outline of the acoustic system 100, and FIG. 4 is a block diagram showing the outline of the camera unit 102 incorporated in the acoustic system 100. As shown in FIG.

音響システム１００は、一対のスピーカ１０１，１０１、カメラユニット１０２、スピーカ１０１を垂直方向の軸を中心に回転させる駆動ステージ１０３、電源スイッチ１０４を有している。 The acoustic system 100 includes a pair of speakers 101, 101, a camera unit 102, a drive stage 103 that rotates the speaker 101 about a vertical axis, and a power switch 104.

図３に示すように、音響システム１００は、外部機器Ｉ／Ｆブロック１１１、Ｄ／Ａ変換ブロック１１２、信号処理ブロック１１３、駆動ブロック１１４、カメラユニット１０２、音響出力ブロック（オーディオ出力手段）３０を備えている。また、カメラユニット１０２は、カメラコントロールブロック２１およびレンズユニットブロック２２を備え、音響出力部３０はアンプブロック３１を備えている。
外部機器Ｉ／Ｆブロック１１１は、テレビやＤＶＤプレーヤなどの外部機器からオーディオ信号を入力するオーディオ入力端子を有している。外部機器Ｉ／Ｆブロック１１１に入力されたオーディオ信号は、信号処理ブロック１１３によりデジタル出力データへ変換する信号処理が施される。信号処理ブロック１１３は、ＤＳＰ（Digital Signal Processor）により構成されており、入力されたデータに対してノイズリダクションや、フィルタ処理などの処理を行なう。 As shown in FIG. 3, the acoustic system 100 includes an external device I / F block 111, a D / A conversion block 112, a signal processing block 113, a drive block 114, a camera unit 102, and an acoustic output block (audio output means) 30. I have. The camera unit 102 includes a camera control block 21 and a lens unit block 22, and the sound output unit 30 includes an amplifier block 31.
The external device I / F block 111 has an audio input terminal for inputting an audio signal from an external device such as a television or a DVD player. The audio signal input to the external device I / F block 111 is subjected to signal processing to be converted into digital output data by the signal processing block 113. The signal processing block 113 is configured by a DSP (Digital Signal Processor), and performs processing such as noise reduction and filter processing on input data.

信号処理ブロック１１３で変換されたデジタル出力データは、Ｄ／Ａ変換ブロック１１２によりアナログデータに変換され、そのアナログデータが音響出力部３０へ入力される。音響出力部３０は、アンプブロック３１により、アナログデータに対して増幅処理を行うことができる。 The digital output data converted by the signal processing block 113 is converted into analog data by the D / A conversion block 112, and the analog data is input to the sound output unit 30. The sound output unit 30 can perform amplification processing on the analog data by the amplifier block 31.

駆動ブロック１１４は、各スピーカ１０１の下部にある駆動ステージ１０３を、それぞれ所定角度に回動可能である。この駆動ブロック１１４および駆動ステージ１０３が、スピーカ１０１，１０１の向きを変更する手段に相当する。 The drive block 114 can rotate the drive stage 103 below each speaker 101 by a predetermined angle. The drive block 114 and the drive stage 103 correspond to means for changing the direction of the speakers 101 and 101.

次に、図４に基づいてカメラユニット１０２の構成を説明する。
カメラユニット１０２は、聴取空間ＳＰ（図７参照）内を撮影して画像データを得る撮像手段であって、カメラコントロールブロック２１とレンズユニットブロック２２とを備えており、本実施例１では、後述する聴取空間環境検出手段としての機能を有する。なお、聴取空間ＳＰとは、図７に示すように、音響システム１００が配置されてスピーカ１０１，１０１によりオーディオ出力が成される空間を指しており、図７では、椅子ＣＨ１が置かれた部屋を一例として示している。 Next, the configuration of the camera unit 102 will be described with reference to FIG.
The camera unit 102 is an imaging unit that captures an image of the listening space SP (see FIG. 7) and obtains image data. The camera unit 102 includes a camera control block 21 and a lens unit block 22, which will be described later in the first embodiment. It functions as a listening space environment detection means. As shown in FIG. 7, the listening space SP refers to a space where the sound system 100 is arranged and audio output is made by the speakers 101, 101. In FIG. 7, the room where the chair CH1 is placed. Is shown as an example.

図４に戻り、カメラコントロールブロック２１は、レンズユニットブロック２２で得られた画像信号の処理や、レンズユニットブロック２２に対して後述するオートフォーカス（以下、ＡＦと表す）処理などを実行させる制御を行う。 Returning to FIG. 4, the camera control block 21 performs control to execute processing of an image signal obtained by the lens unit block 22, auto focus (hereinafter referred to as AF) processing to be described later, and the like. Do.

カメラコントロールブロック２１は、カメラプロセッサ７（以下、単に「プロセッサ７」という）を備えている。
プロセッサ７は、ＣＣＤ１信号処理ブロック７２、ＣＣＤ２信号処理ブロック７６、ＣＰＵブロック７３、ローカルＳＲＡＭ７７、メモリコントローラブロック７１、Ｉ２Ｃブロック７５を備えており、これらは相互にバスラインで接続されている。プロセッサ７の外部には、ＹＵＶ画像データを保存するＳＤＲＡＭ９が配置されていて、プロセッサ７とバスラインによって接続されている。プロセッサ７の外部には、制御プログラムが格納されたＲＯＭ８が配置されていて、プロセッサ７とバスラインによって接続されている。 The camera control block 21 includes a camera processor 7 (hereinafter simply referred to as “processor 7”).
The processor 7 includes a CCD1 signal processing block 72, a CCD2 signal processing block 76, a CPU block 73, a local SRAM 77, a memory controller block 71, and an I2C block 75, which are connected to each other via a bus line. An SDRAM 9 for storing YUV image data is disposed outside the processor 7 and is connected to the processor 7 by a bus line. A ROM 8 storing a control program is arranged outside the processor 7 and is connected to the processor 7 by a bus line.

レンズユニットブロック２２は、鏡胴ユニット５を備えている。鏡胴ユニット５は、ズームレンズ５１ａを有するズーム光学系５１、およびフォーカスレンズ５２ａを有するフォーカス光学系５２を有する。ズーム光学系５１、フォーカス光学系５２は、それぞれズームモータ５１ｂ、フォーカスモータ５２ｂによって駆動されるようになっている。これら各モータ５１ｂ，５２ｂは、プロセッサ７のＣＰＵブロック７３によって制御されるモータドライバ５５によって動作が制御される。ズームモータ５１ｂには、ＤＣモータを使用し、電源スイッチ１０４がＯｎされた場合に焦点距離を３５ｍｍフィルム換算で３５ｍｍ相当の位置へ移動するように制御する。また、フォーカスモータ５２ｂには、ステッピンモータを使用し、フォーカスレンズ５２ａの駆動範囲は、音響システム１００を実際に設置した場合の聴取空間環境を考えて、音響システム１００の正面方向に焦点距離を１ｍ〜５ｍの範囲で移動させるようにしている。このフォーカスレンズ５２ａによる前記範囲の繰り出し量（パルス量）に基づいて、後述するＡＦ処理を実行する。 The lens unit block 22 includes a lens barrel unit 5. The lens barrel unit 5 includes a zoom optical system 51 having a zoom lens 51a and a focus optical system 52 having a focus lens 52a. The zoom optical system 51 and the focus optical system 52 are driven by a zoom motor 51b and a focus motor 52b, respectively. The operation of each of the motors 51b and 52b is controlled by a motor driver 55 controlled by the CPU block 73 of the processor 7. As the zoom motor 51b, a DC motor is used, and when the power switch 104 is turned on, the focal length is controlled to move to a position corresponding to 35mm in terms of 35mm film. Further, a stepping motor is used as the focus motor 52b, and the driving range of the focus lens 52a is set to a focal length of 1 m in the front direction of the acoustic system 100 in consideration of the listening space environment when the acoustic system 100 is actually installed. It is made to move in the range of ~ 5m. An AF process to be described later is executed based on the amount of extension (pulse amount) of the range by the focus lens 52a.

鏡胴ユニット５は、撮像素子であるＣＣＤ１０に被写体像を結ぶ撮影レンズを有する。ＣＣＤ１０は、上記被写体像を画像信号に変換してＦ／Ｅ−ＩＣ６に入力する。Ｆ／Ｅ−ＩＣ６はＣＤＳ回路６１、ＡＤＣ回路６２、Ａ／Ｄ変換器６３を有し、画像信号にそれぞれ所定の処理を施し、デジタル信号に変換してプロセッサ７のＣＣＤ１信号処理ブロック７２に入力する。これらの信号処理動作は、プロセッサ７のＣＣＤ１信号処理ブロック７２から出力されるＶＤ信号（垂直駆動信号）・ＨＤ信号（水平駆動信号）により、タイミングジェネレータ６４を介して制御される。 The lens barrel unit 5 includes a photographic lens that connects a subject image to the CCD 10 that is an image sensor. The CCD 10 converts the subject image into an image signal and inputs it to the F / E-IC 6. The F / E-IC 6 has a CDS circuit 61, an ADC circuit 62, and an A / D converter 63, performs predetermined processing on each image signal, converts it to a digital signal, and inputs it to the CCD 1 signal processing block 72 of the processor 7. To do. These signal processing operations are controlled via the timing generator 64 by a VD signal (vertical drive signal) / HD signal (horizontal drive signal) output from the CCD1 signal processing block 72 of the processor 7.

次に、このカメラユニット１０２のＡＦ動作について説明する。
まず、両レンズ５１ａ，５２ａを通してＣＣＤ１０に入射した光は、電気信号に変換されてアナログ信号のＲ，Ｇ，ＢとしてＣＤＳ回路６１、Ａ／Ｄ変換器６３に送られる。Ａ／Ｄ変換器６３でデジタル信号に変換されたそれぞれの信号は、ＳＤＲＡＭ９内のＹＵＶ変換部（図示省略）でＹＵＶ信号に変換されて、メモリコントローラブロック７１によってフレームメモリに書き込まれる。このＹＵＶ信号は、ＶＤ信号毎に出力され、そのつど更新できるようにしている。また、このＹＵＶ信号は、メモリコントローラブロック７１に読み出されて、画像データの合焦度合いを示すＡＦ評価値、同データの露光状態を示すＡＦ評価値が算出される。 Next, the AF operation of the camera unit 102 will be described.
First, light incident on the CCD 10 through both lenses 51a and 52a is converted into an electric signal and sent to the CDS circuit 61 and the A / D converter 63 as R, G, and B of analog signals. Each signal converted into a digital signal by the A / D converter 63 is converted into a YUV signal by a YUV converter (not shown) in the SDRAM 9 and written into the frame memory by the memory controller block 71. This YUV signal is output for each VD signal so that it can be updated each time. The YUV signal is read by the memory controller block 71, and an AF evaluation value indicating the degree of focus of the image data and an AF evaluation value indicating the exposure state of the data are calculated.

ＡＦ評価値データは、特徴データとしてＣＰＵブロック７３に読み出されて、ＡＦの処理に利用される。ＡＦ評価値（フーリエ係数の高周波成分の画像面上での積分値（平均値））は合焦状態にあるとき、被写体のエッジ部分がはっきりとしているため、高周波成分が一番高くなる。これを利用して、ＡＦによる合焦検出動作時は、それぞれのフォーカスレンズ位置におけるＡＦ評価値を取得して、その極大になる点（ピーク位置）を検出する。また極大になる点が複数あることも考慮に入れ、複数あった場合はピーク位置の評価値の大きさや、その周辺の評価値との下降、上昇度合いを判断し、最も信頼性のある点を合焦位置としてＡＦを実行する。 The AF evaluation value data is read out as feature data to the CPU block 73 and used for AF processing. When the AF evaluation value (the integral value (average value) of the high frequency component of the Fourier coefficient on the image plane) is in focus, the high frequency component is the highest because the edge portion of the subject is clear. By utilizing this, during the focus detection operation by AF, the AF evaluation value at each focus lens position is acquired, and the point (peak position) at which the maximum is obtained is detected. Also, taking into account the fact that there are multiple points that are maximal, if there are multiple points, determine the magnitude of the evaluation value of the peak position, the degree of decrease and increase of the evaluation values around it, and determine the most reliable point. AF is executed as the in-focus position.

また、ＡＦ評価値は、本実施例１では、デジタルＲＧＢ信号内の細分化された複数のエリアにおいてそれぞれ算出する。図５が実施例１で用いるＲＧＢの画像データにおける細分化されたエリアを示している。本実施例１では、水平方向、垂直方向ともに８分割されたエリアを使用している。各エリアは座標をもっており、左上なら水平１、垂直１、右下なら水平８、垂直８としている。なお、この場合のエリア分割数は、これに限定されず、さらに細分化するなどしてもよい。 In the first embodiment, the AF evaluation value is calculated in each of a plurality of subdivided areas in the digital RGB signal. FIG. 5 shows subdivided areas in the RGB image data used in the first embodiment. In the first embodiment, an area divided into eight in both the horizontal direction and the vertical direction is used. Each area has a coordinate, horizontal 1 at the upper left, vertical 1, horizontal 8 at the lower right, and vertical 8. Note that the number of area divisions in this case is not limited to this and may be further subdivided.

実施例１の音響システム１００におけるオーディオ出力条件設定処理の流れを図６のフローチャートを用いて説明する。 The flow of the audio output condition setting process in the acoustic system 100 according to the first embodiment will be described with reference to the flowchart of FIG.

まず、ステップＳ１では、音響システム１００の電源スイッチ１０４がＯｎになっているどうかを確認し、Ｏｎになっていなければ処理を終了し、ＯｎであればステップＳ２に進む。 First, in step S1, it is confirmed whether or not the power switch 104 of the sound system 100 is turned on. If it is not turned on, the process is terminated, and if it is turned on, the process proceeds to step S2.

電源がＯｎの場合に進むステップＳ２では、外部機器Ｉ／Ｆブロック１１１に音響データが受信されているかどうかを確認し、受信されていない場合は処理を終了し、受信されている場合はステップＳ３に進む。 In step S2 that proceeds when the power source is On, it is confirmed whether or not acoustic data is received by the external device I / F block 111. If not, the process ends. If it is received, step S3 is performed. Proceed to

ステップＳ３では、音響システム１００の駆動ステージ１０３がすでに駆動されているかどうかを確認し、駆動していない場合は、ステップＳ６に進み、既に駆動されている場合はステップＳ４に進む。 In step S3, it is confirmed whether or not the drive stage 103 of the acoustic system 100 is already driven. If not, the process proceeds to step S6. If already driven, the process proceeds to step S4.

ステップＳ４では、カメラユニット１０２によって聴取空間ＳＰを撮像し、聴取空間ＳＰ内において環境変動があったかどうかを確認する変動確認処理を行った後、ステップＳ５に進む。なお、実施例１における環境変動とは、聴取空間ＳＰに配置された静止物体を含む空間内に存在する物体のスピーカ１０１，１０１に対する相対関係の変化をいう。また、静止物体とは、聴取空間ＳＰに配置されて静止した物体を指し、図７に一例を示す聴取空間ＳＰでは椅子ＣＨ１を指す。また、環境変動の確認に関しては、方法に関してはいろいろあるが、本実施例１では、カメラユニット１０２において現在のフレームで出力された画像データと、その１つ前のフレームの画像データとの差分による画像変動を用いて確認を行う。この画像変動の演算については後述する。 In step S4, the listening space SP is imaged by the camera unit 102, and after performing a change confirmation process for checking whether or not there has been an environmental change in the listening space SP, the process proceeds to step S5. In addition, the environmental fluctuation | variation in Example 1 means the change of the relative relationship with respect to the speakers 101 and 101 of the object which exists in the space containing the stationary object arrange | positioned in listening space SP. Further, the stationary object refers to an object placed in the listening space SP and stationary, and the chair CH1 in the listening space SP shown as an example in FIG. In addition, although there are various methods for confirming environmental changes, in the first embodiment, the difference between the image data output in the current frame in the camera unit 102 and the image data of the previous frame is used. Check using image variation. This calculation of image fluctuation will be described later.

ステップＳ５では、画面変動があったか否か確認し、画像変動があった場合はステップＳ６に進み、画像変動が無かった場合は、ステップＳ８に進む。以上のように、ステップＳ４、Ｓ５では、静止物体のスピーカ１０１，１０１に対する相対関係の変化の有無を検出するために、空間内に存在する物体のカメラユニット１０２に対する相対変位を検出している。音響システム１００において、この処理を行う部分が聴取空間環境検出手段に相当する。 In step S5, it is confirmed whether or not there is a screen change. If there is an image change, the process proceeds to step S6. If there is no image change, the process proceeds to step S8. As described above, in steps S4 and S5, the relative displacement of the object existing in the space with respect to the camera unit 102 is detected in order to detect whether or not the relative relationship of the stationary object with respect to the speakers 101 and 101 has changed. In the acoustic system 100, the part that performs this processing corresponds to the listening space environment detection means.

ステップＳ６では、ＡＦ処理を実行し、次のステップＳ７に進む。なお、ＡＦ処理は、カメラユニット１０２が有するＡＦ機能を用いて、カメラユニット１０２と聴取空間ＳＰに存在する静止物体との距離を計測する処理である。カメラユニット１０２おいてこの処理を行なう部分が距離検出手段に相当するもので、その詳細については後述する。
ステップＳ７では、ＡＦ処理の結果に基づいて、駆動ブロック１１４により駆動ステージ１０３を駆動させる処理を実行した後、ステップＳ８に進む。この駆動ステージ１０３を駆動させる処理とは、本実施例１では、駆動ステージ１０３を、図示を省略した垂直方向の軸を中心に回転させる処理である。これにより、各スピーカ１０１，１０１の水平方向の向きが変化し、聴取空間ＳＰのオーディオ出力条件設定が変化する。このオーディオ出力条件設定のパラメータは角度であり、図８に示すように、２つあるスピーカ１０１，１０１の正面方向であってカメラユニット１０２の光軸Ｏに沿う方向に対する角度θ１、θ２を決定する。なお、この角度θ１、θ２の演算は、ＣＰＵブロック７３と信号処理ブロック１１３とのいずれかで行なう。
ステップＳ８では、音響出力部３０により外部機器Ｉ／Ｆブロック１１１で得られた外部機器からのオーディオ信号をスピーカ１０１，１０１により再生させる音響出力処理を実行した後、処理を終了する。このステップＳ８の音響出力処理の詳細は後述する。 In step S6, AF processing is executed, and the process proceeds to next step S7. The AF process is a process for measuring the distance between the camera unit 102 and a stationary object existing in the listening space SP using the AF function of the camera unit 102. The part that performs this processing in the camera unit 102 corresponds to the distance detecting means, and details thereof will be described later.
In step S7, a process for driving the drive stage 103 by the drive block 114 is executed based on the result of the AF process, and then the process proceeds to step S8. In the first embodiment, the process of driving the drive stage 103 is a process of rotating the drive stage 103 around a vertical axis (not shown). As a result, the horizontal direction of each speaker 101, 101 changes, and the audio output condition setting of the listening space SP changes. The parameter for setting the audio output condition is an angle. As shown in FIG. 8, the angles θ1 and θ2 with respect to the direction in front of the two speakers 101 and 101 and along the optical axis O of the camera unit 102 are determined. . The calculation of the angles θ1 and θ2 is performed by either the CPU block 73 or the signal processing block 113.
In step S <b> 8, the sound output unit 30 performs sound output processing for reproducing the audio signal from the external device obtained by the external device I / F block 111 using the speakers 101 and 101, and then the processing ends. Details of the sound output processing in step S8 will be described later.

以上のように、ステップＳ７，Ｓ８の処理により、静止物体の変化に対応してスピーカ１０１，１０１の向きを変えることでオーディオ出力条件設定を行っている。音響システム１００において、これらの処理を行う部分がオーディオ環境設定手段に相当する。 As described above, the audio output condition is set by changing the direction of the speakers 101 and 101 in response to the change of the stationary object by the processing of steps S7 and S8. In the acoustic system 100, the part that performs these processes corresponds to the audio environment setting means.

次に、ステップＳ４の変動確認処理について説明を加える。
本実施例１では、変動確認処理に用いる画像変動確認には、ＶＤ信号に同期したタイミングで連続的に取得された画像データをＳＤＲＡＭ９のバッファメモリに記憶し、次に取得された画像データと、輝度差分から算出される積算結果で比較する。例えば、まず、一つ前の画像データをバッファメモリに記憶し、この記憶された画像データと最新のタイミングで取得された画像データとの差分を演算し、この演算ののち、最新のタイミングで取得された画像データを一つ前の画像データに上書きしてバッファメモリに記憶する。そして、この上書きされた最新のタイミングの画像データと次のタイミングで取得された画像データとを用いて、差分演算を繰り返す。 Next, a description will be given of the variation confirmation processing in step S4.
In the first embodiment, the image fluctuation confirmation used for the fluctuation confirmation processing stores the image data continuously acquired at the timing synchronized with the VD signal in the buffer memory of the SDRAM 9, and the next acquired image data, Comparison is made with the integration result calculated from the luminance difference. For example, first, the previous image data is stored in the buffer memory, the difference between the stored image data and the image data acquired at the latest timing is calculated, and then acquired at the latest timing after this calculation. The written image data is overwritten on the previous image data and stored in the buffer memory. Then, the difference calculation is repeated using the overwritten latest image data and the image data acquired at the next timing.

最新のタイミングで取得された画像データとＳＤＲＡＭ９に記憶されている一つ前のタイミングで取得された画像データを用いて行う差分演算処理は、各画像データを構成する各画素の隣接する画素間の輝度差分を水平方向と垂直方向のそれぞれにおいて積算し、さらにそれを１つ前のタイミングで取得された積算結果と比較し、その水平方向の差分結果と垂直方向の差分結果を合算したものから変動確認評価値Ｑを算出する処理である。変動確認評価値Ｑは、ＶＤ信号の発生タイミングごとに算出される。 The difference calculation processing performed using the image data acquired at the latest timing and the image data acquired at the previous timing stored in the SDRAM 9 is performed between adjacent pixels of each pixel constituting each image data. The luminance difference is integrated in each of the horizontal direction and the vertical direction, and is compared with the integration result acquired at the previous timing, and the difference is obtained from the sum of the horizontal difference result and the vertical difference result. This is a process for calculating the confirmation evaluation value Q. The fluctuation confirmation evaluation value Q is calculated for each VD signal generation timing.

変動確認処理における差分演算処理に用いる演算式について説明をする。
最新のタイミングにおいて水平方向の隣接する画素間での輝度差分の積算結果をＨ（ｖ）とすると、その演算式は下記の式（１）で表される。

式（１）において、Ｄ（ｉ，ｊ）はＡＦ処理エリア内の画素の座標を示す。ＨｓｔａｒｔはＡＦ処理エリアの水平開始位置であって、ｍはＡＦ処理エリアの水平範囲である。
また、垂直方向の隣接する画素間での輝度差分の積算結果をＶ（ｈ）とすると、その演算式は下記の式（２）で表される。

式（２）において、Ｄ（ｉ，ｊ）はＡＦ処理エリア内の画素の座標を示す。ＶｓｔａｒｔはＡＦ処理エリアの垂直開始位置であって、ｎはＡＦ処理エリアの垂直範囲である。
上記式（１）と式（２）によって算出されるＨ（ｖ）とＶ（ｈ）と、その一つ前のタイミングで算出された積分結果Ｈ’(ｖ)とＶ’(ｈ)を用いて、その差分の総和をＱ（ｔ）とすると、その演算式は下記の式（３）で表される。

式（３）において、ｔはＶＤタイミングカウントである。 An arithmetic expression used for the difference calculation process in the variation confirmation process will be described.
Assuming that the integration result of the luminance difference between adjacent pixels in the horizontal direction at the latest timing is H (v), the calculation formula is expressed by the following formula (1).

In Expression (1), D (i, j) indicates the coordinates of the pixels in the AF processing area. Hstart is the horizontal start position of the AF processing area, and m is the horizontal range of the AF processing area.
Further, if the integration result of the luminance difference between adjacent pixels in the vertical direction is V (h), the calculation formula is expressed by the following formula (2).

In Expression (2), D (i, j) indicates the coordinates of the pixels in the AF processing area. Vstart is the vertical start position of the AF processing area, and n is the vertical range of the AF processing area.
Using H (v) and V (h) calculated by the above formulas (1) and (2), and integration results H ′ (v) and V ′ (h) calculated at the timing immediately before. When the sum of the differences is Q (t), the calculation formula is expressed by the following formula (3).

In equation (3), t is the VD timing count.

この評価値Ｑ（ｔ）を画像更新タイミング（Ｖｄタイミング）毎に演算し、ステップＳ５では、その差分があらかじめ設定された判定閾値以上であった場合に、環境変動があったと判定する。なお、この判定閾値は、明暗による画像データの変化も想定されるために、ある程度の明暗変化を許容する値に設定しておくことが好ましい。また、本実施例１では、上述のように、画像データのフレーム間差分により静止物体の変動を検知しているがこの限りではなく、ヒストグラムによる差分抽出を行うことで検知したり、あるいは処理速度に問題が無いようであれば、画像差からオプティカルフローを算出したりする他の方法を用いることもできる。 This evaluation value Q (t) is calculated for each image update timing (Vd timing), and in step S5, it is determined that there has been an environmental change if the difference is equal to or greater than a preset determination threshold. Note that the determination threshold value is preferably set to a value that allows a certain level of change in light and dark because a change in image data due to light and dark is also assumed. Further, in the first embodiment, as described above, the variation of the stationary object is detected by the inter-frame difference of the image data. However, the present invention is not limited to this. If there is no problem, other methods for calculating the optical flow from the image difference can be used.

次に、ステップＳ６で実行するＡＦ処理の詳細を図９のフローチャートに基づいて説明する。なお、ＡＦ処理では、測距エリアを、カメラユニット１０２から光軸方向に１ｍの位置から５ｍの位置までの範囲とし、ＡＦ処理の開始時点では、フォーカスレンズ５２ａの焦点を１ｍの距離に配置しておく。
まず、ステップＳ６１では、ＶＤ信号の立ち下がりを検出するまで待ち処理を行い、ＶＤ信号の立ち下がりを検出すると、ステップＳ６２に進む。
ステップＳ６２では、所定パルス数に応じてフォーカスモータ５２ｂを駆動し、フォーカスレンズ５２ａを焦点距離５ｍの位置に向けて移動を開始させた後、次のステップＳ６３に進む。 Next, details of the AF processing executed in step S6 will be described based on the flowchart of FIG. In the AF process, the distance measurement area is a range from a position of 1 m to a position of 5 m in the optical axis direction from the camera unit 102, and the focus of the focus lens 52a is arranged at a distance of 1 m at the start of the AF process. Keep it.
First, in step S61, a wait process is performed until the falling edge of the VD signal is detected. When the falling edge of the VD signal is detected, the process proceeds to step S62.
In step S62, the focus motor 52b is driven according to the predetermined number of pulses to start the movement of the focus lens 52a toward the focal length of 5 m, and then the process proceeds to the next step S63.

ステップＳ６３では、フォーカスレンズ５２ａを移動した後の映像信号を取得し、この映像信号に基づく画像データによってＡＦ評価値を算出し、ステップＳ６４に進む。ステップＳ６４では、フォーカスレンズ５２ａの位置が終了位置（本実施例１では、焦点距離５ｍに相当する位置）まで移動したか否か判定し、終了位置に達した場合はステップＳ６５に進み、終了位置に達していない場合は、ステップＳ６２に戻って、フォーカス駆動とＡＦ評価値取得とを繰り返し行う。 In step S63, a video signal after moving the focus lens 52a is acquired, an AF evaluation value is calculated from image data based on this video signal, and the process proceeds to step S64. In step S64, it is determined whether or not the position of the focus lens 52a has been moved to an end position (a position corresponding to a focal length of 5 m in the first embodiment). If the end position has been reached, the process proceeds to step S65. If not, the process returns to step S62, and focus drive and AF evaluation value acquisition are repeated.

ステップＳ６５では、ピーク位置検出処理を実行し、ステップＳ６６に進む。このピーク位置検出処理では、まず、図５の細分化されたそれぞれのエリアにおける各画素の高周波成分の積分値によりピーク位置を検出する。そして、各エリアにおいて、ピーク位置が同距離と思われる位置をグルーピングし、その範囲が含まれる端のエリア番号を取得する。 In step S65, a peak position detection process is executed, and the process proceeds to step S66. In this peak position detection process, first, the peak position is detected by the integrated value of the high-frequency component of each pixel in each subdivided area of FIG. Then, in each area, the positions where the peak positions are considered to be the same distance are grouped, and the end area number including the range is acquired.

ここで、グルーピングについて説明を加える。
このグルーピングの説明にあたり、聴取空間ＳＰにおける椅子ＣＨ１が取り替えられた場合を例に挙げて説明する。すなわち、音響システム１００が設置されている聴取空間ＳＰにおいて、図７に示すように複数人掛けの椅子ＣＨ１が置かれている状態から、図１０に示すように、一人掛けの椅子ＣＨ２に取り替えられたというように、静止物体の変動が生じた場合のグルーピングの違いについて説明する。 Here, explanation is added about grouping.
In the description of the grouping, a case where the chair CH1 in the listening space SP is replaced will be described as an example. That is, in the listening space SP in which the acoustic system 100 is installed, the chair CH1 for a plurality of people as shown in FIG. 7 is replaced with a chair CH2 for one person as shown in FIG. As described above, the difference in grouping when a stationary object fluctuates will be described.

図１１、図１２がそれぞれ図７および図１０に示す聴取空間ＳＰに対するＡＦ処理を行ったときの画像データとグルーピング結果を示している。すなわち、聴取空間ＳＰが図７に示す状態である場合、図１１（ａ）に示す画像データが得られ、同図（ｂ）において斜線を付したエリアでグルーピングされている。この例では、グルーピングエリアは、水平方向１〜７、垂直方向６〜８のエリアとなっている。一方、聴取空間ＳＰが図１０に示す状態である場合、図１２（ａ）に示す画像データが得られ、同図（ｂ）において斜線を付したエリアでグルーピングされている。この例では、水平方向２〜５、垂直方向６〜８のエリアとなっている。このように聴取空間ＳＰにおいて静止物体が変動する環境変動が生じても、グルーピングを行うことによってこの変動を検出できるとともに、聴取空間ＳＰの空間内存在物体の存在位置を推定することが可能となる。
最後のステップＳ６６では、フォーカスレンズ５２ａの焦点をピーク位置へ移動する処理を行い、ＡＦ処理を終了する。 11 and 12 show image data and grouping results when AF processing is performed on the listening space SP shown in FIGS. 7 and 10, respectively. That is, when the listening space SP is in the state shown in FIG. 7, the image data shown in FIG. 11 (a) is obtained and grouped in the hatched area in FIG. 11 (b). In this example, the grouping area is an area of 1 to 7 in the horizontal direction and 6 to 8 in the vertical direction. On the other hand, when the listening space SP is in the state shown in FIG. 10, the image data shown in FIG. 12 (a) is obtained and grouped in the hatched area in FIG. 12 (b). In this example, the area is 2 to 5 in the horizontal direction and 6 to 8 in the vertical direction. In this way, even if an environmental change in which a stationary object fluctuates occurs in the listening space SP, this change can be detected by performing grouping, and the presence position of the existing object in the listening space SP can be estimated. .
In the final step S66, a process of moving the focus of the focus lens 52a to the peak position is performed, and the AF process is terminated.

次に、図６のフローチャートで説明したステップＳ７のステージ駆動処理について説明を加える。前述のように、ステージ駆動時のパラメータは角度であるが、２つの角度θ１、θ２の演算には、図８に示すように、ＡＦ処理によるグルーピングエリアから、そのエリア番号の水平方向両端位置の検出値を用いる。本実施例１では、各スピーカ１０１，１０１を、水平方向にのみ回動させるため、グルーピングエリアの水平方向両端位置に各スピーカ１０１，１０１の正面が向くように回動させ、グルーピングエリアにおける左右のスピーカ１０１，１０１のオーディオ出力状態が均等になるようにしている。このため、グルーピングエリアの両端に相当する水平エリア位置を、焦点距離３５ｍｍの画角と聴取距離ｄとから算出し、画角端と水平エリア位置との差分Ｌ１，Ｌ２（図１２（ｂ）参照）を算出し、下記式（４）（５）に基づき、この差分Ｌ１，Ｌ２聴取距離ｄとのａｔａｎ(アークタンジェント)により角度θ１，θ２を算出する。
θ１＝ａｔａｎ（Ｌ１／ｄ）・・・（４）
θ２＝ａｔａｎ（Ｌ２／ｄ）・・・（５）
次に、図６のフローチャートで説明したステップＳ８の音響出力処理について説明を加える。
音響出力処理では、デジタル変換されたオーディオ信号に対して、増幅処理などを行い各スピーカ１０１，１０１へと出力する。このとき、聴取位置により出力タイミングを変化させる必要がある場合は、フィルタ処理を行うことにより両スピーカ１０１，１０１における出力タイミングをずらす処理を行ってもよい。その際には、ＡＦ処理にて取得したグルーピングによって推定される静止物体（例えば、椅子ＣＨ１，ＣＨ２）から各スピーカ１０１，１０１までの聴取距離ｄを算出しフィルタをかけることになる。 Next, the stage drive process in step S7 described in the flowchart of FIG. 6 will be described. As described above, the parameter for driving the stage is an angle. However, as shown in FIG. 8, the two angles θ1 and θ2 are calculated from the grouping area by the AF process as shown in FIG. The detection value is used. In the first embodiment, since the speakers 101 and 101 are rotated only in the horizontal direction, the speakers 101 and 101 are rotated so that the front surfaces of the speakers 101 and 101 face the both ends of the grouping area in the horizontal direction. The audio output states of the speakers 101 and 101 are made uniform. Therefore, the horizontal area positions corresponding to both ends of the grouping area are calculated from the angle of view with a focal length of 35 mm and the listening distance d, and the differences L1 and L2 between the angle of view edge and the horizontal area position (see FIG. 12B). ), And the angles θ1 and θ2 are calculated based on the following formulas (4) and (5) based on atan (arc tangent) with the differences L1 and L2 listening distance d.
θ1 = atan (L1 / d) (4)
θ2 = atan (L2 / d) (5)
Next, the sound output process in step S8 described in the flowchart of FIG. 6 will be described.
In the sound output processing, the digitally converted audio signal is subjected to amplification processing or the like and output to the speakers 101 and 101. At this time, if it is necessary to change the output timing depending on the listening position, a process of shifting the output timings of the speakers 101 and 101 may be performed by performing a filter process. In that case, the listening distance d from the stationary object (for example, chairs CH1 and CH2) estimated by the grouping acquired in the AF process to each of the speakers 101 and 101 is calculated and filtered.

次に、実施例１の作用を、音響システム１００が配置された聴取空間ＳＰにおいて、静止物体としての図７に示す椅子ＣＨ１が、図１０に示す椅子ＣＨ２に取り替えられるという変動、すなわち、スピーカ１０１，１０１に対する静止物体の相対関係に変化が生じた場合を例に挙げて説明する。 Next, the operation of the first embodiment is the variation in which the chair CH1 shown in FIG. 7 as a stationary object is replaced with the chair CH2 shown in FIG. 10 in the listening space SP in which the acoustic system 100 is arranged, that is, the speaker 101. , 101, the case where a change occurs in the relative relationship of the stationary object will be described as an example.

音響システム１００では電源スイッチ１０４がＯｎされると、オーディオ出力条件設定処理が継続して実行される。ここで、オーディオ出力条件設定処理の最初の処理の時点では、駆動ステージ１０３が駆動されていないことから、ステップＳ１、Ｓ２、Ｓ３、Ｓ６の流れに基づきＡＦ処理が実行される。このとき、聴取空間ＳＰにおいて、図７に示す位置に２，３人掛け用の椅子ＣＨ１が置かれている場合、ステップＳ６５のピーク位置検出処理により、図１１（ｂ）に示すように、水平方向で１〜７、垂直方向で６〜８の範囲のグルーピングが成される。そして、ステップＳ７のステージ駆動処理により、スピーカ１０１，１０１の正面をグルーピングエリアの水平方向両端に向かせる角度θ１，θ２（図１３参照）が得られ、さらに、両角度θ１，θ２に基づいて、駆動ステージ１０３が回動される。これに伴い両スピーカ１０１，１０１は、駆動ステージ１０３と一体的に回動して図１３に示すように正面が若干外側を向いた状態となり、その後、音響出力処理が実行される。 In the sound system 100, when the power switch 104 is turned on, the audio output condition setting process is continuously executed. Here, since the drive stage 103 is not driven at the time of the first process of the audio output condition setting process, the AF process is executed based on the flow of steps S1, S2, S3, and S6. At this time, when a chair CH1 for two or three persons is placed at the position shown in FIG. 7 in the listening space SP, as shown in FIG. A grouping in the range of 1 to 7 in the direction and 6 to 8 in the vertical direction is formed. Then, by the stage drive process in step S7, angles θ1 and θ2 (see FIG. 13) for directing the front surfaces of the speakers 101 and 101 to both ends in the horizontal direction of the grouping area are obtained. Further, based on both angles θ1 and θ2, The drive stage 103 is rotated. Along with this, both the speakers 101, 101 rotate integrally with the drive stage 103 to be in a state where the front faces slightly outward as shown in FIG. 13, and thereafter the sound output processing is executed.

なお、その後、椅子ＣＨ１がそのまま置かれている場合、ステップＳ５において画面変動有りの判定が成されず、スピーカ１０１，１０１の向きが維持されたままで音響出力処理が実行される。 After that, when the chair CH1 is left as it is, it is not determined in step S5 that the screen has changed, and the sound output process is executed while the orientation of the speakers 101 and 101 is maintained.

音響システム１００がこのように作動した場合、音響システム１００による音場は、椅子ＣＨ１に掛けた聴取者ＭＡ（図１６参照）にとって臨場感に富む最適のオーディオ出力条件に設定される。 When the acoustic system 100 operates in this way, the sound field by the acoustic system 100 is set to an optimal audio output condition that is rich in the presence for the listener MA (see FIG. 16) hung on the chair CH1.

一方、聴取空間ＳＰにおいて、図１０に示す１人掛け用の椅子ＣＨ２に取り替えられて図示のように配置されて環境変動が生じた場合、ステップＳ５において、画面変動が有りと判定されて、再び、ＡＦ処理およびステージ駆動処理が実行され、再度オーディオ出力条件の設定が成される。 On the other hand, in the listening space SP, when it is replaced with the chair CH2 for one person shown in FIG. 10 and is arranged as shown in the figure, an environmental change occurs, it is determined in step S5 that there is a screen change, and again. The AF process and the stage drive process are executed, and the audio output condition is set again.

このとき、ステップＳ６５のピーク位置検出処理により、図１２（ｂ）に示すように、水平方向２〜５、垂直方向６〜８の範囲がグルーピングされる。そして、ステップＳ７のステージ駆動処理により、グルーピングエリアの水平方向両端に対応する図１４に示す角度θ１，θ２が得られる。この例では、角度θ１は、変化していないが、角度θ２が大きく変化し、これに伴って、装置右側に配置された駆動ステージ１０３が回動されて図１４に示すように装置右側に配置されたスピーカ１０１が回動される。 At this time, the range of the horizontal direction 2-5 and the vertical direction 6-8 is grouped by the peak position detection process of step S65, as shown in FIG.12 (b). Then, the angles θ1 and θ2 shown in FIG. 14 corresponding to both ends in the horizontal direction of the grouping area are obtained by the stage driving process in step S7. In this example, the angle θ1 does not change, but the angle θ2 changes greatly, and accordingly, the drive stage 103 arranged on the right side of the apparatus is rotated and arranged on the right side of the apparatus as shown in FIG. The speaker 101 is rotated.

したがって、音響システム１００による音場は、椅子ＣＨ２に掛けた聴取者ＭＡにとって臨場感に富む最適の音場を形成するオーディオ出力条件に設定に変更される。 Therefore, the sound field by the acoustic system 100 is changed to a setting of audio output conditions that form an optimal sound field that is rich in the presence for the listener MA hung on the chair CH2.

以上説明したように、本実施例１では、聴取空間ＳＰに配置された静止物体の存在を検出し、これら静止物体に応じてオーディオ出力条件（音響パラメータ）を設定するとともに、聴取空間ＳＰにおいてスピーカ１０１，１０１に対する静止物体の相対関係に変化（画像変動）が生じた場合には、静止物体の変化に伴ってオーディオ出力条件を変化させ、最適なオーディオ出力を可能とすることができる。
この場合、聴取者ＭＡが自らスピーカ１０１からの距離などを測定する必要が無く、聴取者ＭＡの手間を省くことができる。加えて、聴取空間ＳＰにおける椅子ＣＨ１，ＣＨ２などの他動的な静止物体のスピーカ１０１，１０１に対する相対関係の変化に応じてオーディオ出力条件の設定を変更でき、聴取者ＭＡがオーディオ出力条件の設定のために、このような静止物体など移動させる必要もなく、短時間に手間を掛けることの無い設定が可能となる。 As described above, in the first embodiment, the presence of stationary objects arranged in the listening space SP is detected, audio output conditions (acoustic parameters) are set according to these stationary objects, and the speaker in the listening space SP is set. When a change (image fluctuation) occurs in the relative relationship between the still objects 101 and 101, the audio output condition can be changed in accordance with the change of the still object, thereby enabling optimal audio output.
In this case, it is not necessary for the listener MA to measure the distance from the speaker 101, and the trouble of the listener MA can be saved. In addition, the setting of the audio output condition can be changed according to the change in the relative relationship of the other stationary objects such as the chairs CH1 and CH2 in the listening space SP to the speakers 101 and 101, and the listener MA can set the audio output condition. For this reason, it is not necessary to move such a stationary object, and it is possible to perform settings without taking time and effort.

そして何より、聴取空間ＳＰにおいて静止物体の相対関係の変化が生じても、聴取者ＭＡが再度設定し直すこと無く自動的にオーディオ出力条件設定の変更が成され、聴取者ＭＡの手間をさらに大幅に省くことが可能である。 Above all, even if the relative relationship of the stationary object changes in the listening space SP, the listener MA automatically changes the audio output condition setting without resetting it, which further increases the labor of the listener MA. Can be omitted.

（他の実施例）
以下に、他の実施例の音響環定装置について説明する。
なお、他の実施例を説明するのにあたり、実施例１と同じ構成については同じ符号を付けて説明を省略する。作用についても、実施例１と相違する作用について説明し、実施例１と同じ作用については説明を省略する。 (Other examples)
In the following, acoustic ringing devices of other embodiments will be described.
In the description of the other embodiments, the same components as those in the first embodiment are denoted by the same reference numerals and the description thereof is omitted. Regarding the action, the action different from that of the first embodiment will be described, and the description of the same action as that of the first embodiment will be omitted.

＜実施例２＞
実施例２の音響設定装置は、音響システム１００における処理の内容が実施例１と異なり、聴取空間ＳＰの聴取空間環境を検出する聴取空間環境検出手段に、聴取者ＭＡを検出する聴取者検出手段を加えた例である。 <Example 2>
The sound setting device according to the second embodiment is different from the first embodiment in the content of processing in the sound system 100, and the listener detection unit that detects the listener MA is used as the listening space environment detection unit that detects the listening space environment of the listening space SP. It is an example to which is added.

この実施例２の音響システム１００における処理の流れを図１５のフローチャートに基づいて説明する。なお、このフローチャートにおいて、ステップＳ１〜Ｓ６までは実施例１と同様であるので、実施例１と同じステップ符号を付けて説明を省略し、実施例１と相違する処理ステップについて説明する。 The flow of processing in the acoustic system 100 of the second embodiment will be described based on the flowchart of FIG. In this flowchart, steps S1 to S6 are the same as those in the first embodiment, and therefore, the same step symbols as those in the first embodiment are denoted by the same reference numerals, and the description thereof is omitted.

ステップＳ６のＡＦ処理を終えると、ステップＳ２１に進み、顔検知判定処理を実行し、ステップＳ２２に進む。
このステップＳ２１の顔検知判定処理では、聴取空間ＳＰに顔が存在するか否かに基づいて、聴取者ＭＡが存在するか否かを判定し、聴取者ＭＡが存在する場合は、その顔が存在する位置について再度グルーピングを行う。顔検知方法に関しては、デジタルスチルカメラの技術において、既に、公知となった下記のａ〜ｃに列挙する方法が知られており、本実施例２では、下記の技術のいずれかの方法を用いるものとする。
ａ）テレビジョン学会誌Ｖｏｌ．４９、Ｎｏ．６、ｐｐ．７８７−７９７（１９９５）の「顔領域抽出に有効な修正ＨＳＶ表色系の提案」に示されるように、カラー画像をモザイク画像化し、肌色領域に着目して顔領域を抽出する方法。
ｂ）電子情報通信学会誌Ｖｏｌ．７４−Ｄ−ＩＩ、Ｎｏ．１１、ｐｐ．１６２５−１６２７（１９９１）の「静止濃淡情景画像からの顔領域を抽出する手法」に示されているように、髪や目や口など正面人物像の頭部を構成する各部分に関する幾何学的な形状特徴を利用して正面人物の頭部領域を抽出する方法。
ｃ）画像ラボ１９９１−１１（１９９１）の「テレビ電話用顔領域検出とその効果」に示されるように、動画像の場合、フレーム間の人物の微妙な動きによって発生する人物像の輪郭エッジを利用して正面人物像を抽出する方法。 When the AF process in step S6 ends, the process proceeds to step S21, a face detection determination process is executed, and the process proceeds to step S22.
In the face detection determination process in step S21, it is determined whether or not the listener MA exists based on whether or not a face exists in the listening space SP. If the listener MA exists, the face is determined. Grouping is performed again for existing positions. Regarding the face detection method, the methods listed in the following a to c that have already become known in the digital still camera technology are known, and in the second embodiment, any one of the following methods is used. Shall.
a) Journal of Television Society Vol. 49, no. 6, pp. No. 787-797 (1995) “Proposal of a modified HSV color system effective for face area extraction” is a method of extracting a face area by converting a color image into a mosaic image and paying attention to a skin color area.
b) Journal of the Institute of Electronics, Information and Communication Engineers, Vol. 74-D-II, no. 11, pp. 1625-1627 (1991) “A method for extracting a face region from a still gray scene image”, the geometrical structure of each part constituting the head of a front human figure such as hair, eyes and mouth. Of extracting the head region of a frontal person using various shape features.
c) As shown in “Video phone face area detection and its effect” in Image Lab 1991-11 (1991), in the case of a moving image, the contour edge of a human image generated by a subtle movement of a person between frames is detected. A method of extracting a frontal person image using it.

ステップＳ２２では、ステップＳ２１の顔検知判定処理の結果に基づいて、聴取者ＭＡの有無を判定し、聴取者ＭＡが存在する場合はステップＳ２３に進み、聴取者ＭＡが不在の場合はステップＳ７に進む。 In step S22, the presence / absence of the listener MA is determined based on the result of the face detection determination process in step S21. If the listener MA exists, the process proceeds to step S23. If the listener MA is absent, the process proceeds to step S7. move on.

ステップＳ２３では、再度、顔検知位置でのグルーピング処理によりその範囲を設定する顔検知範囲設定処理を行った後、ステップＳ７に進む。以上のように、実施例２で追加したステップＳ２１〜Ｓ２３の処理を行う部分が、聴取者検出手段に相当する。
なお、ステップＳ７以降の処理は、実施例１と同様であるので説明を省略する。 In step S23, the face detection range setting process for setting the range is performed again by the grouping process at the face detection position, and then the process proceeds to step S7. As described above, the part that performs the processes of steps S21 to S23 added in the second embodiment corresponds to the listener detection unit.
In addition, since the process after step S7 is the same as that of Example 1, description is abbreviate | omitted.

次に、実施例２の作用を説明する。
この作用を説明するのにあたり、聴取空間ＳＰ内の環境変化として、図１６に示すように聴取者ＭＡが椅子ＣＨ１に座った状態から、図１７に示すように聴取者ＭＡが椅子ＣＨ１に寝そべった状態に変化した場合を例に挙げて説明する。すなわち、実施例２では、スピーカ１０１，１０１に対し、静止物体としての椅子ＣＨ１の相対関係には変化がなく一定の状態であるが、聴取者ＭＡの相対関係が変化している。 Next, the operation of the second embodiment will be described.
In describing this action, as a change in the environment in the listening space SP, the listener MA lay down on the chair CH1 as shown in FIG. 17 from the state in which the listener MA sat on the chair CH1 as shown in FIG. A case where the state is changed will be described as an example. That is, in Example 2, the relative relationship of the chair MA1 as a stationary object with respect to the speakers 101 and 101 is not changed and is in a constant state, but the relative relationship of the listener MA is changed.

図１６に示す聴取空間ＳＰの聴取空間環境で、実施例２の音響設定装置の作動が開始されたとする。この場合、まず、ステップＳ１〜Ｓ３、Ｓ６、Ｓ２１、Ｓ２２、Ｓ２３、Ｓ７、Ｓ８の順で処理が実行され、この聴取空間ＳＰの聴取空間環境に応じて駆動ステージ１０３が駆動されてスピーカ１０１の向きが設定された後、音響出力が成される。 It is assumed that the operation of the sound setting device according to the second embodiment is started in the listening space environment of the listening space SP illustrated in FIG. In this case, first, processing is executed in the order of steps S1 to S3, S6, S21, S22, S23, S7, and S8, and the drive stage 103 is driven according to the listening space environment of the listening space SP to After the orientation is set, a sound output is made.

この場合、ステップＳ６のＡＦ処理において図１８（ａ）に示す画像データが得られるとともに、図１８（ｂ）に示すグルーピング結果が得られ、この図に示すように、水平範囲で１〜７、垂直範囲で４〜８の範囲でグルーピングされている。
さらに、本実施例２では、ステップＳ２３の顔検知範囲設定処理により、図１８（ｃ）において斜線で示すように顔が検知されるエリアが設定され、この図示の例では、顔検知範囲が、水平範囲で４〜５、垂直範囲で４〜６の範囲に設定される。 In this case, the image data shown in FIG. 18A is obtained in the AF process in step S6, and the grouping result shown in FIG. 18B is obtained. As shown in FIG. Grouped in the range of 4 to 8 in the vertical range.
Furthermore, in the second embodiment, an area in which a face is detected is set as shown by hatching in FIG. 18C by the face detection range setting process in step S23. In the illustrated example, the face detection range is The horizontal range is set to 4 to 5, and the vertical range is set to 4 to 6.

本実施例２では、ステップＳ７のステージ駆動処理では、実施例１と同様に、顔検知範囲の水平方向両端と画角端との差分Ｌ１，Ｌ２と聴取距離ｄとのａｔａｎ（アークタンジェント）により角度θ１，θ２が算出され、駆動ステージ１０３が駆動される。その結果、両スピーカ１０１，１０１は、図２０に示すように、顔検知範囲の両端を向くように、音響システム１００の正面中央方向を向く。
したがって、音響システム１００では、椅子ＣＨ１の幅方向中央付近に着座した聴取者ＭＡにとって最適の臨場感にあふれた音場が得られる。 In the second embodiment, in the stage driving process in step S7, as in the first embodiment, the difference between L1 and L2 between the horizontal end and the view angle end of the face detection range and the atan (arc tangent) between the listening distance d is used. The angles θ1 and θ2 are calculated, and the drive stage 103 is driven. As a result, as shown in FIG. 20, both speakers 101, 101 face the front center direction of the acoustic system 100 so as to face both ends of the face detection range.
Therefore, in the acoustic system 100, a sound field full of a sense of reality that is optimal for the listener MA seated near the center in the width direction of the chair CH1 can be obtained.

次に、聴取者ＭＡの姿勢が、図１６に示すように椅子ＣＨ１に着座した状態から、図１７に示すように椅子ＣＨ１に寝転がった状態に変動した場合について説明する。 Next, the case where the posture of the listener MA changes from the state of being seated on the chair CH1 as shown in FIG. 16 to the state of lying on the chair CH1 as shown in FIG.

この場合、このような聴取者ＭＡのスピーカ１０１，１０１に対する相対関係の変化が画面変動に現れ、ステップＳ５、Ｓ６、Ｓ２１〜Ｓ２３の流れに基づき、再びＡＦ処理、顔検知範囲設定処理などが実行される。 In this case, such a change in the relative relationship of the listener MA with respect to the speakers 101, 101 appears in the screen fluctuation, and AF processing, face detection range setting processing, etc. are executed again based on the flow of steps S5, S6, S21 to S23. Is done.

このとき、ＡＦ処理では、図１９（ａ）に示す画像データが得られ、さらに、図１９（ｂ）に示すようなグルーピングエリアが得られる。この場合、ＡＦ処理によるグルーピングエリアは、水平方向範囲は、図１８に示す例と同様である。 At this time, in the AF process, the image data shown in FIG. 19A is obtained, and further, a grouping area as shown in FIG. 19B is obtained. In this case, the horizontal range of the grouping area by AF processing is the same as the example shown in FIG.

そこで、本実施例２では、ステップＳ２３の顔検知範囲設定処理によって、図１９（ｃ）に示す顔検知範囲が設定され、この場合、顔検知範囲は、水平範囲で２〜４、垂直範囲で５〜８の範囲と判定され、図１８に示す状態とは範囲が異なっている。 Therefore, in the second embodiment, the face detection range shown in FIG. 19C is set by the face detection range setting process in step S23. In this case, the face detection range is 2 to 4 in the horizontal range and the vertical range. It is determined that the range is 5 to 8, and the range is different from the state shown in FIG.

したがって、ステップＳ７のステージ駆動処理では、この顔検知範囲に基づいて、図２１に示すように、角度θ１，θ２が変更され、駆動ステージ１０３，１０３の回動に基づいてスピーカ１０１，１０１の向きが変更される。
この場合も、両スピーカ１０１，１０１は、顔検知範囲の両端を向き、椅子ＣＨ１に寝転がった聴取者ＭＡにとって最適の臨場感にあふれた音場が得られる。 Therefore, in the stage driving process in step S7, the angles θ1 and θ2 are changed based on the face detection range, as shown in FIG. 21, and the directions of the speakers 101 and 101 are determined based on the rotation of the driving stages 103 and 103. Is changed.
Also in this case, both speakers 101, 101 face both ends of the face detection range, and a sound field full of a sense of reality that is optimal for the listener MA lying on the chair CH1 is obtained.

以上のように、実施例２では、椅子ＣＨ１，ＣＨ２などの静止物体のスピーカ１０１，１０１に対する相対関係の変化に加え、聴取者ＭＡの相対関係の変化に応じたオーディオ出力条件設定が可能となり、聴取者ＭＡに、いっそう臨場感ある音場を提供することが可能である。 As described above, in the second embodiment, in addition to the change in the relative relationship between the stationary objects such as the chairs CH1 and CH2 with respect to the speakers 101 and 101, the audio output condition can be set according to the change in the relative relationship of the listener MA. It is possible to provide the listener MA with a more realistic sound field.

しかも、実施例２では、顔検知だけでなく、実施例１と同様に聴取空間ＳＰに存在する静止物体を含む空間内に存在する物体に対するＡＦ処理も実行している。このため、顔検知時に、その特徴点、例えば目や口といった部位が隠れるなどにより顔検知できなかった場合でも、実施例１と同様に、静止物体（図１６に示す例では椅子）に対するＡＦ処理結果に基づいてオーディオ出力条件設定を行うことが可能である。したがって、顔検知が成されない場合でも、聴取空間ＳＰに適したオーディオ出力条件設定が可能である。 Moreover, in the second embodiment, not only face detection, but also AF processing is performed on an object existing in a space including a stationary object existing in the listening space SP as in the first embodiment. For this reason, even when face detection cannot be performed due to hiding the feature points, for example, parts such as eyes and mouth, at the time of face detection, AF processing for a stationary object (a chair in the example shown in FIG. 16) is performed as in the first embodiment. It is possible to set the audio output condition based on the result. Therefore, even when face detection is not performed, it is possible to set an audio output condition suitable for the listening space SP.

＜実施例３＞
次に、実施例３について説明する。
この実施例３は、聴取空間ＳＰにおいて、聴取者ＭＡの人数を判定し、聴取者ＭＡの人数の変動にも対応できるようにした例である。 <Example 3>
Next, Example 3 will be described.
The third embodiment is an example in which the number of listeners MA is determined in the listening space SP, and the change in the number of listeners MA can be dealt with.

図２２、図２３のフローチャートは、実施例３の処理の流れを示している。なお、実施例３では音響システム１００における処理の流れの一部のみが、実施例２と相違しており、これら実施例２と同様の処理の部分は、実施例１，２と同じステップ符号を付して説明を省略し、相違点についてのみ説明する。
実施例１，２と相違するのは、ステップＳ２２において聴取者ＭＡ有りと判定して顔検知範囲設定処理を行うステップＳ２３に進むまでの間に、ステップＳ３１において顔検知数検索処理を行っている点である。 The flowcharts of FIGS. 22 and 23 show the processing flow of the third embodiment. In the third embodiment, only a part of the processing flow in the acoustic system 100 is different from the second embodiment, and the same processing steps as those in the second embodiment are denoted by the same step codes as those in the first and second embodiments. A description thereof will be omitted, and only differences will be described.
The difference from the first and second embodiments is that the face detection number search process is performed in step S31 until the process proceeds to step S23 in which it is determined in step S22 that the listener MA is present and the face detection range setting process is performed. Is a point.

この顔検知数検索処理は、聴取空間ＳＰにおける聴取者ＭＡの人数を検出するもので、顔検知可能な人数に関しては、カメラユニット１０２内のＣＰＵブロック７３など演算能力などにも影響されるが、少なくとも４人程度は必要と考えられる。 This face detection number search process detects the number of listeners MA in the listening space SP, and the number of faces that can be detected by the face is affected by the calculation capability of the CPU block 73 in the camera unit 102, etc. At least about four people are considered necessary.

次に、ステップＳ３１の顔検知数検索処理にて人数が確定された状態で、再度、顔検知範囲に対してのグルーピング処理を行う顔検知範囲設定処理を行う。この顔検知範囲設定処理自体は、実施例２のステップＳ２３と同様であるが、図２４に示すように、聴取者ＭＡが２人存在する場合は以下のようになる。 Next, face detection range setting processing for performing grouping processing on the face detection range is performed again in a state where the number of people is determined in the face detection number search processing in step S31. This face detection range setting process itself is the same as step S23 in the second embodiment, but as shown in FIG. 24, when there are two listeners MA, the process is as follows.

すなわち、ステップＳ６のＡＦ処理では、図２５（ａ）に示す画像データに基づいてグルーピングが行われ、図２５（ｂ）に示すグルーピング結果が得られる。この場合、グルーピングエリアは、水平範囲で１〜７、垂直範囲で４〜８となっている。これに対し、ステップＳ２３の顔検知範囲設定処理で設定される顔検知範囲は、複数の全ての顔（この例では、２人の顔）が含まれる範囲が設定され、図２５（ｃ）において斜線で示すように、水平範囲で３〜７、垂直範囲で４〜６の範囲が設定される。 That is, in the AF process in step S6, grouping is performed based on the image data shown in FIG. 25A, and the grouping result shown in FIG. 25B is obtained. In this case, the grouping area is 1 to 7 in the horizontal range and 4 to 8 in the vertical range. On the other hand, the face detection range set in the face detection range setting process in step S23 is set to a range including all of a plurality of faces (in this example, two faces). In FIG. As indicated by diagonal lines, a horizontal range of 3 to 7 and a vertical range of 4 to 6 are set.

したがって、ステップＳ７のステージ駆動処理では、実施例１と同様に、顔検知範囲の両端と画角端との差分Ｌ１，Ｌ２と聴取距離ｄとのａｔａｎ（アークタンジェント）により角度θ１，θ２が算出され、駆動ステージ１０３が駆動される。 Therefore, in the stage driving process in step S7, as in the first embodiment, the angles θ1 and θ2 are calculated by the atan (arc tangent) between the differences L1 and L2 between the both ends of the face detection range and the view angle end and the listening distance d. Then, the drive stage 103 is driven.

次に、実施例３の作動例について説明する。
まず、聴取空間ＳＰの聴取空間環境が図１６に示すように、椅子ＣＨ１に１人の聴取者ＭＡが座っている状態であるときには、実施例２の説明と同様に、図１８（ｃ）に示す顔検知範囲が設定され、スピーカ１０１，１０１は、図２０に示すように傾けられる。したがって、椅子ＣＨ１に座った１人の聴取者ＭＡにとって最適なオーディオ出力条件設定を形成する。 Next, an operation example of the third embodiment will be described.
First, when the listening space environment of the listening space SP is in a state where one listener MA is sitting on the chair CH1, as shown in FIG. 16, as in the description of the second embodiment, FIG. The face detection range shown is set, and the speakers 101, 101 are tilted as shown in FIG. Therefore, an optimal audio output condition setting is formed for one listener MA sitting on the chair CH1.

この聴取空間環境から、図２４に示すように、椅子ＣＨ１に２人の聴取者ＭＡが座る環境変動が生じた場合、Ｓ５において画面変動有りと判定されて、ＡＦ処理（Ｓ６）、顔検知判定処理（Ｓ２１）、顔検知数検索処理（Ｓ３１）、顔検知範囲設定処理（Ｓ２３）が実行される。 As shown in FIG. 24, when an environmental change in which two listeners MA sit on the chair CH1 occurs from this listening space environment, it is determined in S5 that there is a screen change, and AF processing (S6), face detection determination Processing (S21), face detection number search processing (S31), and face detection range setting processing (S23) are executed.

これらの処理により、図２５（ｃ）に示すように２人の顔が含まれる顔検知範囲が設定され、前述したようにステージ駆動処理では、顔検知範囲の水平方向端縁と画角端との差分Ｌ１，Ｌ２と、聴取距離ｄとのａｔａｎ（アークタンジェント）により角度θ１，θ２が算出され、駆動ステージ１０３が駆動される。したがって、スピーカ１０１，１０１は、図２０に示すように、中央に傾いた状態から図２６に示すように、若干外側を向くように向きが変更され、これにより、２人の聴取者ＭＡのいずれに対しても均等で最適な音場の形成が可能となる。 By these processes, a face detection range including two faces is set as shown in FIG. 25C. In the stage driving process as described above, the horizontal edge and the angle of view of the face detection range are set. The angles θ1 and θ2 are calculated by the atan (arc tangent) of the differences L1 and L2 and the listening distance d, and the drive stage 103 is driven. Therefore, as shown in FIG. 20, the speakers 101 and 101 are changed in direction from a state in which they are inclined toward the center, as shown in FIG. Therefore, it is possible to form an even and optimal sound field.

以上説明したように、実施例３にあっては、聴取空間ＳＰに複数の聴取者ＭＡが存在する場合には、顔検知された複数の聴取者ＭＡの全てが含まれるような範囲に向けてオーディオ出力することによって、複数の聴取者ＭＡの全てに対して最適な音場を提供することができる。 As described above, in the third embodiment, when a plurality of listeners MA exist in the listening space SP, the range is such that all of the plurality of listeners MA whose faces are detected are included. By outputting audio, an optimal sound field can be provided for all of the plurality of listeners MA.

また、実施例２と同様に、顔検知できなかった場合でも、静止物体に対するＡＦ処理結果に基づいてオーディオ出力条件設定を行って、聴取空間ＳＰに適したオーディオ出力条件設定が可能である。 Similarly to the second embodiment, even when the face cannot be detected, the audio output condition can be set based on the AF processing result for the stationary object, and the audio output condition suitable for the listening space SP can be set.

＜実施例４＞
実施例４の音響設定装置は、聴取者ＭＡが眠った場合に、最適な音響設定を行うことを可能とした例である。 <Example 4>
The sound setting device according to the fourth embodiment is an example in which the optimal sound setting can be performed when the listener MA sleeps.

図２７、図２８のフローチャートは、実施例４の処理の流れを示している。なお、実施例４では音響システム１００における処理の流れの一部のみが、実施例３と相違しており、これら実施例３と同様の処理の部分は、実施例１〜３と同じステップ符号を付して説明を省略し、相違点についてのみ説明する。
実施例３と相違するのは、ステップＳ３１の顔検知数検索処理と、ステップＳ２３の顔検知範囲設定処理の間に、検知人数に応じて視線検知処理を行っている点である。 The flowcharts of FIGS. 27 and 28 show the processing flow of the fourth embodiment. In the fourth embodiment, only part of the processing flow in the acoustic system 100 is different from that in the third embodiment. The same processing steps as those in the third embodiment are denoted by the same step codes as those in the first to third embodiments. A description thereof will be omitted, and only differences will be described.
The difference from the third embodiment is that the line-of-sight detection process is performed according to the number of detected persons between the face detection number search process in step S31 and the face detection range setting process in step S23.

すなわち、ステップＳ３１の顔検知数検索処理にて人数が確定した後に進むステップＳ４１では、検知人数が２人以上であるか否か判定し、検知人数が１人の場合はステップＳ２３の顔検知範囲設定処理に進み、検知人数が２人以上の場合はステップＳ４２に進む。 That is, in step S41, which is performed after the number of persons is determined in the face detection number search process in step S31, it is determined whether or not the number of detected persons is two or more. Proceeding to the setting process, if the number of detected persons is two or more, the process proceeds to step S42.

ステップＳ４２では、顔検知された各顔画像に対して視線検知処理を行う。この視線検知処理は、聴取者ＭＡが開眼している（起きている）か閉眼している（眠っている）かを確認する処理である。この場合、図２９（ａ）に示す目画像テンプレートＴＰを用い、この目画像テンプレートＴＰを拡大縮小させながら顔画像とマッチングを行って画像データＧを検索し目を検出、つまり開眼している状態かどうかを検出する。なお、図２９（ｂ）は目画像テンプレートＴＰのマッチング状態を示している。ここで、顔が検知された聴取者ＭＡの顔画像において開眼が検出されない場合には、この顔は検知されていないものとして聴取者ＭＡから除外する処理を行う。 In step S42, a line-of-sight detection process is performed on each face image whose face has been detected. This line-of-sight detection process is a process for confirming whether the listener MA is open (wakes up) or closed (sleeps). In this case, the eye image template TP shown in FIG. 29A is used, and the eye image template TP is matched with the face image while being enlarged or reduced, the image data G is searched, and the eye is detected, that is, the eye is opened. Detect whether or not. Note that FIG. 29B shows a matching state of the eye image template TP. Here, when the eye opening is not detected in the face image of the listener MA whose face has been detected, a process of excluding the face from the listener MA as being not detected is performed.

したがって、複数の聴取者ＭＡの全てが含まれる顔検知範囲を設定する際に、水平方向で端およびその隣の聴取者ＭＡの開眼が検出されずに聴視者ＭＡから除外された場合は、これらの聴取者ＭＡは、複数の顔画像を含む顔検知範囲に含まれなくなる。 Therefore, when setting the face detection range including all of the plurality of listeners MA, when the eye opening of the listener MA adjacent to the end and the neighbor in the horizontal direction is not detected and excluded from the listener MA, These listeners MA are not included in the face detection range including a plurality of face images.

よって、本実施例４では、図２４に示すように、聴取空間ＳＰに２人の聴取者ＭＡ，ＭＡが存在し、図示のように、２人の聴取者ＭＡ，ＭＡとも開眼している場合は、実施例３と同様に、図２５（ｃ）において斜線で示す範囲のように、２人の聴取者ＭＡ，ＭＡの顔が含まれる顔検知範囲が設定される。また、この場合、駆動ステージ１０３，１０３は、図２６に示す角度θ１，θ２だけ回動し、両スピーカ１０１，１０１は、顔検知範囲の水平方向両端を向くように、中央方向に傾けられる。 Therefore, in the fourth embodiment, as shown in FIG. 24, there are two listeners MA and MA in the listening space SP, and both the listeners MA and MA are open as shown in the figure. As in the third embodiment, a face detection range including the faces of the two listeners MA and MA is set as shown by the hatched area in FIG. In this case, the drive stages 103 and 103 are rotated by angles θ1 and θ2 shown in FIG. 26, and both speakers 101 and 101 are tilted in the center direction so as to face both ends in the horizontal direction of the face detection range.

一方、図３０（ａ）の画像データに示すように、装置正面の右側の聴取者ＭＡが閉眼している場合について説明する。この場合、ＡＦ処理によるグリーピングは、図２５（ｂ）に示す例と同様に、水平範囲で１〜７、垂直範囲で４〜８となり、また、ステップＳ３１の顔検知数検索処理を実行した時点での顔検知範囲は、図２５（ｃ）に示す例と同様に、水平範囲で３〜７、垂直範囲で４〜６のエリアとなる。 On the other hand, as shown in the image data of FIG. 30A, the case where the listener MA on the right side of the front of the apparatus is closed will be described. In this case, the grouping by AF processing is 1 to 7 in the horizontal range and 4 to 8 in the vertical range, as in the example shown in FIG. 25B, and the face detection number search processing in step S31 is executed. The face detection range at the time is an area of 3 to 7 in the horizontal range and 4 to 6 in the vertical range, as in the example shown in FIG.

さらに、ステップＳ４２の視線検知処理を実行し、向かって右側の聴取者ＭＡは、閉眼していることから顔検知範囲から除外するため、ステップＳ２３においける顔検知範囲設定処理により設定される顔検知範囲は、図３０（ｄ）において斜線で示すように、向かって左側の聴取者ＭＡの顔のみが含まれる水平範囲で３〜４、垂直範囲で４〜６のエリアとなる。 Further, the line-of-sight detection process in step S42 is executed, and the right listener MM is excluded from the face detection range because it is closed, so the face set by the face detection range setting process in step S23 The detection range is an area of 3 to 4 in the horizontal range including only the face of the listener MA on the left side, and 4 to 6 in the vertical range, as indicated by hatching in FIG.

したがって、ステップＳ７のステージ駆動処理では、実施例１と同様に、図３１に示すように、顔検知範囲の水平方向両端と画角端との差分Ｌ１，Ｌ２と聴取距離ｄとのａｔａｎ（アークタンジェント）により角度θ１，θ２が算出される。これにより、駆動ステージ１０３が駆動され、装置左側のスピーカ１０１が大きく傾いた状態となる。 Therefore, in the stage driving process in step S7, as in the first embodiment, as shown in FIG. 31, the atan (arc) between the differences L1 and L2 between the horizontal ends of the face detection range and the view angle ends and the listening distance d is obtained. The angles θ1 and θ2 are calculated by (tangent). As a result, the drive stage 103 is driven, and the speaker 101 on the left side of the apparatus is greatly inclined.

よって、実施例４では、聴取者ＭＡが複数存在する場合、目が閉じているかどうかを確認し、目を開けて起きていると思われる聴取者ＭＡのみに最適な音場を形成するオーディオ出力条件を設定することができる。 Therefore, in Example 4, when there are a plurality of listeners MA, it is confirmed whether or not the eyes are closed, and an audio output that forms an optimal sound field only for the listeners MA who are thought to be awake with their eyes open. Conditions can be set.

また、オーディオ出力条件の設定後に、顔が検知されていた聴取者ＭＡが眠るなどして目を閉じた場合、その画像変動の検出で、顔検知範囲設定処理、視線検知処理が行われ、眠った聴取者ＭＡは除外されて顔検知範囲が設定され、この設定に応じてオーディオ出力条件（角度θ１，θ２）の再設定が行われ、起きている聴取者ＭＡにとって最適の音場が形成される。 In addition, after the audio output condition is set, when the listener MA who has detected the face sleeps and closes his eyes, the face detection range setting process and the line-of-sight detection process are performed by detecting the image fluctuation, and the sleeper sleeps. The listener MA is excluded and the face detection range is set, and the audio output conditions (angles θ1 and θ2) are reset according to this setting, and an optimal sound field is formed for the listener MA who is awake. The

以上、本発明の実施の形態について実施例を用いて説明したが、本発明はこうした実施例に限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形および置換を加えることができる。 The embodiments of the present invention have been described using the examples. However, the present invention is not limited to these examples, and various modifications and substitutions may be made without departing from the scope of the present invention. Can do.

例えば、実施例では、各駆動ステージ１０３，１０３を水平方向に回動させているが、垂直方向にも回動させるようにしてもよい。
また、実施例では、ＡＦ処理において、ピーク値をスキャンするいわゆる山登りＡＦの採用例を示したが、距離結果がでるようなもの、例えば、レンズを２段構成にしてステレオ方式による視差画像を用いて距離を算出することができればより高速に聴取空間ＳＰに存在する空間内存在物体との距離を捉えることが可能となる。
また、実施例では、２つのスピーカ１０１，１０１を用いた例を示したが、その数は、この限りでなく、５．１ｃｈなどの３以上のスピーカを用いてもよい。 For example, in the embodiment, the drive stages 103 and 103 are rotated in the horizontal direction, but may be rotated in the vertical direction.
In the embodiment, an example of adopting so-called hill-climbing AF that scans the peak value in the AF processing has been shown. However, a method that produces a distance result, for example, a stereo parallax image using two-stage lenses. Thus, if the distance can be calculated, the distance to the object existing in the space existing in the listening space SP can be captured at a higher speed.
Moreover, although the example which used the two speakers 101 and 101 was shown in the Example, the number is not restricted to this, You may use 3 or more speakers, such as 5.1ch.

また、実施例では、静止物体として椅子ＣＨ１，ＣＨ２を示したが、静止物体としてはこれら椅子に限定されるものではない。 In the embodiments, chairs CH1 and CH2 are shown as stationary objects, but the stationary objects are not limited to these chairs.

また、実施例では、オーディオ出力条件設定処理の実行開始時のオーディオ出力条件の設定も自動的に行うようにした例を示したが、少なくとも、空間内存在物体のスピーカに対する相対関係の変化時に、この変化に応じたオーディオ出力条件設定とするものであれば、最初の設定は、他の要因に基づいて自動設定したり、手動設定したりするものでもよい。 Further, in the embodiment, the example in which the audio output condition is set automatically at the start of the execution of the audio output condition setting process is shown, but at least when the relative relationship of the object existing in the space changes, As long as the audio output condition is set according to this change, the initial setting may be automatically set based on other factors or manually set.

また、実施例では、オーディオ出力条件の設定として、スピーカを回動させるものを示したが、これに限定されるものではなく、聴取音に関するオーディオ出力条件の設定であれば、スピーカ位置、音声フィルタ処理、音圧や出力タイミングなどの聴取音に関する設定を行なうようにしてもよい。 In the embodiment, the audio output condition is set to rotate the speaker. However, the present invention is not limited to this. If the audio output condition related to the listening sound is set, the speaker position and the audio filter are set. You may make it perform the setting regarding listening sound, such as a process, a sound pressure, and an output timing.

また、実施例では、聴取空間環境検出手段における距離検出手段として、撮像手段のオートフォーカス機能を用いたものを適用したが、静止物体との距離を計測できるものであれば、音波や光の反射時間を用いて距離を計測するものなど他の手段を用いてもよい。 In the embodiment, the distance detecting means in the listening space environment detecting means is the one using the autofocus function of the imaging means. However, if the distance from the stationary object can be measured, the reflection of sound waves and light is possible. Other means such as one that measures distance using time may be used.

また、実施例では、静止物体を検出するのにあたり、画像データとして撮像され、ＡＦ処理により距離が計測された物体は全て、静止物体として処理したが、例えば、実施例において画像変動の有無を判定する手段を用いて、あらかじめ設定された画像データ間で変動が生じないものを静止物体とする処理を追加してもよい。 In the embodiment, when detecting a stationary object, all the objects captured as image data and the distances measured by the AF processing are processed as stationary objects. For example, in the embodiment, it is determined whether there is an image variation. By using this means, processing for setting a still object that does not change between preset image data may be added.

本発明は、オーディオ機器、ならびにオーディオ機器を搭載したテレビまたはプロジェクタなどに利用可能である。 The present invention can be used for an audio device and a television or a projector equipped with the audio device.

３０音響出力部（オーディオ出力手段）
１００音響システム（オーディオ出力条件設定手段）
１０１スピーカ
１０２カメラユニット（聴取空間環境検出手段）
ＣＨ１椅子（静止物体）
ＣＨ２椅子（静止物体）
ＳＰ聴取空間 30 Sound output unit (audio output means)
100 sound system (audio output condition setting means)
101 speaker 102 camera unit (listening space environment detection means)
CH1 Chair (stationary object)
CH2 Chair (stationary object)
SP listening space

ＷＯ２００６０５７１３１号公報WO2006057131

Claims

A sound setting device for setting an audio output condition relating to an output state of a listening sound to be heard by a listener output from the speaker toward a listening space where a plurality of speakers are installed.
Audio output means for driving the plurality of speakers based on an audio signal;
A listening space environment detecting means for detecting a stationary object existing in the listening space;
The listening space environment detecting means comprises audio output condition setting means for setting an audio output condition corresponding to the change when detecting that the relative relation of the stationary object to the speaker has changed. A characteristic sound setting device.

2. The listening space environment detection unit includes an imaging unit that obtains image data of the listening space, and detects a change in the relative relationship based on a difference between the image data that fluctuates in time. The sound setting device described in 1.

The listening space environment detecting means divides an entire image data area from the image data into image areas for obtaining image data of the listening space, and relative to the speaker and the stationary object in each area. The sound setting device according to claim 1, further comprising a distance detection unit that acquires distance information.

The sound setting device according to claim 3, wherein the distance detection unit detects the relative distance using an autofocus mechanism of the imaging unit.

The listening space environment detecting means has listener detection means for detecting the listener existing in the listening space by detecting the characteristics of the listener's face using the image data,
The audio output condition setting means corresponds to the change in the audio output condition setting when the listening space environment detecting means detects a change in a relative relationship with respect to the speaker in either the stationary object or the listener. The sound setting device according to any one of claims 2 to 4, wherein the setting is performed.

The listener detection unit detects, when a plurality of listeners are detected, detection of an eye opening of a face image of the listener, and excludes a listener of a face image of an unopened eye from the listener. Item 6. The acoustic setting device according to Item 5.

The sound setting device according to any one of claims 1 to 6, wherein the audio output condition setting means includes means for changing the direction of the speaker with respect to the listening space.

The listening space environment detecting means includes imaging means for obtaining image data of the listening space, and detects the in-focus position in each area by equally dividing the image data into a plurality of areas vertically and horizontally. Are grouped, the coordinate positions of both ends of the grouped area in the horizontal direction are specified, and the direction of the speaker is set based on the horizontal coordinate position and the distance to the stationary object. The sound setting device according to claim 7.

A sound setting method for setting an audio output condition related to an output state of a listening sound to be heard by a listener output from the speaker toward a listening space where a plurality of speakers are installed.
An audio output step of driving the plurality of speakers based on an audio signal;
A listening space environment detection step of detecting a stationary object present in the listening space;
An audio output condition setting step for setting an audio output condition corresponding to the change when detecting that the relative relation of the stationary object to the speaker has changed in the listening space environment detection step; A characteristic sound setting method.

The listening space environment detection step includes an imaging step of obtaining image data of the listening space, and divides the entire image data area from the image data into subdivided areas, and relative to the speaker and the stationary object in each area The acoustic setting method according to claim 9, further comprising a distance detection step of acquiring distance information.

The listening space environment detection step includes a listener detection step of detecting the listener existing in the listening space by detecting the feature of the listener's face using the image data,
The audio output condition setting step includes setting an audio output condition corresponding to the change when detecting that the relative relationship with respect to the speaker has changed in either the stationary object or the listener. The sound setting method according to claim 10.

In the listener detection step, when a plurality of listeners are detected, opening of the listener's face image is detected, and listeners of the unopened face image are excluded from the listener. Item 12. The sound setting method according to Item 11.