JP2004514359A

JP2004514359A - Automatic tuning sound system

Info

Publication number: JP2004514359A
Application number: JP2002543259A
Authority: JP
Inventors: トライコヴィッチ，ミロスラフ; ギュッタ，スリニヴァス; コルメナレツ，アントニオ
Original assignee: Koninklijke Philips Electronics NV
Current assignee: Koninklijke Philips NV
Priority date: 2000-11-16
Filing date: 2001-11-14
Publication date: 2004-05-13
Also published as: WO2002041664A3; WO2002041664A2; EP1393591A2

Abstract

２以上のスピーカを通じて音を出力する音発生システムである。２以上のスピーカの夫々の音出力は、２以上のスピーカの位置に対するユーザの位置に基づいて調整される。システムは、視聴領域において学習可能であり画像認識ソフトウエアを有する処理部に結合される（ビデオカメラといった）少なくとも１つの画像捕捉装置を含む。処理部は、画像捕捉装置によって発生される画像中でユーザを識別するために画像認識ソフトウエアを使用する。処理部はまた、画像中のユーザの位置に基づいてユーザの位置の少なくとも１つの測定値を発生するソフトウエアを有する。This is a sound generation system that outputs sound through two or more speakers. The sound output of each of the two or more speakers is adjusted based on the position of the user with respect to the positions of the two or more speakers. The system includes at least one image capture device (such as a video camera) coupled to a processor that is learnable in the viewing area and has image recognition software. The processing unit uses image recognition software to identify a user in the image generated by the image capture device. The processing unit also has software for generating at least one measurement of the position of the user based on the position of the user in the image.

Description

【０００１】
［発明の分野］
本発明は、ステレオシステム、テレビジョンオーディオシステム、及びホームシアターシステムといった音響システムに関連する。特に、本発明は音響システムを調整するシステム及び方法に関連する。
【０００２】
［発明の背景］
聴取者（「ユーザ」）の位置に基づいて種々の音響システムの出力を調整する幾つかのシステムが知られている。例えば、英国特許出願ＧＢ２，２２８，３２４号は、聴取者に対するステレオ効果を維持するためにユーザが動くにつれステレオシステムのバランスを調整するシステムについて記載している。ユーザによって持ち運ばれる発信器は、２つのステレオスピーカに隣接した２つの別々の受信器へ信号を発する。発せられる信号は、超音波信号、赤外線信号、又は無線信号でありえ、開始信号に応じて発せられうる（有線の電気信号であってもよい）。システムは、ユーザとスピーカとの間の距離を決定するために、（スピーカに隣接する）受信器が発信器からの信号を受信するまでの時間を使用する。ユーザと２つのスピーカの夫々との間の距離はこのように計算される。音は音源からの距離の３乗で減少するという原理に基づき、システムは、各スピーカからユーザへ略等しい音強度が与えられるよう各スピーカを調整するために各スピーカとユーザとの間の距離を使用する。
【０００３】
ＧＢ２，２２８，３２４号は、各スピーカからのユーザの距離が重なり合う点を決定することによってユーザの位置を決定するシステムについて記載しているが、位置の決定はステレオバランスの調整には必要でないとしている。
【０００４】
日本国公開特許英文抄録第５−１３７２００号は、テレビジョンの正面に対する５つの角度的なゾーンのうちの１つにいる聴取者の位置を各ゾーンにある別々の赤外線検出器を指すことにより検出する。テレビジョン画面の側面に位置するステレオスピーカのバランスは聴取者がいるゾーンに基づいて調整されると述べられている。
【０００５】
日本国公開特許英文抄録第４−１３０９００号は、聴取者と２つの発光・光検出部との間の距離を計算するために光伝送にかかる時間を使用する。ユーザと２つの部分との間の距離と、２つの部分間の距離は、聴取者の位置を計算し、オーディオ信号のバランスを調整するために使用される。
【０００６】
同様に、日本国公開特許英文抄録第７−３０２２１０号は、視聴位置と一連のスピーカとの間の距離を測定し、スピーカと視聴位置との間の距離に基づいて各スピーカについての適切な遅延時間を調整するために赤外線信号を使用する。
【０００７】
［発明の概要］
従来技術のシステムの１つの明らかな困難性は、ステレオシステムのバランスの自動調整を楽しむためには、（ＧＢ２，２２８，３２４号のように）ユーザが発信器を装着するか持ち運ぶこと、又は、そうでなければ、聴取者の位置の検出の信頼性が低い及び／又は粗い（赤外線センサといった）センサに依存することが必要とされることである。例えば、赤外線検出器を使用すると聴取者の検出に失敗する場合があり、結果として上述のシステムはユーザの位置に対する正しいバランス調整に失敗する。更に、センサによって他の人々（又はペットといった他のもの）が感知されることがあり、その場合は結果として聴取者ではなく他の人物又は他の物に対してバランス調整がされてしまう。
【０００８】
更に、上述のシステムは、例えばホームシアターシステムといった単純なステレオシステムよりも複雑な音響システムにはあまり適していない。ホームシアターシステムは、一般的には、音響効果を含む音を聴取者に対して投射するために使用される部屋の周りに配置される多数のスピーカを有する。音は、単純にスピーカの間で「バランス調整」されるのではない。むしろ、特定のスピーカ位置の出力は、自分の場所にいる聴取者に対して与えられるべき音響効果に基づいて上下されるか又は他の方法で整合される。例えば、多数のスピーカのうちの２つのスピーカは、特定の音響効果を聴取者の位置にいる聴取者へ与えるために、位相を合わせて又は位相をずらして駆動されうる。
【０００９】
従って、聴取者の位置に対する多数のスピーカの夫々の位置の正確な決定は、幾つかのエンターテインメント体験のためには非常に重要である。更に、多数のスピーカの必要とされる出力を聴取者の変化した位置又は変化している位置に対して調整するために、聴取者の位置をより高い信頼性で正確に決定することが必要とされる。
【００１０】
従って、本発明は、システムの聴取者又はユーザの位置に対して自動的に調整しうる、ユーザの位置の変化を含む、音響システム（オーディオビジュアルシステムを含む）を提供する。システムは、人間の体（例えばユーザ）の輪郭の幾つか又は一部を認識する画像捕捉及び認識を用いる。視野の中のユーザの位置に基づいて、システムはユーザの位置情報を決定する。システムの１つの実施例では、例えば、ユーザの角度的な位置は、撮像捕捉装置の視野の中のユーザの画像の位置に基づいて決定され、システムは決定された角度に基づいて２つ以上のスピーカの出力を調整しうる。
【００１１】
画像捕捉装置は、例えば、人間の体の形状の全て又は一部を認識するようプログラムされる画像認識ソフトウエアを有する制御ユニット又はＣＰＵに接続されるビデオカメラでありうる。人間の体といった活動的な輪郭を検出し追跡する種々の方法が開発されてきた。例えば、人間の体（又は例えば頭部又は手）をビデオ画像中で見つけ追跡する「ｐｅｒｓｏｎｆｉｎｄｅｒ」は、ここに参照として組み入れられるＩＥＥＥＴｒａｎｓａｃｔｉｏｎｓｏｎＰａｔｔｅｒｎＡｎａｌｙｓｉｓａｎｄＭａｃｈｉｎｅＩｎｔｅｌｌｉｇｅｎｃｅ，ｖｏｌ．１９，ｎｏ．７，ｐｐ７８０−８５（Ｊｕｌｙ１９９７）で発表されたＭ．Ｉ．Ｔ．ＭｅｄｉａＬａｂｏｒａｔｏｒｙＰｅｒｃｅｐｔｕａｌＣｏｍｐｕｔｉｎｇＳｅｃｔｉｏｎＴｅｃｈｎｉｃａｌＲｅｐｏｒｔＮｏ．３５３のＷｒｅｎ外による”Ｐｆｉｎｄｅｒ：Ｒｅａｌ−ＴｉｍｅＴｒａｃｋｉｎｇｏｆｔｈｅＨｕｍａｎＢｏｄｙ”に記載されている。
【００１２】
テンプレートマッチング法を用いた画像中の人（歩行者）の検出は、ここに参照として組み入れられるＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＥｕｒｏｐｅａｎＣｏｎｆｅｒｅｎｃｅｏｎＣｏｍｐｕｔｅｒＶｉｓｉｏｎ，２０００（ｗｗｗ．ｇｒａｖｉｌａ．ｎｅｔで入手可能）のＤ．Ｍ．Ｇａｖｒｉｌａ（ＩｍａｇｅＵｎｄｅｒｓｔａｎｄｉｎｇＳｙｓｔｅｍｓ，ＤａｉｍｌｅｒＣｈｒｙｓｌｅｒＲｅｓｅａｒｃｈ）による”ＰｅｄｅｓｔｒｉａｎＤｅｔｅｃｔｉｏｎＦｒｏｍＡＭｏｖｉｎｇＶｅｈｉｃｌｅ”に記載されている。
【００１３】
画像中の静的な対象の検出のための統計的サンプリングアルゴリズム及び対象の動きの検出の確率論的モデルは、ここに参照として組み入れられるＩｎｔ．Ｊ．ＣｏｍｐｕｔｅｒＶｉｓｉｏｎ，ｖｏｌ．２９，１９９８（ｗｗｗ．ｄａｉ．ｅｄ．ａｃ．ｕｋ／ＣＶｏｎｌｉｎｅ／ＬＯＣＡＬ／ＣＯＰＩＥＳ／ＩＳＡＲＤ１／ｃｏｎｄｅｎｓａｔｉｏｎ．ｈｔｍｌで”Ｃｏｎｄｅｎｓａｔｉｏｎ”ソースコードと共に入手可能）のＩｓａｒｄ及び及びＢｌａｃｋ（ＯｘｆｏｒｄＵｎｉｖ．Ｄｅｐｔ．ｏｆＥｎｇｉｎｅｅｒｉｎｇＳｃｉｅｎｃｅ）による”Ｃｏｎｄｅｎｓａｔｉｏｎ−ＣｏｎｄｉｔｉｏｎａｌＤｅｎｓｉｔｙＰｒｏｐａｇａｔｉｏｎＦｏｒＶｉｓｕａｌＴｒａｃｋｉｎｇ”に記載されている。
【００１４】
或いは、制御ユニット又はＣＰＵは、人間の頭部の輪郭、又は、特定のユーザの顔の輪郭を認識するようプログラムされうる。画像（ディジタル画像を含む）中の顔を認識しうるソフトウエアは市販されており、例えばＶｉｓｉｏｎｉｃｓ社から販売されｗｗｗ．ｆａｃｅｉｔ．ｃｏｍに記載されている「ＦａｃｅＩｔ」のようなものがある。人間の体、顔等を検出するために使用されうるアルゴリズムを組み込んだソフトウエアは、一般的に、以下の説明では、画像認識ソフトウエア、画像認識アルゴリズム等と称するものとする。カメラの視野に対する認識された体又は頭の位置は、例えば、カメラに対するユーザの位置の角度を決定するために使用されうる。決定された角度は、各スピーカによってユーザの位置へ与えられる音の出力及び音響効果をバランス調整又は他の方法で調整するために使用されうる。
【００１５】
人間の体又は特定の顔の輪郭を識別する画像捕捉装置及び関連する画像感知ソフトウエアは、ユーザの検出をより正確で信頼性の高いものとする。
【００１６】
重なり合う視野を有する２以上のかかるプログラムされた画像捕捉装置は、ユーザの位置を正確に決定するために使用されうる。例えば、上述のような２つの別個のカメラは、別々に配置されてもよく、夫々が基準座標系におけるユーザの位置を決定するために使用されうる。ユーザの位置は、例えば、ユーザの現在の場所と基準座標系における各スピーカの固定の（既知の）位置との間の距離を決定するために、また、ホームシアターシステムにおける音響効果のようにユーザの位置に対して正しいオーディオミックスを与えるためにスピーカ出力を正しく調整するために、音響システムによって使用されうる。
【００１７】
従って、一般的に、本発明は２以上のスピーカを通じて音を出力する音発生システムを含む。２以上のスピーカの夫々の音出力は、２以上のスピーカの位置に対してユーザの位置に基づいて調整可能である。システムは、聴取領域に対して学習可能であり、画像認識ソフトウエアを有する処理部に結合される少なくとも１つの（ビデオカメラといった）画像捕捉装置を含む。処理部は、画像捕捉装置によって発生された画像中でユーザを認識するために画像認識ソフトウエアを使用する。処理部はまた、画像中のユーザの位置に基づいてユーザの位置の少なくとも１つの測定値を発生するソフトウエアを有する。
【００１８】
［詳細な説明］
図１を参照するに、ユーザ１０はホームシアターシステムのオーディオコンポーネント及びビジュアルコンポーネントの間にいるものとして示される。ホームシアターシステムは、ビデオディスプレイ画面１４と、ディスプレイ画面１４のための快適な視聴領域の周囲を囲む一連のオーディオスピーカ１８ａ−ｅとを含む。システムは更に、図１中はディスプレイ画面１４の上に載るものとして示される制御ユニット２２を含む。もちろん、制御ユニット２２はどの場所に置かれてもよく、ディスプレイユニット１４自体の中に組み込まれてもよい。制御ユニット２２、ディスプレイ画面１４、及びスピーカ１８ａ−ｅは、全て、電気的なワイヤ及びコネクタによって電気的に接続される。ワイヤは一般的には室内のカーペットの下側を通されるか又は隣接する壁の中を通されるため、図１には示さない。
【００１９】
図１のホームシアターシステムは、ディスプレイ画面１４から視覚出力を生成しスピーカ１８ａ−ｅから対応する音出力を生成する電気的コンポーネントを含む。ホームシアター出力のためのオーディオ及びビデオ処理は、一般的には、プロセッサ、メモリ、及び関連する処理ソフトウエアを含みうる制御ユニット２２の中で生ずる。このような制御ユニット及び関連する処理コンポーネントは公知であり種々の市販の形態で入手可能である。制御ユニット２２へ与えられるオーディオ及びビデオ入力は、テレビジョン信号、ケーブル信号、衛星信号、ＤＶＤ及びＶＣＲから入来しうる。制御ユニット２２は、図１Ａに示すように、入力信号を処理し、ディスプレイ画面１４の駆動回路への適当な信号を与え、結果としてビデオ表示を生じさせ、また、入力信号を処理し、スピーカ１８ａ−ｅへの適当な駆動信号を与える。
【００２０】
制御ユニット２２への信号入力のオーディオ部分は、立体音響信号であってもよく、又は、制御ユニット２２によって処理される音響効果といったより複雑な音響処理をサポートしてもよい。例えば、制御ユニット２２は、ディスプレイの右側の部分を通過する車をまねるために重なり合うシーケンスでスピーカ１８ｂ，１８ｃ，１８ｄを駆動しうる。各スピーカ１８ｂ，１８ｃ，１８ｄの振幅及び位相は、制御ユニット２２により受信されたオーディオ信号に基づいて、また、制御ユニット２２のメモリに格納されるようにユーザ１０に対するスピーカ１８ｂ，１８ｃ，１８ｄの位置に基づいて駆動される。
【００２１】
制御ユニット２２は、例えば図１中で原点Ｏと単位ベクトル（ｘ，ｙ，ｚ）によって定義されるような共通基準系に対するスピーカ１８ａ−ｅの位置及びユーザ１０の位置を受信し記憶しうる。基準座標系における各スピーカ１８ａ−ｅ及びユーザ１０のｘ，ｙ及びｚ座標は、物理的に測定されるか他の方法で決定され、制御ユニット２２へ入力される。
【００２２】
図１中のユーザ１０の位置は、基準座標系において座標（Ｘ_Ｐ，Ｙ_Ｐ，Ｚ_Ｐ）を有するとして示される。一般的には基準座標系は、図１に示される以外の場所に配置されうる。（以下更に詳述するように、図１に示す基準座標系は、本発明によるユーザ１０の自動測位を容易とするために、カメラの位置にあるよう選択される）。基準座標系におけるスピーカ１８ａ−ｅ及びユーザ１０の座標が制御ユニット２２によって受信されると、制御ユニット２２は代わりに座標を内部基準座標系へ平行移動しうる。
【００２３】
かかる共通基準座標系におけるユーザ１０及びスピーカ１８ａ−ｅの位置は、制御ユニット１０が各スピーカ１８ａ−ｅに対するユーザ１０の位置を決定することを可能とする。（ユーザ１０の座標をスピーカ１８ａの座標から差し引くことにより、基準座標系におけるそれらの相対的な位置が決まることは周知である）。制御ユニット２２の中のソフトウエアは、受信されたオーディオ信号と、スピーカに対するユーザ１０の位置とに基づいて各スピーカの音出力（例えば音量、周波数、位相）のための駆動信号を電気的に調整する。ユーザ１０に対するスピーカ１８ａ−ｅの相対的な位置に基づく制御ユニット２２による音出力の電子的な調整は、従来技術で周知である。或いは、制御システムは、ユーザが各スピーカ１８ａ−ｅの音出力を手動で調整することを可能としても良い。このような制御ユニット２２を介したオーディオコンポーネントの手動制御もまた従来技術で周知である。いずれの場合も、入力は、制御ユニット２２と無線でインタフェース接続されディスプレイ画面１４上にメニューを投写する、例えば位置データの入力を可能とする遠隔制御器によって与えられうる。
【００２４】
図１に示すホームシアターシステムは、また、基準座標系におけるユーザとユーザの位置を自動的に識別しうる。上述の説明では、原点Ｏに置かれる基準座標系におけるユーザ１０及びスピーカ１８ａ−ｅの位置は、例えばユーザによって与えられる手動入力に基づいて知られていると仮定される。ユーザ１０の位置が知られていないか変化するとき、又は、ユーザの位置の自動的な検出及び決定が他の方法で所望であるとき、スピーカ１８ａ−ｅの位置は、通常は配置された後は固定されたままであるため、制御ユニット２２によって通常は既知のままである。従って、基準座標系におけるスピーカ１８ａ−ｅの位置は、初期システムセットアップ中に制御システム２２へ夫々手動で入力され、一般的にはその後は固定されたままである。（もちろんスピーカの位置は変更されえ、新しい位置が入力されうるが、これはシステムの通常の使用では生じない）。ユーザの位置がシステムによって自動的に決定されると、以下詳述するように、制御ユニット２２は、上述のように手動で位置を入力する場合のように、ユーザ及びスピーカ１８ａ−ｅの位置に基づいて各スピーカ１８ａ−ｅへの音出力を調整する。
【００２５】
図１中のユーザ１０が存在するか否かを自動的に検出し、存在する場合はその位置を検出するために、システムは更にディスプレイ画面１４の上に載せられ、ディスプレイ画面１４の通常の視聴領域に向けられる２つのビデオカメラ２６ａ，２６ｂを含む。カメラ２６ａは、共通基準座標系の原点Ｏに配置される。以下の説明から明らかであるように、ビデオカメラ２６ａ，２６ｂは他の場所に配置されうる；基準座標系はカメラ２６ａの異なる場所又は他の場所へ場所を変更されうる。ビデオカメラ２６ａ，２６ｂは、制御ユニット２２とインタフェース接続され、視聴領域中で捕捉される画像を制御ユニット２２に与える。画像認識ソフトウエアは制御ユニット２２にロードされ、カメラ２６ａ，２６ｂから受信されるビデオ画像を処理するためにその中のプロセッサによって処理される。画像認識に使用される制御ユニット２２のメモリを含むコンポーネントは、別々であるか、図１Ａに示すように、制御ユニット２２の他の機能と共用される。或いは、画像認識は別個のユニットにおいて行われうる。
【００２６】
図２Ａは、図１のディスプレイ画面の一方の側のカメラ２６ａの視野内の画像を示す図である。図２Ａの画像は、制御ユニット２２へ送信され、制御ユニット２２において例えばその中にロードされた周知の画像認識ソフトウエアを用いて処理される。画像認識アルゴリズムは、ユーザ１０といった人間の体の輪郭を認識するために使用されうる。或いは、顔を認識する、又は、例えばユーザ１０の顔といった特定の１以上の顔を認識するようプログラムされうる画像認識ソフトウエアが使用されうる。
【００２７】
画像認識ソフトウエアが人間の体の輪郭又は特定の顔を識別すると、制御ユニット２２は、画像中のユーザ１０の頭部の中心の点Ｐ_ｉ’と、画像の左上の隅の点Ｏ_ｉ’に対する座標（ｘ’，ｙ’）とを決定するようプログラムされる。図２Ａの画像中の点Ｏ_ｉ’は、図１の基準座標系における点（０，０，Ｚ_Ｐ）に略対応する。
【００２８】
同様に、図２Ｂは、図１のディスプレイ画面の他方の側のカメラ２６ｂの視野内の画像を示す図である。同様に、図２Ｂの画像は、制御ユニット２２へ送信され、制御ユニット２２においてユーザ１０又はユーザの顔の画像を認識するために画像認識ソフトウエアを用いて処理される。カメラ２６ｂはディスプレイ画面の他方の側に配置されるため、ユーザ１０の画像は図２Ａと比較すると視野の異なる部分に配置される。制御ユニットは、図２Ｂの画像中のユーザの頭部の中心の点Ｐ_ｉ’’と画像の左上の隅の点Ｏ_ｉ’’に対する座標（ｘ’’，ｙ’’）とを決定する。
【００２９】
図２Ａ及び図２Ｂに示すカメラ画像中でユーザ１０の位置Ｐ_ｉ’及びＰ_ｉ’’が夫々画像座標（ｘ’，ｙ’）及び（ｘ’’，ｙ’’）を有するものであると識別すると、図１の基準座標系におけるユーザ１０の位置Ｐの座標（Ｘ_Ｐ，Ｙ_Ｐ，Ｚ_Ｐ）は、「ステレオ問題」として知られるコンピュータビジョンの標準的な技術を用いて一意に決定されうる。３次元コンピュータビジョンの基本的なステレオ技術は、例えば、ここに参照として組み入れられるＴｒｕｃｃｏ及び及びＶｅｒｒｉによる”ＩｎｔｒｏｄｕｃｔｏｒｙＴｅｃｈｎｉｑｕｅｓｆｏｒ３−ＤＣｏｍｐｕｔｅｒＶｉｓｉｏｎ” （ＰｒｅｎｔｉｃｅＨａｌｌ，１９９８）と、特にそのうちの第７章”Ｓｔｅｒｅｏｐｓｉｓ”に記載されている。このような周知の技術を用いて、図１中のユーザの位置Ｐ（未知の座標（Ｘ_Ｐ，Ｙ_Ｐ，Ｚ_Ｐ）を有する）と、図２Ａ中のユーザの画像位置Ｐ_ｉ’（既知の画像座標（ｘ’，ｙ’）を有する）は、以下の式、
ｘ’＝Ｘ_ｐ／Ｚ_Ｐ　　　　　　　　　（式１）
ｙ’＝Ｙ_ｐ／Ｚ_Ｐ　　　　　　　　　（式２）
によって与えられる。同様に、図１中のユーザの位置Ｐと、図２Ｂ中のユーザの画像位置Ｐ_ｉ’’（既知の画像座標（ｘ’’，ｙ’’）を有する）は、以下の式、
ｘ’’＝（Ｘ_ｐ−Ｄ）／Ｚ_Ｐ　　　　　（式３）
ｙ’’＝Ｙ_ｐ／Ｚ_Ｐ　　　　　　　　　（式４）
によって与えられ、但し、Ｄはカメラ２６ａ，２６ｂの間の距離である。当業者は、式１乃至４で与えられる項はカメラの幾何学形状によって決められる線形変換によることを認識するであろう。
【００３０】
式１乃至式４は、３つの未知の変数（座標Ｘ_Ｐ，Ｙ_Ｐ，Ｚ_Ｐ）を有するため、連立方程式の解によりＸ_Ｐ，Ｙ_Ｐ及びＺ_Ｐの値が得られ、従って、図１の基準座標系におけるユーザ１０の位置が与えられる。
【００３１】
必要であれば、座標（Ｘ_Ｐ，Ｙ_Ｐ，Ｚ_Ｐ）は制御ユニット２２の他の内部座標系へ平行移動されうる。ユーザの位置（Ｘ_Ｐ，Ｙ_Ｐ，Ｚ_Ｐ）を決定し、必要であれば放射座標を他の基準座標へ平行移動するために必要な処理は、制御ユニット２２以外の処理ユニットにおいても行われうる。例えば、この処理は、画像認識処理をサポートし、従って画像検出及び測位のタスクのみを行う別個の処理ユニットを含みうる。
【００３２】
上述のように、スピーカ１８ａ−ｅの固定の位置は、以前の入力に基づいて制御ユニット２２において知られている。例えば、各スピーカ１８ａ−ｅは図１に示すように室内に配置されると、基準座標系における各スピーカ１８ａ−ｅの座標（ｘ，ｙ，ｚ）、及び、カメラ２６ａ，２６ｂの間の距離Ｄは、測定され、制御ユニット２２においてメモリへ入力されうる。画像認識ソフトウエアを用いて（上述のステレオ問題の後認識処理と共に）決定されるユーザ１０の座標（Ｘ_Ｐ，Ｙ_Ｐ，Ｚ_Ｐ）と、各スピーカの予め記憶された座標は、各スピーカ１８ａ−ｅに対するユーザ１０の位置を決定するために使用されうる。上述のように、制御ユニット２２の音響処理は、入力オーディオ信号及びスピーカ１８ａ−ｅに対するユーザ１０の位置に基づいて、各スピーカ１８ａ−ｅの出力（振幅、周波数、及び位相を含む）を適当に調整しうる。
【００３３】
従って、ビデオカメラ２６ａ，２６ｂ、画像認識ソフトウエア、及び検出されたユーザの位置を決定するための後認識処理を使用することにより、図１のホームシアターシステムのユーザの位置が自動的に検出され決定されることが可能となる。ユーザが動いた場合、処理は繰り返され、ユーザの新しい位置が決定され、制御ユニット２２はスピーカ１８ａ−ｅによって出力されるオーディオ信号を調整するために新しい位置を使用する。
【００３４】
自動検出特徴は、スピーカの出力がユーザ１０の位置のデフォルト入力又はマニュアル入力に基づくようオフとされうる。画像認識ソフトウエアは、例えば多数の異なる顔を認識するようプログラムされえ、特定のユーザの顔は認識及び自動調整のために選択されうる。このように、システムは視聴領域にいる特定のユーザの位置を調整しうる。或いは、画像認識ソフトウエアは、視聴領域にある全ての顔又は人間の体を検出するために使用されえ、処理は夫々の場所を自動的に決定しうる。各スピーカ１８ａ−ｅの音出力の調整は、各検出されたユーザの場所における聞き取り経験の最適化を試みるアルゴリズムによって決定されうる。
【００３５】
図１の実施例はホームシアターシステムを示すが、自動検出及び調整は他のオーディオビジュアルシステム又は他の純粋な音響システムによって使用されうる。例えば、ユーザの位置において立体音響音のバランスを正しく（又は予め決められたように）維持するためにスピーカに対するユーザの決定された位置に基づいて各スピーカ位置において音量を調整するために多数のスピーカを有するステレオシステムと共に使用されうる。
【００３６】
従って、図３中、２つのスピーカを有するステレオシステムに適用される本発明のより簡単な実施例を示す。ステレオシステムの基本的な構成要素は、２つのスピーカ１００ａ，１００ｂに取り付けられたステレオ増幅器１３０を含む。カメラ１１０は、視聴領域にいる聴取者１４０の画像を含む視聴領域の画像を検出するために使用される。スピーカ１００ａ，１００ｂ、カメラ１１０、及びユーザ１４０の相対的な位置は、上から見た、又は床の平面へ投影されたものとして示される。図３はまた、平面上の簡単な基準座標系を示し、この座標系は、カメラの位置に原点Ｏを有し、カメラ１００の軸Ａに対する対象の角度から構成される。従って、角度βはスピーカ１００ａの角度的な位置であり、角度φはスピーカ１００ｂの角度的な位置であり、角度θはユーザ１４０の角度的な位置である（図３は、ユーザの頭部の一番上を示す）。
【００３７】
図３のシステムでは、ユーザ１４０は原点Ｏから適当な距離Ｄにおいて図３の中央領域においてステレオを聴くと想定される。スピーカ１００ａ，１００ｂは、視聴領域の略中央である軸Ａに沿った位置Ｄにおいてデフォルトバランスを有する。
【００３８】
スピーカ１００ａ，１００ｂの位置の角度β及びφは、測定され処理ユニット１２０の中に予め格納される。カメラ１１０によって捕捉される画像は、上述の実施例に記載されるように人間の体、特に顔等の輪郭を検出する画像認識ソフトウエアを含む処理ユニット１２０へ転送される。画像中の検出された体又は顔の位置は、基準座標系におけるユーザ１４０の位置に対応する角度θを決定するために処理ユニットによって使用される。例えば、図３Ａを参照するに、角度θの一次の決定は、以下の式、
θ＝（ｘ／Ｗ）（Ｐ）
で表わされ、但しｘは処理ユニット１２０によって測定される画像の中心Ｃからの水平方向の像距離、Ｗは画像の水平方向の全幅、Ｐは視野、又は、カメラによって固定されるシーンの角度的な幅である。
【００３９】
処理ユニット１２０は、ユーザ１４０とスピーカ１００ａ，１００ｂの相対的な角度的な位置に基づいてスピーカ１００ａ，１００ｂのバランスを調整する信号を順に増幅器へ送信する。例えば、スピーカ１００ａの出力は係数（β−θ）を用いて調整され、スピーカ１００ｂの出力は係数（φ＋θ）を用いて調整される。このように、スピーカ１００ａ，１００ｂのバランスは、スピーカ１００ａ，１００ｂに対するユーザ１４０の位置に基づいて自動的に調整される。上述したように、図４のシステムでは、ユーザ１４０は図３の中央視聴領域にそのまま、原点Ｏから略距離Ｄにいると想定される。従って、ユーザの角度的な位置θに基づくバランスの調整は許容可能な１次の調整である。
【００４０】
本発明の例示的な実施例は、添付の図面を参照して説明されたが、本発明はこれらの厳密な実施例に限られるものではなく、本発明の範囲は請求項の範囲によって決められることが意図されることが理解されるべきである。
【図面の簡単な説明】
【図１】
本発明の第１の実施例によるユーザの自動検出及び測位と出力の調整を含むホームシアターシステムを示す斜視図である。
【図１Ａ】
図１のシステムの制御システムの部分を示す図である。
【図２Ａ】
図１のシステムの第１のカメラによって捕捉されるユーザの画像を含む画像を示す図である。
【図２Ｂ】
図１のシステムの第２のカメラによって捕捉されるユーザの画像を含む画像を示す図である。
【図３】
本発明の第１の実施例によるユーザの自動検出及び測位と出力の調整を含むステレオシステムを示す斜視図である。
【図３Ａ】
図３のシステムのカメラによって捕捉されるユーザの画像を含む画像を示す図である。[0001]
[Field of the Invention]
The present invention relates to audio systems such as stereo systems, television audio systems, and home theater systems. In particular, the present invention relates to systems and methods for adjusting an acoustic system.
[0002]
[Background of the Invention]
Several systems are known that adjust the output of various acoustic systems based on the location of the listener ("user"). For example, UK Patent Application GB 2,228,324 describes a system that balances a stereo system as the user moves to maintain the stereo effect on the listener. A transmitter carried by the user emits a signal to two separate receivers adjacent to two stereo speakers. The emitted signal may be an ultrasonic signal, an infrared signal, or a wireless signal, and may be emitted in response to the start signal (or may be a wired electric signal). The system uses the time until the receiver (adjacent to the speaker) receives the signal from the transmitter to determine the distance between the user and the speaker. The distance between the user and each of the two loudspeakers is thus calculated. Based on the principle that sound decreases with the cube of the distance from the sound source, the system adjusts the distance between each speaker and the user to adjust each speaker so that each speaker has approximately equal sound intensity. use.
[0003]
GB 2,228,324 describes a system for determining the position of the user by determining the point at which the distance of the user from each speaker overlaps, but it is assumed that the determination of the position is not necessary for adjusting the stereo balance. I have.
[0004]
Japanese Published Patent Application No. 5-137200 detects the position of a listener in one of five angular zones with respect to the front of the television by pointing to separate infrared detectors in each zone. I do. It is stated that the balance of the stereo speakers located on the side of the television screen is adjusted based on the zone where the listener is.
[0005]
Japanese Patent Publication No. 4-130900 uses the time taken for light transmission to calculate the distance between the listener and the two light emitting and light detecting units. The distance between the user and the two parts and the distance between the two parts are used to calculate the position of the listener and to balance the audio signal.
[0006]
Similarly, Japanese Published Patent Application No. 7-302210 measures the distance between a listening position and a set of speakers and provides an appropriate delay for each speaker based on the distance between the speakers and the listening position. Use the infrared signal to adjust the time.
[0007]
[Summary of the Invention]
One obvious difficulty with prior art systems is that the user must wear or carry the transmitter (as in GB 2,228,324) to enjoy automatic balancing of the stereo system, or Otherwise, it may be necessary to rely on unreliable and / or coarse (such as infrared sensors) sensors to detect the location of the listener. For example, using an infrared detector may fail to detect the listener, and as a result, the above-described system will fail to properly balance the position of the user. In addition, other people (or others, such as pets) may be sensed by the sensor, which may result in a balance adjustment to another person or object rather than the listener.
[0008]
Furthermore, the above-described system is less suitable for more complex sound systems than simple stereo systems, for example home theater systems. Home theater systems typically have a number of speakers located around a room used to project sound, including sound effects, to a listener. Sound is not simply "balanced" between the speakers. Rather, the output of a particular loudspeaker position is raised or lowered or otherwise matched based on the sound effects to be provided to the listener at his location. For example, two of the many speakers may be driven in phase or out of phase to provide a particular acoustic effect to a listener at the listener's location.
[0009]
Thus, the precise determination of the position of each of a number of speakers relative to the position of the listener is very important for some entertainment experiences. Furthermore, in order to adjust the required output of a large number of loudspeakers to a changed or changing position of the listener, it is necessary to determine the position of the listener more reliably and accurately. Is done.
[0010]
Accordingly, the present invention provides acoustic systems (including audio-visual systems) that include a change in the position of the user that can automatically adjust to the position of the listener or user of the system. The system uses image capture and recognition to recognize some or part of the contours of a human body (eg, a user). Based on the user's position in the field of view, the system determines the user's location information. In one embodiment of the system, for example, the angular position of the user is determined based on the position of the user's image in the field of view of the image capture device, and the system determines two or more based on the determined angle. The output of the speaker can be adjusted.
[0011]
The image capture device may be, for example, a video camera connected to a control unit or CPU having image recognition software programmed to recognize all or part of the shape of the human body. Various methods have been developed to detect and track active contours, such as the human body. For example, "person finder", which locates and tracks a human body (or head or hand, for example) in a video image, is described in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp 780-85 (Jully 1997). I. T. Media Laboratory Perceptual Computing Section Technical Report No. 353, "Pfinder: Real-Time Tracking of the Human Body".
[0012]
The detection of people (pedestrians) in images using the template matching method is described in the D. of Proceedings of the European Conference on Computer Vision, 2000 (available at www.gravila.net), which is incorporated herein by reference. M. Gaverilla (Image Understanding Systems, DaimlerChrysler Research) in "Pedestrian Detection From A Moving Vehicle".
[0013]
Statistical sampling algorithms for the detection of static objects in images and probabilistic models of object motion detection are described in Int. J. Computer Vision, vol. 29, 1998 (available with the “Condensation” source code at www.dai.ed.ed.ac.uk/CVonline/LOCAL/COPIES/ISARD1/condensation.html) and Black (Oxford Univ. Enforce. "Condensation-Conditional Density Propagation For Visual Tracking".
[0014]
Alternatively, the control unit or CPU can be programmed to recognize the contour of a human head or the contour of a particular user's face. Software for recognizing faces in images (including digital images) is commercially available, for example, sold by Visionics, Inc. at www. faceit. com, such as "FaceIt". Software incorporating algorithms that can be used to detect human bodies, faces, etc., will generally be referred to as image recognition software, image recognition algorithms, etc. in the following description. The recognized body or head position relative to the camera's field of view can be used, for example, to determine the angle of the user's position relative to the camera. The determined angles may be used to balance or otherwise adjust the sound output and sound effects provided by each speaker to the user's location.
[0015]
Image capture devices and associated image sensing software that identify the contours of the human body or a particular face make user detection more accurate and reliable.
[0016]
Two or more such programmed image capture devices with overlapping fields of view can be used to accurately determine the position of the user. For example, two separate cameras, as described above, may be located separately and each may be used to determine a user's position in a reference coordinate system. The user's position is determined, for example, to determine the distance between the user's current location and the fixed (known) position of each speaker in the reference coordinate system, and as in the case of home theater systems, such as acoustics. It can be used by audio systems to properly adjust speaker output to give the correct audio mix for a location.
[0017]
Thus, in general, the present invention includes a sound generation system that outputs sound through two or more speakers. The sound output of each of the two or more speakers is adjustable based on the position of the user with respect to the positions of the two or more speakers. The system includes at least one image capture device (such as a video camera) that is capable of learning on the listening area and is coupled to a processor having image recognition software. The processing unit uses image recognition software to recognize a user in the image generated by the image capture device. The processing unit also has software for generating at least one measurement of the user's position based on the user's position in the image.
[0018]
[Detailed description]
Referring to FIG. 1, a user 10 is shown as being between the audio and visual components of a home theater system. The home theater system includes a video display screen 14 and a series of audio speakers 18a-e surrounding a comfortable viewing area for the display screen 14. The system further includes a control unit 22, shown as resting on the display screen 14 in FIG. Of course, the control unit 22 may be located anywhere and may be incorporated into the display unit 14 itself. The control unit 22, the display screen 14, and the speakers 18a-e are all electrically connected by electrical wires and connectors. The wires are not shown in FIG. 1 because they are typically run under the carpet in the room or through adjacent walls.
[0019]
The home theater system of FIG. 1 includes electrical components that generate a visual output from display screen 14 and a corresponding sound output from speakers 18a-e. Audio and video processing for the home theater output typically occurs in a control unit 22, which may include a processor, memory, and associated processing software. Such control units and associated processing components are known and are available in various commercially available forms. The audio and video inputs provided to the control unit 22 can come from television signals, cable signals, satellite signals, DVDs and VCRs. The control unit 22 processes the input signal and provides the appropriate signals to the drive circuitry of the display screen 14, resulting in a video display, as shown in FIG. -E to provide an appropriate drive signal.
[0020]
The audio portion of the signal input to the control unit 22 may be a stereophonic signal or may support more complex sound processing, such as sound effects processed by the control unit 22. For example, the control unit 22 may drive the speakers 18b, 18c, 18d in an overlapping sequence to mimic a car passing through the right portion of the display. The amplitude and phase of each speaker 18b, 18c, 18d is based on the audio signal received by control unit 22 and the position of speaker 18b, 18c, 18d relative to user 10 as stored in the memory of control unit 22. It is driven based on.
[0021]
The control unit 22 may receive and store the positions of the speakers 18a-e and the position of the user 10 with respect to a common reference system, such as defined by the origin O and the unit vector (x, y, z) in FIG. The x, y and z coordinates of each speaker 18a-e and user 10 in the reference coordinate system are physically measured or otherwise determined and input to the control unit 22.
[0022]
The position of the user 10 in FIG. 1 is represented by coordinates (X _P , Y _P , Z _P ). In general, the reference coordinate system can be located at a location other than that shown in FIG. (As will be described in further detail below, the reference coordinate system shown in FIG. 1 is selected to be at the camera position to facilitate automatic positioning of the user 10 according to the present invention). When the coordinates of the speakers 18a-e and the user 10 in the reference coordinate system are received by the control unit 22, the control unit 22 may instead translate the coordinates to the internal reference coordinate system.
[0023]
The positions of the user 10 and the speakers 18a-e in such a common reference coordinate system allow the control unit 10 to determine the position of the user 10 with respect to each speaker 18a-e. (It is well known that subtracting the coordinates of the user 10 from the coordinates of the speaker 18a determines their relative position in the reference coordinate system.) Software in the control unit 22 electrically adjusts drive signals for sound output (eg, volume, frequency, phase) of each speaker based on the received audio signal and the position of the user 10 with respect to the speakers. I do. Electronic adjustment of the sound output by the control unit 22 based on the relative positions of the speakers 18a-e with respect to the user 10 is well known in the art. Alternatively, the control system may allow a user to manually adjust the sound output of each speaker 18a-e. Manual control of audio components via such a control unit 22 is also well known in the prior art. In either case, the input may be provided by a remote controller wirelessly interfaced with the control unit 22 and projecting a menu on the display screen 14, for example, allowing entry of location data.
[0024]
The home theater system shown in FIG. 1 can also automatically identify the user and the user's position in the reference coordinate system. In the above description, it is assumed that the positions of the user 10 and the speakers 18a-e in the reference coordinate system located at the origin O are known, for example, based on manual input provided by the user. When the position of the user 10 is not known or changes, or when automatic detection and determination of the user's position is otherwise desired, the positions of the speakers 18a-e are typically Remains fixed, so that it is usually known by the control unit 22. Accordingly, the positions of the speakers 18a-e in the reference coordinate system are each manually entered into the control system 22 during initial system setup and generally remain fixed thereafter. (Of course, the position of the speaker can be changed and a new position can be entered, but this does not occur with normal use of the system). Once the position of the user is automatically determined by the system, as will be described in more detail below, the control unit 22 may adjust the position of the user and the speakers 18a-e, such as when manually entering a position as described above. The sound output to each of the speakers 18a-e is adjusted based on this.
[0025]
The system is further mounted on the display screen 14 to automatically detect if the user 10 in FIG. 1 is present and, if so, to its location, for normal viewing of the display screen 14. It includes two video cameras 26a, 26b pointed at the area. The camera 26a is arranged at the origin O of the common reference coordinate system. As will be apparent from the following description, video cameras 26a, 26b may be located elsewhere; the reference coordinate system may be relocated to a different location of camera 26a or to another location. The video cameras 26a, 26b are interfaced with the control unit 22 and provide the control unit 22 with images captured in the viewing area. Image recognition software is loaded into the control unit 22 and processed by the processor therein to process video images received from the cameras 26a, 26b. The components including the memory of the control unit 22 used for image recognition are separate or shared with other functions of the control unit 22, as shown in FIG. 1A. Alternatively, the image recognition can be performed in a separate unit.
[0026]
FIG. 2A is a diagram showing an image in the field of view of the camera 26a on one side of the display screen of FIG. The image of FIG. 2A is transmitted to the control unit 22, where it is processed using, for example, well-known image recognition software loaded therein. The image recognition algorithm can be used to recognize the contour of a human body, such as the user 10. Alternatively, image recognition software can be used that can be programmed to recognize faces or to recognize one or more specific faces, for example, the face of the user 10.
[0027]
When the image recognition software identifies the contours of the human body or a particular face, the control unit 22 proceeds to a point P at the center of the head of the user 10 in the image. _i 'And point O in the upper left corner of the image _i And coordinates (x ', y') for '. Point O in the image of FIG. 2A _i 'Denotes a point (0, 0, Z) in the reference coordinate system of FIG. _P ).
[0028]
Similarly, FIG. 2B shows an image within the field of view of the camera 26b on the other side of the display screen of FIG. Similarly, the image of FIG. 2B is transmitted to the control unit 22, where it is processed using image recognition software to recognize the image of the user 10 or the user's face. Since the camera 26b is arranged on the other side of the display screen, the image of the user 10 is arranged in a portion having a different field of view as compared with FIG. 2A. The control unit determines the point P at the center of the user's head in the image of FIG. 2B. _i '' And point O in the upper left corner of the image _i '' And coordinates (x '', y '').
[0029]
The position P of the user 10 in the camera images shown in FIGS. 2A and 2B _i 'And P _i '' Are identified as having the image coordinates (x ′, y ′) and (x ″, y ″), respectively, the coordinates (X) of the position P of the user 10 in the reference coordinate system of FIG. _P , Y _P , Z _P ) Can be uniquely determined using standard techniques of computer vision known as the “stereo problem”. The basic stereo technology of three-dimensional computer vision is described, for example, in "Introduction Technologies for 3-D Computer Vision" by Trucco and Verri (Prentice Hall, 1998), and in particular Chapter 7 thereof. Stereoopsis ". Using such a well-known technique, the user's position P (unknown coordinates (X _P , Y _P , Z _P )) And the image position P of the user in FIG. 2A. _i '(With known image coordinates (x', y ')) is
x '= X _p / Z _P (Equation 1)
y '= Y _p / Z _P (Equation 2)
Given by Similarly, the user's position P in FIG. 1 and the user's image position P in FIG. 2B _i '' (With known image coordinates (x '', y '')) is
x '' = (X _p −D) / Z _P (Equation 3)
y '' = Y _p / Z _P (Equation 4)
Where D is the distance between the cameras 26a, 26b. Those skilled in the art will recognize that the terms given in Equations 1-4 are due to a linear transformation determined by the camera geometry.
[0030]
Equations 1 through 4 represent three unknown variables (coordinates X _P , Y _P , Z _P ), So that X _P , Y _P And Z _P Thus, the position of the user 10 in the reference coordinate system of FIG. 1 is given.
[0031]
If necessary, the coordinates (X _P , Y _P , Z _P ) Can be translated to another internal coordinate system of the control unit 22. User location (X _P , Y _P , Z _P ), And if necessary, the processing required to translate the radial coordinates to other reference coordinates can be performed in a processing unit other than the control unit 22. For example, the process may include a separate processing unit that supports the image recognition process and thus performs only the image detection and positioning tasks.
[0032]
As described above, the fixed positions of the speakers 18a-e are known in the control unit 22 based on previous inputs. For example, when the speakers 18a-e are arranged indoors as shown in FIG. 1, the coordinates (x, y, z) of the speakers 18a-e in the reference coordinate system and the distance between the cameras 26a, 26b D can be measured and entered into memory at control unit 22. The coordinates (X) of the user 10 determined using the image recognition software (along with the post-recognition processing of the stereo problem described above) _P , Y _P , Z _P ) And the pre-stored coordinates of each speaker may be used to determine the position of the user 10 with respect to each speaker 18a-e. As described above, the acoustic processing of the control unit 22 appropriately adjusts the output (including amplitude, frequency, and phase) of each speaker 18a-e based on the input audio signal and the position of the user 10 with respect to the speakers 18a-e. Can be adjusted.
[0033]
Thus, by using video cameras 26a, 26b, image recognition software, and post-recognition processing to determine the location of the detected user, the location of the user of the home theater system of FIG. 1 is automatically detected and determined. It is possible to be. If the user moves, the process is repeated, the new position of the user is determined, and control unit 22 uses the new position to adjust the audio signals output by speakers 18a-e.
[0034]
The auto-detect feature may be turned off so that the speaker output is based on a default or manual input of the location of the user 10. Image recognition software can be programmed, for example, to recognize a number of different faces, and a particular user's face can be selected for recognition and automatic adjustment. In this way, the system may adjust the position of a particular user in the viewing area. Alternatively, image recognition software may be used to detect all faces or human bodies in the viewing area, and the process may automatically determine the respective location. Adjusting the sound output of each speaker 18a-e may be determined by an algorithm that attempts to optimize the listening experience at each detected user location.
[0035]
Although the embodiment of FIG. 1 shows a home theater system, automatic detection and adjustment may be used by other audiovisual systems or other pure audio systems. For example, multiple speakers to adjust the volume at each speaker location based on the user's determined location relative to the speakers to maintain the stereophonic sound balance at the user's location correctly (or as predetermined). Can be used with a stereo system having
[0036]
Accordingly, FIG. 3 shows a simpler embodiment of the present invention applied to a stereo system having two speakers. The basic components of a stereo system include a stereo amplifier 130 mounted on two speakers 100a, 100b. Camera 110 is used to detect an image of the viewing area including an image of listener 140 in the viewing area. The relative positions of the speakers 100a, 100b, the camera 110, and the user 140 are shown as viewed from above or projected onto the floor plane. FIG. 3 also shows a simple reference coordinate system on a plane, which has an origin O at the position of the camera and consists of the angle of the object with respect to the axis A of the camera 100. Accordingly, the angle β is the angular position of the speaker 100a, the angle φ is the angular position of the speaker 100b, and the angle θ is the angular position of the user 140 (FIG. Indicates the top).
[0037]
In the system of FIG. 3, it is assumed that the user 140 listens to the stereo in the central region of FIG. The speakers 100a and 100b have a default balance at a position D along an axis A which is substantially the center of the viewing area.
[0038]
The angles β and φ of the positions of the speakers 100a and 100b are measured and stored in the processing unit 120 in advance. The image captured by the camera 110 is transferred to a processing unit 120 that includes image recognition software for detecting contours of a human body, especially a face, as described in the above-described embodiment. The position of the detected body or face in the image is used by the processing unit to determine an angle θ corresponding to the position of the user 140 in the reference coordinate system. For example, referring to FIG. 3A, the primary determination of the angle θ is given by the following equation:
θ = (x / W) (P)
Where x is the horizontal image distance from the center C of the image measured by the processing unit 120, W is the full horizontal width of the image, P is the field of view, or the angle of the scene fixed by the camera Width.
[0039]
The processing unit 120 sequentially transmits signals for adjusting the balance of the speakers 100a and 100b to the amplifier based on the relative angular positions of the user 140 and the speakers 100a and 100b. For example, the output of speaker 100a is adjusted using a coefficient (β−θ), and the output of speaker 100b is adjusted using a coefficient (φ + θ). As described above, the balance between the speakers 100a and 100b is automatically adjusted based on the position of the user 140 with respect to the speakers 100a and 100b. As described above, in the system of FIG. 4, it is assumed that the user 140 is in the center viewing area of FIG. Therefore, the adjustment of the balance based on the angular position θ of the user is an acceptable first-order adjustment.
[0040]
While exemplary embodiments of the present invention have been described with reference to the accompanying drawings, the present invention is not limited to these exact embodiments, and the scope of the present invention is defined by the appended claims. It is to be understood that this is intended.
[Brief description of the drawings]
FIG.
1 is a perspective view illustrating a home theater system including automatic detection of a user and adjustment of positioning and output according to a first embodiment of the present invention.
FIG. 1A
FIG. 2 is a diagram illustrating a part of a control system of the system of FIG. 1.
FIG. 2A
FIG. 2 shows an image including an image of a user captured by a first camera of the system of FIG. 1.
FIG. 2B
FIG. 2 shows an image including an image of a user captured by a second camera of the system of FIG. 1.
FIG. 3
1 is a perspective view showing a stereo system including automatic detection of a user and positioning and output adjustment according to a first embodiment of the present invention;
FIG. 3A
FIG. 4 shows an image including an image of a user captured by a camera of the system of FIG. 3.

Claims

A sound generating system, wherein sound is output through two or more speakers, and a sound output of each of the two or more speakers is adjustable based on a position of a user with respect to a position of the two or more speakers,
The system includes at least one image capture device coupled to a processor capable of learning in a viewing area and having image recognition software for identifying a user in an image generated by the image capture device, wherein the processor includes: A sound generation system having additional software for generating at least one measurement of the user's position based on the user's position in the image.

The sound generation system according to claim 1, wherein the system is part of an audiovisual system.

The sound generation system according to claim 2, wherein the audiovisual system is a home theater system.

The sound generation system according to claim 1, wherein the processing unit adjusts a sound output of at least one of the speakers based on at least one measurement value of the position of the user.

The processing unit identifies a user in an image, generates at least one measurement of the position of the user, and outputs a sound output of at least one of the speakers based on at least one measurement of the position of the user. 5. The sound generation system according to claim 4, comprising a single processing unit for adjusting the sound pressure.

A first processing unit for identifying a user in an image and generating at least one measurement of the position of the user; and a processing unit of the speaker based on at least one measurement of the position of the user. 5. The sound generation system according to claim 4, further comprising a second processing unit that adjusts at least one sound output of the second processing unit.

The sound generation system according to claim 1, wherein the at least one image capture device is a video camera.

The sound generation system according to claim 7, wherein the at least one measurement of the position of the user is an angle in a reference coordinate system.

The sound generating system according to claim 7, wherein the processing unit uses the angle to adjust an output of at least one speaker.

The sound generating system according to claim 1, wherein the at least one image capture device is two or more video cameras.

The sound generation system according to claim 10, wherein the processing unit determines a position of the user in a reference coordinate system using a position of the user in an image generated by each of the two or more video cameras.

The processing unit uses three-dimensional computer vision stereo technology to determine a user's position in the reference coordinate system using a user's position in an image generated by each of the two or more video cameras. The sound generation system according to claim 11.

The processing unit uses a position of the user in the reference coordinate system and a position of the two or more speakers in the reference coordinate system to determine a distance between the user and each of the two or more speakers. Item 12. The sound generation system according to Item 11.

14. The sound generation system of claim 13, wherein a distance between the user and each of the two or more speakers is used to adjust a sound output of at least one of the two or more speakers.