JP4244416B2

JP4244416B2 - Information processing apparatus and method, and recording medium

Info

Publication number: JP4244416B2
Application number: JP31076698A
Authority: JP
Inventors: 哲二郎近藤; 知之大月; 淳一石橋
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1998-10-30
Filing date: 1998-10-30
Publication date: 2009-03-25
Anticipated expiration: 2018-10-30
Also published as: JP2000138913A

Description

【０００１】
【発明の属する技術分野】
本発明は情報処理装置および方法、並びに記録媒体に関し、特に、テレビ会議システムに参加している参加者の向いている方向を検出し、その検出された方向に従って、各会議室に設置されているスピーカから出力される音量や、ディスプレイに表示される画像を制御することにより、自分に注目している他の参加者を認識させるようにする情報処理装置および方法、並びに記録媒体に関する。
【０００２】
【従来の技術】
複数の会議室をネットワークを用いて接続し、あたかも１つのテーブルを囲んで会議しているような遠隔会議システムが提案されている。このようなシステムにおける各会議室には、自分（参加者Ａと称する）以外の参加者の映像を映し出すディスプレイと、そのディスプレイに映し出されている参加者の発言を出力するスピーカが備え付けられている。ディスプレイとスピーカは、参加者Ａを除いた参加人数と同じ数だけ設置される。
【０００３】
また、各会議室には、その会議室にいる参加者を撮像するためのビデオカメラと、音声を取り込む為のマイクロホンも備え付けられている。そのビデオカメラやマイクロホンは、各会議室に備え付けられているディスプレイの近傍（主に上部）に設置されている。ビデオカメラにより撮像された参加者Ａの映像や、マイクロホンにより取り込まれた音声は、各会場に備え付けられた、参加者Ａに対応するディスプレイとスピーカに出力される。
【０００４】
【発明が解決しようとする課題】
上述した会議室の構成において、複数の参加者が同時に発言したとき、その発言は、それぞれの参加者に対応するスピーカから出力される。その結果、注目している参加者の発言を聞き取りづらくなってしまう課題があった。
【０００５】
また、上述した会議室の構成では、ディスプレイの映像は、差異なく表示されているので、自分の発言に注目している参加者や、自分に対して話しかけている参加者とを区別することができず、自分に注目している参加者を認識することは困難であった。
【０００６】
本発明はこのような状況に鑑みてなされたものであり、参加者の向いている方向を検出し、その検出された方向に従って、スピーカから出力される音量やディスプレイに表示される画像を制御することにより、自分に注目している参加者を認識させるようにするものである。
【００１０】
【課題を解決するための手段】
本発明の情報処理装置は、被写体の画像を撮像する撮像手段と、被写体が発した音声を取り込む取り込み手段と、被写体の角度を検出する検出手段と、撮像手段により撮像された画像、取り込み手段により取り込まれた音声、および検出手段により検出された角度の、それぞれのデータを他の情報処理装置に送信する送信手段と、他の情報処理装置から送信された他の被写体を撮像した画像の画像データ、他の被写体が発した音声の音声データ、および他の被写体が向いている方向を示す角度データを受信する受信手段と、受信手段により受信された角度データが示す角度が、所定の条件を満たすか否かを判断することで、他の被写体が被写体の方向に向いていないと判断された場合、受信手段で受信された画像データに基づく画像の解像度を落として表示するための制御、または受信手段で受信された音声データに基づく音声の音量レベルを小さくして出力するための制御のうち、少なくとも一方を行う制御手段とを備えることを特徴とする。
【００１１】
本発明の情報処理方法は、被写体の画像を撮像する撮像ステップと、被写体が発した音声を取り込む取り込みステップと、被写体の角度を検出する検出ステップと、撮像ステップで撮像された画像、取り込みステップで取り込まれた音声、および検出ステップで検出された角度の、それぞれのデータを他の情報処理装置に送信する送信ステップと、他の情報処理装置から送信された他の被写体を撮像した画像の画像データ、他の被写体が発した音声の音声データ、および他の被写体が向いている方向を示す角度データを受信する受信ステップと、受信ステップで受信された角度データが示す角度が、所定の条件を満たすか否かを判断することで、他の被写体が被写体の方向に向いていないと判断された場合、受信ステップの処理で受信された画像データに基づく画像の解像度を落として表示するための制御、または受信ステップの処理で受信された音声データに基づく音声の音量レベルを小さくして出力するための制御のうち、少なくとも一方を行う制御ステップとを含むことを特徴とする。
【００１２】
本発明の記録媒体は、情報処理装置に、被写体の画像を撮像する撮像ステップと、被写体が発した音声を取り込む取り込みステップと、被写体の角度を検出する検出ステップと、撮像ステップで撮像された画像、取り込みステップで取り込まれた音声、および検出ステップで検出された角度の、それぞれのデータを他の情報処理装置に送信する送信ステップと、他の情報処理装置から送信された他の被写体を撮像した画像の画像データ、他の被写体が発した音声の音声データ、および他の被写体が向いている方向を示す角度データを受信する受信ステップと、受信ステップで受信された角度データが示す角度が、所定の条件を満たすか否かを判断することで、他の被写体が被写体の方向に向いていないと判断された場合、受信ステップの処理で受信された画像データに基づく画像の解像度を落として表示するための制御、または受信ステップの処理で受信された音声データに基づく音声の音量レベルを小さくして出力するための制御のうち、少なくとも一方を行う制御ステップとを含む処理を実行させるコンピュータが読みとり可能なプログラムを記録したことを特徴とする。
【００１４】
本発明の情報処理装置および方法、並びに記録媒体においては、被写体の画像が撮像され、被写体が発した音声が取り込まれ、被写体の角度が検出され、撮像された画像、取り込まれた音声、および検出された角度の、それぞれのデータが他の情報処理装置に送信され、他の情報処理装置から送信された他の被写体を撮像した画像の画像データ、他の被写体が発した音声の音声データ、および他の被写体が向いている方向を示す角度データが受信され、受信された角度データが示す角度が、所定の条件を満たすか否かが判断されることで、他の被写体が被写体の方向に向いていないと判断された場合、受信された画像データに基づく画像の解像度を落として表示するための制御、または受信された音声データに基づく音声の音量レベルを小さくして出力するための制御のうち、少なくとも一方が行われる。
【００１５】
【発明の実施の形態】
以下に本発明の実施の形態を説明するが、特許請求の範囲に記載の発明の各手段と以下の実施の形態との対応関係を明らかにするために、各手段の後の括弧内に、対応する実施の形態（但し一例）を付加して本発明の特徴を記述すると、次のようになる。但し勿論この記載は、各手段を記載したものに限定することを意味するものではない。また、従来の場合と対応する部分には同一の符号を付してあり、その説明は適宜省略する。
【００１７】
請求項１に記載の情報処理装置は、被写体の画像を撮像する撮像手段（例えば、図９の正面ビデオカメラ３１−２１）と、被写体が発した音声を取り込む取り込み手段（例えば、図９のマイクロホン３２）と、被写体の角度を検出する検出手段（例えば、図１０のステップＳ２２）と、撮像手段により撮像された画像、取り込み手段により取り込まれた音声、および検出手段により検出された角度の、それぞれのデータを他の情報処理装置に送信する送信手段（例えば、図１０のステップＳ２３）と、他の情報処理装置から送信された他の被写体を撮像した画像の画像データ、他の被写体が発した音声の音声データ、および他の被写体が向いている方向を示す角度データを受信する受信手段（例えば、図９の送受信装置４２）と、受信手段により受信された角度データが示す角度が、所定の条件を満たすか否かを判断することで、他の被写体が前記被写体の方向に向いていないと判断された場合、受信手段で受信された画像データに基づく画像の解像度を落として表示するための制御、または受信手段で受信された音声データに基づく音声の音量レベルを小さくして出力するための制御のうち、少なくとも一方を行う制御手段（例えば、図１１のステップＳ３３，Ｓ３４）とを備えることを特徴とする。
【００１８】
図１は、本発明の情報処理装置を適用したテレビ会議システムの構成を示している。なお、本明細書において、システムとは、複数の装置で構成される全体的な装置を表すものとする。図１に示されるように、複数（この実施の形態の場合、４つ）の通信センタ１−１乃至１ー４が、例えば、ISDN（Integrated Service Digital Network）などのネットワーク２を介して相互に接続されている。また、各通信センタは、例えば、図２に示すような１つの会議室を備えている。
【００１９】
図２に示す会議室においては、１つのテーブル１０、１つの椅子及び３台のディスプレイ装置が設けられている。例えば、通信センタ１−４の会議室においては、図２に示される番号４の位置に椅子が配置されており、番号１乃至３の位置にディスプレイ装置が配置されている。また、通信センタ１−３の会議室においては、図２に示される番号３の位置に椅子が配置されており、番号１，２、および４の位置にディスプレイ装置が配置されている。また、通信センタ１−２の会議室においては、図２に示される番号２の位置に椅子が配置されており、番号１，３、および４の位置にディスプレイ装置が配置されている。さらに、通信センタ１−１の会議室においては、図２に示される番号１の位置に椅子が配置されており、番号２乃至４の位置にディスプレイ装置が配置される。
【００２０】
また、通信センタ１−４の会議室に配置されたディスプレイ装置において、図２に示される番号１の位置に配置されたディスプレイ装置には、通信センタ１−１の参加者を撮影した画像が表示され、番号２の位置に配置されたディスプレイ装置には、通信センタ１−２の参加者を撮影した画像が表示され、番号３の位置に配置されたディスプレイ装置には、通信センタ１−３の参加者を撮影した画像が表示されるようになされている。他の通信センタも同様に、参加者が座る椅子以外の位置に配置されたディスプレイ装置には、対応する通信センタの参加者を撮影した映像が表示されることになる。
【００２１】
このように、それぞれの通信センタの会議室には、その通信センタの特定の位置に参加者が座るための椅子が配置されており、その他は、他の通信センタの参加者を表示するディスプレイ装置が配置されることになる。したがって、このように、会議室を構成することにより、どの通信センタの会議室においても、参加者の配置が同一の位置となる。すなわち、あたかも、テーブル１０を中心にして、４人の参加者が、実際に特定の位置に配置されたような状態となる。ただし、各通信センタの会議室において、実在する参加者自身以外は、全てディスプレイ表示による参加者になるが、どの会議室においても、同様の会議状態が実現されていることになる。
【００２２】
次に、各通信センタの詳細について、図３を用いて説明する。なお、各通信センタとも、ディスプレイ装置の配置状態は多少異なるが、ほぼ同一の構成であるため、ここでは、通信センタ１−４についてのみ説明し、他の通信センタ１−１乃至１−３の説明は省略する。
【００２３】
まず、通信センタ１−４の会議室には、図２に示したように、番号４の位置に椅子が配置されており、番号１乃至３の位置にディスプレイ装置がそれぞれ配置されている。従って、図３に示される参加者２４は、図２の番号４の位置に配置されている椅子に座ることになる。また、各ディスプレイ装置２１乃至２３には、参加者２４の映像を取り込むために、参加者２４の左側に設置されている左側面ビデオカメラ３１−２２、参加者２４の正面に設置されている正面ビデオカメラ３１−２１、および参加者２４の右側に設置されている右側面ビデオカメラ３１−２３が設けられている。さらに、その参加者２４の発言を取り込むマイクロホン３２−２１乃至３２−２３（以下、マイクロホン３２−２１乃至３２−２３を個々に区別する必要がない場合、単にマイクロホン３２と記述する。その他の装置に付いても同様に記述する）、他の通信センタからそれぞれ供給される音声を出力するスピーカ部３３−２１乃至３３−２３、およびその音声に対応する画像を表示するディスプレイ部３４−２１乃至３４−２３が設けられている。
【００２４】
スピーカ部３３−２１乃至３３−２３とディスプレイ部３４−２１乃至３４−２３は、通信センタ１−１乃至１−３から送信されてきた画像とその画像に対応する音声をそれぞれ出力するようになされている。すなわち、例えば、ディスプレイ装置２１のディスプレイ部３４−２１には、通信センタ１−１の参加者の画像が表示され、スピーカ部３３−２１からは、その参加者の発言が出力されるようになされている。また、ディスプレイ装置２２のディスプレイ部３４−２２には、通信センタ１−２の参加者の画像が表示され、スピーカ部３３−２２からは、その参加者の発言が出力されるようになされている。さらに、ディスプレイ装置２３のディスプレイ部３４−２３には、通信センタ１−３の参加者の画像が表示され、スピーカ部３３−２３からは、その参加者の発言が出力されるようになされている。
【００２５】
また、ディスプレイ装置２１に配置されている正面ビデオカメラ３１−２１は、通信センタ１−４の参加者２４を撮影し、マイクロホン３２−２１は、その参加者２４の発言を取り込み、その参加者２４の画像と発言が、通信センタ１−１に供給される。また、ディスプレイ装置２２に設置されている左側面ビデオカメラ３１−２２は、通信センタ１−４の参加者２４を撮影し、マイクロホン３２−２２は、その参加者２４の発言を取り込み、その参加者２４の画像と発言が、通信センタ１−２に供給される。さらに、ディスプレイ装置２３に設置されているビデオカメラ３１−２３は、通信センタ１−４の参加者２４を撮影し、マイクロホン３２−２３は、その参加者２４の発言を取り込み、その参加者２４の画像と発言が、通信センタ１−３に供給される。
【００２６】
そして、図３に示すように、ディスプレイ装置２１乃至２３は、参加者２４が各ディスプレイ装置２１乃至２３のディスプレイ部３４−２１乃至３４−２３を見ることができるように、図２に示された所定の位置に配置されている。
【００２７】
このような構成をもつ会議室において、参加者２４が注目している参加者を検出し、その参加者に対応するディスプレイ装置２１乃至２３のスピーカ部３３から出力される発言、または、ディスプレイ部３４に映し出される映像を強調する方法について説明する。まず、発言を強調する場合を説明する。
【００２８】
図４は、参加者２４が注目している参加者を判断し、換言すれば、見ているディスプレイ装置２１乃至２３を判断し、その参加者の発言を強調するための処理を行うのに必要な装置を示したブロック図である。参加者２４は、例えば頭部に、参加者２４の向いている方向（角度）を検出するための角度検出装置４１を装着する。この角度検出装置４１は、例えば、磁気センサ、ジャイロ、角速度センサなどから構成され、参加者２４の水平方向の顔の向きを検出する。
【００２９】
演算装置４２は、スピーカ部３３−２１乃至３３−２３と、それぞれ接続されており、角度検出装置４１により検出された参加者２４の顔の向きを基に、送受信装置４３により受信された音声データが、各スピーカ部３３から出力される際の音量を制御するようになされている。具体的には、参加者２４が向いていると判断されたスピーカ部３３の音量を、他のスピーカ部３３よりも大きくする、または逆に、他のスピーカ部３３の音量を参加者２４が向いていると判断されたスピーカ部３３の音量よりも小さくする。または、周波数を変化することにより、参加者２４の顔の向きに応じて聞こえる音質を変化させるようにしても良い。すなわち、正面から聞こえてくる音以外の音は、壁などに反射した後に聞こえてくる音であるから、その壁の質にもよるが、一般的には、高い帯域の音声信号は減衰して聞こえる。このことを考慮し、参加者２４が向いている方に存在するスピーカ部３３以外のスピーカ部３３から出力される音は、高い帯域の音声信号が減衰された音として出力されるようにしても良い。。
【００３０】
ここでは、参加者２４が向いていると判断されたスピーカ部３３の音量を大きくすることにより、他のスピーカ部３３から出力される音と区別が付くようにする場合を例に挙げて説明する。
【００３１】
ここで、正面ビデオカメラ３１−２１、左側面ビデオカメラ３１−２２、および右側面ビデオカメラ３１−２３の配置について、図５を参照して説明する。角度ｄＩ₁，ｄＩ₂，ｄＩ₃は、それぞれ正面ビデオカメラ３１−２１、左側面ビデオカメラ３１−２２、右側面ビデオカメラ３１−２３の角度を表している。ここでは、正面ビデオカメラ３１−２１の角度ｄＩ₁を０度とする。そして、左側面ビデオカメラ３１−２２と、正面ビデオカメラ３１−２１との角度ｄＩ₂をマイナス４５度とし、ビデオカメラ３１−２３と、正面ビデオカメラ３１−２１との角度ｄＩ₃を４５度とする。そして、角度検出装置４１で検出される参加者２４の角度ｄＩ₁に対しての角度を、角度ｄ（−４５度乃至４５度）とする。
【００３２】
角度Ｔｈ₂と角度Ｔｈ₃は、それぞれ、角度ｄＩ₁となす所定の大きさの角度である。この角度は、参加者２４が、どのビデオカメラ３１に向いているのかを決定する際の閾値として用いられる。
【００３３】
図６は、演算装置４２の動作を説明するフローチャートである。ステップＳ１において、ユーザは、正面ビデオカメラ３１−２１に向かって座り、その時の角度検出装置４１で検出された角度を、演算装置４２は、初期値０度（角度ｄＩ₁）として設定する。初期値が設定されたら、ステップＳ２に進み、角度検出装置４１での、参加者２４の顔の向きの検出が開始される。
【００３４】
ステップＳ３において、演算装置４２は、角度検出装置４１で検出された角度ｄが閾値である角度Ｔｈ₂よりも小さいか否かが判断される。角度ｄが角度Ｔｈ₂よりも小さいと判断された場合、参加者２４は、左側面ビデオカメラ３１−２２の方に向いていると判断され、ステップＳ４において、演算装置４２は、スピーカ部３３−２２から出力される音量を、スピーカ部３３−２１とスピーカ部３３−２３よりも大きな音量で出力されるように制御する。
【００３５】
一方、ステップＳ３において、角度ｄが角度Ｔｈ₂と等しいか、それよりも大きいと判断された場合、ステップＳ５に進み、角度ｄが角度Ｔｈ₃よりも大きいか否かが判断される。角度ｄが角度Ｔｈ₃よりも大きいと判断された場合、参加者２４は、右側面ビデオカメラ３１−２３の方に向いていると判断され、ステップＳ６において、演算装置４２は、スピーカ部３３−２３から出力される音量を、スピーカ部３３−２１とスピーカ部３３−２２よりも大きな音量で出力されるように制御する。
【００３６】
ステップＳ５において、角度ｄが角度Ｔｈ₃と等しいか、それよりも小さいと判断された場合、換言すれば、角度ｄが、角度Ｔｈ₂≦角度ｄ≦角度Ｔｈ₃の関係にあると判断された場合、参加者２４は、正面ビデオカメラ３１−２１の方に向いていると判断され、ステップＳ７において、演算装置４２は、スピーカ部３３−２１から出力される音量を、スピーカ部３３−２２とスピーカ部３３−２３よりも大きな音量で出力されるように制御する。
【００３７】
ステップＳ２乃至ステップＳ７の処理が、会議中繰り返されることにより、スピーカ部３３から出力される音量が、参加者２４の向いている方向に対応して制御される。このフローチャートの処理は、会議が終了した時点で、割り込み処理として終了される。
【００３８】
図７は、角度検出装置４２により検出された参加者２４の顔の向きにより、ディスプレイ部３４に映し出される映像を制御する場合の構成例を示すブロック図である。この構成においては、送受信装置４３で受信された画像データが、演算装置４２を介して、それぞれ対応するディスプレイ部３４−２１乃至３４−２３に供給される。
【００３９】
図７に示した演算装置４２の動作について、図８のフローチャートを参照して説明するが、ステップＳ１１乃至Ｓ１３、およびステップＳ１５の処理は、図６のステップＳ１乃至Ｓ３、およびステップＳ５の処理と同様の処理なので、その説明は省略する。
【００４０】
ステップＳ１３において、角度Ｔｈ₂よりも角度ｄの方が小さいと判断された場合、換言すれば、ディスプレイ部３４−２２に映し出されている参加者の方を向いていると判断された場合、ステップＳ１４に進む。ステップＳ１４において、演算装置４２は、ディスプレイ部３４−２２に映し出された映像が、他のディスプレイ部３４−２１とディスプレイ部３４−２３に映し出されている映像よりも強調されるように制御する。具体的には、ディスプレイ部３４−２１とディスプレイ部３４−２３に映し出される映像の輝度を下げる、解像度を下げるなどの処理が行われる。
【００４１】
ステップＳ１５において、角度Ｔｈ₃よりも角度ｄの方が大きいと判断された場合、換言すれば、ディスプレイ部３４−２３に映し出されている参加者の方を向いていると判断された場合、ステップＳ１６に進み、ディスプレイ部３４−２３に映し出された映像が、他のディスプレイ部３４−２１とディスプレイ部３４−２２に映し出されている映像よりも強調されるように制御される。また、ステップＳ１５において、角度Ｔｈ₃よりも角度ｄの方が小さいと判断された場合、換言すれば、ディスプレイ部３４−２１の方を向いていると判断された場合、ステップＳ１７に進み、ディスプレイ部３４−２１に映し出された映像が、他のディスプレイ部３４−２２とディスプレイ部３４−２３に映し出されている映像よりも強調されるように制御される。
【００４２】
上述した説明においては、音声または映像のうち、一方を制御するようにしたが、両方とも制御するようにしてもよい。また、上述した説明では、参加者２４の会議室に備え付けられているスピーカ部３３やディスプレイ部３４を制御するようにしたが、換言すれば、送受信装置４３で受信されたデータを制御していたが、送信するデータを制御することにより同様の効果を得ることも可能である。以下に、送信するデータを制御する場合を、図９を参照して説明する。
【００４３】
図９においては、演算装置４２に正面ビデオカメラ３１−２１、左側面ビデオカメラ３１−２２、右側面ビデオカメラ３１−２３、およびマイクロホン３２が接続されている。マイクロホン３２は、図３においては、ディスプレイ装置２１乃至２３毎に設置されていたが、以下の実施の形態においては、１本のマイクロホン３２が、参加者２４の前に設置されているとして説明する。また、マイクロホン３２は、参加者２４の耳から装着できるタイプにすると、参加者２４とマイクロホン３２が常に一定の位置関係に保てるので、マイクロホン３２を机の上などに固定しておくよりも良い。さらに、そのようなマイクロホン３２を用いる場合、そのマイクロホン３２の一部に、例えば、耳にかける枝の部分に角度検出装置４１を装着することにより、頭上に付けるよりも目立たずに、角度を検出することが可能となる。
【００４４】
図９に示した演算装置４２の動作について、図１０のフローチャートを参照して説明する。ステップＳ２１において、図６のステップＳ１で説明した場合と同様に、角度検出装置４１の初期化が行われる。ステップＳ２２において、初期化された角度を基準に、角度検出装置４１により角度検出が開始される。そして、ステップＳ２３において、検出された角度は、演算装置４２により、各ビデオカメラ３１で撮像された画像と、マイクロホン３２で取り込まれた音声とともに、送受信装置４３に出力され、対応する会議室（通信センタ１−１乃至１−３）に送信される。
【００４５】
図１１のフローチャートを参照して、演算装置４２が、他の通信センタから送信されたデータを受信した際の動作を説明する。ステップＳ３１において、送受信装置４３で受信されたデータは、通信センタ１−１から送信されたデータであるか否かが判断される。通信センタ１−１から送信されたデータであると判断された場合、ステップＳ３２に進む。ステップＳ３２において、受信された角度情報の角度ｄが、角度Ｔｈ₂以上、角度Ｔｈ₃以下であるか否かが判断される。角度ｄが角度Ｔｈ₂≦角度ｄ≦角度Ｔｈ₃の関係にあると判断された場合、換言すれば、通信センタ１−１の参加者が参加者２４が映し出されているディスプレイ装置の方を向いていると判断された場合、ステップＳ３３に進み、演算装置４２は、受信された画像データを、ディスプレイ部３４−２１に、解像度等に処理を加えないで出力する。
【００４６】
一方、ステップＳ３２において、角度Ｔｈ₂≦角度ｄ≦角度Ｔｈ₃の関係ではないと判断された場合、ステップＳ３４に進む。ステップＳ３４において、演算装置４２は、受信された画像データを、解像度が落とされて表示されるように処理し、ディスプレイ部３４−２１に出力する。
【００４７】
ステップＳ３１において、受信されたデータが通信センタ１−１から送信されたデータではないと判断された場合、ステップＳ３５に進み、通信センタ１−２から送信されたデータであるか否かが判断される。受信されたデータが通信センタ１−２から送信されたデータであると判断された場合、ステップＳ３６に進む。ステップＳ３６において、受信された角度情報の角度ｄが、角度ｄ＞角度Ｔｈ₃の関係にあるか否かが判断される。角度ｄが、角度ｄ＞角度Ｔｈ₃の関係にあると判断された場合、換言すれば、通信センタ１−２の参加者が参加者２４が映し出されているディスプレイ装置の方を向いていると判断された場合、ステップＳ３３に進み、演算装置４２は、受信された画像データを、ディスプレイ部３４−２２に、解像度等に処理を加えないで出力する。
【００４８】
一方、ステップＳ３６において、角度ｄ＞角度Ｔｈ₃の関係ではないと判断された場合、ステップＳ３４に進む。ステップＳ３４において、演算装置４２は、受信された画像データを、解像度が落とされて表示されるように処理し、ディスプレイ部３４−２２に出力する。
【００４９】
ステップＳ３５において、受信されたデータが通信センタ１−２から送信されたデータではないと判断された場合、ステップＳ３７に進み、通信センタ１−３から送信されたデータであるか否かが判断される。受信されたデータが通信センタ１−３から送信されたデータであると判断された場合、ステップＳ３８に進む。ステップＳ３８において、受信された角度情報の角度ｄが、角度ｄ＜角度Ｔｈ₂の関係にあるか否かが判断される。角度ｄが、角度ｄ＜角度Ｔｈ₂の関係にあると判断された場合、換言すれば、通信センタ１−３の参加者が参加者２４が映し出されているディスプレイ装置の方を向いていると判断された場合、ステップＳ３３に進み、演算装置４２は、受信された画像データを、ディスプレイ部３４−２３に、解像度等に処理を加えないで出力する。
【００５０】
一方、ステップＳ３８において、角度ｄ＜角度Ｔｈ₂の関係ではないと判断された場合、ステップＳ３４に進む。ステップＳ３４において、演算装置４２は、受信された画像データを、解像度が落とされて表示されるように処理し、ディスプレイ部３４−２３に出力する。
【００５１】
ステップＳ３７において、受信されたデータが通信センタ１−３から送信されたデータではないと判断された場合、ステップＳ３９に進む。ステップＳ３９に進むということは、通信センタ１−１乃至１−３のうちの、いずれの通信センタからも送信されたデータではないと判断された場合であるので、エラーが生じたと判断され、エラー処理が行われる。エラー処理としては、例えば、受信されたデータを破棄するなどである。
【００５２】
上述した説明においては、ディスプレイ部３４に映し出される画像が制御されるようにしたが、スピーカ部３３から出力される音声を制御するようにしてもよい。
【００５３】
このようにして、参加者が注目しているディスプレイ装置の情報（角度情報）とともに、画像データと音声データを送受信することにより、自分に注目している参加者を認識する事が可能となる。
【００５４】
図１２は、図９で示した演算装置４２の他の動作を説明するフローチャートである。このフローチャートにおいて、図６のフローチャートと同様の処理は、その説明を省略する。
【００５５】
ステップＳ４１とステップＳ４２の処理が終了され、ステップＳ４３において、演算装置４２が角度Ｔｈ₂が角度ｄよりも小さいと判断された場合、ステップＳ４４に進む。角度Ｔｈ₂が角度ｄよりも小さいと判断された場合、参加者２４は、左側面ビデオカメラ３１−２２の方に向いていると判断されたことになるので、演算装置４２は、ステップＳ４４において、正面ビデオカメラ３１−２１と右側面ビデオカメラ３１−２３から出力された画像データの、輝度を小さくする、または解像度を落とすなどの処理を施して、送受信装置４３に出力する。この際、演算装置４２は、マイクロホン３２から出力されたデータを、正面ビデオカメラ３１−２１と右側面ビデオカメラ３１−２３から出力されたデータと共に送るとき、その音声データは、音量が小さくなるように制御し、送受信装置３２に出力するようにしても良い。
【００５６】
ステップＳ４５において、角度Ｔｈ₃が角度ｄよりも小さいと判断された場合、ステップＳ４６に進む。ステップＳ４６において、正面ビデオカメラ３１−２１と左側面ビデオカメラ３１−２２から出力されるデータが制御され、送受信装置４３に出力される。
【００５７】
ステップＳ４５において、角度Ｔｈ₃が角度ｄと等しいか、それよりも大きいと判断された場合、ステップＳ４７に進む。ステップＳ４７において、左側面ビデオカメラ３１−２１と右側面ビデオカメラ３１−２３から出力されるデータが制御され、送受信装置４３に出力される。
【００５８】
このようにして、制御され、送受信装置４３に出力されたデータは、ステップＳ４８において、それぞれ対応する通信センタ１−１乃至１−３に送信される。
【００５９】
このように、送信されるデータを制御することにより、送信されるデータ量を減少させることができ、さらに、受信されたデータを再生することにより、臨場感があるテレビ会議を行うことが可能となる。
【００６０】
上述した説明においては、参加者２４の顔の向きを検出するのに、角度検出装置４１を用いたが、予め、参加者２４の顔の画像と角度を関連付けて記憶しておくことにより、同様の効果を得ることができる。図１３に示した構成例では、記憶部５１が新たに設けられ、演算装置４２には、正面ビデオカメラ３１−２１、送受信装置４３、および記憶部５１が接続されている。
【００６１】
参加者２４は、正面ビデオカメラ３１−２１の前に座り、一定角度毎に、一時静止し、その時の画像を正面ビデオカメラ３１−２１に撮像させる。撮像された画像は、撮像された時の角度と関係づけられて、演算装置４２を介して、記憶部５１に記憶される。このようにして、記憶部５１には、一定の角度毎に得られた参加者２４の顔の画像が記憶される。
【００６２】
このような構成において、得られた角度からスピーカ部３３から出力される音量を制御する場合は、図４を参照して説明したのと同様に、演算装置４２にスピーカ部３３−２１乃至３３−２３が接続される。また、得られた角度からディスプレイ部３４の映像を制御する場合は、図７を参照して説明したのと同様に、演算装置４２にディスプレイ部３４−２１乃至３４−２３が接続される。さらに、得られた角度から送信する画像データや音声データを制御する場合は、図９を参照して説明したのと同様に、演算装置４２に正面ビデオカメラ３１−２１の他に、左側面ビデオカメラ３１−２２と右側面ビデオカメラ３１−２３も接続され、さらにマイクロホン３２も接続される。
【００６３】
図１４のフローチャートを参照して、演算装置４２の動作について説明する。ステップＳ５１において、演算装置４２は、正面ビデオカメラ３１−２１から得られた参加者２４の顔の画像を、角度と関連付けて記憶部５１に記憶させる。所定の枚数の画像が記憶されたら、ステップＳ５２に進む。ステップＳ５２において、参加者２４の顔が向いている方向を判断するために、まず、参加者２４の顔の画像が、正面ビデオカメラ３１−２１により撮像される。
【００６４】
演算装置４２は、記憶部５１に記憶されている画像から、撮像された画像に近似する画像を検索する。その検索の仕方としては、例えば、正規化相関関数が用いられる。その正規化相関関数を以下に示す。
【数１】

この式において、Ｒは、ステップＳ５１において、記憶部５１に記憶されたリファレンス画像の画素値を示し、ＣはステップＳ５２において、正面ビデオカメラ３１−２１により撮像された画像の画素値を示し、アルファベットの添字は、その画像の画素値の位置を示し、アルファベット上部の横線は、その画像の画素値の平均値を示す。
【００６５】
式（１）により求められた値が、１に近ければ近いほど、相関が高い画像であることを示している。また、式（１）のように、画像毎の画素値の平均値を基準値として相関度を求めてもよいし、次式（２）に示すように、画像毎の画素値の平均値を用いない式により相関度を求めるようにしてもよい。
【数２】

【００６６】
ステップＳ５３において、式（１）または式（２）により、記憶部５１に記憶されている画像と近似する画像（相関度が高い画像）が検索されたら、ステップＳ５４において、その画像に関連付けられている角度情報が取得される。上述したように、記憶部５１に記憶されている基準画像は、角度情報と関連づけられて記憶されているので、相関度が高いと判断された基準画像と関連付けられている角度を検索することにより、角度情報が得られる
【００６７】
演算装置４２は、取得された角度情報を用いて、例えば、図４に示したように、スピーカ部３３を制御するようになされている場合、図６のステップＳ３以降の処理を行う。同様に、図７に示したように、ディスプレイ部３４を制御するようにされている場合、図８のステップＳ１３以降の処理を行う。
【００６８】
このように、参加者２４が演算装置４１を装着しなくても、角度情報を予め記憶された画像との相関度を算出することにより求めることが可能である。
【００６９】
なお、上述した実施の形態においては、水平方向の向きのみを検出の対象としたが、垂直方向の向きも検出するようにしても良い。垂直方向の向きも検出する場合、例えば、参加者２４が所定の角度以上、下または上を向いているとき、水平方向に顔を動かしたとしても、その角度の検出は行わず、換言すると、音量や映像の制御を行わず、送受信装置４３で送信するデータ、または受信したデータをそのまま送受信する。このようにすることで、例えば、参加者２４が書類を見るために下を向いているときに、顔を左右に動かしたとしても、音声や映像が変化することがないので、注目していないのに音声や映像が変化するといった不都合を解消することができる。
【００７０】
本明細書中において、上記処理を実行するコンピュータプログラムをユーザに提供する提供媒体には、磁気ディスク、CD-ROMなどの情報記録媒体の他、インターネット、デジタル衛星などのネットワークによる伝送媒体も含まれる。
【００７１】
【発明の効果】
以上の如く本発明によれば、臨場感があるテレビ会議などを提供する事が可能となる。
【図面の簡単な説明】
【図１】本発明の情報処理装置を適用したテレビ会議システムの構成例を示す図である。
【図２】テレビ会議システムにおける各通信センタの会議室の状態を示す図である。
【図３】図１における通信センタにおけるディスプレイ装置の配置状態を示す図である。
【図４】参加者の向きを検出する装置の構成を示すブロック図である。
【図５】ビデオカメラの配置を説明する図である。
【図６】図４に示した演算装置の動作を説明するフローチャートである。
【図７】参加者の向きを検出する装置の他の構成を示すブロック図である。
【図８】図７に示した演算装置の動作を説明するフローチャートである。
【図９】参加者の向きを検出する装置のさらに他の構成を示すブロック図である。
【図１０】図９に示した演算装置の動作を説明するフローチャートである。
【図１１】図９の送受信装置から送信された情報を受信した演算装置の動作を説明するフローチャートである。
【図１２】図９に示した演算装置の他の動作を説明するフローチャートである。
【図１３】参加者の向きを検出する装置のさらに他の構成を示すブロック図である。
【図１４】図１３に示した演算装置の動作を説明するフローチャートである。
【符号の説明】
１−１乃至１−６通信センタ，２ネットワーク，２１−２３ディスプレイ装置，３５−２１正面ビデオカメラ，３５−２２左側面ビデオカメラ，３５−２３右側面ビデオカメラ，３６−２１乃至３６−２３マイクロホン，３７−２１乃至３７−２３ディスプレイ部，３８−２１乃至３８−２３スピーカ部，４１角度検出装置，４２演算装置，４３送受信装置，５１記憶部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an information processing apparatus and method, and Record Regarding the media, in particular, the direction in which the participants participating in the video conference system are facing is detected, and the volume output from the speakers installed in each conference room and the display on the display according to the detected direction Processing apparatus and method for recognizing other participants who are paying attention to themselves by controlling the image to be displayed, and Record It relates to the medium.
[0002]
[Prior art]
There has been proposed a remote conference system in which a plurality of conference rooms are connected using a network and a conference is held as if surrounding a single table. Each conference room in such a system is equipped with a display that displays a video of a participant other than myself (referred to as participant A) and a speaker that outputs the speech of the participant displayed on the display. . The same number of displays and speakers as the number of participants excluding participant A are installed.
[0003]
Each conference room is also equipped with a video camera for capturing the participants in the conference room and a microphone for capturing audio. The video camera and microphone are installed in the vicinity (mainly at the top) of the display provided in each conference room. Participant A's video imaged by the video camera and audio captured by the microphone are output to a display and a speaker corresponding to the participant A provided at each venue.
[0004]
[Problems to be solved by the invention]
In the configuration of the conference room described above, when a plurality of participants speak at the same time, the speech is output from the speaker corresponding to each participant. As a result, there was a problem that it was difficult to hear the remarks of participants who were paying attention.
[0005]
Also, in the conference room configuration described above, the video on the display is displayed without difference, so it is possible to distinguish between participants who are paying attention to their speech and participants who are talking to themselves. It was difficult to recognize the participants who were paying attention to them.
[0006]
The present invention has been made in view of such circumstances, and detects the direction in which the participant is facing, and controls the sound volume output from the speaker and the image displayed on the display according to the detected direction. This makes it possible to recognize participants who are paying attention to themselves.
[0010]
[Means for Solving the Problems]
The information processing apparatus according to the present invention includes an imaging unit that captures an image of a subject, a capturing unit that captures sound generated by the subject, a detecting unit that detects an angle of the subject, an image captured by the imaging unit, and a capturing unit. Transmitted means for transmitting each data of the captured voice and the angle detected by the detecting means to the other information processing apparatus, and transmitted from the other information processing apparatus Images of other subjects image data, Of audio from other subjects Audio data, and Indicates the direction that other subjects are facing Receiving means for receiving angle data, and angle data received by the receiving means When it is determined that the other object is not facing the direction of the subject by judging whether the angle indicated by , Received by receiving means image data Image based on Control to display with reduced resolution, or Received by receiving means Audio data Based on voice The volume level of small And a control means for performing at least one of the control for output.
[0011]
The information processing method according to the present invention includes an imaging step for capturing an image of a subject, a capturing step for capturing sound produced by the subject, a detecting step for detecting the angle of the subject, an image captured in the capturing step, and a capturing step. A transmission step for transmitting each data of the captured voice and the angle detected in the detection step to another information processing device, and a transmission from the other information processing device. Images of other subjects image data, Of audio from other subjects Audio data, and Indicates the direction that other subjects are facing Reception step for receiving angle data, and angle data received at the reception step When it is determined that the other object is not facing the direction of the subject by judging whether the angle indicated by , Received during receive step processing image data Image based on Control to display with reduced resolution, or Received during receive step processing Audio data Based on voice The volume level of small And a control step for performing at least one of the control for output.
[0012]
The recording medium of the present invention includes an image capturing step for capturing an image of a subject, a capturing step for capturing sound produced by the subject, a detection step for detecting the angle of the subject, and an image captured by the imaging step. , A transmission step of transmitting each data of the voice captured in the capture step and the angle detected in the detection step to the other information processing device, and transmitted from the other information processing device Images of other subjects image data, Of audio from other subjects Audio data, and Indicates the direction that other subjects are facing Reception step for receiving angle data, and angle data received at the reception step When it is determined that the other object is not facing the direction of the subject by judging whether the angle indicated by , Received during receive step processing image data Image based on Control to display with reduced resolution, or Received during receive step processing Audio data Based on voice The volume level of small A computer-readable program for executing a process including a control step for performing at least one of the control for output did It is characterized by that.
[0014]
In the information processing apparatus and method and the recording medium of the present invention, an image of a subject is picked up, sound emitted from the subject is captured, the angle of the subject is detected, the captured image, captured sound, and detection Each data of the specified angle was sent to another information processing device and sent from another information processing device Images of other subjects image data, Of audio from other subjects Audio data, and Indicates the direction that other subjects are facing Angle data is received and received angle data When it is determined that the angle indicated by indicates whether or not a predetermined condition is satisfied, the other subject is not facing the direction of the subject. , Received image data Image based on Control to display with reduced resolution, or Received Audio data Based on voice The volume level of small Then, at least one of the control for outputting is performed.
[0015]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below, but in order to clarify the correspondence between each means of the invention described in the claims and the following embodiments, in parentheses after each means, The features of the present invention will be described with the corresponding embodiment (however, an example) added. However, of course, this description does not mean that each means is limited to the description. In addition, parts corresponding to those in the conventional case are denoted by the same reference numerals, and description thereof will be omitted as appropriate.
[0017]
The information processing apparatus according to claim 1 includes an imaging unit that captures an image of a subject (for example, the front video camera 31-21 in FIG. 9) and a capturing unit that captures sound emitted by the subject (for example, the microphone in FIG. 9). 32), detection means for detecting the angle of the subject (for example, step S22 in FIG. 10), the image captured by the imaging means, the sound captured by the capture means, and the angle detected by the detection means, respectively. Is transmitted from the other information processing apparatus and the transmission means (e.g., step S23 in FIG. 10) for transmitting the data to the other information processing apparatus. Images of other subjects image data, Of audio from other subjects Audio data, and Indicates the direction that other subjects are facing Receiving means for receiving angle data (for example, the transmission / reception device 42 in FIG. 9) and angle data received by the receiving means. When it is determined that the other object is not facing the direction of the subject by judging whether or not the angle indicated by satisfies the predetermined condition , Received by receiving means image data Image based on Control to display with reduced resolution, or Received by receiving means Audio data Based on voice The volume level of small And a control means (for example, steps S33 and S34 in FIG. 11) for performing at least one of the control for output.
[0018]
FIG. 1 shows a configuration of a video conference system to which an information processing apparatus of the present invention is applied. In the present specification, the system represents an overall apparatus composed of a plurality of apparatuses. As shown in FIG. 1, a plurality (four in this embodiment) of communication centers 1-1 to 1-4 are mutually connected via a network 2 such as an ISDN (Integrated Service Digital Network). It is connected. Each communication center includes one conference room as shown in FIG. 2, for example.
[0019]
In the conference room shown in FIG. 2, one table 10, one chair, and three display devices are provided. For example, in the conference room of the communication center 1-4, a chair is arranged at the position of the number 4 shown in FIG. 2, and a display device is arranged at the positions of the numbers 1 to 3. Further, in the conference room of the communication center 1-3, a chair is arranged at the position of the number 3 shown in FIG. 2, and a display device is arranged at the positions of the

numbers

1, 2, and 4. Further, in the conference room of the communication center 1-2, a chair is arranged at the position of the number 2 shown in FIG. 2, and a display device is arranged at the positions of the

numbers

1, 3, and 4. Further, in the conference room of the communication center 1-1, a chair is disposed at the position of number 1 shown in FIG. 2, and a display device is disposed at the positions of numbers 2 to 4.
[0020]
Further, in the display device arranged in the conference room of the communication center 1-4, an image obtained by photographing the participant of the communication center 1-1 is displayed on the display device arranged at the position of the number 1 shown in FIG. The display device arranged at the number 2 position displays an image of the participant of the communication center 1-2. The display device arranged at the number 3 position displays the image of the communication center 1-3. An image of the participant is displayed. Similarly, in other communication centers, a video image of a participant in the corresponding communication center is displayed on the display device arranged at a position other than the chair where the participant sits.
[0021]
As described above, the conference room of each communication center is provided with a chair for a participant to sit at a specific position of the communication center, and the others are display devices that display participants of other communication centers. Will be placed. Therefore, by configuring the conference room in this way, the participants are placed at the same position in the conference room of any communication center. That is, it is as if the four participants are actually placed at specific positions with the table 10 as the center. However, in the conference rooms of each communication center, all participants other than the actual participants themselves are participants by display display, but the same conference state is realized in any conference room.
[0022]
Next, details of each communication center will be described with reference to FIG. Each communication center has almost the same configuration although the arrangement of display devices is somewhat different. Therefore, only the communication center 1-4 will be described here, and the other communication centers 1-1 to 1-3 will be described. Description is omitted.
[0023]
First, in the conference room of the communication center 1-4, as shown in FIG. 2, a chair is arranged at the position of the number 4, and a display device is arranged at the positions of the numbers 1 to 3, respectively. Therefore, the participant 24 shown in FIG. 3 is seated in the chair arranged at the position of the number 4 in FIG. In addition, in each display device 21 to 23, in order to capture the video of the participant 24, the left side video camera 31-22 installed on the left side of the participant 24 and the front side installed in front of the participant 24. A video camera 31-21 and a right side video camera 31-23 installed on the right side of the participant 24 are provided. Furthermore, microphones 32-21 to 32-23 that capture the remarks of the participant 24 (hereinafter, when it is not necessary to individually distinguish the microphones 32-21 to 32-23, they are simply referred to as microphones 32. Other devices) In the same manner, the speaker units 33-21 to 33-23 that output audio supplied from other communication centers and the display units 34-21 to 34- that display images corresponding to the audio are also described. 23 is provided.
[0024]
The speaker units 33-21 to 33-23 and the display units 34-21 to 34-23 output the images transmitted from the communication centers 1-1 to 1-3 and the audio corresponding to the images, respectively. ing. That is, for example, an image of a participant of the communication center 1-1 is displayed on the display unit 34-21 of the display device 21, and a speech of the participant is output from the speaker unit 33-21. ing. Further, an image of the participant of the communication center 1-2 is displayed on the display unit 34-22 of the display device 22, and a speech of the participant is output from the speaker unit 33-22. . Furthermore, an image of the participant of the communication center 1-3 is displayed on the display unit 34-23 of the display device 23, and the speech of the participant is output from the speaker unit 33-23. .
[0025]
Further, the front video camera 31-21 arranged in the display device 21 captures the participant 24 of the communication center 1-4, and the microphone 32-21 captures the speech of the participant 24, and the participant 24 The images and remarks are supplied to the communication center 1-1. The left side video camera 31-22 installed in the display device 22 captures the participant 24 of the communication center 1-4, and the microphone 32-22 captures the speech of the participant 24, and the participant 24 images and messages are supplied to the communication center 1-2. Furthermore, the video camera 31-23 installed in the display device 23 captures the participant 24 of the communication center 1-4, and the microphone 32-23 captures the speech of the participant 24, and the participant 24 Images and speech are supplied to the communication center 1-3.
[0026]
And as shown in FIG. 3, the display devices 21 to 23 are shown in FIG. 2 so that the participants 24 can see the display units 34-21 to 34-23 of the display devices 21 to 23. It is arranged at a predetermined position.
[0027]
In the conference room having such a configuration, a participant who is paying attention to the participant 24 is detected, and a speech output from the speaker unit 33 of the display devices 21 to 23 corresponding to the participant or the display unit 34 A method for emphasizing the video displayed on the screen will be described. First, the case where a statement is emphasized will be described.
[0028]
FIG. 4 is necessary to determine the participant that the participant 24 is paying attention to, in other words, to determine the display devices 21 to 23 being viewed, and to perform processing for emphasizing the participant's remarks. It is the block diagram which showed an apparatus. The participant 24 wears an angle detection device 41 for detecting the direction (angle) in which the participant 24 is facing, for example, on the head. The angle detection device 41 includes, for example, a magnetic sensor, a gyroscope, an angular velocity sensor, and the like, and detects the face orientation of the participant 24 in the horizontal direction.
[0029]
The computing device 42 is connected to the speaker units 33-21 to 33-23, and the audio data received by the transmission / reception device 43 based on the face orientation of the participant 24 detected by the angle detection device 41. However, the volume at the time of output from each speaker unit 33 is controlled. Specifically, the volume of the speaker unit 33 determined to be facing the participant 24 is made larger than that of the other speaker unit 33, or conversely, the volume of the other speaker unit 33 is turned to the participant 24. It is made smaller than the volume of the speaker unit 33 determined to be. Or you may make it change the sound quality heard according to the direction of the face of the participant 24 by changing a frequency. In other words, since the sound other than the sound that is heard from the front is the sound that is heard after being reflected on the wall, generally, the high-band audio signal is attenuated, although it depends on the quality of the wall. hear. In consideration of this, the sound output from the speaker unit 33 other than the speaker unit 33 existing on the side facing the participant 24 may be output as a sound in which a high-band audio signal is attenuated. good. .
[0030]
Here, an example will be described in which the volume of the speaker unit 33 determined to be facing the participant 24 is increased so that the sound can be distinguished from the sound output from the other speaker units 33. .
[0031]
Here, the arrangement of the front video camera 31-21, the left side video camera 31-22, and the right side video camera 31-23 will be described with reference to FIG. Angle dI ₁ , DI ₂ , DI _Three Are angles of the front video camera 31-21, the left side video camera 31-22, and the right side video camera 31-23, respectively. Here, the angle dI of the front video camera 31-21 ₁ Is 0 degrees. Then, the angle dI between the left video camera 31-22 and the front video camera 31-21. ₂ Is 45 degrees, and the angle dI between the video camera 31-23 and the front video camera 31-21 is _Three Is 45 degrees. Then, the angle dI of the participant 24 detected by the angle detection device 41 ₁ An angle with respect to is an angle d (−45 degrees to 45 degrees).
[0032]
Angle Th ₂ And angle Th _Three Respectively, the angle dI ₁ This is an angle of a predetermined size. This angle is used as a threshold for determining which video camera 31 the participant 24 is facing.
[0033]
FIG. 6 is a flowchart for explaining the operation of the arithmetic unit 42. In step S1, the user sits toward the front video camera 31-21, and calculates the angle detected by the angle detection device 41 at that time, using the initial value 0 degree (angle dI). ₁ ). When the initial value is set, the process proceeds to step S2, and detection of the orientation of the face of the participant 24 by the angle detection device 41 is started.
[0034]
In step S <b> 3, the calculation device 42 has an angle Th whose angle d detected by the angle detection device 41 is a threshold value. ₂ Or less is determined. Angle d is angle Th ₂ If it is determined that the volume is smaller than that, the participant 24 is determined to be facing the left side video camera 31-22, and in step S4, the arithmetic device 42 outputs the volume output from the speaker unit 33-22. Are controlled so as to be output at a louder volume than the speaker unit 33-21 and the speaker unit 33-23.
[0035]
On the other hand, in step S3, the angle d is equal to the angle Th. ₂ If it is determined that the angle d is greater than or equal to, the process proceeds to step S5 where the angle d is equal to the angle Th. _Three It is judged whether it is larger. Angle d is angle Th _Three If it is determined that the volume is larger than that, the participant 24 is determined to be facing the right-side video camera 31-23, and in step S6, the computing device 42 outputs the volume output from the speaker unit 33-23. Are controlled so as to be output at a louder volume than the speaker unit 33-21 and the speaker unit 33-22.
[0036]
In step S5, the angle d is an angle Th. _Three In other words, the angle d is equal to the angle Th. ₂ ≦ angle d ≦ angle Th _Three If it is determined that the relationship is, the participant 24 is determined to be facing the front video camera 31-21, and in step S7, the arithmetic device 42 outputs the volume output from the speaker unit 33-21. Are controlled to be output at a louder volume than that of the speaker unit 33-22 and the speaker unit 33-23.
[0037]
By repeating the processes of step S2 to step S7 during the meeting, the volume output from the speaker unit 33 is controlled corresponding to the direction in which the participant 24 is facing. The process of this flowchart is terminated as an interruption process when the conference is terminated.
[0038]
FIG. 7 is a block diagram illustrating a configuration example in the case of controlling an image displayed on the display unit 34 according to the face direction of the participant 24 detected by the angle detection device 42. In this configuration, the image data received by the transmission / reception device 43 is supplied to the corresponding display units 34-21 to 34-23 via the arithmetic device 42.
[0039]
The operation of the arithmetic unit 42 shown in FIG. 7 will be described with reference to the flowchart of FIG. 8. The processes in steps S11 to S13 and S15 are the same as the processes in steps S1 to S3 and S5 in FIG. Since it is the same processing, its description is omitted.
[0040]
In step S13, the angle Th ₂ If it is determined that the angle d is smaller than the angle d, in other words, if it is determined that the angle d is facing the participant shown on the display unit 34-22, the process proceeds to step S14. In step S14, the arithmetic unit 42 performs control such that the video displayed on the display unit 34-22 is emphasized more than the video displayed on the other display units 34-21 and 34-23. Specifically, processing such as lowering the luminance of the video displayed on the display unit 34-21 and the display unit 34-23 and lowering the resolution is performed.
[0041]
In step S15, the angle Th _Three If it is determined that the angle d is larger than the angle d, in other words, if it is determined that the angle d is facing the participant shown on the display unit 34-23, the process proceeds to step S16, and the display unit 34 is displayed. Control is performed so that the video image projected on -23 is emphasized more than the video image projected on the other display units 34-21 and 34-22. In step S15, the angle Th _Three If it is determined that the angle d is smaller than the angle d, in other words, if it is determined that the angle d is facing the display unit 34-21, the process proceeds to step S17, and the image displayed on the display unit 34-21 Are controlled so as to be more emphasized than the images displayed on the other display units 34-22 and 34-23.
[0042]
In the above description, one of audio and video is controlled, but both may be controlled. In the above description, the speaker unit 33 and the display unit 34 provided in the conference room of the participant 24 are controlled. In other words, the data received by the transmission / reception device 43 is controlled. However, it is possible to obtain the same effect by controlling the data to be transmitted. Hereinafter, a case of controlling data to be transmitted will be described with reference to FIG.
[0043]
In FIG. 9, a front video camera 31-21, a left side video camera 31-22, a right side video camera 31-23, and a microphone 32 are connected to the arithmetic device 42. In FIG. 3, the microphone 32 is installed for each of the display devices 21 to 23. However, in the following embodiment, it is assumed that one microphone 32 is installed in front of the participant 24. . Further, if the microphone 32 is of a type that can be worn from the ears of the participant 24, the participant 24 and the microphone 32 can always be kept in a fixed positional relationship, so that it is better than fixing the microphone 32 on a desk or the like. Further, when such a microphone 32 is used, an angle detection device 41 is attached to a part of the microphone 32, for example, by attaching an angle detection device 41 to a part of a branch to be put on the ear, so that the angle can be detected more conspicuously than when it is attached to the head. It becomes possible to do.
[0044]
The operation of the arithmetic device 42 shown in FIG. 9 will be described with reference to the flowchart of FIG. In step S21, the angle detection device 41 is initialized as in the case described in step S1 of FIG. In step S22, angle detection is started by the angle detection device 41 with the initialized angle as a reference. In step S23, the detected angle is output to the transmission / reception device 43 together with the image captured by each video camera 31 and the sound captured by the microphone 32 by the arithmetic device 42, and the corresponding conference room (communication). It is transmitted to the centers 1-1 to 1-3).
[0045]
With reference to the flowchart of FIG. 11, the operation when the computing device 42 receives data transmitted from another communication center will be described. In step S31, it is determined whether or not the data received by the transmission / reception device 43 is data transmitted from the communication center 1-1. If it is determined that the data is transmitted from the communication center 1-1, the process proceeds to step S32. In step S32, the angle d of the received angle information is changed to the angle Th. ₂ Above, angle Th _Three It is determined whether or not: Angle d is angle Th ₂ ≦ angle d ≦ angle Th _Three In other words, if it is determined that the participant of the communication center 1-1 is facing the display device on which the participant 24 is projected, the process proceeds to step S33. The arithmetic device 42 outputs the received image data to the display unit 34-21 without applying any processing to the resolution or the like.
[0046]
On the other hand, in step S32, the angle Th ₂ ≦ angle d ≦ angle Th _Three If it is determined that the relationship is not, the process proceeds to step S34. In step S34, the arithmetic unit 42 processes the received image data so that the resolution is reduced and displayed, and outputs the processed data to the display unit 34-21.
[0047]
In step S31, when it is determined that the received data is not data transmitted from the communication center 1-1, the process proceeds to step S35, and it is determined whether the received data is data transmitted from the communication center 1-2. The If it is determined that the received data is data transmitted from the communication center 1-2, the process proceeds to step S36. In step S36, the angle d of the received angle information is expressed as follows: angle d> angle Th _Three It is determined whether or not there is a relationship. Angle d is such that angle d> angle Th _Three In other words, if it is determined that the participant of the communication center 1-2 is facing the display device on which the participant 24 is projected, the process proceeds to step S33. The arithmetic unit 42 outputs the received image data to the display unit 34-22 without applying any processing to the resolution or the like.
[0048]
On the other hand, in step S36, angle d> angle Th _Three If it is determined that the relationship is not, the process proceeds to step S34. In step S34, the arithmetic unit 42 processes the received image data so that the resolution is reduced and displayed, and outputs the processed image data to the display unit 34-22.
[0049]
In step S35, when it is determined that the received data is not data transmitted from the communication center 1-2, the process proceeds to step S37, and it is determined whether the received data is data transmitted from the communication center 1-3. The If it is determined that the received data is data transmitted from the communication center 1-3, the process proceeds to step S38. In step S38, the angle d of the received angle information is expressed as follows: angle d <angle Th ₂ It is determined whether or not there is a relationship. Angle d is equal to angle d <angle Th ₂ In other words, if it is determined that the participant of the communication center 1-3 is facing the display device on which the participant 24 is projected, the process proceeds to step S33. The arithmetic device 42 outputs the received image data to the display unit 34-23 without applying any processing to the resolution or the like.
[0050]
On the other hand, in step S38, the angle d <angle Th ₂ If it is determined that the relationship is not, the process proceeds to step S34. In step S <b> 34, the arithmetic device 42 processes the received image data so that the resolution is reduced, and outputs the processed image data to the display unit 34-23.
[0051]
If it is determined in step S37 that the received data is not data transmitted from the communication center 1-3, the process proceeds to step S39. Proceeding to step S39 is a case where it is determined that the data is not transmitted from any of the communication centers 1-1 to 1-3, so that it is determined that an error has occurred and an error has occurred. Processing is performed. As the error processing, for example, received data is discarded.
[0052]
In the above description, the image displayed on the display unit 34 is controlled. However, the sound output from the speaker unit 33 may be controlled.
[0053]
In this way, it is possible to recognize a participant who is paying attention to himself / herself by transmitting and receiving image data and audio data together with information (angle information) of the display device that the participant is paying attention to.
[0054]
FIG. 12 is a flowchart for explaining another operation of the arithmetic unit 42 shown in FIG. In this flowchart, the description of the same processing as that of the flowchart of FIG. 6 is omitted.
[0055]
The process of step S41 and step S42 is complete | finished, and the arithmetic unit 42 makes angle Th in step S43. ₂ Is determined to be smaller than the angle d, the process proceeds to step S44. Angle Th ₂ Is determined to be smaller than the angle d, it is determined that the participant 24 is facing the left-side video camera 31-22. The image data output from the camera 31-21 and the right side video camera 31-23 is subjected to processing such as reducing the luminance or reducing the resolution and outputs the processed data to the transmission / reception device 43. At this time, when the arithmetic unit 42 sends the data output from the microphone 32 together with the data output from the front video camera 31-21 and the right side video camera 31-23, the audio data is reduced in volume. And may be output to the transmission / reception device 32.
[0056]
In step S45, the angle Th _Three Is determined to be smaller than the angle d, the process proceeds to step S46. In step S46, data output from the front video camera 31-21 and the left side video camera 31-22 is controlled and output to the transmission / reception device 43.
[0057]
In step S45, the angle Th _Three When it is determined that is equal to or greater than the angle d, the process proceeds to step S47. In step S <b> 47, data output from the left side video camera 31-21 and the right side video camera 31-23 is controlled and output to the transmission / reception device 43.
[0058]
In this manner, the data controlled and output to the transmission / reception device 43 is transmitted to the corresponding communication centers 1-1 to 1-3 in step S48.
[0059]
In this way, the amount of data to be transmitted can be reduced by controlling the data to be transmitted, and further, a realistic video conference can be performed by reproducing the received data. Become.
[0060]
In the above description, the angle detection device 41 is used to detect the orientation of the face of the participant 24. However, if the angle is associated with the image of the face of the participant 24 and stored in advance, the angle detection device 41 is used. The effect of can be obtained. In the configuration example illustrated in FIG. 13, a storage unit 51 is newly provided, and a front video camera 31-21, a transmission / reception device 43, and a storage unit 51 are connected to the arithmetic device 42.
[0061]
The participant 24 sits in front of the front video camera 31-21, pauses at fixed angles, and causes the front video camera 31-21 to capture an image at that time. The captured image is associated with the angle at which the image was captured and stored in the storage unit 51 via the arithmetic device 42. In this manner, the storage unit 51 stores the images of the faces of the participants 24 obtained at fixed angles.
[0062]
In such a configuration, when the volume output from the speaker unit 33 is controlled from the obtained angle, the speaker units 33-21 to 33- are connected to the arithmetic unit 42 as described with reference to FIG. 23 is connected. Further, in the case of controlling the video on the display unit 34 from the obtained angle, the display units 34-21 to 34-23 are connected to the computing device 42 as described with reference to FIG. 7. Further, when controlling the image data and audio data to be transmitted from the obtained angle, in addition to the front video camera 31-21, the left side video as well as the front video camera 31-21, as described with reference to FIG. A camera 31-22 and a right side video camera 31-23 are also connected, and a microphone 32 is also connected.
[0063]
The operation of the arithmetic unit 42 will be described with reference to the flowchart of FIG. In step S51, the arithmetic unit 42 stores the face image of the participant 24 obtained from the front video camera 31-21 in the storage unit 51 in association with the angle. When a predetermined number of images are stored, the process proceeds to step S52. In step S52, in order to determine the direction in which the face of the participant 24 is facing, first, an image of the face of the participant 24 is captured by the front video camera 31-21.
[0064]
The computing device 42 searches the image stored in the storage unit 51 for an image that approximates the captured image. As a search method, for example, a normalized correlation function is used. The normalized correlation function is shown below.
[Expression 1]

In this equation, R indicates the pixel value of the reference image stored in the storage unit 51 in step S51, and C indicates the pixel value of the image captured by the front video camera 31-21 in step S52. The subscript indicates the position of the pixel value of the image, and the horizontal line above the alphabet indicates the average value of the pixel value of the image.
[0065]
The closer the value obtained by Equation (1) is to 1, the higher the correlation is. Further, the degree of correlation may be obtained by using the average value of the pixel values for each image as a reference value as shown in Expression (1), or the average value of the pixel values for each image as shown in Expression (2) The degree of correlation may be obtained by an expression that is not used.
[Expression 2]

[0066]
In step S53, when an image that approximates the image stored in the storage unit 51 (an image with a high degree of correlation) is retrieved by the equation (1) or equation (2), the image is associated with the image in step S54. Angle information is acquired. As described above, since the reference image stored in the storage unit 51 is stored in association with the angle information, by searching for the angle associated with the reference image determined to have a high degree of correlation. Angle information can be obtained
[0067]
For example, when the arithmetic unit 42 is configured to control the speaker unit 33 as illustrated in FIG. 4 using the acquired angle information, the processing unit 42 performs the processing from step S3 onward in FIG. Similarly, as shown in FIG. 7, when the display unit 34 is controlled, the processing after step S13 in FIG. 8 is performed.
[0068]
In this way, even if the participant 24 does not wear the arithmetic device 41, the angle information can be obtained by calculating the degree of correlation with the image stored in advance.
[0069]
In the above-described embodiment, only the horizontal direction is detected, but the vertical direction may also be detected. In the case of detecting the vertical direction as well, for example, when the participant 24 faces down or above a predetermined angle, even if the face is moved in the horizontal direction, the angle is not detected. The data transmitted by the transmission / reception device 43 or the received data is transmitted / received as it is without controlling the volume or video. In this way, for example, when the participant 24 is looking down to see the document, even if the face is moved to the left or right, the audio and video will not change, so attention is not paid. However, inconveniences such as changes in audio and video can be solved.
[0070]
In this specification, the medium for providing a computer program for executing the above processing to the user includes not only an information recording medium such as a magnetic disk and a CD-ROM, but also a transmission medium via a network such as the Internet and a digital satellite. .
[0071]
【The invention's effect】
As above According to the present invention, It is possible to provide a realistic video conference.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating a configuration example of a video conference system to which an information processing apparatus of the present invention is applied.
FIG. 2 is a diagram illustrating a state of a conference room of each communication center in the video conference system.
3 is a diagram showing an arrangement state of display devices in the communication center in FIG. 1. FIG.
FIG. 4 is a block diagram showing a configuration of an apparatus for detecting the orientation of a participant.
FIG. 5 is a diagram illustrating the arrangement of video cameras.
6 is a flowchart for explaining the operation of the arithmetic unit shown in FIG. 4;
FIG. 7 is a block diagram showing another configuration of an apparatus for detecting the orientation of a participant.
8 is a flowchart for explaining the operation of the arithmetic device shown in FIG. 7;
FIG. 9 is a block diagram showing still another configuration of an apparatus for detecting the orientation of a participant.
10 is a flowchart for explaining the operation of the arithmetic device shown in FIG. 9;
11 is a flowchart for explaining the operation of the arithmetic device that has received the information transmitted from the transmission / reception device of FIG. 9;
12 is a flowchart for explaining another operation of the arithmetic unit shown in FIG. 9;
FIG. 13 is a block diagram showing still another configuration of an apparatus for detecting the orientation of a participant.
14 is a flowchart for explaining the operation of the arithmetic unit shown in FIG. 13;
[Explanation of symbols]
1-1 to 1-6 Communication Center, 2 Networks, 21-23 Display Device, 35-21 Front Video Camera, 35-22 Left Side Video Camera, 35-23 Right Side Video Camera, 36-21 to 36-23 Microphone , 37-21 to 37-23 display unit, 38-21 to 38-23 speaker unit, 41 angle detection device, 42 arithmetic unit, 43 transmission / reception device, 51 storage unit

Claims

In an information processing apparatus that exchanges data with other information processing apparatuses via a network,
Imaging means for capturing an image of a subject;
Capturing means for capturing audio generated by the subject;
Detecting means for detecting an angle of the subject;
Transmitting means for transmitting respective data of the image captured by the imaging means, the sound captured by the capturing means, and the angle detected by the detecting means to the other information processing apparatus;
Image data of an image obtained by imaging another subject transmitted from the other information processing device , audio data of sound emitted from the other subject , and angle data indicating a direction in which the other subject is facing are received. Receiving means;
When it is determined that the angle indicated by the angle data received by the receiving unit satisfies a predetermined condition, and thus it is determined that the other subject is not facing the direction of the subject , the receiving unit At least one of the control for reducing the resolution of the image based on the image data received in the display and the control for reducing the volume level of the sound based on the audio data received by the receiving means An information processing apparatus comprising: control means for performing

In an information processing method of an information processing apparatus that exchanges data with another information processing apparatus via a network,
An imaging step for capturing an image of a subject;
A capturing step for capturing audio generated by the subject;
A detection step of detecting an angle of the subject;
A transmission step of transmitting each data of the image captured in the imaging step, the sound captured in the capture step, and the angle detected in the detection step to the other information processing device;
Image data of an image obtained by imaging another subject transmitted from the other information processing device , audio data of sound emitted from the other subject , and angle data indicating a direction in which the other subject is facing are received. Receiving step;
If it is determined that the angle indicated by the angle data received in the receiving step satisfies a predetermined condition, and it is determined that the other subject is not facing the subject , the receiving step Control for reducing the resolution of the image based on the image data received in the process of the display, or for reducing the volume level of the sound based on the audio data received in the process of the reception step A control step for performing at least one of the information processing methods.

To an information processing device that exchanges data with other information processing devices via a network,
An imaging step for capturing an image of a subject;
A capturing step for capturing audio generated by the subject;
A detection step of detecting an angle of the subject;
A transmission step of transmitting each data of the image captured in the imaging step, the sound captured in the capture step, and the angle detected in the detection step to the other information processing device;
Image data of an image obtained by imaging another subject transmitted from the other information processing device , audio data of sound emitted from the other subject , and angle data indicating a direction in which the other subject is facing are received. Receiving step;
If it is determined that the angle indicated by the angle data received in the receiving step satisfies a predetermined condition, and it is determined that the other subject is not facing the subject , the receiving step Control for reducing the resolution of the image based on the image data received in the process of the display, or for reducing the volume level of the sound based on the audio data received in the process of the reception step A recording medium storing a computer-readable program that executes a process including a control step that performs at least one of them.