JP4055539B2

JP4055539B2 - Interactive communication system

Info

Publication number: JP4055539B2
Application number: JP2002292858A
Authority: JP
Inventors: 隆郎福井
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2002-10-04
Filing date: 2002-10-04
Publication date: 2008-03-05
Anticipated expiration: 2022-10-04
Also published as: JP2004129071A

Description

【０００１】
【発明の属する技術分野】
本発明は、例えばテレビ会議システムのような双方向コミュニケーションシステムに関する。
【０００２】
【従来の技術】
互いに離れた複数の地点の間において、個々の地点で撮影した映像をそれぞれネットワーク経由で残りの地点に送信することにより、遠隔地にいる人同士が相手の顔を見ながら双方向にコミュニケーションを行えるようにしたシステムが普及しつつある。そうした双方向コミュニケーションシステムの例としては、テレビ会議システムが挙げられる。
【０００３】
図１は、従来のテレビ会議システムにおける映像送受信用の機器や回路を示すブロック図である。地点Ａの会議室内に、ビデオカメラ５１，ＡＤコンバータ５２，映像コーデック５３，ネットワーク通信用の通信インタフェース５４，ＤＡコンバータ５５及びモニター５６が設けられる。
【０００４】
地点Ａとは離れた地点Ｂの会議室内にも、同じくビデオカメラ５１，ＡＤコンバータ５２，映像コーデック５３，通信インタフェース５４，ＤＡコンバータ５５及びモニター５６が設けられる。
【０００５】
地点Ａにおいて、ビデオカメラ５１で撮影された映像（会議参加者の映像）は、ＡＤコンバータ５２でデジタル変換され、映像コーデック５３で符号化（圧縮）されて、通信インタフェース５４からネットワーク１０１経由で地点Ｂの通信インタフェース５４に送信される。そして、地点Ｂにおいて、通信インタフェース５４で受信した映像が、映像コーデック５３で復号（伸長）され、ＤＡコンバータ５５でアナログ変換されて、モニター５６に表示される。
【０００６】
地点Ｂからも全く同様にして地点Ａに映像が送信され、地点Ａにおいてモニター５６にその映像が表示される。
【０００７】
図２及び図３は、図１のモニター５６に表示される映像を例示する図である。図２のように、遠隔地（地点Ａにとっては地点Ｂ、地点Ｂにとっては地点Ａ）の会議参加者全員の映像がモニター５６に表示されたり、あるいは、遠隔地のビデオカメラ５１のズーム量やパンチルト量が制御される（図１ではビデオカメラ５１の制御系は図示を省略している）ことにより、図３のように、会議参加者のうち現在発言中の１人の映像がモニター５６にアップで表示される（このように、テレビ会議システムにおいて遠隔地の会議参加者の映像がモニターに表示されることについては、例えば特許文献１及び２参照。）。
【０００８】
【特許文献１】
特開２００２−２３７９９１号公報（段落番号０００７、図１）
【特許文献２】
特開２００２−１７１４９９号公報（段落番号０００２〜０００４）
【０００９】
【発明が解決しようとする課題】
しかし、こうした従来のテレビ会議システムでは、遠隔地の会議参加者の中に面識のない人が含まれている場合、モニターを見てもその人がどんな人物であるか（どんな肩書きで何という名前の人か）を把握することができなかった。
【００１０】
また、遠隔地の会議参加予定者が全員面識のある人であっても、参加人数が多い場合には、モニターを見ても遠隔地において実際に誰々が会議に参加しているのかを把握しにいことが少なくなかった（図２のように遠隔地の会議参加者全員の映像がモニターに表示される際には、映像コーデックでの圧縮率の関係で個々の参加者の顔があまり鮮明でなかったりするので、実際に誰々が会議に参加しているのかを把握しにくいし、図３のように現在発言中の１人の映像がモニターに表示される際には、残りの参加者は表示されないのでやはり実際に誰々が会議に参加しているのかを把握しにくい）。
【００１１】
本発明は、上述の点に鑑み、テレビ会議システムのような双方向コミュニケーションシステムにおいて、モニターを見ることにより遠隔地の参加者を容易に把握できるようにすることを課題としてなされたものである。
【００１２】
【課題を解決するための手段】
この課題を解決するために、本出願人は、互いに離れた複数の地点の間において、個々の地点で撮影された映像がそれぞれネットワーク経由で残りの地点に送信される双方向コミュニケーションシステムにおいて、双方向コミュニケーションへの参加予定者を識別するための顔特徴データと、この参加予定者の属性を示す属性情報とを記憶したデータベースと、個々の地点で撮影された映像から、このデータベースに記憶されている顔特徴データを用いて顔認識を行うことによって双方向コミュニケーションへの実際の参加者を認証する第１の処理、第１の処理で認証した各参加者について、顔認識において検出したその参加者の顔領域の位置をその参加者の位置として検出する第２の処理、今回の第１の処理で新たな参加者が認証された場合、今回の第２の処理で検出したいずれかの参加者の位置が前回の第２の処理で検出した位置から変化した場合、前回の第２の処理で位置が検出されたいずれかの参加者が今回の第２の処理で位置検出されなかった場合、または前回の第２の処理で位置が検出されなかった参加者が今回の第２の処理で位置検出された場合に、個々の地点で撮影された映像のうち、今回の第２の処理で検出した各参加者の位置の近傍であって他の参加者とは重ならない位置に、このデータベースに記憶されているその参加者の属性情報を視覚的に表示するためのデータを付加する第３の処理、を繰り返し実行する処理手段とを備えたものを提案する。
【００１３】
この双方向コミュニケーションシステムでは、双方向コミュニケーションへの参加予定者を識別するための顔特徴データと、この参加予定者の属性を示す属性情報とがデータベースに記憶されている。そして、個々の地点において、処理手段が、次の第１〜第３の処理を繰り返し実行する。
その地点で撮影された映像から、このデータベースに記憶されている顔特徴データを用いて顔認識を行うことによって双方向コミュニケーションへの実際の参加者を認証する第１の処理。
第１の処理で認証した各参加者について、顔認識において検出したその参加者の顔領域の位置をその参加者の位置として検出する第２の処理。
今回の第１の処理で新たな参加者が認証された場合、今回の第２の処理で検出したいずれかの参加者の位置が前回の第２の処理で検出した位置から変化した場合、前回の第２の処理で位置が検出されたいずれかの参加者が今回の第２の処理で位置検出されなかった場合、または前回の第２の処理で位置が検出されなかった参加者が今回の第２の処理で位置検出された場合に、その地点で撮影された映像のうち、今回の第２の処理で検出した各参加者の位置の近傍であって他の参加者とは重ならない位置に、このデータベースに記憶されているその参加者の属性情報を視覚的に表示するためのデータを付加する第３の処理。
【００１４】
その結果、その地点からは、ビデオカメラのズーム量・パンチルト量が変化したり、参加者の席の移動や退席があったり、途中から参加した者がいたりしても、常に、その地点で撮影された映像であって、現在表示されている各参加者の顔の近傍にその参加者の属性情報を視覚的に表示するためのデータを付加したものが、残りの地点に送信される。
【００１５】
これにより、その地点において、ビデオカメラのズーム量・パンチルト量が変化したり、参加者の席の移動や退席があったり、途中から参加した者がいたりしても、残りの地点のモニターには、常に、現在表示されている各参加者（遠隔地の参加者）の顔の近傍にその参加者の属性情報が表示される。
【００１６】
このように、遠隔地における双方向コミュニケーションへの各参加者の属性情報がその参加者の映像とともにモニターに表示されるので、モニターを見ただけで遠隔地の参加者を容易に把握することができるようになる。
【００１８】
また、参加者に認証や位置検出のための専用の媒体（認証用のＩＤカードや、位置検出用の電波または赤外線の発生器等）を所持させたり、その媒体から認証や位置検出のための情報を取得する機器（ＩＤカードの読取装置や、電波または赤外線の受信装置等）を設けたりすることなく、参加者の認証や位置検出を行うことができるようになる。
【００１９】
なお、この双方向コミュニケーションシステムをテレビ会議システムに適用するような場合には、一例として、データベースに、属性情報として肩書き及び名前の情報を記憶させ、処理手段を、第３の処理において、この肩書き及び名前のキャラクタデータを付加するように構成することが好適である。
【００２０】
それにより、テレビ会議を行うような際に、遠隔地の会議参加者の中に面識のない人が含まれている場合にも、モニターを見ただけでその人がどんな人物であるか（どんな肩書きで何という名前の人か）を容易に把握できるようになり、また遠隔地の会議参加予定者が全員面識のある人であるが参加人数が多い場合にも、モニターを見ただけで実際に誰々が会議に参加しているのかを容易に把握できるようになる。
【００２１】
【発明の実施の形態】
以下、互いに離れた２つの地点を結ぶテレビ会議システムに本発明を適用した例について、図面を用いて説明する。
【００２２】
図４は、本発明を適用したテレビ会議システムにおける各地点（地点Ａ，Ｂ）の会議室を、会議参加者の視点で示す図である。会議参加者はテーブル４の手前に着席するようになっており、テーブル４の向こう側には、テレビ会議装置１及びスピーカ内蔵型のモニター２が正面に設置されるとともに、パーソナルコンピュータ（以下単にコンピュータと呼ぶ）３が設置されている。
【００２３】
テレビ会議装置１は、ビデオカメラ，マイクロホン，コーデック，ネットワーク通信用の通信インタフェース等が一体となった装置であり、ビデオカメラ及びマイクロホンは装置の正面方向（テーブル４の方向）に向けられている。
【００２４】
テレビ会議装置１の映像出力端子，音声出力端子は、それぞれモニター２の映像入力端子，音声入力端子にケーブルで接続されている。また、テレビ会議装置１の別の映像出力端子がコンピュータ３の映像入力端子にケーブルで接続されるとともに、テレビ会議装置１の映像入力端子がコンピュータ３の映像出力端子にケーブルで接続されている。
【００２５】
図５は、テレビ会議装置１の構成を、テレビ会議システムの全体構成とともに示すブロック図である。テレビ会議装置１は、ビデオカメラ１１，ミキシング回路１２，ＡＤコンバータ１３，映像コーデック１４，通信インタフェース１５，ＤＡコンバータ１６，マイクロホン１７，ＡＤコンバータ１８，エコーキャンセラ１９，音声コーデック２０，ＤＡコンバータ２１，ＣＰＵ２２，カメラコントローラ２３を含んでいる。
【００２６】
地点Ａにおいて、テレビ会議装置１のビデオカメラ１１から出力した映像信号は、テレビ会議装置１からコンピュータ３に送られるとともに、ミキシング回路１２に送られる。ミキシング回路１２では、ビデオカメラ１１からの映像信号と、コンピュータ３からテレビ会議装置１に送られた映像信号とがミキシングされる。ミキシング回路１２から出力した映像信号は、ＡＤコンバータ１３でデジタル変換され、映像コーデック１４で符号化（圧縮）され、通信インタフェース１５からネットワーク１０１経由で地点Ｂのテレビ会議装置１の通信インタフェース１５に送信される。そして、地点Ｂにおいて、テレビ会議装置１の通信インタフェース１５で受信した映像データが、映像コーデック１４で復号（伸長）され、ＤＡコンバータ１６でアナログ変換され、テレビ会議装置１からモニター２に送られてモニター２に表示される。
【００２７】
また、地点Ａにおいて、テレビ会議装置１のマイクロホン１７から出力した音声信号は、ＡＤコンバータ１８でデジタル変換され、エコーキャンセラ１９を経て音声コーデック２０で符号化され、通信インタフェース１５からネットワーク１０１経由で地点Ｂのテレビ会議装置１の通信インタフェース１５に送信される。そして、地点Ｂにおいて、テレビ会議装置１の通信インタフェース１５で受信した音声データが、音声コーデック１４で復号され、エコーキャンセラ１９を経てＤＡコンバータ２１でアナログ変換され、テレビ会議装置１からモニター２に送られてモニター２の内蔵スピーカで再生される。
【００２８】
地点Ｂからも全く同様にして地点Ａに映像，音声が送信され、地点Ａにおいてモニター２にその映像が表示されるとともにモニター２の内蔵スピーカでその音声が再生される。
【００２９】
カメラコントローラ２３は、ＣＰＵ２２の制御のもとで、ビデオカメラ１１のズーム量やパンチルト量を調整する。ＣＰＵ２２は、自分の地点のリモートコントローラ（図示略）のズームやパンチルト用の操作キーの操作による信号や、あるいはネットワーク１０１経由で相手の地点のテレビ会議装置１から送信された制御データに基づいて、カメラコントローラ２３に制御信号を与える。
【００３０】
コンピュータ３内には、会議への参加予定者１人１人についてのデータを登録するためのデータベースが格納されている。図６は、このデータベースに登録されるデータを示す。１人１人の参加予定者Ａ，Ｂ，Ｃ…，について、顔認識を行うための顔特徴データと、「○○部○○課長」というような肩書き及び名前を示す肩書き・名前データとがそれぞれ登録される。
【００３１】
また、コンピュータ３には、図７に示すような処理を実行するためのプログラムが格納されている。この処理では、最初に、テレビ会議装置１からコンピュータ３に入力した映像信号から、データベース内の顔特徴データ（図６）を用いて顔認識を行うことにより、会議への実際の参加者を認証する（ステップＳ１）。
【００３２】
この顔認識は、次の（ａ）〜（ｄ）のような過程の既存の顔認識処理によって行う。
（ａ）入力映像からの顔領域の検出
（ｂ）顔領域の検出結果に基づく入力映像からの顔領域の切り出しと、切り出した顔領域の大きさや輝度等のばらつきの正規化
（ｃ）顔領域からの顔特徴の抽出
（ｄ）抽出した顔特徴と、データベース内の顔特徴データとの照合
【００３３】
続いて、テレビ会議装置１から入力した映像信号から、いままでステップＳ１で認証済みの各参加者の画面内（１フレーム分の映像内）の位置を検出する（ステップＳ２）。この位置検出は、認証済みの各参加者について今回のステップＳ１の前述の（ａ）の処理で検出した顔領域の位置をそのままその参加者の位置として決定するという方法で行う。
【００３４】
続いて、まだ認証済みでない新たな参加者が今回のステップＳ１で認証されたか否かを判断する（ステップＳ３）。
【００３５】
イエスであれば、データベース内の肩書き・名前データを用いて、今回のステップＳ２で検出した各参加者の位置の近傍であって他の参加者とは重ならない位置にその参加者の肩書き及び名前を表示させるキャラクタデータを生成する（ステップＳ４）。そして、そのキャラクタデータをコンピュータ３からテレビ会議装置１に送る（ステップＳ５）。そして、ステップＳ１に戻ってステップＳ１以下を繰り返す。
【００３６】
ステップＳ３でノーであった場合には、今回のステップＳ２で検出されたいずれかの参加者の位置が、前回のステップＳ２で検出した位置から変化したり、あるいは前回位置検出された参加者が今回位置検出されなかったり、前回位置検出されなかった参加者が今回位置検出されたりしているか否かを判断する（ステップＳ６）。
【００３７】
イエスであれば、ステップＳ４に進む。他方ノーであれば、ステップＳ１に戻ってステップＳ１以下を繰り返す。
【００３８】
次に、このテレビ会議システムを用いた会議の様子について説明する。例えば、地点Ａの会議参加予定者が、或る事業所の次の７名であったとする。
・木下所長
・営業１部：佐藤部長・山田課長
・営業２部：鈴木課長・上田係長
・設計１部：吉田課長
・設計２部：田中課長
【００３９】
地点Ａのコンピュータ３内のデータベース（図６）には、会議開始前に、上記７名の参加予定者についての顔特徴データ及び肩書き・名前データをそれぞれ登録しておく。
【００４０】
会議の開始時刻になり、地点Ａでは、上記７名の参加予定者のうちの営業２部鈴木課長が急用で参加できなくなったが残りの６名が予定通り会議に参加したとする。
【００４１】
会議が始まり、地点Ａにおいて、テレビ会議装置１で、まずこの６名全員を画角に収めるようにカメラコントローラ２３（図５）でビデオカメラ１１（図５）のズーム量やパンチルト量をコントロールして撮影が行われたとする。すると、その映像が、テレビ会議装置１からネットワーク１０１経由で地点Ｂのテレビ会議装置に送信されるとともに、テレビ会議装置１からコンピュータ３に送られる。
【００４２】
そして、コンピュータ３で図７の処理のステップＳ１〜Ｓ５が実行されることにより、画面内のこの６名のそれぞれの位置の近傍にその人物の肩書き及び名前を表示させるキャラクタデータが、コンピュータ３からテレビ会議装置１に送られる。
【００４３】
そして、このキャラクタデータが、テレビ会議装置１内のミキシング回路１２（図５）で、ビデオカメラ１１からの映像とミキシングされる。その結果、テレビ会議装置１からは、このキャラクタデータを付加した映像が地点Ｂのテレビ会議装置に送信される。
【００４４】
これにより、地点Ｂのモニター２には、地点Ａのこの６名の参加者の映像が表示されるだけでなく、この６名の参加者１人１人の顔の近傍にその参加者の肩書き及び名前のテロップが表示される。図８は、このとき地点Ｂのモニター２に表示される映像を例示する図である。
【００４５】
地点Ｂの会議参加者にとって、地点Ａの会議参加者のうち木下所長，設計１課吉田課長の２名とは面識がなく、残りの５名とは面識があるとする。地点Ｂの会議参加者は、木下所長や設計１課吉田課長とは面識がないにもかかわらず、モニター２に表示されるこのテロップを見ただけで、図８の例では、画面の左端の人物が設計１課の吉田課長であって画面の左から３番目の人物が木下所長であるということを容易に把握できる。
【００４６】
また、地点Ｂの会議参加者は、地点Ａの参加予定者が面識のある５名を含めて合計７名であることを予め知らされていたが、営業２部鈴木課長に急用ができたことまでは知らされていなかったとする。地点Ｂの会議参加者は、テレビ会議装置１の映像コーデック１４（図５）での圧縮率の関係でモニター２に人物の顔があまり鮮明に表示されないような場合でも、モニター２に表示されるこのテロップを見ただけで、地点Ａにおいて実際に誰々が会議に参加しているのか（営業２部鈴木課長が参加しておらず、残りの６名が参加していること）を容易に把握することができる。
【００４７】
続いて、木下所長が発言を始め、地点Ａにおいて、テレビ会議装置１で、木下所長をアップにするようにカメラコントローラ２３（図５）でビデオカメラ１１（図５）のズーム量やパンチルト量をコントロールして撮影が行われたとする。
【００４８】
すると、その映像に基づき、コンピュータ３では、図７の処理のステップＳ１〜Ｓ３が実行され、新たな参加者は認証されていない（今回認証されたのは木下所長のみであり、木下所長は前回も認証されている）のでステップＳ３でノーとなってステップＳ６に進み、画面内での木下所長の位置が変化している（前回とは顔領域の大きさが違う）とともに残りの参加者が位置検出されないのでステップＳ６でイエスとなってステップＳ４及びＳ５が実行される。したがって、今度は、画面内の木下所長の位置の近傍にその肩書き及び名前を表示させるキャラクタデータが、コンピュータ３からテレビ会議装置１に送られる。
【００４９】
これにより、地点Ｂのモニター２には、今度は木下所長の映像がアップで表示されるとともに、木下所長の顔の近傍にその肩書き及び名前のテロップが表示される。図９は、このとき地点Ｂのモニター２に表示される映像を例示する図である。
【００５０】
また、例えば、地点Ａにおいて、テレビ会議装置１で参加者６名全員を画角に収めて撮影が行われている最中に、参加者が席を移動したり、一部の参加者が退席したりしたとする。
【００５１】
すると、やはりコンピュータ３で図７の処理のステップＳ１〜Ｓ３，Ｓ６，Ｓ４，Ｓ５が実行されるので、地点Ｂのモニター２には、移動後後の各参加者の顔の近傍や退席者を除く残りの参加者の顔の近傍にその参加者の肩書き及び名前のテロップが表示される。
【００５２】
また、例えば、地点Ａにおいて、急用ができた営業２部鈴木課長が用件を済ませて会議に途中から参加し、テレビ会議装置１で営業２部鈴木課長も画角に収めて撮影が行われたとする。すると、コンピュータ３では、図７の処理のステップＳ１〜Ｓ３が実行され、新たな参加者（営業２部鈴木課長）が認証されたのでステップＳ３でイエスとなってステップＳ４及びＳ５が実行される。したがって、今度は、画面内の営業２部鈴木課長を含む各参加者の位置の近傍にその肩書き及び名前を表示させるキャラクタデータが、コンピュータ３からテレビ会議装置１に送られる。
【００５３】
これにより、地点Ｂのモニター２には、途中から会議に参加した営業２部鈴木課長の顔の近傍にもその肩書き及び名前のテロップが表示される。
【００５４】
このようにして、地点Ａにおいて、テレビ会議装置１のビデオカメラ１１のズーム量・パンチルト量が変化したり、参加者の席の移動や退席があったり、途中から会議に参加した者がいたりしても、地点Ｂのモニター２には、現在表示されている各参加者の顔の近傍に、常にその参加者の肩書き及び名前のテロップが表示される。
【００５５】
ここでは地点Ｂの会議参加者からみた会議の様子（地点Ｂのモニター２の表示）を説明したが、地点Ａのモニター２にも、全く同様にして、現在表示されている地点Ｂの各参加者の顔の近傍に、常にその参加者の肩書き及び名前のテロップが表示される。
【００５６】
以上のようにして、このテレビ会議システムによれば、遠隔地（地点Ａにとっては地点Ｂ、地点Ｂにとっては地点Ａ）における各会議参加者の肩書き及び名前がその参加者の映像とともにモニターに表示されるので、モニターを見ただけで遠隔地の参加者を容易に把握することができる。
【００５７】
そして、顔認識処理に基づいて参加者の認証及び位置検出を行うので、参加者に認証や位置検出のための専用の媒体（認証用のＩＤカードや、位置検出用の電波または赤外線の発生器等）を所持させたり、その媒体から認証や位置検出のための情報を取得する機器（ＩＤカードの読取装置や、電波または赤外線の受信装置等）を会議室内に設けたりすることなく、参加者の認証や位置検出を行うことができる。
【００５８】
なお、以上の例では、参加者の認証及び位置検出を、顔認識処理に基づいて行っている。しかし、別の例として、参加者認証を音声認識処理，指紋認識処理，網膜認識処理等の個人認識処理によって行ったり、参加者の位置検出を音声認識処理によって行うようにしてもよい。
【００５９】
あるいは、必要に応じて、参加者に認証や位置検出のための専用の媒体（認証用のＩＤカードや、位置検出用の電波または赤外線の発生器等）を所持させ、その媒体から認証や位置検出のための情報を取得する機器（ＩＤカードの読取装置や、電波または赤外線の受信装置等）を会議室内に設けるようにしてもよい。
【００６０】
あるいはまた、例えば図４のテーブル４上の所定の位置に参加予定の人数分の台数のマイクロホンが配置される（それらのマイクロホンからテレビ会議装置１に音声信号が送られる）ような場合には、例えば、各マイクロホンに、そのマイクロホンを使用する参加者のＩＤ情報をコンピュータ３に送るための操作器や回路を組み込むとともに、コンピュータ３に各マイクロホン３の位置を記憶させておき、そのＩＤ情報に基づいて参加者認証を行うとともに、そのＩＤ情報を送ったマイクロホンの位置をその認証した参加者の位置として決定するようにしてもよい。
【００６１】
また、以上の例では、地点Ａ，地点Ｂにそれぞれ１台ずつコンピュータ３を設けている。しかし、別の例として、地点Ａ，地点Ｂのうちの１つの地点（あるいはネットワーク１０１経由で地点Ａ，地点Ｂに結ばれた別の１つの地点）に１台だけコンピュータ３を設け、そのコンピュータ３に、地点Ａ及び地点Ｂの両方の参加予定者についての顔特徴データ及び肩書き・名前データをデータベースに登録させて、この両方の参加者について図７の処理を実行させるようにしてもよい。
【００６２】
また、以上の例では、テレビ会議装置１とは別に、データベースを格納して図７の処理を実行するコンピュータ３を設けている。しかし、別の例として、テレビ会議装置１そのものを、データベースを格納して図７の処理を実行するように構成してもよい。
【００６３】
また、以上の例では、コンピュータ３内のデータベースに肩書き・名前データを登録する（図６）ことにより、モニター２に表示される参加者の顔の近傍に、その参加者の肩書き及び名前のテロップが表示されるようにしている。しかし、これに限らず、参加予定者の適宜の属性を示す属性情報をこのデータベースに登録することにより、モニター２に表示される参加者の顔の近傍にその属性情報が表示されるようにしてよい。一つの例としては、肩書き・名前データに加え、あるいは肩書き・名前データに代えて、その参加予定者の過去の会議での主張（或るプロジェクトに賛成か反対かの見解等）を要約したデータをこのデータベースに登録することにより、モニター２に表示される参加者の顔の近傍に、そうした主張の要約のテロップも表示されるようにすることが考えられる。
【００６４】
また、以上の例では、地点Ａ，地点Ｂという２地点を結ぶテレビ会議システムに本発明を適用している。しかし、これに限らず、３地点以上を結ぶテレビ会議システムや、テレビ会議システム以外の適宜の双方向コミュニケーションシステムにも本発明を適用してよい。
【００６５】
エンターテイメント系の双方向コミュニケーションシステムに本発明を適用する場合には、例えば参加予定者の好きなアニメーションの画像データを属性情報としてデータベースに登録することにより、モニターに表示される参加者の顔の近傍にそのアニメーションの画像が表示されるようにしたり、モニターに表示される参加者の顔の上にそのアニメーションの画像が表示されるようにしてもよい。
【００６６】
また、参加予定者のうちモニターに顔を表示することが好ましくない人物がいるような双方向コミュニケーションシステムに本発明を適用する場合には、その人物についての属性情報としてモザイクをかけることを指示する情報をデータベースに登録することにより、モニターに表示されるその人物の顔にモザイクがかかるようにしてもよい。
【００６７】
また、本発明は、以上の例に限らず、本発明の要旨を逸脱することなく、その他様々の構成をとりうることはもちろんである。
【００６８】
【発明の効果】
以上のように、本発明に係る双方向コミュニケーションシステムによれば、いずれかの地点において、ビデオカメラのズーム量・パンチルト量が変化したり、参加者の席の移動や退席があったり、途中から参加した者がいたりしても、残りの地点のモニターには、常に、現在表示されている各参加者（遠隔地の参加者）の顔の近傍にその参加者の属性情報が表示される。このように、遠隔地における双方向コミュニケーションへの各参加者の属性情報がその参加者の映像とともにモニターに表示されるので、モニターを見ただけで遠隔地の参加者を容易に把握することができるという効果が得られる。
【００６９】
また、参加者に認証や位置検出のための専用の媒体を所持させたり、その媒体から認証や位置検出のための情報を取得する機器を設けたりすることなく、参加者の認証や位置検出を行うことができるという効果が得られる。
【００７０】
また、テレビ会議を行う際に、遠隔地の会議参加者の中に面識のない人が含まれている場合にも、モニターを見ただけでその人がどんな人物であるか（どんな肩書きで何という名前の人か）を容易に把握でき、遠隔地の会議参加予定者が全員面識のある人であるが参加人数が多い場合にも、モニターを見ただけで実際に誰々が会議に参加しているのかを容易に把握できるという効果が得られる。
【図面の簡単な説明】
【図１】従来のテレビ会議システムにおける映像送受信用の機器を示す図である。
【図２】図１のモニターに表示される映像を例示する図である。
【図３】図１のモニターに表示される映像を例示する図である。
【図４】本発明を適用したテレビ会議システムにおける会議室を示す図である。
【図５】図４のテレビ会議装置の構成を示すブロック図である。
【図６】図４のコンピュータ内のデータベースを示す図である。
【図７】図４のコンピュータが実行する処理を示すフローチャートである。
【図８】図４のモニターに表示される映像を例示する図である。
【図９】図４のモニターに表示される映像を例示する図である。
【符号の説明】
１テレビ会議装置、２モニター、３コンピュータ、４テーブル、１１ビデオカメラ、１２ミキシング回路、１３，１８ＡＤコンバータ、１４映像コーデック、１５通信インタフェース、１６，２１ＤＡコンバータ、１７マイクロホン、１９エコーキャンセラ、２０音声コーデック、１０１ネットワーク[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an interactive communication system such as a video conference system.
[0002]
[Prior art]
By sending video shot at each point to the remaining points via a network between multiple points far from each other, people in remote locations can communicate with each other while looking at the other's face Such a system is spreading. An example of such a two-way communication system is a video conference system.
[0003]
FIG. 1 is a block diagram showing devices and circuits for video transmission / reception in a conventional video conference system. In the meeting room at the point A, a video camera 51, an AD converter 52, a video codec 53, a communication interface 54 for network communication, a DA converter 55, and a monitor 56 are provided.
[0004]
Similarly, a video camera 51, an AD converter 52, a video codec 53, a communication interface 54, a DA converter 55, and a monitor 56 are also provided in the conference room at a point B away from the point A.
[0005]
At point A, the video (conference participant's video) captured by the video camera 51 is digitally converted by the AD converter 52, encoded (compressed) by the video codec 53, and transmitted from the communication interface 54 via the network 101. B communication interface 54. At point B, the video received by the communication interface 54 is decoded (expanded) by the video codec 53, converted to analog by the DA converter 55, and displayed on the monitor 56.
[0006]
The video is transmitted from the point B to the point A in the same manner, and the video is displayed on the monitor 56 at the point A.
[0007]
2 and 3 are diagrams illustrating an image displayed on the monitor 56 of FIG. As shown in FIG. 2, images of all conference participants in a remote place (point B for point A and point A for point B) are displayed on the monitor 56, or the zoom amount of the video camera 51 in the remote place By controlling the pan / tilt amount (the control system of the video camera 51 is not shown in FIG. 1), the video of one of the conference participants who is currently speaking is displayed on the monitor 56 as shown in FIG. (For example, refer to Patent Documents 1 and 2 for displaying images of remote conference participants on the monitor in the video conference system.)
[0008]
[Patent Document 1]
JP 2002-233791 A (paragraph number 0007, FIG. 1)
[Patent Document 2]
JP 2002-171499 A (paragraph numbers 0002 to 0004)
[0009]
[Problems to be solved by the invention]
However, in these conventional video conferencing systems, if a remote conference participant includes an unacquainted person, what kind of person the person is on the monitor (what title and what name) I couldn't figure out.
[0010]
In addition, even if everyone who is planning to attend a conference at a remote location is familiar, even if there are a large number of participants, it is possible to know who is actually participating in the conference at a remote location by looking at the monitor. There were not many cases (when the video of all the participants in the remote place was displayed on the monitor as shown in Fig. 2, the faces of the individual participants were not so good due to the compression rate of the video codec. Since it is not clear, it is difficult to know who is actually participating in the conference, and when the video of one person who is currently speaking is displayed on the monitor as shown in FIG. Participants are not displayed, so it's still difficult to see who is actually participating in the meeting.)
[0011]
SUMMARY OF THE INVENTION In view of the above, the present invention has been made in an interactive communication system such as a video conference system so that a remote participant can be easily grasped by looking at a monitor.
[0012]
[Means for Solving the Problems]
  In order to solve this problem, the applicant of the present invention, in a two-way communication system in which videos taken at individual points are transmitted to the remaining points via a network between a plurality of points separated from each other. To identify prospective participantsFace feature dataAnd a database storing attribute information indicating attributes of the prospective participants,By performing face recognition from the video taken at each point using the facial feature data stored in this databaseAuthenticating actual participants in two-way communicationFirst process,For each participant authenticated in the first process, a second process for detecting the position of the face area of the participant detected in face recognition as the position of the participant, a new participant in the first process this time If the position of any participant detected in the second process of this time has changed from the position detected in the previous second process, the position has been detected in the previous second process. When any participant has not been detected in the second process of this time, or when a participant whose position has not been detected in the previous second process has been detected in the second process of this time ,Of the images taken at individual points,A position that is in the vicinity of the position of each participant detected in the second processing this time and does not overlap with other participantsIs added with data for visually displaying the attribute information of the participant stored in the database.Processing means for repeatedly executing the third processingWe propose something with
[0013]
  This two-way communication system is used to identify participants who plan to participate in two-way communication.Face feature dataAnd attribute information indicating the attributes of the prospective participants are stored in the database.Then, at each point, the processing means repeatedly executes the following first to third processes.
  By performing face recognition using the face feature data stored in this database from the video taken at that pointAuthenticating actual participants in two-way communicationFirst process.
  A second process for detecting the position of the face area of the participant detected in the face recognition as the position of the participant for each participant authenticated in the first process.
  If a new participant is authenticated in the first process of this time, if the position of any participant detected in the second process of this time has changed from the position detected in the second process of the previous time, If any participant whose position was detected in the second process is not detected in the current second process, or a participant whose position was not detected in the previous second process is When the position is detected in the second process, among the images taken at that point, a position that is in the vicinity of the position of each participant detected in the second process and does not overlap with other participants And a third process for adding data for visually displaying the attribute information of the participant stored in the database.
[0014]
  SoAs a result, from that point,Even if the zoom / pan / tilt amount of the video camera changes, the participant's seat moves or leaves, or there is a participant who joined from the middle,A video shot at that point,Near the face of each participant currently displayedData to which the attribute information of the participant is visually displayed is added to the remaining points.
[0015]
  ThisAt that point, even if the amount of zoom / pan / tilt of the video camera changes, the participant's seat moves or leaves, or there are people who joined from the middle,To monitor the remaining points,Always near the face of each participant currently displayed (remote participant)Is displayed with the attribute information of the participant.
[0016]
  In this way, two-way communication in remote areaseachSince the attribute information of the participant is displayed on the monitor together with the video of the participant, it becomes possible to easily grasp the participant in the remote place just by looking at the monitor.
[0018]
  Also, Let participants have a dedicated medium for authentication and position detection (such as an ID card for authentication and a radio wave or infrared generator for position detection), and information for authentication and position detection from that medium Authentication or position detection of a participant can be performed without providing a device (such as an ID card reader or a radio wave or infrared receiver).
[0019]
  In additionIn the case of applying this interactive communication system to a video conference system, as an example, the title and name information are stored as attribute information in the database,Processing means in the third processIt is preferable that the title and the character data of the name are added.
[0020]
As a result, even when a remote conference participant includes a person who is not acquainted, such as when performing a video conference, what kind of person the person is just by looking at the monitor (what type It ’s easy to see what the name of the person is in the title), and even if the remote conference participants are all familiar, but there are a lot of participants, just looking at the monitor It will be easier to see who is attending the meeting.
[0021]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, an example in which the present invention is applied to a video conference system that connects two points separated from each other will be described with reference to the drawings.
[0022]
FIG. 4 is a diagram showing the conference room at each point (points A and B) in the video conference system to which the present invention is applied, from the viewpoint of the conference participant. Participants of the conference are seated in front of the table 4. A video conference device 1 and a monitor 2 with a built-in speaker are installed in front of the table 4 and a personal computer (hereinafter simply referred to as a computer). 3) is installed.
[0023]
The video conference device 1 is a device in which a video camera, a microphone, a codec, a communication interface for network communication, and the like are integrated, and the video camera and the microphone are directed in the front direction of the device (the direction of the table 4).
[0024]
The video output terminal and the audio output terminal of the video conference apparatus 1 are connected to the video input terminal and the audio input terminal of the monitor 2 by cables, respectively. Further, another video output terminal of the video conference apparatus 1 is connected to the video input terminal of the computer 3 with a cable, and the video input terminal of the video conference apparatus 1 is connected to the video output terminal of the computer 3 with a cable.
[0025]
FIG. 5 is a block diagram showing the configuration of the video conference apparatus 1 together with the overall configuration of the video conference system. The video conference apparatus 1 includes a video camera 11, a mixing circuit 12, an AD converter 13, a video codec 14, a communication interface 15, a DA converter 16, a microphone 17, an AD converter 18, an echo canceller 19, an audio codec 20, a DA converter 21, and a CPU 22. , A camera controller 23 is included.
[0026]
At point A, the video signal output from the video camera 11 of the video conference device 1 is sent from the video conference device 1 to the computer 3 and also sent to the mixing circuit 12. In the mixing circuit 12, the video signal from the video camera 11 and the video signal sent from the computer 3 to the video conference device 1 are mixed. The video signal output from the mixing circuit 12 is digitally converted by the AD converter 13, encoded (compressed) by the video codec 14, and transmitted from the communication interface 15 to the communication interface 15 of the video conference apparatus 1 at the point B via the network 101. Is done. At point B, the video data received by the communication interface 15 of the video conference device 1 is decoded (expanded) by the video codec 14, converted to analog by the DA converter 16, and sent from the video conference device 1 to the monitor 2. Displayed on monitor 2.
[0027]
At point A, the audio signal output from the microphone 17 of the video conference device 1 is digitally converted by the AD converter 18, encoded by the audio codec 20 via the echo canceller 19, and then transmitted from the communication interface 15 via the network 101. It is transmitted to the communication interface 15 of the B video conference apparatus 1. At point B, the audio data received by the communication interface 15 of the video conference device 1 is decoded by the audio codec 14, converted to analog by the DA converter 21 via the echo canceller 19, and sent from the video conference device 1 to the monitor 2. And reproduced by the built-in speaker of the monitor 2.
[0028]
Video and audio are transmitted from the point B to the point A in the same manner, and the video is displayed on the monitor 2 at the point A and the sound is reproduced by the built-in speaker of the monitor 2.
[0029]
The camera controller 23 adjusts the zoom amount and pan / tilt amount of the video camera 11 under the control of the CPU 22. Based on the control signal transmitted from the video conference apparatus 1 at the other party via the network 101 or the signal from the operation of the zoom or pan / tilt operation keys of the remote controller (not shown) at the own point, the CPU 22 A control signal is given to the camera controller 23.
[0030]
The computer 3 stores a database for registering data on each person who is scheduled to participate in the conference. FIG. 6 shows data registered in this database. For each of the prospective participants A, B, C, etc., facial feature data for performing face recognition and title / name data indicating the title such as “XX Department XX Manager” and name Each is registered.
[0031]
The computer 3 stores a program for executing processing as shown in FIG. In this process, first, an actual participant in the conference is authenticated by performing face recognition using the face feature data (FIG. 6) in the database from the video signal input to the computer 3 from the video conference device 1. (Step S1).
[0032]
This face recognition is performed by existing face recognition processing in the following processes (a) to (d).
(A) Detection of face area from input video
(B) Extraction of face area from input video based on detection result of face area and normalization of variation in size, brightness, etc. of extracted face area
(C) Extracting facial features from facial regions
(D) Verification of the extracted facial features and facial feature data in the database
[0033]
Subsequently, from the video signal input from the video conference apparatus 1, the position within the screen (one frame of video) of each participant who has been authenticated in step S 1 is detected (step S 2). This position detection is performed by a method in which the position of the face area detected in the process (a) in step S1 is determined as the position of the participant as it is for each authenticated participant.
[0034]
Subsequently, it is determined whether or not a new participant who has not been authenticated yet has been authenticated in the current step S1 (step S3).
[0035]
If yes, using the title / name data in the database, the title and name of the participant in the vicinity of the location of each participant detected in step S2 and not overlapping with other participants. Is generated (step S4). Then, the character data is sent from the computer 3 to the video conference apparatus 1 (step S5). And it returns to step S1 and repeats after step S1.
[0036]
If no in step S3, the position of any participant detected in this step S2 has changed from the position detected in the previous step S2, or if the participant whose previous position was detected is It is determined whether or not a participant whose current position has not been detected or a participant whose previous position has not been detected has been detected this time (step S6).
[0037]
If yes, go to step S4. On the other hand, if no, it returns to step S1 and repeats step S1 and subsequent steps.
[0038]
Next, the state of the conference using this video conference system will be described. For example, it is assumed that the scheduled attendees at the point A are the following seven people at a certain business establishment.
・ Director Kinoshita
・ Sales Department 1: Director Sato, Manager Yamada
・ Sales 2nd Division: Manager Suzuki, Assistant Manager Ueda
・ Design 1st Division: Manager Yoshida
・ Design 2nd part: Manager Tanaka
[0039]
In the database (FIG. 6) in the computer 3 at the point A, the face feature data and the title / name data for the seven prospective participants are registered before the start of the conference.
[0040]
It is the start time of the meeting, and at point A, it is assumed that the manager of the sales department 2 Suzuki, among the seven prospective participants, was unable to participate due to an emergency, but the remaining six persons participated in the meeting as scheduled.
[0041]
At the point A, the video conferencing apparatus 1 controls the zoom amount and pan / tilt amount of the video camera 11 (FIG. 5) with the camera controller 23 (FIG. 5) so that all the six persons are within the angle of view. Is taken. Then, the video is transmitted from the video conference apparatus 1 to the video conference apparatus at the point B via the network 101 and also transmitted from the video conference apparatus 1 to the computer 3.
[0042]
7 is executed by the computer 3, character data for displaying the title and name of the person in the vicinity of the respective positions of the six names on the screen is obtained from the computer 3. It is sent to the video conference device 1.
[0043]
Then, this character data is mixed with the video from the video camera 11 by the mixing circuit 12 (FIG. 5) in the video conference apparatus 1. As a result, the video conference device 1 transmits the video with the character data added to the video conference device at the point B.
[0044]
As a result, not only the images of the six participants at the point A are displayed on the monitor 2 at the point B, but also the titles of the participants in the vicinity of the faces of the six participants. And a name telop is displayed. FIG. 8 is a diagram illustrating an image displayed on the monitor 2 at the point B at this time.
[0045]
It is assumed that the meeting participant at the point B has no contact with the two directors, the director Kinoshita and the design 1 section, Yoshida, among the participants at the point A, and the other 5 members. The conference participant at point B has no knowledge of Director Kinoshita or Manager of Design 1 Section Yoshida, but only by looking at this telop displayed on monitor 2, in the example of FIG. It can be easily understood that the person is the manager Yoshida of Design 1 and the third person from the left of the screen is Director Kinoshita.
[0046]
In addition, the meeting participants at point B had been informed in advance that there were a total of 7 people including 5 people who were acquainted with point A. Until then. The conference participant at the point B is displayed on the monitor 2 even when the face of the person is not displayed very clearly on the monitor 2 due to the compression ratio of the video codec 14 (FIG. 5) of the video conference device 1. By just looking at this telop, it is easy to see who is actually participating in the meeting at point A (the sales manager 2nd section is not participating and the remaining 6 are participating). I can grasp it.
[0047]
Subsequently, Director Kinoshita starts speaking, and at point A, the video conferencing apparatus 1 sets the zoom amount and pan tilt amount of the video camera 11 (FIG. 5) with the camera controller 23 (FIG. 5) so as to raise the director. Assume that shooting was performed under control.
[0048]
Then, on the basis of the video, the computer 3 executes steps S1 to S3 of the process of FIG. 7, and the new participant has not been authenticated (only the director Kinoshita has been authenticated this time. So that the answer is no in step S3 and the process proceeds to step S6, where the position of the director of Kinoshita changes in the screen (the face area size is different from the previous one) and the remaining participants Since the position is not detected, the answer is yes in step S6 and steps S4 and S5 are executed. Therefore, this time, character data for displaying the title and name in the vicinity of the position of Director Kinoshita in the screen is sent from the computer 3 to the video conference apparatus 1.
[0049]
Thereby, on the monitor 2 at the point B, the video of Director Kinoshita is displayed in an up state, and the title and name telop are displayed in the vicinity of Director Kinoshita's face. FIG. 9 is a diagram illustrating an image displayed on the monitor 2 at the point B at this time.
[0050]
Also, for example, at point A, while the video conferencing apparatus 1 is shooting all six participants within the angle of view, the participants move their seats, or some participants leave. If you do.
[0051]
Then, the steps S1 to S3, S6, S4, and S5 of the process of FIG. 7 are also executed by the computer 3, so that the monitor 2 at the point B shows the vicinity of each participant's face after moving and those who have left. In the vicinity of the remaining participant's face, the title and name telop of the participant are displayed.
[0052]
In addition, for example, at point A, the manager of the sales department 2 Suzuki, who was able to urgently, completes the business and participates in the meeting from the middle. Suppose. Then, in the computer 3, steps S1 to S3 of the process of FIG. 7 are executed, and since a new participant (sales manager 2nd section sales manager) is authenticated, the answer becomes yes in step S3 and steps S4 and S5 are executed. . Therefore, this time, the character data for displaying the title and name in the vicinity of the position of each participant including the manager of the sales department 2 Suzuki in the screen is sent from the computer 3 to the video conference apparatus 1.
[0053]
As a result, the title and name telop are also displayed on the monitor 2 at the point B in the vicinity of the face of the manager of the sales department 2, Suzuki, who joined the meeting from the middle.
[0054]
In this way, at the point A, the zoom amount and pan / tilt amount of the video camera 11 of the video conference device 1 may change, the participant's seat may move or leave, or there may be a person who joined the conference from the middle. However, on the monitor 2 at the point B, the title and name telop of the participant are always displayed near the face of each participant currently displayed.
[0055]
Here, the state of the meeting as viewed from the meeting participant at point B (display on monitor B at point B) has been described, but each participation at point B currently displayed is also performed in the same manner on monitor 2 at point A. The participant's title and name telop are always displayed near the person's face.
[0056]
As described above, according to this video conference system, the title and name of each conference participant in a remote place (point B for point A and point A for point B) are displayed on the monitor together with the video of the participant. Therefore, it is possible to easily grasp the participants in the remote place just by looking at the monitor.
[0057]
Since the participant authentication and position detection are performed based on the face recognition processing, a dedicated medium (authentication ID card, position detection radio wave or infrared generator) is used for the participant. Etc.) and no devices (such as ID card readers or radio wave or infrared receivers) that acquire information for authentication or position detection from the media are provided in the conference room. Authentication and position detection can be performed.
[0058]
In the above example, participant authentication and position detection are performed based on face recognition processing. However, as another example, participant authentication may be performed by personal recognition processing such as voice recognition processing, fingerprint recognition processing, and retina recognition processing, or participant position detection may be performed by voice recognition processing.
[0059]
Alternatively, if necessary, participants can have a dedicated medium for authentication and position detection (such as an ID card for authentication or a radio wave or infrared generator for position detection), and authentication or position can be obtained from that medium. A device (such as an ID card reader or a radio wave or infrared receiver) that acquires information for detection may be provided in the conference room.
[0060]
Alternatively, for example, in the case where microphones corresponding to the number of persons scheduled to participate are arranged at predetermined positions on the table 4 in FIG. 4 (audio signals are sent from these microphones to the video conference apparatus 1), For example, an operation device or a circuit for sending ID information of a participant who uses the microphone to each computer is incorporated in each microphone, and the position of each microphone 3 is stored in the computer 3 and is based on the ID information. In addition to performing participant authentication, the position of the microphone that sent the ID information may be determined as the position of the authenticated participant.
[0061]
In the above example, one computer 3 is provided at each of the points A and B. However, as another example, only one computer 3 is provided at one of the points A and B (or another point connected to the points A and B via the network 101). 3, the face feature data and title / name data for the prospective participants at both the points A and B may be registered in the database, and the processing of FIG. 7 may be executed for both participants.
[0062]
Moreover, in the above example, the computer 3 which stores a database and performs the process of FIG. 7 is provided separately from the video conference apparatus 1. However, as another example, the video conference apparatus 1 itself may be configured to store a database and execute the processing of FIG.
[0063]
In the above example, the title / name data of the participant is displayed in the vicinity of the participant's face displayed on the monitor 2 by registering the title / name data in the database in the computer 3 (FIG. 6). Is displayed. However, the present invention is not limited to this, and attribute information indicating appropriate attributes of the prospective participant is registered in this database so that the attribute information is displayed in the vicinity of the participant's face displayed on the monitor 2. Good. As an example, in addition to or instead of title / name data, data summarizing the previous meeting's allegations of the prospective participants (whether they agree or disagree with a project). Is registered in this database so that a telop of the summary of such claims can be displayed in the vicinity of the face of the participant displayed on the monitor 2.
[0064]
In the above example, the present invention is applied to a video conference system that connects two points, point A and point B. However, the present invention is not limited to this, and the present invention may be applied to a video conference system that connects three or more points, and an appropriate two-way communication system other than the video conference system.
[0065]
When the present invention is applied to an entertainment interactive communication system, for example, by registering image data of a favorite animation of a prospective participant in the database as attribute information, the vicinity of the participant's face displayed on the monitor The animation image may be displayed on the screen, or the animation image may be displayed on the participant's face displayed on the monitor.
[0066]
In addition, when the present invention is applied to an interactive communication system in which there is a person who is not desirable to display a face on a monitor among prospective participants, it is instructed to apply mosaic as attribute information about the person. By registering the information in the database, a mosaic may be applied to the face of the person displayed on the monitor.
[0067]
Further, the present invention is not limited to the above examples, and it is needless to say that various other configurations can be taken without departing from the gist of the present invention.
[0068]
【The invention's effect】
  As described above, according to the interactive communication system according to the present invention,Even if the zoom / pan / tilt amount of the video camera changes, the participant's seat moves or leaves at any point, or there is a participant who joined from the middle, The attribute information of the participant is always displayed in the vicinity of the face of each participant currently displayed (remote participant). in this way,To two-way communication in remote areaseachSince the attribute information of the participant is displayed on the monitor together with the video of the participant, there is an effect that the remote participant can be easily grasped only by looking at the monitor.
[0069]
In addition, participants can be authenticated and positioned without having to have a dedicated medium for authentication and position detection, or by providing a device that obtains information for authentication and position detection from that medium. The effect that it can be performed is acquired.
[0070]
Also, when a teleconference is held, if a remote conference participant includes a person who is not acquainted, what kind of person the person is just by looking at the monitor (what title and what Can easily grasp the name) and everyone who is planning to participate in the remote conference is familiar, but even if there are many participants, just looking at the monitor who actually participates in the conference The effect that it is possible to easily grasp whether or not it is done is obtained.
[Brief description of the drawings]
FIG. 1 is a diagram showing a device for video transmission / reception in a conventional video conference system.
2 is a diagram illustrating an image displayed on the monitor of FIG. 1; FIG.
FIG. 3 is a diagram illustrating an image displayed on the monitor of FIG. 1;
FIG. 4 is a diagram showing a conference room in a video conference system to which the present invention is applied.
FIG. 5 is a block diagram illustrating a configuration of the video conference apparatus in FIG. 4;
FIG. 6 is a diagram showing a database in the computer of FIG. 4;
FIG. 7 is a flowchart showing processing executed by the computer shown in FIG. 4;
FIG. 8 is a diagram illustrating an image displayed on the monitor of FIG. 4;
FIG. 9 is a diagram illustrating an image displayed on the monitor of FIG. 4;
[Explanation of symbols]
1 video conference device, 2 monitor, 3 computer, 4 table, 11 video camera, 12 mixing circuit, 13, 18 AD converter, 14 video codec, 15 communication interface, 16, 21 DA converter, 17 microphone, 19 echo canceller, 20 Voice codec, 101 network

Claims

In a two-way communication system in which videos taken at individual points are transmitted to the remaining points via a network between a plurality of points separated from each other,
A database storing facial feature data for identifying a prospective participant in two-way communication, and attribute information indicating an attribute of the prospective participant;
A first process for authenticating an actual participant in two-way communication by performing face recognition using the face feature data stored in the database from videos shot at the individual points ;
A second process for detecting the position of the face area of the participant detected in the face recognition as the position of the participant for each participant authenticated in the first process;
When a new participant is authenticated in the first process of this time, the position of any participant detected in the second process of this time has changed from the position detected in the second process of the previous time. If any participant whose position was detected in the previous second process was not detected in the current second process, or no position was detected in the previous second process When the position of the participant is detected in the second process of this time, it is in the vicinity of the position of each participant detected in the second process of the current time among the images shot at the respective points. A third process of adding data for visually displaying the attribute information of the participant stored in the database at a position that does not overlap with other participants ;
A two-way communication system comprising processing means for repeatedly executing .

The interactive communication system according to claim 1,
In the database, title information and name information are stored as the attribute information,
In the third processing, the processing means adds the title and the character data of the name in the third processing .