JP2004129071A

JP2004129071A - Two-way communication system

Info

Publication number: JP2004129071A
Application number: JP2002292858A
Authority: JP
Inventors: Takao Fukui; 福井　隆郎
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2002-10-04
Filing date: 2002-10-04
Publication date: 2004-04-22
Anticipated expiration: 2022-10-04
Also published as: JP4055539B2

Abstract

<P>PROBLEM TO BE SOLVED: To easily grasp a remote participant by watching a monitor in a two-way communication system. <P>SOLUTION: There are provided: a database 3 for storing identification information for identifying a participant scheduled to participate in the two-way communication and attribute information indicative of attribute of the participant, authentication means 3 for authenticating an actual participant participating in the two-way communication at an individual point; detection means 3 for detecting the position of a participant authenticated by the authentication means 3 at the individual point; and adding means 3, 12 for adding data for visually displaying attribute information of a participant stored in the database 3 at a location corresponding to the position of the participant detected by the detection means 3 among video images photographed at the individual positions. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、例えばテレビ会議システムのような双方向コミュニケーションシステムに関する。
【０００２】
【従来の技術】
互いに離れた複数の地点の間において、個々の地点で撮影した映像をそれぞれネットワーク経由で残りの地点に送信することにより、遠隔地にいる人同士が相手の顔を見ながら双方向にコミュニケーションを行えるようにしたシステムが普及しつつある。そうした双方向コミュニケーションシステムの例としては、テレビ会議システムが挙げられる。
【０００３】
図１は、従来のテレビ会議システムにおける映像送受信用の機器や回路を示すブロック図である。地点Ａの会議室内に、ビデオカメラ５１，ＡＤコンバータ５２，映像コーデック５３，ネットワーク通信用の通信インタフェース５４，ＤＡコンバータ５５及びモニター５６が設けられる。
【０００４】
地点Ａとは離れた地点Ｂの会議室内にも、同じくビデオカメラ５１，ＡＤコンバータ５２，映像コーデック５３，通信インタフェース５４，ＤＡコンバータ５５及びモニター５６が設けられる。
【０００５】
地点Ａにおいて、ビデオカメラ５１で撮影された映像（会議参加者の映像）は、ＡＤコンバータ５２でデジタル変換され、映像コーデック５３で符号化（圧縮）されて、通信インタフェース５４からネットワーク１０１経由で地点Ｂの通信インタフェース５４に送信される。そして、地点Ｂにおいて、通信インタフェース５４で受信した映像が、映像コーデック５３で復号（伸長）され、ＤＡコンバータ５５でアナログ変換されて、モニター５６に表示される。
【０００６】
地点Ｂからも全く同様にして地点Ａに映像が送信され、地点Ａにおいてモニター５６にその映像が表示される。
【０００７】
図２及び図３は、図１のモニター５６に表示される映像を例示する図である。図２のように、遠隔地（地点Ａにとっては地点Ｂ、地点Ｂにとっては地点Ａ）の会議参加者全員の映像がモニター５６に表示されたり、あるいは、遠隔地のビデオカメラ５１のズーム量やパンチルト量が制御される（図１ではビデオカメラ５１の制御系は図示を省略している）ことにより、図３のように、会議参加者のうち現在発言中の１人の映像がモニター５６にアップで表示される（このように、テレビ会議システムにおいて遠隔地の会議参加者の映像がモニターに表示されることについては、例えば特許文献１及び２参照。）。
【０００８】
【特許文献１】
特開２００２−２３７９９１号公報（段落番号０００７、図１）
【特許文献２】
特開２００２−１７１４９９号公報（段落番号０００２〜０００４）
【０００９】
【発明が解決しようとする課題】
しかし、こうした従来のテレビ会議システムでは、遠隔地の会議参加者の中に面識のない人が含まれている場合、モニターを見てもその人がどんな人物であるか（どんな肩書きで何という名前の人か）を把握することができなかった。
【００１０】
また、遠隔地の会議参加予定者が全員面識のある人であっても、参加人数が多い場合には、モニターを見ても遠隔地において実際に誰々が会議に参加しているのかを把握しにいことが少なくなかった（図２のように遠隔地の会議参加者全員の映像がモニターに表示される際には、映像コーデックでの圧縮率の関係で個々の参加者の顔があまり鮮明でなかったりするので、実際に誰々が会議に参加しているのかを把握しにくいし、図３のように現在発言中の１人の映像がモニターに表示される際には、残りの参加者は表示されないのでやはり実際に誰々が会議に参加しているのかを把握しにくい）。
【００１１】
本発明は、上述の点に鑑み、テレビ会議システムのような双方向コミュニケーションシステムにおいて、モニターを見ることにより遠隔地の参加者を容易に把握できるようにすることを課題としてなされたものである。
【００１２】
【課題を解決するための手段】
この課題を解決するために、本出願人は、互いに離れた複数の地点の間において、個々の地点で撮影された映像がそれぞれネットワーク経由で残りの地点に送信される双方向コミュニケーションシステムにおいて、双方向コミュニケーションへの参加予定者を識別するための識別情報と、この参加予定者の属性を示す属性情報とを記憶したデータベースと、このデータベースに記憶されている識別情報を用いて、個々の地点における双方向コミュニケーションへの実際の参加者を認証する認証手段と、個々の地点における、この認証手段で認証された参加者の位置を検出する検出手段と、個々の地点で撮影された映像のうち、この検出手段で検出された参加者の位置に対応する部分に、このデータベースに記憶されているその参加者の属性情報を視覚的に表示するためのデータを付加する付加手段とを備えたものを提案する。
【００１３】
この双方向コミュニケーションシステムでは、双方向コミュニケーションへの参加予定者を識別するための識別情報と、この参加予定者の属性を示す属性情報とがデータベースに記憶されている。この参加予定者のうちのいずれかの人が、いずれかの地点で実際に双方向コミュニケーションに参加すると、認証手段により、このデータベースに記憶されている識別情報を用いてその参加者が認証された後、検出手段により、その地点におけるその認証された参加者の位置が検出される。
【００１４】
そして、その地点で撮影された映像のうち、この検出された参加者の位置に対応する部分に、付加手段により、このデータベースに記憶されているその参加者の属性情報を視覚的に表示するためのデータが付加される。その結果、その地点からは、その地点で撮影された映像であって、この検出された参加者の位置に対応する部分にその参加者の属性情報を視覚的に表示するためのデータを付加したものが、残りの地点に送信される。
【００１５】
これにより、残りの地点のモニターには、この参加者（遠隔地の参加者）の映像が表示されるだけでなく、この参加者の位置に対応する部分にその参加者の属性情報が表示される。
【００１６】
このように、遠隔地における双方向コミュニケーションへの参加者の属性情報がその参加者の映像とともにモニターに表示されるので、モニターを見ただけで遠隔地の参加者を容易に把握することができるようになる。
【００１７】
なお、この双方向コミュニケーションシステムにおいて、一例として、データベースに、識別情報として参加予定者の顔特徴データを記憶させ、認識手段を、個々の地点で撮影された映像からこの顔特徴データを用いて顔認識によって認証を行うように構成し、検出手段を、個々の地点で撮影された映像から、認証手段で認証された参加者の画面内の位置を検出するように構成することが好適である。
【００１８】
それにより、参加者に認証や位置検出のための専用の媒体（認証用のＩＤカードや、位置検出用の電波または赤外線の発生器等）を所持させたり、その媒体から認証や位置検出のための情報を取得する機器（ＩＤカードの読取装置や、電波または赤外線の受信装置等）を設けたりすることなく、参加者の認証や位置検出を行うことができるようになる。
【００１９】
また、この双方向コミュニケーションシステムをテレビ会議システムに適用するような場合には、一例として、データベースに、属性情報として肩書き及び名前の情報を記憶させ、付加手段を、この肩書き及び名前のキャラクタデータを付加するように構成することが好適である。
【００２０】
それにより、テレビ会議を行うような際に、遠隔地の会議参加者の中に面識のない人が含まれている場合にも、モニターを見ただけでその人がどんな人物であるか（どんな肩書きで何という名前の人か）を容易に把握できるようになり、また遠隔地の会議参加予定者が全員面識のある人であるが参加人数が多い場合にも、モニターを見ただけで実際に誰々が会議に参加しているのかを容易に把握できるようになる。
【００２１】
【発明の実施の形態】
以下、互いに離れた２つの地点を結ぶテレビ会議システムに本発明を適用した例について、図面を用いて説明する。
【００２２】
図４は、本発明を適用したテレビ会議システムにおける各地点（地点Ａ，Ｂ）の会議室を、会議参加者の視点で示す図である。会議参加者はテーブル４の手前に着席するようになっており、テーブル４の向こう側には、テレビ会議装置１及びスピーカ内蔵型のモニター２が正面に設置されるとともに、パーソナルコンピュータ（以下単にコンピュータと呼ぶ）３が設置されている。
【００２３】
テレビ会議装置１は、ビデオカメラ，マイクロホン，コーデック，ネットワーク通信用の通信インタフェース等が一体となった装置であり、ビデオカメラ及びマイクロホンは装置の正面方向（テーブル４の方向）に向けられている。
【００２４】
テレビ会議装置１の映像出力端子，音声出力端子は、それぞれモニター２の映像入力端子，音声入力端子にケーブルで接続されている。また、テレビ会議装置１の別の映像出力端子がコンピュータ３の映像入力端子にケーブルで接続されるとともに、テレビ会議装置１の映像入力端子がコンピュータ３の映像出力端子にケーブルで接続されている。
【００２５】
図５は、テレビ会議装置１の構成を、テレビ会議システムの全体構成とともに示すブロック図である。テレビ会議装置１は、ビデオカメラ１１，ミキシング回路１２，ＡＤコンバータ１３，映像コーデック１４，通信インタフェース１５，ＤＡコンバータ１６，マイクロホン１７，ＡＤコンバータ１８，エコーキャンセラ１９，音声コーデック２０，ＤＡコンバータ２１，ＣＰＵ２２，カメラコントローラ２３を含んでいる。
【００２６】
地点Ａにおいて、テレビ会議装置１のビデオカメラ１１から出力した映像信号は、テレビ会議装置１からコンピュータ３に送られるとともに、ミキシング回路１２に送られる。ミキシング回路１２では、ビデオカメラ１１からの映像信号と、コンピュータ３からテレビ会議装置１に送られた映像信号とがミキシングされる。ミキシング回路１２から出力した映像信号は、ＡＤコンバータ１３でデジタル変換され、映像コーデック１４で符号化（圧縮）され、通信インタフェース１５からネットワーク１０１経由で地点Ｂのテレビ会議装置１の通信インタフェース１５に送信される。そして、地点Ｂにおいて、テレビ会議装置１の通信インタフェース１５で受信した映像データが、映像コーデック１４で復号（伸長）され、ＤＡコンバータ１６でアナログ変換され、テレビ会議装置１からモニター２に送られてモニター２に表示される。
【００２７】
また、地点Ａにおいて、テレビ会議装置１のマイクロホン１７から出力した音声信号は、ＡＤコンバータ１８でデジタル変換され、エコーキャンセラ１９を経て音声コーデック２０で符号化され、通信インタフェース１５からネットワーク１０１経由で地点Ｂのテレビ会議装置１の通信インタフェース１５に送信される。そして、地点Ｂにおいて、テレビ会議装置１の通信インタフェース１５で受信した音声データが、音声コーデック１４で復号され、エコーキャンセラ１９を経てＤＡコンバータ２１でアナログ変換され、テレビ会議装置１からモニター２に送られてモニター２の内蔵スピーカで再生される。
【００２８】
地点Ｂからも全く同様にして地点Ａに映像，音声が送信され、地点Ａにおいてモニター２にその映像が表示されるとともにモニター２の内蔵スピーカでその音声が再生される。
【００２９】
カメラコントローラ２３は、ＣＰＵ２２の制御のもとで、ビデオカメラ１１のズーム量やパンチルト量を調整する。ＣＰＵ２２は、自分の地点のリモートコントローラ（図示略）のズームやパンチルト用の操作キーの操作による信号や、あるいはネットワーク１０１経由で相手の地点のテレビ会議装置１から送信された制御データに基づいて、カメラコントローラ２３に制御信号を与える。
【００３０】
コンピュータ３内には、会議への参加予定者１人１人についてのデータを登録するためのデータベースが格納されている。図６は、このデータベースに登録されるデータを示す。１人１人の参加予定者Ａ，Ｂ，Ｃ…，について、顔認識を行うための顔特徴データと、「○○部○○課長」というような肩書き及び名前を示す肩書き・名前データとがそれぞれ登録される。
【００３１】
また、コンピュータ３には、図７に示すような処理を実行するためのプログラムが格納されている。この処理では、最初に、テレビ会議装置１からコンピュータ３に入力した映像信号から、データベース内の顔特徴データ（図６）を用いて顔認識を行うことにより、会議への実際の参加者を認証する（ステップＳ１）。
【００３２】
この顔認識は、次の（ａ）〜（ｄ）のような過程の既存の顔認識処理によって行う。
（ａ）入力映像からの顔領域の検出
（ｂ）顔領域の検出結果に基づく入力映像からの顔領域の切り出しと、切り出した顔領域の大きさや輝度等のばらつきの正規化
（ｃ）顔領域からの顔特徴の抽出
（ｄ）抽出した顔特徴と、データベース内の顔特徴データとの照合
【００３３】
続いて、テレビ会議装置１から入力した映像信号から、いままでステップＳ１で認証済みの各参加者の画面内（１フレーム分の映像内）の位置を検出する（ステップＳ２）。この位置検出は、認証済みの各参加者について今回のステップＳ１の前述の（ａ）の処理で検出した顔領域の位置をそのままその参加者の位置として決定するという方法で行う。
【００３４】
続いて、まだ認証済みでない新たな参加者が今回のステップＳ１で認証されたか否かを判断する（ステップＳ３）。
【００３５】
イエスであれば、データベース内の肩書き・名前データを用いて、今回のステップＳ２で検出した各参加者の位置の近傍であって他の参加者とは重ならない位置にその参加者の肩書き及び名前を表示させるキャラクタデータを生成する（ステップＳ４）。そして、そのキャラクタデータをコンピュータ３からテレビ会議装置１に送る（ステップＳ５）。そして、ステップＳ１に戻ってステップＳ１以下を繰り返す。
【００３６】
ステップＳ３でノーであった場合には、今回のステップＳ２で検出されたいずれかの参加者の位置が、前回のステップＳ２で検出した位置から変化したり、あるいは前回位置検出された参加者が今回位置検出されなかったり、前回位置検出されなかった参加者が今回位置検出されたりしているか否かを判断する（ステップＳ６）。
【００３７】
イエスであれば、ステップＳ４に進む。他方ノーであれば、ステップＳ１に戻ってステップＳ１以下を繰り返す。
【００３８】
次に、このテレビ会議システムを用いた会議の様子について説明する。例えば、地点Ａの会議参加予定者が、或る事業所の次の７名であったとする。
・木下所長
・営業１部：佐藤部長・山田課長
・営業２部：鈴木課長・上田係長
・設計１部：吉田課長
・設計２部：田中課長
【００３９】
地点Ａのコンピュータ３内のデータベース（図６）には、会議開始前に、上記７名の参加予定者についての顔特徴データ及び肩書き・名前データをそれぞれ登録しておく。
【００４０】
会議の開始時刻になり、地点Ａでは、上記７名の参加予定者のうちの営業２部鈴木課長が急用で参加できなくなったが残りの６名が予定通り会議に参加したとする。
【００４１】
会議が始まり、地点Ａにおいて、テレビ会議装置１で、まずこの６名全員を画角に収めるようにカメラコントローラ２３（図５）でビデオカメラ１１（図５）のズーム量やパンチルト量をコントロールして撮影が行われたとする。すると、その映像が、テレビ会議装置１からネットワーク１０１経由で地点Ｂのテレビ会議装置に送信されるとともに、テレビ会議装置１からコンピュータ３に送られる。
【００４２】
そして、コンピュータ３で図７の処理のステップＳ１〜Ｓ５が実行されることにより、画面内のこの６名のそれぞれの位置の近傍にその人物の肩書き及び名前を表示させるキャラクタデータが、コンピュータ３からテレビ会議装置１に送られる。
【００４３】
そして、このキャラクタデータが、テレビ会議装置１内のミキシング回路１２（図５）で、ビデオカメラ１１からの映像とミキシングされる。その結果、テレビ会議装置１からは、このキャラクタデータを付加した映像が地点Ｂのテレビ会議装置に送信される。
【００４４】
これにより、地点Ｂのモニター２には、地点Ａのこの６名の参加者の映像が表示されるだけでなく、この６名の参加者１人１人の顔の近傍にその参加者の肩書き及び名前のテロップが表示される。図８は、このとき地点Ｂのモニター２に表示される映像を例示する図である。
【００４５】
地点Ｂの会議参加者にとって、地点Ａの会議参加者のうち木下所長，設計１課吉田課長の２名とは面識がなく、残りの５名とは面識があるとする。地点Ｂの会議参加者は、木下所長や設計１課吉田課長とは面識がないにもかかわらず、モニター２に表示されるこのテロップを見ただけで、図８の例では、画面の左端の人物が設計１課の吉田課長であって画面の左から３番目の人物が木下所長であるということを容易に把握できる。
【００４６】
また、地点Ｂの会議参加者は、地点Ａの参加予定者が面識のある５名を含めて合計７名であることを予め知らされていたが、営業２部鈴木課長に急用ができたことまでは知らされていなかったとする。地点Ｂの会議参加者は、テレビ会議装置１の映像コーデック１４（図５）での圧縮率の関係でモニター２に人物の顔があまり鮮明に表示されないような場合でも、モニター２に表示されるこのテロップを見ただけで、地点Ａにおいて実際に誰々が会議に参加しているのか（営業２部鈴木課長が参加しておらず、残りの６名が参加していること）を容易に把握することができる。
【００４７】
続いて、木下所長が発言を始め、地点Ａにおいて、テレビ会議装置１で、木下所長をアップにするようにカメラコントローラ２３（図５）でビデオカメラ１１（図５）のズーム量やパンチルト量をコントロールして撮影が行われたとする。
【００４８】
すると、その映像に基づき、コンピュータ３では、図７の処理のステップＳ１〜Ｓ３が実行され、新たな参加者は認証されていない（今回認証されたのは木下所長のみであり、木下所長は前回も認証されている）のでステップＳ３でノーとなってステップＳ６に進み、画面内での木下所長の位置が変化している（前回とは顔領域の大きさが違う）とともに残りの参加者が位置検出されないのでステップＳ６でイエスとなってステップＳ４及びＳ５が実行される。したがって、今度は、画面内の木下所長の位置の近傍にその肩書き及び名前を表示させるキャラクタデータが、コンピュータ３からテレビ会議装置１に送られる。
【００４９】
これにより、地点Ｂのモニター２には、今度は木下所長の映像がアップで表示されるとともに、木下所長の顔の近傍にその肩書き及び名前のテロップが表示される。図９は、このとき地点Ｂのモニター２に表示される映像を例示する図である。
【００５０】
また、例えば、地点Ａにおいて、テレビ会議装置１で参加者６名全員を画角に収めて撮影が行われている最中に、参加者が席を移動したり、一部の参加者が退席したりしたとする。
【００５１】
すると、やはりコンピュータ３で図７の処理のステップＳ１〜Ｓ３，Ｓ６，Ｓ４，Ｓ５が実行されるので、地点Ｂのモニター２には、移動後後の各参加者の顔の近傍や退席者を除く残りの参加者の顔の近傍にその参加者の肩書き及び名前のテロップが表示される。
【００５２】
また、例えば、地点Ａにおいて、急用ができた営業２部鈴木課長が用件を済ませて会議に途中から参加し、テレビ会議装置１で営業２部鈴木課長も画角に収めて撮影が行われたとする。すると、コンピュータ３では、図７の処理のステップＳ１〜Ｓ３が実行され、新たな参加者（営業２部鈴木課長）が認証されたのでステップＳ３でイエスとなってステップＳ４及びＳ５が実行される。したがって、今度は、画面内の営業２部鈴木課長を含む各参加者の位置の近傍にその肩書き及び名前を表示させるキャラクタデータが、コンピュータ３からテレビ会議装置１に送られる。
【００５３】
これにより、地点Ｂのモニター２には、途中から会議に参加した営業２部鈴木課長の顔の近傍にもその肩書き及び名前のテロップが表示される。
【００５４】
このようにして、地点Ａにおいて、テレビ会議装置１のビデオカメラ１１のズーム量・パンチルト量が変化したり、参加者の席の移動や退席があったり、途中から会議に参加した者がいたりしても、地点Ｂのモニター２には、現在表示されている各参加者の顔の近傍に、常にその参加者の肩書き及び名前のテロップが表示される。
【００５５】
ここでは地点Ｂの会議参加者からみた会議の様子（地点Ｂのモニター２の表示）を説明したが、地点Ａのモニター２にも、全く同様にして、現在表示されている地点Ｂの各参加者の顔の近傍に、常にその参加者の肩書き及び名前のテロップが表示される。
【００５６】
以上のようにして、このテレビ会議システムによれば、遠隔地（地点Ａにとっては地点Ｂ、地点Ｂにとっては地点Ａ）における各会議参加者の肩書き及び名前がその参加者の映像とともにモニターに表示されるので、モニターを見ただけで遠隔地の参加者を容易に把握することができる。
【００５７】
そして、顔認識処理に基づいて参加者の認証及び位置検出を行うので、参加者に認証や位置検出のための専用の媒体（認証用のＩＤカードや、位置検出用の電波または赤外線の発生器等）を所持させたり、その媒体から認証や位置検出のための情報を取得する機器（ＩＤカードの読取装置や、電波または赤外線の受信装置等）を会議室内に設けたりすることなく、参加者の認証や位置検出を行うことができる。
【００５８】
なお、以上の例では、参加者の認証及び位置検出を、顔認識処理に基づいて行っている。しかし、別の例として、参加者認証を音声認識処理，指紋認識処理，網膜認識処理等の個人認識処理によって行ったり、参加者の位置検出を音声認識処理によって行うようにしてもよい。
【００５９】
あるいは、必要に応じて、参加者に認証や位置検出のための専用の媒体（認証用のＩＤカードや、位置検出用の電波または赤外線の発生器等）を所持させ、その媒体から認証や位置検出のための情報を取得する機器（ＩＤカードの読取装置や、電波または赤外線の受信装置等）を会議室内に設けるようにしてもよい。
【００６０】
あるいはまた、例えば図４のテーブル４上の所定の位置に参加予定の人数分の台数のマイクロホンが配置される（それらのマイクロホンからテレビ会議装置１に音声信号が送られる）ような場合には、例えば、各マイクロホンに、そのマイクロホンを使用する参加者のＩＤ情報をコンピュータ３に送るための操作器や回路を組み込むとともに、コンピュータ３に各マイクロホン３の位置を記憶させておき、そのＩＤ情報に基づいて参加者認証を行うとともに、そのＩＤ情報を送ったマイクロホンの位置をその認証した参加者の位置として決定するようにしてもよい。
【００６１】
また、以上の例では、地点Ａ，地点Ｂにそれぞれ１台ずつコンピュータ３を設けている。しかし、別の例として、地点Ａ，地点Ｂのうちの１つの地点（あるいはネットワーク１０１経由で地点Ａ，地点Ｂに結ばれた別の１つの地点）に１台だけコンピュータ３を設け、そのコンピュータ３に、地点Ａ及び地点Ｂの両方の参加予定者についての顔特徴データ及び肩書き・名前データをデータベースに登録させて、この両方の参加者について図７の処理を実行させるようにしてもよい。
【００６２】
また、以上の例では、テレビ会議装置１とは別に、データベースを格納して図７の処理を実行するコンピュータ３を設けている。しかし、別の例として、テレビ会議装置１そのものを、データベースを格納して図７の処理を実行するように構成してもよい。
【００６３】
また、以上の例では、コンピュータ３内のデータベースに肩書き・名前データを登録する（図６）ことにより、モニター２に表示される参加者の顔の近傍に、その参加者の肩書き及び名前のテロップが表示されるようにしている。しかし、これに限らず、参加予定者の適宜の属性を示す属性情報をこのデータベースに登録することにより、モニター２に表示される参加者の顔の近傍にその属性情報が表示されるようにしてよい。一つの例としては、肩書き・名前データに加え、あるいは肩書き・名前データに代えて、その参加予定者の過去の会議での主張（或るプロジェクトに賛成か反対かの見解等）を要約したデータをこのデータベースに登録することにより、モニター２に表示される参加者の顔の近傍に、そうした主張の要約のテロップも表示されるようにすることが考えられる。
【００６４】
また、以上の例では、地点Ａ，地点Ｂという２地点を結ぶテレビ会議システムに本発明を適用している。しかし、これに限らず、３地点以上を結ぶテレビ会議システムや、テレビ会議システム以外の適宜の双方向コミュニケーションシステムにも本発明を適用してよい。
【００６５】
エンターテイメント系の双方向コミュニケーションシステムに本発明を適用する場合には、例えば参加予定者の好きなアニメーションの画像データを属性情報としてデータベースに登録することにより、モニターに表示される参加者の顔の近傍にそのアニメーションの画像が表示されるようにしたり、モニターに表示される参加者の顔の上にそのアニメーションの画像が表示されるようにしてもよい。
【００６６】
また、参加予定者のうちモニターに顔を表示することが好ましくない人物がいるような双方向コミュニケーションシステムに本発明を適用する場合には、その人物についての属性情報としてモザイクをかけることを指示する情報をデータベースに登録することにより、モニターに表示されるその人物の顔にモザイクがかかるようにしてもよい。
【００６７】
また、本発明は、以上の例に限らず、本発明の要旨を逸脱することなく、その他様々の構成をとりうることはもちろんである。
【００６８】
【発明の効果】
以上のように、本発明に係る双方向コミュニケーションシステムによれば、遠隔地における双方向コミュニケーションへの参加者の属性情報がその参加者の映像とともにモニターに表示されるので、モニターを見ただけで遠隔地の参加者を容易に把握することができるという効果が得られる。
【００６９】
また、参加者に認証や位置検出のための専用の媒体を所持させたり、その媒体から認証や位置検出のための情報を取得する機器を設けたりすることなく、参加者の認証や位置検出を行うことができるという効果が得られる。
【００７０】
また、テレビ会議を行う際に、遠隔地の会議参加者の中に面識のない人が含まれている場合にも、モニターを見ただけでその人がどんな人物であるか（どんな肩書きで何という名前の人か）を容易に把握でき、遠隔地の会議参加予定者が全員面識のある人であるが参加人数が多い場合にも、モニターを見ただけで実際に誰々が会議に参加しているのかを容易に把握できるという効果が得られる。
【図面の簡単な説明】
【図１】従来のテレビ会議システムにおける映像送受信用の機器を示す図である。
【図２】図１のモニターに表示される映像を例示する図である。
【図３】図１のモニターに表示される映像を例示する図である。
【図４】本発明を適用したテレビ会議システムにおける会議室を示す図である。
【図５】図４のテレビ会議装置の構成を示すブロック図である。
【図６】図４のコンピュータ内のデータベースを示す図である。
【図７】図４のコンピュータが実行する処理を示すフローチャートである。
【図８】図４のモニターに表示される映像を例示する図である。
【図９】図４のモニターに表示される映像を例示する図である。
【符号の説明】
１テレビ会議装置、　２　モニター、　３　コンピュータ、　４　テーブル、１１　ビデオカメラ、　１２　ミキシング回路、　１３，１８　ＡＤコンバータ、　１４　映像コーデック、　１５　通信インタフェース、　１６，　２１　ＤＡコンバータ、　１７　マイクロホン、　１９　エコーキャンセラ、　２０　音声コーデック、　１０１　ネットワーク[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an interactive communication system such as a video conference system.
[0002]
[Prior art]
By transmitting the video shot at each point to the remaining points via a network between multiple points separated from each other, people in remote locations can communicate with each other while seeing the face of the other party Such a system is becoming widespread. A video conference system is an example of such an interactive communication system.
[0003]
FIG. 1 is a block diagram showing devices and circuits for transmitting and receiving video in a conventional video conference system. A video camera 51, an AD converter 52, a video codec 53, a communication interface 54 for network communication, a DA converter 55, and a monitor 56 are provided in the conference room at the point A.
[0004]
A video camera 51, an AD converter 52, a video codec 53, a communication interface 54, a DA converter 55, and a monitor 56 are also provided in a conference room at a point B remote from the point A.
[0005]
At the point A, the video (video of the conference participant) captured by the video camera 51 is digitally converted by the AD converter 52, encoded (compressed) by the video codec 53, and transmitted from the communication interface 54 via the network 101. B is transmitted to the communication interface 54. Then, at the point B, the video received by the communication interface 54 is decoded (expanded) by the video codec 53, converted into an analog signal by the DA converter 55, and displayed on the monitor 56.
[0006]
The image is transmitted from the point B to the point A in exactly the same manner, and the image is displayed on the monitor 56 at the point A.
[0007]
FIGS. 2 and 3 are diagrams exemplifying images displayed on the monitor 56 of FIG. As shown in FIG. 2, images of all conference participants in a remote place (point B for point A and point A for point B) are displayed on the monitor 56, or the zoom amount of the video camera 51 in the remote place is displayed. By controlling the pan / tilt amount (the control system of the video camera 51 is not shown in FIG. 1), the video of one of the conference participants who is currently speaking is displayed on the monitor 56 as shown in FIG. The video is displayed in an up state (in this way, the video of the remote conference participant is displayed on the monitor in the video conference system, for example, see Patent Documents 1 and 2).
[0008]
[Patent Document 1]
JP-A-2002-237991 (paragraph number 0007, FIG. 1)
[Patent Document 2]
JP-A-2002-171499 (paragraph numbers 0002 to 0004)
[0009]
[Problems to be solved by the invention]
However, in such a conventional video conferencing system, if a remote participant includes an unfamiliar person, the person who looks at the monitor will know who the person is (what title and what name Who couldn't figure out).
[0010]
Also, even if all of the prospective participants in the remote area are people who are acquainted, if the number of participants is large, the monitor can be used to understand who is actually participating in the conference in the remote area. (When the images of all the remote participants are displayed on the monitor as shown in Fig. 2, the face of each individual participant is too small due to the compression rate of the video codec. Because it is not clear, it is difficult to know who is actually participating in the conference, and when the image of one person currently speaking is displayed on the monitor as shown in FIG. Participants are not displayed, so it is difficult to know who is actually participating in the meeting).
[0011]
SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and it is an object of the present invention to provide a two-way communication system such as a video conference system so that a remote participant can be easily grasped by looking at a monitor.
[0012]
[Means for Solving the Problems]
In order to solve this problem, the present applicant has proposed a two-way communication system in which, between a plurality of points separated from each other, images captured at individual points are respectively transmitted to the remaining points via a network. A database storing identification information for identifying a prospective participant in the bidirectional communication, and attribute information indicating the attribute of the prospective participant; and Authentication means for authenticating the actual participant in the two-way communication, detection means for detecting the position of the participant authenticated by this authentication means at each point, and of the video taken at each point, In the part corresponding to the position of the participant detected by the detection means, the attribute information of the participant stored in this database is stored. The propose that a adding means for adding data to visually display.
[0013]
In this two-way communication system, identification information for identifying a prospective participant in the two-way communication and attribute information indicating the attribute of the prospective participant are stored in a database. When one of the prospective participants actually participates in the interactive communication at any point, the participant is authenticated by the authentication means using the identification information stored in the database. Thereafter, the position of the authenticated participant at that point is detected by the detecting means.
[0014]
Then, in a portion corresponding to the position of the detected participant in the video taken at that point, the attribute information of the participant stored in the database is visually displayed by the adding means. Is added. As a result, from the point, data for visually displaying the attribute information of the participant is added to a part of the video taken at the point corresponding to the position of the detected participant. Things are sent to the rest.
[0015]
As a result, not only the video of this participant (participant in a remote place) is displayed on the monitor at the remaining point, but also the attribute information of the participant is displayed in a portion corresponding to the position of this participant. You.
[0016]
As described above, since the attribute information of the participant in the two-way communication in the remote place is displayed on the monitor together with the video of the participant, the participant in the remote place can be easily grasped just by looking at the monitor. Become like
[0017]
In this two-way communication system, as an example, the database stores face feature data of a prospective participant as identification information, and recognizes a face by using the face feature data from a video taken at each point. It is preferable that authentication be performed by recognition, and that the detecting means be configured to detect the position in the screen of the participant authenticated by the authenticating means from the video taken at each point.
[0018]
This allows the participant to carry a dedicated medium for authentication or position detection (an ID card for authentication, a radio wave or infrared ray generator for position detection, or the like) or to use the medium for authentication or position detection. Authentication or position detection of a participant can be performed without providing a device (an ID card reading device, a radio wave or infrared receiving device, or the like) for acquiring such information.
[0019]
Further, in a case where the two-way communication system is applied to a video conference system, as an example, the title and the name information are stored as attribute information in a database. It is preferable to configure to add.
[0020]
Therefore, even if a remote conference participant includes an unfamiliar person at the time of a video conference, for example, the person who looks at the monitor can determine who the person is. The name of the person with the title) can be easily grasped, and even if all the remote meeting participants are acquainted but there are many participants, just looking at the monitor Can easily understand who is participating in the conference.
[0021]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, an example in which the present invention is applied to a video conference system connecting two points separated from each other will be described with reference to the drawings.
[0022]
FIG. 4 is a diagram showing a conference room at each point (points A and B) in the video conference system to which the present invention is applied, from the viewpoint of conference participants. The conference participants are seated in front of the table 4, and a video conference device 1 and a monitor 2 with a built-in speaker are installed in front of the table 4 and a personal computer (hereinafter simply referred to as a computer). 3) are installed.
[0023]
The video conference device 1 is a device in which a video camera, a microphone, a codec, a communication interface for network communication, and the like are integrated, and the video camera and the microphone are directed toward the front of the device (the direction of the table 4).
[0024]
The video output terminal and the audio output terminal of the video conference device 1 are connected to the video input terminal and the audio input terminal of the monitor 2 by cables, respectively. Further, another video output terminal of the video conference device 1 is connected to a video input terminal of the computer 3 by a cable, and a video input terminal of the video conference device 1 is connected to a video output terminal of the computer 3 by a cable.
[0025]
FIG. 5 is a block diagram showing the configuration of the video conference device 1 together with the overall configuration of the video conference system. The video conference device 1 includes a video camera 11, a mixing circuit 12, an AD converter 13, a video codec 14, a communication interface 15, a DA converter 16, a microphone 17, an AD converter 18, an echo canceller 19, an audio codec 20, a DA converter 21, and a CPU 22. , A camera controller 23.
[0026]
At the point A, the video signal output from the video camera 11 of the video conference device 1 is sent from the video conference device 1 to the computer 3 and to the mixing circuit 12. In the mixing circuit 12, the video signal from the video camera 11 and the video signal sent from the computer 3 to the video conference device 1 are mixed. The video signal output from the mixing circuit 12 is digitally converted by the AD converter 13, encoded (compressed) by the video codec 14, and transmitted from the communication interface 15 to the communication interface 15 of the video conference device 1 at the point B via the network 101. Is done. Then, at the point B, the video data received by the communication interface 15 of the video conference device 1 is decoded (expanded) by the video codec 14, converted into an analog signal by the DA converter 16, and sent from the video conference device 1 to the monitor 2. Displayed on monitor 2.
[0027]
At the point A, the audio signal output from the microphone 17 of the video conference device 1 is digitally converted by the AD converter 18, encoded by the audio codec 20 via the echo canceller 19, and transmitted from the communication interface 15 via the network 101. It is transmitted to the communication interface 15 of the TV conference device 1 of B. Then, at the point B, the audio data received by the communication interface 15 of the video conference device 1 is decoded by the audio codec 14, converted into an analog signal by the DA converter 21 via the echo canceller 19, and transmitted from the video conference device 1 to the monitor 2. It is reproduced by the built-in speaker of the monitor 2.
[0028]
Video and sound are transmitted from the point B to the point A in exactly the same manner. At the point A, the video is displayed on the monitor 2 and the sound is reproduced by the built-in speaker of the monitor 2.
[0029]
The camera controller 23 adjusts the zoom amount and the pan / tilt amount of the video camera 11 under the control of the CPU 22. The CPU 22 is based on a signal generated by operating a zoom or pan / tilt operation key of a remote controller (not shown) at the own location or control data transmitted from the video conference device 1 at the other location via the network 101. A control signal is given to the camera controller 23.
[0030]
The computer 3 stores a database for registering data on each prospective participant in the conference. FIG. 6 shows data registered in this database. For each prospective participant A, B, C..., Face feature data for performing face recognition, and a title such as “XX section XX section manager” and a title / name data indicating a name are included. Each is registered.
[0031]
Further, the computer 3 stores a program for executing a process as shown in FIG. In this process, an actual participant in the conference is first authenticated by performing face recognition from the video signal input from the video conference device 1 to the computer 3 using the facial feature data (FIG. 6) in the database. (Step S1).
[0032]
This face recognition is performed by existing face recognition processing in the following processes (a) to (d).
(A) Face area detection from input video
(B) Extraction of a face area from an input video based on the detection result of the face area, and normalization of variations in the size, luminance, and the like of the extracted face area
(C) Extraction of face features from face area
(D) Matching the extracted facial features with the facial feature data in the database
[0033]
Subsequently, from the video signal input from the video conference device 1, the position of each participant who has been authenticated in step S1 so far is detected on the screen (in the video of one frame) (step S2). This position detection is performed by a method of determining the position of the face area detected in the process (a) of step S1 for each authenticated participant as it is as the position of the participant.
[0034]
Subsequently, it is determined whether or not a new participant who has not been authenticated has been authenticated in the current step S1 (step S3).
[0035]
If yes, the title and name of the participant are placed near the position of each participant detected in step S2 this time and not overlapping with other participants using the title / name data in the database. Is generated (step S4). Then, the character data is sent from the computer 3 to the video conference device 1 (step S5). Then, the process returns to step S1 and repeats the steps from step S1.
[0036]
If NO in step S3, the position of any participant detected in step S2 this time changes from the position detected in previous step S2, or the participant whose position was previously detected is It is determined whether the position has not been detected this time or a participant whose position has not been detected previously has been detected this time (step S6).
[0037]
If yes, proceed to step S4. On the other hand, if no, the process returns to step S1 and repeats the steps from step S1.
[0038]
Next, a state of a conference using this video conference system will be described. For example, it is assumed that the meeting attendees at the point A are the next seven persons at a certain business establishment.
・ Kinoshita
・ Sales Division 1: Manager Sato, Manager Yamada
・ Sales Division 2: Suzuki, Ueda
・ Design Department 1: Manager Yoshida
・ Design Department 2: Section Manager Tanaka
[0039]
In the database (FIG. 6) in the computer 3 at the point A, the facial feature data and the title / name data of the seven potential participants are registered before the conference starts.
[0040]
At the start time of the meeting, it is assumed that, at the point A, of the seven prospective participants, the manager of the second sales department, Suzuki, cannot urgently participate, but the remaining six persons participated in the meeting as scheduled.
[0041]
At the point A, the zoom amount and the pan / tilt amount of the video camera 11 (FIG. 5) are controlled by the camera controller 23 (FIG. 5) at the point A by the video conference device 1 so that all the six persons are included in the angle of view. It is assumed that shooting was performed. Then, the video is transmitted from the video conference apparatus 1 to the video conference apparatus at the point B via the network 101 and from the video conference apparatus 1 to the computer 3.
[0042]
Then, the computer 3 executes steps S1 to S5 of the processing in FIG. 7, and character data for displaying the title and name of the person near the respective positions of the six persons on the screen are transmitted from the computer 3. It is sent to the video conference device 1.
[0043]
Then, the character data is mixed with the video from the video camera 11 by the mixing circuit 12 (FIG. 5) in the video conference device 1. As a result, the video to which the character data is added is transmitted from the video conference device 1 to the video conference device at the point B.
[0044]
As a result, not only the images of the six participants at the point A are displayed on the monitor 2 at the point B, but also the titles of the participants near the faces of the six participants. And the telop with the name are displayed. FIG. 8 is a diagram illustrating an example of an image displayed on the monitor 2 at the point B at this time.
[0045]
It is assumed that the meeting participant at the point B has no meeting with two of the meeting participants at the point A, namely, Kinoshita and the Design Section 1 Yoshida, and has knowledge of the remaining 5 persons. The meeting participant at the point B only saw this telop displayed on the monitor 2 despite having no acquaintance with Mr. Kinoshita and Mr. Yoshida, Design Section 1. In the example of FIG. It can be easily grasped that the person is the Yoshida section manager of the design section 1 and the third person from the left of the screen is the director Kinoshita.
[0046]
In addition, the meeting participants at point B were informed in advance that there were a total of seven potential attendees at point A, including five who were acquainted. Until then, let us know. The conference participant at the point B is displayed on the monitor 2 even when the face of the person is not displayed clearly on the monitor 2 due to the compression ratio of the video codec 14 (FIG. 5) of the video conference device 1. Just by looking at this telop, it is easy to see who actually participates in the meeting at point A (the manager of the Sales Department, Suzuki, does not participate, and the remaining six people participate). I can figure it out.
[0047]
Subsequently, Kinoshita begins to speak, and at point A, the video conferencing apparatus 1 uses the camera controller 23 (FIG. 5) to adjust the zoom amount and the pan / tilt amount of the video camera 11 (FIG. 5) so as to raise the Kinoshita director. It is assumed that the shooting is performed under the control.
[0048]
Then, based on the video, the computer 3 executes steps S1 to S3 of the processing in FIG. Has also been authenticated), so the answer is no in step S3 and the process proceeds to step S6, where the position of Director Kinoshita has changed in the screen (the size of the face area is different from the previous time) and the remaining participants Since the position is not detected, the result in step S6 is YES, and steps S4 and S5 are executed. Therefore, character data for displaying the title and name near the position of Director Kinoshita on the screen are sent from the computer 3 to the video conference apparatus 1 this time.
[0049]
As a result, on the monitor 2 at the point B, the image of Director Kinoshita is displayed in an up state, and the title and telop of the name are displayed near the face of Director Kinoshita. FIG. 9 is a diagram illustrating an example of an image displayed on the monitor 2 at the point B at this time.
[0050]
Also, for example, at the point A, while all the six participants are being photographed with the video conference device 1 in the angle of view, the participants move their seats, or some of the participants leave. Or you did.
[0051]
Then, steps S1 to S3, S6, S4, and S5 of the processing in FIG. 7 are also executed by the computer 3, and the monitor 2 at the point B displays the vicinity of each participant's face after the movement and the abandoned person. The title and the telop of the participant are displayed near the face of the remaining participant except for the participant.
[0052]
Further, for example, at the point A, the sales department manager Suzuki, who was in urgent business, completed the business and participated in the meeting from the middle, and the video conference device 1 also shot the sales department manager Suzuki section manager in the angle of view. Suppose. Then, in the computer 3, steps S1 to S3 of the processing of FIG. 7 are executed, and since a new participant (Sales Division 2, Suzuki section manager) is authenticated, the answer in step S3 is YES, and steps S4 and S5 are executed. . Therefore, character data for displaying the title and name near the position of each participant including the sales department 2 section manager Suzuki on the screen are transmitted from the computer 3 to the video conference apparatus 1 this time.
[0053]
As a result, the title and the telop of the name are displayed on the monitor 2 at the point B also near the face of the manager of the second sales department, Suzuki, who participated in the meeting from the middle.
[0054]
In this way, at the point A, the zoom amount / pan / tilt amount of the video camera 11 of the video conference device 1 changes, a participant moves or leaves, or a participant joins the conference from the middle. However, on the monitor 2 at the point B, the title and the telop of the participant are always displayed near the face of each participant currently displayed.
[0055]
Here, the state of the conference viewed from the conference participant at the point B (display on the monitor 2 at the point B) has been described. A telop of the participant's title and name is always displayed near the participant's face.
[0056]
As described above, according to this video conference system, the title and name of each conference participant at a remote location (point B for point A and point A for point B) are displayed on the monitor together with the video of the participant. Therefore, it is possible to easily grasp the remote participants simply by looking at the monitor.
[0057]
Since the participant is authenticated and the position is detected based on the face recognition processing, the participant is provided with a dedicated medium for authentication and position detection (an ID card for authentication, a radio wave or infrared ray generator for position detection). Participants in the conference room without possessing a device (such as an ID card reader or a radio or infrared receiver) that acquires information for authentication or position detection from the medium. Authentication and position detection.
[0058]
In the above example, the authentication and the position detection of the participant are performed based on the face recognition processing. However, as another example, the participant authentication may be performed by a personal recognition process such as a voice recognition process, a fingerprint recognition process, or a retinal recognition process, or the position of the participant may be detected by a voice recognition process.
[0059]
Alternatively, if necessary, the participant can have a dedicated medium for authentication and position detection (an ID card for authentication, a generator of radio waves or infrared rays for position detection, etc.), and use the medium for authentication and position detection. A device for acquiring information for detection (an ID card reader, a radio or infrared receiver, or the like) may be provided in the conference room.
[0060]
Alternatively, for example, in a case where microphones for the number of people who are going to participate are arranged at predetermined positions on the table 4 in FIG. 4 (audio signals are transmitted from these microphones to the video conference apparatus 1), For example, each microphone incorporates an operating device and a circuit for transmitting the ID information of the participant who uses the microphone to the computer 3 and stores the position of each microphone 3 in the computer 3 based on the ID information. In addition to performing participant authentication, the position of the microphone that sent the ID information may be determined as the position of the authenticated participant.
[0061]
In the above example, one computer 3 is provided for each of the points A and B. However, as another example, only one computer 3 is provided at one of the points A and B (or another one connected to the points A and B via the network 101). 3 may register the face feature data and the title / name data of the prospective participants at both the point A and the point B in the database, and execute the processing of FIG. 7 for both the participants.
[0062]
Further, in the above example, a computer 3 that stores a database and executes the processing in FIG. 7 is provided separately from the video conference device 1. However, as another example, the video conference device 1 itself may be configured to store a database and execute the processing in FIG.
[0063]
In the above example, by registering the title and name data in the database in the computer 3 (FIG. 6), the title and the telop of the participant are displayed near the participant's face displayed on the monitor 2. Is displayed. However, the present invention is not limited to this. By registering attribute information indicating an appropriate attribute of a prospective participant in this database, the attribute information is displayed near the participant's face displayed on the monitor 2. Good. One example is data summarizing, in addition to, or instead of title / name data, the prospective attendees' opinions at past meetings (eg, opinions on or against a project). Is registered in this database so that a summary telop of such a claim is also displayed near the participant's face displayed on the monitor 2.
[0064]
In the above example, the present invention is applied to a video conference system connecting two points, point A and point B. However, the present invention is not limited to this, and the present invention may be applied to a video conference system connecting three or more points or an appropriate two-way communication system other than the video conference system.
[0065]
When the present invention is applied to an interactive two-way communication system for entertainment, for example, by registering image data of a favorite animation of a prospective participant in a database as attribute information, the vicinity of the participant's face displayed on a monitor can be obtained. The animation image may be displayed on the monitor, or the animation image may be displayed on the participant's face displayed on the monitor.
[0066]
When the present invention is applied to a two-way communication system in which there is a person who does not want to display a face on the monitor among the prospective participants, an instruction is given to apply a mosaic as attribute information about the person. By registering the information in the database, a mosaic may be applied to the face of the person displayed on the monitor.
[0067]
Further, the present invention is not limited to the above-described example, and it is needless to say that various other configurations can be adopted without departing from the gist of the present invention.
[0068]
【The invention's effect】
As described above, according to the two-way communication system according to the present invention, the attribute information of the participant in the two-way communication in the remote place is displayed on the monitor together with the image of the participant, so that the user only needs to look at the monitor. The effect is obtained that a participant in a remote place can be easily grasped.
[0069]
In addition, the authentication and position detection of the participant can be performed without having the participant possess a dedicated medium for authentication or position detection, or providing a device that acquires information for authentication or position detection from the medium. The effect that it can be performed is acquired.
[0070]
Also, when conducting a video conference, even if some of the remote meeting participants do not know anyone, just by looking at the monitor, what kind of person is that person? Is easily understood, and even if all the people who are going to the conference in remote areas are all acquaintances but there are many participants, just watching the monitor, who actually participates in the conference It is possible to obtain the effect that the user can easily understand whether the user is performing the job.
[Brief description of the drawings]
FIG. 1 is a diagram showing a device for transmitting and receiving video in a conventional video conference system.
FIG. 2 is a diagram illustrating an example of an image displayed on the monitor of FIG. 1;
FIG. 3 is a diagram illustrating an example of an image displayed on the monitor of FIG. 1;
FIG. 4 is a diagram showing a conference room in a video conference system to which the present invention is applied.
FIG. 5 is a block diagram illustrating a configuration of the video conference device in FIG. 4;
FIG. 6 is a diagram showing a database in the computer of FIG. 4;
FIG. 7 is a flowchart showing a process executed by the computer of FIG. 4;
FIG. 8 is a diagram illustrating an example of an image displayed on the monitor of FIG. 4;
FIG. 9 is a diagram illustrating an example of an image displayed on the monitor of FIG. 4;
[Explanation of symbols]
1 video conference device, 2 monitor, 3 computer, 4 table, 11 video camera, 12 mixing circuit, 13, 18 AD converter, 14 video codec, 15 communication interface, 16, 21 DA converter, 17 microphone, 19 echo canceller, 20 Voice codec, 101 network

Claims

In a two-way communication system in which, between a plurality of points apart from each other, images taken at the individual points are transmitted to the remaining points via a network.
A database storing identification information for identifying a prospective participant in the two-way communication, and attribute information indicating an attribute of the prospective participant;
Using the identification information stored in the database, an authentication unit that authenticates an actual participant in the interactive communication at each of the points,
Detecting means for detecting the position of the participant authenticated by the authentication means at each of the points;
For visually displaying attribute information of the participant stored in the database at a part corresponding to the position of the participant detected by the detection means in the video taken at each of the points. An interactive communication system comprising: an adding unit for adding data.

The two-way communication system according to claim 1,
The database stores face feature data of the prospective participant as the identification information,
The recognition unit performs authentication by face recognition using the face feature data from an image captured at each of the points,
The two-way communication system according to claim 1, wherein said detecting means detects a position in a screen of the participant authenticated by said authenticating means, from a video taken at each of said points.

The two-way communication system according to claim 1,
In the database, title and name information are stored as the attribute information,
The said adding means adds the character data of the title and the name, The two-way communication system characterized by the above-mentioned.