JP4373645B2

JP4373645B2 - Video distribution system, program, and recording medium

Info

Publication number: JP4373645B2
Application number: JP2002160646A
Authority: JP
Inventors: 憲彦村田; 青木　　伸
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2002-05-31
Filing date: 2002-05-31
Publication date: 2009-11-25
Anticipated expiration: 2022-05-31
Also published as: JP2004007283A

Description

【０００１】
【発明の属する技術分野】
本発明は、広角の視野を有する撮像手段を用いて取得された広範囲のシーンの映像を配信するシステムに関するものである。具体的には監視システム、遠隔会議システム、遠隔教育システム等の用途に使用される。
【０００２】
【従来の技術】
電気通信技術の発展により、あるシーンを撮影した映像を遠隔地に転送し、表示するための遠隔監視システムやテレビ会議システムが、多くの場面で活用されるようになった。かかるシステムの利便性をより向上させるために、通常のカメラでは撮影不可能な広範囲のシーンを撮影できるカメラ装置及び必要なシーンのみを切り出した部分映像を表示するためのシステムが数多く提案されている。
【０００３】
例えば、特開平5-122689号公報において、マイクから入力される音声を検出して話者を判定し、該判定結果に基づいてカメラ制御部においてカメラを自動制御し、話者を捉えるというカメラ装置及びテレビ会議システムが提案されている。また、集約的視野を形成するよう複数のカメラを配列させたカメラ装置がある。
【０００４】
しかし、特開平5-122689号公報に開示された発明では、話者を捉えるためにカメラを制御するのに時間がかかる、またカメラを回動させる駆動ユニットが壊れやすいという問題点がある。また、上記2つの従来技術においては、シーン全体を記録した映像を取得できないという問題点がある。本出願人は被験者に対し、(1) シーンの注目部分を映した部分映像のみを表示する、(2) シーンの全体を映した映像のみを表示する、(3) (1)及び(2)の映像を同時に表示する、という3種類の映像表示形態を提示し、どれが最も好ましいかを評価する試験を行った。その結果、圧倒的に(3)の表示形態が望ましく、次いで(2)の表示形態が望ましいという評価結果を得た。このように、かかるシステムにおいては、シーン全体が含まれた映像を転送することの重要性が認識されているが、上記従来技術はこの問題点を解消するものではなかった。
【０００５】
一方、シーン全体を記録した1枚の映像を取得するために、様々な広角撮影装置が提案されている。例えば、特許番号第2939087号公報及び特許番号第3054146号公報において、双曲線ドーム型ミラーによる360度パノラマ撮影カメラに関する発明が公開されている。これらの公報によると、ドーム型のミラーに反射させた像を撮影することで、魚眼カメラに比較して側方の解像度が高い360度 (半球)分の情景が撮影できる。このカメラを机の上や天井などに設置すれば、室内の全体を観察することができるため監視用途などに適している。撮影された画像は、図７のように円形状に歪んだ形となるが、特許番号第3054146号公報にはこれを通常のカメラで撮影したように変形する手段が記載されている。これらの公報によると、シーン全体を記録した画像のみならず、切り出し処理を適用することにより部分映像をも生成することが可能である。しかし、これらの公報に記載された広角撮影装置の場合、全方位の映像を1枚の円形状の映像に取得することにより、図７の黒塗り部分のように何も映されない無駄な領域が生じ、その結果得られる部分映像の画質が不十分となる。
【０００６】
このほか、特開平2001-94857号公報において、カメラアレイにより同期的に捕捉された1組の映像を共通座標系にワープさせることにより、継ぎ目のない1つの広角画像を生成するという発明が公開されている。該公報によると、通常のカメラで捉えた映像を1枚に結合することで、画面上全体にわたって高い解像度を有する広角画像を取得できる。しかし、複数の映像信号を実時間でワープさせるのに膨大な処理コストが必要となる、カメラアレイが事前に校正されている必要があるなどの問題点がある。
【０００７】
【発明が解決しようとする課題】
本発明は、上述の問題点に鑑みてなされたものであり、その第１の目的は、簡素な構成・処理で、広範囲の映像を取得・配信すると同時に、所望のシーンの映像を高い解像度で取得・配信することを可能とならしめる映像配信システム並びに該システムの各部の処理を実行させるためのプログラム及び記録媒体を提供しようとするものである。
【０００８】
また、本発明の第２の目的は、閲覧者に対し、高い解像度を持つ所望のシーンの映像を一層容易に選択可能とならしめる映像配信システムを提供しようとするものである。
【０００９】
また、本発明の第３の目的は、ユーザに面倒な操作を強いることなく、所望のシーンの映像を高い解像度で取得・配信することを可能とならしめる映像配信システムを提供しようとするものである。
【００１０】
また、本発明の第４の目的は、取得された広範囲の映像を、更に閲覧者に観察しやすい形で表示することを可能とならしめる映像配信システムを提供しようとするものである。
また、本発明の第５の目的は、更に所望のシーンの映像を高い解像度で漏れなく取得・表示することを可能とならしめる映像配信システムを提供しようとするものである。
【００１１】
【課題を解決するための手段】
請求項１に記載の発明は、映像配信システムであって、広角の映像を取得する第一の撮像手段と、互いに異なる所定の領域が複数撮影された映像を同期的に取得する第二の撮像手段と、前記広角の映像と前記第二の撮像手段により取得された各々の映像との対応関係を特定する特定手段と、前記第二の撮像手段により取得された映像の少なくとも一つ、前記広角の映像、及び前記特定手段により特定された対応関係を配信する配信手段とを有し、前記第二の撮像手段は複数のカメラにより構成され、前記複数のカメラは各々識別子が付されており、前記第一の撮像手段は、前記複数のカメラに付された識別子を撮影範囲に含み、前記特定手段は、前記広角の映像において含まれる識別子の撮影位置に基づき、前記対応関係を特定するものである。
【００１５】
請求項２に記載の発明は、映像配信システムであって、特定手段は、前記広角の映像と前記第二の撮像手段により取得された各々の映像との類似度に基づき、前記対応関係を特定するものである。
【００１６】
請求項１、２に記載の発明により、第１撮像部により取得された広角の映像と、第２撮像部により取得された各々の部分映像との対応関係を特定する特定部を備えることにより、閲覧者に対して、高い解像度を持つ所望のシーンの映像を一層容易に選択することが可能となる。
【００１７】
請求項３に記載の発明は、映像配信システムであって、広角の映像を取得する第一の撮像手段と、互いに異なる所定の領域が複数撮影された映像を同期的に取得する第二の撮像手段と、前記第二の撮像手段により取得された複数の映像より所定の映像を選択する映像選択手段と、前記映像選択手段により選択された所定の映像及び前記広角の映像を配信する配信手段と、音声強度を取得する複数の音声取得手段と、前記複数の音声取得手段により取得された複数の音声強度のうち、音声強度が最大の音声取得手段と、残りの音声取得手段の相対的な強度差とに基づいて音源の位置又は方向を検出する音源検出手段とを有し、前記映像選択手段は、前記音源検出手段により出力された音源の位置若しくは方向に基づいて、前記所定の映像を選択するものである。
【００１９】
請求項４に記載の発明は、映像配信システムであって、更に、前記広角の映像又は前記第二の撮像手段により取得された複数の映像における被写体の動きを検出する動き検出手段を有し、前記映像選択手段は、前記動き検出手段により出力された被写体の動きに基づいて、前記所定の映像を選択するものである。
【００２０】
請求項３、４に記載の発明により、音源検出部により出力された音源の位置若しくは方向により、所定の映像を選択する映像選択部を備えることにより、ユーザに面倒な操作を強いることなく、所望のシーンの映像を高い解像度で取得・配信することが可能となる。
【００２１】
請求項５に記載の発明は、コンピュータに、映像配信システムの各手段に係る処理を実行させることを特徴とするプログラムである。
【００２２】
更に、広角の映像を変形する変形部を備えることにより、取得された広範囲の映像を、更に閲覧者に観察しやすい形で表示することが可能となる。
【００２３】
また、第２撮像部により取得される各々の映像が、少なくとも一の他の映像と一部の共通する領域を含むことにより、更に所望のシーンの映像を高い解像度で漏れなく取得・表示することが可能となる。
【００２４】
請求項５、６に記載の発明により、第１撮像部により取得された広角の映像と、第２撮像部により取得された各々の部分映像との対応関係を特定する特定部を備えることにより、閲覧者に対して、高い解像度を持つ所望のシーンの映像を一層容易に選択することが可能となる。
【００２５】
【発明の実施の形態】
まず、映像配信システムがどのように使用されるかの使用例について簡単に概説し、次に、映像配信システムの実施の形態を具体的に説明する。各実施の形態においては、それを構成する要素及びその動作を説明し、最後に処理の流れについて説明する。
【００２６】
先ず最初に映像配信システムの使用例について説明する。
【００２７】
図１は、本発明を会議場面に設置した使用例を概説する説明図である。映像配信システムは、広角の映像を取得する広角カメラ200と、通常の画角を持つ複数のカメラ401-1から401-4より構成されたカメラアレイ400と、会議中の音声を取得するマイクロフォン501と、広角カメラ200及びカメラアレイ400で取得された映像データ並びにマイクロフォン501により取得された音声データを取り込み、配信するためのサーバ300とを有する。
【００２８】
図１に示したように、広角カメラ200は、テーブル1に設置され、会議の参加者（話者）2-1から2-4のいる方向、例えば水平面を見渡す全周囲の画像を一括して撮像する。また、カメラアレイ400を構成する各々のカメラ401-1から401-4は、例えばそれぞれ会議の参加者の前面に置かれ、各参加者の姿を撮影する。これらのカメラ401-1から401-4により取得される映像を、以後「部分映像」と呼ぶ。また、サーバ300はキャビネット3に格納され、広角カメラ200及びカメラアレイ400からの映像データ並びにマイクロフォン501により取得された音声データを取得する。このほかサーバ300は、クライアントからの要求に応じて、電気通信回線を介して、クライアントが持つPC (Personal Computer)やPDA (Personal Digital Assistant)などの端末に対し、取得された上記映像データを配信する（なお、図１では電気通信回線及びクライアントが持つ端末を省略している）。図２は、クライアントが持つPCの映像表示端末における表示画面を示した図である。図２に示すように、カメラアレイ400により取得された少なくとも一の部分映像601及び広角カメラにより取得された映像602が、表示用ウィンドウ600の上に表示される。ここで、映像選択ボタン603を操作すると、サーバ300に対して配信要求する部分映像の選択情報が送信され、サーバ300を介して送られるカメラアレイ400からの映像601が切り替えられる。
【００２９】
一方、図３は、本発明を美術館などの会場の様子を映像として配信するシステムに適用した例を概説する説明図である。映像配信システム100は、広角の映像を取得する広角カメラ200と、通常の画角を持つ少なくとも1つのカメラ401-1、401-2より構成されたカメラアレイ400と、広角カメラ200及びカメラアレイ400で取得された映像データを取り込むためのサーバ300とを有する（なお、図３ではサーバ300を省略している）。
【００３０】
図３示したように、広角カメラ200は、天井に設置され、会場全体を見渡す全周囲の画像を一括して撮像する。また、カメラアレイ400を構成する各々のカメラ401-1、401-2は、会場内の所々に設置され、例えば館内に展示された各々の絵画の前面を映すために、天井から吊るされた状態で固定される。また、サーバ300は会場内の人目につかない場所に設置され、広角カメラ200及びカメラアレイ400からの映像データを取得する。このほかサーバ300は、クライアントからの要求に応じて、電気通信回線を介して、クライアントが持つ端末に対し、取得された上記映像データを配信する（なお、図には電気通信回線及びクライアントが持つ端末を省略している）。
【００３１】
以下の各実施の形態では、本発明の映像配信システムを、会議の撮影及びその映像の配信に適用した場合について説明する。
【００３２】
１．実施の形態１
先ず最初に、本発明の実施の形態１について説明する。
【００３３】
１．１構成
図４は、本発明の実施の形態１に係る映像配信システムの構成を示す図である。サーバ300には、ＵＳＢハブ320及びバス310を介して広角カメラ200と、カメラアレイ400とが接続され、広角の映像データ及びカメラアレイ400により取得された複数の部分映像データが取得される。サーバ300により取得された映像データは、インターネット330を介して配信され、該インターネットに接続されたクライアントPCにおいて表示される。
【００３４】
次に、上記各部の構成について説明する。
【００３５】
１．１．１広角カメラ
図５は、実施の形態１に係る、第一の撮像手段としての、広角カメラ２００の構成を示す図である。この第一の撮像手段としての広角カメラ200は、所定形状の曲面を有するミラー211と、レンズ212と、絞り213と、CCD（Charge Coupled Device）等の撮像素子214と、上記撮像素子214のタイミング制御、並びに上記撮像素子214により得られた映像信号に対してアナログ−デジタル変換等のデジタル化処理を行う駆動部215と、前記駆動部215により得られたデジタル信号に対してエッジ強調やγ補正等の前処理を行う前処理回路216と、アイリスを制御するために絞り213を駆動するモータ駆動部217とを備えている。
【００３６】
ミラー211は、光学系に入射する光を反射させることにより広角の撮影を可能とするためのものであり、ここでは所定形状の曲面有するミラーとして、双曲面ミラーを使用する。図６は、本実施の形態の双曲面ミラー211を用いた場合の光路を説明する図である。また、図７は、本実施の形態の双曲面ミラー211により撮像素子214の表面に結像される広角画像の様子を示した図である。図７に示すように、双曲面ミラー211から反射されて撮像素子214に取り込まれる画像はドーナツ形状となっている（このドーナッツ形状の映像を以後「ドーナッツ映像」と呼ぶ）。該ドーナッツ映像は、前記撮像素子214において結像され、さらに前記駆動部215においてデジタル化され、前処理回路216を介して後述するサーバ300に送出される。なお、図６の中の中心部は、撮像素子214の方向を映し出し、これは重要でない画像情報である。したがって、双曲面ミラー211の頭頂部218を黒く塗りつぶして、黒色情報としてもよい。なお、使用の態様によっては、頭頂部218に基準線を描画し、広角カメラ200の立ち上げの際、モータ駆動部217を駆動することにより、ピント調整などの初期設定に利用してもよい。
【００３７】
上記のように、通常のカメラとミラーの組み合わせにより、安価かつ簡素な構成で広角の映像を撮影することができる。
１．１．２カメラアレイ
カメラアレイ400は少なくとも1つのカメラより構成され、各々のカメラは、前記広角カメラ200の撮影範囲の一部のシーンを、より高い解像度で撮影する。カメラアレイ400を構成するカメラ401は、図１及び図３のようにバラバラに配置されても、図８のように各々のカメラ401-1から401-3を筐体402に固定して配置したものであっても構わない。カメラ401に使用される撮像素子は、CCD、CMOS (Complementary Metal-Oxide Semiconductor)型など様々な種類のものを使用することができる。該撮像素子において結像された映像信号は、カメラ内部でデジタル化された後、後述するサーバ300に送出される。
【００３８】
上記の構成を有するカメラ401を少なくとも1つ用意することにより、安価かつ簡素な構成で部分映像を取得することができる。
【００３９】
１．１．３サーバ
図９は、本実施の形態におけるサーバ300の構成例を示した図である。すなわち、映像配信システム100における各種の制御及び処理を行うCPU (Central Processing Unit) 301と、SDRAM (Synchronous Dynamic Random Access Memory) 302と、HDD (Hard Disk Drive) 303と、マウス３１１等のポインティングデバイス、キーボード３１２等の各種入力インターフェース（以下I/Fと略す）304と、電源305と、CRT (Cathode Ray Tube)等のディスプレイとを接続するための表示I/F 306と、前記広角カメラ200や前記カメラアレイ400などの外部機器を接続するための外部I/F 307とを、バス３１３を介して接続することにより構成される。
【００４０】
次に、サーバ300の各構成部について説明する。CPU 301は、HDD 303に格納された所定のプログラムにしたがって、広角カメラ200及びカメラアレイ400からの映像取得、取得された映像の配信などの各種処理及び制御を行う。SDRAM 302は、CPU 301の作業領域として利用されるとともに、HDD 303に格納された各処理プログラムや、Windows (登録商標） NT Server (米国Microsoft社の登録商標)などのOS (Operating System)の記憶領域として利用される。
【００４１】
外部I/F307の一例として各種I/Fボード、USB (Universal Serial Bus)、IEEE 1394、或いはIrDA、Bluetooth等の無線I/Fが挙げられる。前記広角の映像データ及び前記カメラアレイ400により取得される複数の部分映像データは、前記広角カメラ200及びカメラアレイ400をUSB2.0のような高速シリアルインターフェース経由でサーバ300に接続することにより、同期的に取得することが可能である。
【００４２】
１．２動作
図１０は、図４に示された本実施の形態に係る映像配信システムを、機能別のブロック図に書き直した図である。以下において、図１０に示された各部の動作を具体的に説明する。
【００４３】
１．２．１第１撮像部
第1撮像部31は、上記の１．１．１に記載した広角カメラ200により構成され、取得され且つデジタル化された広角の映像データを出力する動作を行う。
【００４４】
１．２．２第２撮像部
第２撮像部32は、上記の１．１．２に記載されたカメラアレイ400により構成され、取得され且つデジタル化された部分映像データを出力する動作を行う。
【００４５】
１．２．３変形部
図１１は、実施の形態１における変形部３３の動作を説明する図である。変形部33は、第1撮像部31により取得された広角の映像データを、図１１に示すように、通常のカメラで捉えた透視変換像に近い映像 (以下パノラマ映像と呼ぶ)に変形する。一般に、広角の範囲を撮影可能なカメラによって得られる映像は、上述したように、人間の眼で確認できる像の形状と異なり、大きな歪みが含まれるために、後に閲覧するときの便宜を図るために変形処理を施すと好適である。以下、文献 (A.M.Bruckstein and T.J.Richardson: “Omniview Cameras with Curved Surface Mirrors”, Proc. of the IEEE Workshop on Omnidirectional Vision 2000, pp.79-84) に記載された方法を参考に、広角の映像データ（図７に示したドーナッツ映像を）をパノラマ映像に変形する方法を説明する。
【００４６】
図１２は、双曲面ミラーを使用したカメラにおける映像の変形原理を説明するための図である。また図１２ (a)は、変形部33の動作例を示し、ドーナッツ映像を、横軸を方位角とし且つ縦軸を仰角とする円柱面に映されたパノラマ映像に座標変換する。また図１２(b)は、広角カメラ200の幾何的構造を示す図であり、図１２（ｂ）の中のカメラの光学系は中心射影モデルである。ここで、図中の各変数の意味は、下記の通りである。
(u, v)：ドーナッツ映像における座標
(u₀, v₀)：ドーナッツ映像における双曲面ミラーの中心の座標
(θ, φ)：パノラマ映像における座標
r： (u₀, v₀)から(u, v)への画素単位の距離
r_max：ドーナッツ映像における双曲面ミラーの画素単位の半径
θ：方位角
φ：仰角
ψ：カメラの光軸からの頂角
F：双曲面ミラーの焦点
F’：双曲面ミラーと対をなす双曲面の焦点、カメラの光学中心に一致する。
【００４７】
このとき、頂角ψと仰角φとの間に、以下の関係が成立する。
【００４８】
【数１】

ここで、
【００４９】
【数２】

である。また、φminはドーナッツ映像上の半径rmaxの位置に対応する仰角φの値であり、これはカメラの仰角方向の撮影限界値を表す。rmaxとφminの値は一般に容易に知ることができる。
以下、変形の手順を説明する。
(i) 点(u, v)に対応する極座標 (r, θ)を、次式を解くことにより求める。
【００５０】
【数３】

(ii) (3)式により算出されたrに対応する頂角ψを次式により求める。
【００５１】
【数４】

ここで、
【００５２】
【数５】

であり、ψ_maxはドーナッツ映像上の半径r_maxの位置及び仰角φ_minに対応する頂角ψの値である。ψ_maxの値は、(1)式にφ_minを代入することにより求めることができる。
【００５３】
(iii) (4)式により算出されたψに対応する仰角φを、(1)式により求める。
【００５４】
以上の手順により、双曲面ミラーにより撮影されたドーナッツ映像における任意の点(u, v)を、パノラマ映像における点(θ, φ)に座標変換することができる。すなわち、ドーナッツ映像がパノラマ映像に変形される。
【００５５】
図１３は、変形部３３で使用される座標変換テーブルを説明する図である。撮影からパノラマ映像の配信を一時に行う場合、上記の変形処理に要する計算時間が問題となるため、図１３のように、上記の手順に基づいた座標変換テーブルを予め作成しておくと好適である。図１３の座標変換テーブルにおいては、各点(θ, φ)に対応するドーナッツ映像の座標(u, v)を格納しておく。
【００５６】
以上の変形処理は、前記サーバ300内の前記CPU 301により実行される。このとき、前記HDD 303には該変形処理を施すための所定のプログラムを予め格納しておく。
【００５７】
１．２．４エンコード部
図１０のエンコード部34は、図１０の前記第2撮像部32により取得された部分映像データの少なくとも一つと、前記変形部33により出力されたパノラマ映像データを、映像配信に適した形式にエンコードする。ここで、映像配信に適した形式として、RealNetworks社から提供されているRealVideoを使用した場合の動作を説明する。同社より提供されているRealProducerというエンコードプログラムにより、取得された映像データからRealVideo形式へのエンコードを実行する。RealProducerは、映像データの取得が継続している限り常に映像や音声をエンコードし続け、エンコードされたデータを連続して配信部35に送信する。以上のエンコード処理は、前記サーバ300内の前記CPU 301により実行される。このとき、前記HDD 303にRealProducerを予めインストールしておく。
【００５８】
１．２．５配信部
配信部35は、クライアントが持つ映像表示端末からの要求に応じて、前記エンコード部34によりエンコードされた広角の映像データと、少なくとも一つの部分映像データを、インターネットを介して配信する。ここで、映像配信を実行するものとして、RealNetworks社から提供されているRealServerというプログラムを使用した場合の動作の一例を説明する。
【００５９】
クライアントはサーバ300に対して、要求対象である部分映像データを、インターネット経由で指示する。ここで、要求対象である部分映像データの選択は、図２の映像選択ボタン603を操作することにより行われる。また、インターネットによる通信は、TCP (Transport Control Protocol)接続、また映像表示端末とサーバ300との通信はHTTP (HyperText Transfer Protocol)やRTSPを通じて行われる。
【００６０】
クライアントからの要求を受け取ると、サーバ300は該要求に従って、広角の映像データと所定の部分映像データを、インターネット経由で送信する。このとき、UDP (User Datagram Protocol)接続によりインターネットによる通信を行い、またRDT (Real Data Transport)を通じて映像データの送信を実行する。
【００６１】
以上の動作により、映像表示端末に配信された映像データは、クライアントが持つ映像表示プログラム (例えばRealNetworks社から提供されているRealPlayer)により表示される。
【００６２】
２．実施の形態２
また、一方向に曲率をもった曲面ミラーを、広角カメラ200に使用することもできる。
【００６３】
２．１構成
実施の形態２の構成は、上述の実施の形態１と同様に、図４に示されている。
【００６４】
以下、本実施の形態における広角カメラ200の構成について説明する。なお、サーバ300及びカメラアレイ400の構成も、実施の形態１で説明した通りであるので、説明を省略する。
【００６５】
２．１．１広角カメラ
図１４は、本実施の形態における広角カメラ200の構成を示した図である。図１４に示すように、広角カメラ200は、通常の画角を有するカメラ219と一方向に曲率をもったミラー211とにより構成されており、全方位を撮影することはできないが、広い範囲のシーンを撮影することができる。図１５は、該ミラー211を使用したときのカメラ219に映される広角画像の様子を示した図であり、カメラ219の背後のシーンを撮影することができる。図１５に示したように、取り込まれる画像は、入射光の水平方向の角度と、撮影される画像の位置の横方向の座標が比例した状態で、横方向に圧縮された形状となっている。また、カメラ219自身の画像への写り込みを低減するように改良することも可能である。
【００６６】
上記のように、通常のカメラとミラーの組み合わせにより、安価かつ簡素な構成で広角の映像を撮影することができる。
【００６７】
２．２動作
本実施の形態における機能別のブロック図は、上述の実施の形態１と同様に、図１０に示されている。以下において、図１０に示された各部の動作を具体的に説明する。なお、第1撮像部31、第2撮像部32、及び配信部35の動作は、上述の実施の形態１と同様であるので、説明を省略する。
【００６８】
２．２．１変形部
本発明の実施の形態２においては、広角カメラ200により取得された映像データを、横方向に一様に引き伸ばすだけでパノラマ映像を得ることが可能である。双曲面ミラーを使用した場合と同様に、図１３に示すような座標変換テーブルを作成し、パノラマ映像の各点に対応する変形前の映像の座標(u, v)を格納するようにすればよい。
【００６９】
また、この広角カメラ200を使用した場合、映像配信システム100に変形部33を設けなくとも、映像を表示するクライアント側でパノラマ映像を表示することが可能である。今、横 (水平)方向の撮影範囲が180度、縦 (鉛直)方向の撮影範囲が60度であり、サイズが352×240画素の映像が、該広角カメラ200により取得されているとする。この場合、横方向の長さを3倍、すなわち1056画素に引き伸ばすことによりパノラマ映像を得ることができる。また、サーバ300のマシン名を”vidserv”、映像配信システムから配信される広角の映像データ名を”movie.rm” (後述するRealVideoというデータ形式)、及び映像表示端末とサーバ300との通信に使用されるプロトコルをRTSP (Real Time Streaming Protocol)とする。このとき、該引き伸ばし処理を実行する処理は、図１６に示すように、W3C (World Wide Web Consortium)によって勧告されたSMIL (Synchronized Multimedia Integrated Language)を用いて記述することができる。図１６に示すように、<region>タグにおいて指定された表示領域の大きさと、関連付けられる映像データ”movie.rm”の画像サイズが異なる場合、fit属性を"fill"と指定することにより、表示領域のサイズに合わせて、映像データが拡大縮小表示される。すなわち、映像データに対して所望の拡大率を有する表示領域を指定し、かつ属性値を上記のように指定することにより、パノラマ映像を表示することができる。以上の変形処理は、クライアントが持つ映像表示端末において、映像の表示と同時に実行される。すなわち、サーバ300が変形処理を実行する必要がないので、小さい処理コストで広角の映像データを配信することが可能となる。
【００７０】
２．２．２エンコード部
エンコード部34の動作は、上述の実施の形態１と同様であり、前記第2撮像部32により取得された部分映像データの少なくとも一つ及び前記変形部33により出力されたパノラマ映像データを、それぞれ映像配信に適した形式にエンコードする。
【００７１】
なお、変形部33が存在しない場合、エンコード部34は、前記変形部33により変形された広角の映像データの代わりに、前記第1撮像部31により取得された広角の映像データをエンコードする。
【００７２】
３．実施の形態３
本発明の実施の形態３は、前記広角カメラ200により取得された広角の映像データと、前記カメラアレイ400により取得された各々の部分映像データとの対応関係を特定する映像配信システムに関するものである。ここでいう「対応関係」の例として、以下のものが挙げられる。
・広角カメラ200とカメラアレイ400を構成する各々のカメラ401との位置関係
・広角の映像データと各々の部分映像データとの位置関係
上記の対応関係が不明である場合、クライアントにとっては、図２の映像選択ボタン603の操作により、部分映像の配信の切り替えを要求しても、所望の部分映像データがサーバ300より配信される保証はない。この問題を解消するために、映像選択ボタン603の左向き矢印ボタンを順に押すと、配信される部分映像データが半時計回りに切り替わるように、カメラアレイ400を設置するなどの対策が考えられる。しかし、部分映像の切り替え順序と、カメラの配置順序を対応させなくてはならないため、映像配信システム100の設置作業が面倒になるという問題点がある。
【００７３】
図１７は、クライアントが持つ映像表示端末において、上記の対応関係を利用した表示画面の一例を示す図である。図において、広角の映像602の下側にバー604が設置され、現在表示されている部分映像601に対応する撮影範囲が、黒色のバー605で示されている。また、現在表示されている部分映像601以外の部分映像の撮影範囲が、それぞれ灰色のバー606で示されている。ここで、クライアントは、マウス (図示せず)を操作することによりカーソル607を動かし、所定の部分映像を示す灰色のバー606の上をクリックすると、サーバ300に対して配信要求する部分映像の選択情報が送信され、サーバ300を介して送られるカメラアレイ400からの部分映像601が、該当する部分映像に切り替えられる。このように、上記の対応関係が特定されることにより、映像配信システム100の設置作業が楽になる。また、クライアントは一層容易に所望の映像を選択することができると共に、配信された映像から、撮影対象となるシーンをより深く理解することができる。
【００７４】
本発明の実施の形態３は、このような動作を実現するための映像配信システムに関するものである。
【００７５】
３．１構成
前述の実施の形態１と同様に、本発明の実施の形態３の構成は、図４乃至図９に示されている。
【００７６】
３．２動作
図１８は、本発明の実施の形態３に係る映像配信システムを、機能別のブロックで示した図であり、図１０に示された前述の実施の形態１のブロック図に更に加えて、特定部36を追加したものである。以下に、図１８に示された各部の動作を具体的に説明する。なお、第1撮像部31、第2撮像部32、変形部33、及びエンコード部34の動作は、前述の実施の形態１と同様であるので、説明を省略する。
【００７７】
３．２．１特定部
特定部36は、前記広角カメラ200により取得された広角の映像データと、前記カメラアレイ400により取得された各々の部分映像データとの対応関係を特定する動作を行う。この動作を、以下に説明する。
【００７８】
(1) カメラアレイ400を構成する各カメラ401に付された識別子を利用する方法
図１９は、特定部36の別の動作例を示す図である。図１９(a)に示すように、カメラアレイ400を構成する各カメラ401に識別子403を付し、該カメラ401を前記広角カメラ200で捕捉できる位置にそれぞれ配置する。この状態で、前記広角カメラ200により取得される映像データは、図１９ (b)のようになる。この映像データにおいて、前記識別子403が映されている画像座標を検出することにより、広角カメラ200と各々のカメラ401との位置関係を特定することができる。ここで、前記識別子403には、
・算用数字を付したシール、
・バーコード、
・カラーコード、
・2次元バーコード、
などを使用することが可能であり、これらの識別子を映像データから読み取る動作は、パターン認識の分野で既に周知技術となっている。
(2) 広角カメラ200及びカメラアレイ400により取得された映像データを利用する方法
図２０は、特定部36の別の動作例を示す図である。本動作例においては、広角の映像データと各々の部分映像データとの類似度が高い部分を検出する。
【００７９】
ここで、前記類似度の高い部分を検出する手段として、テンプレートマッチングを利用した場合の動作を説明する。まず、図２０(a)のように、カメラアレイ400により取得される各々の部分映像より、( 2DX + 1 )×( 2DY + 1 )の大きさのテンプレート608を生成する。次に、Ｚ２０(b)のように、該テンプレート608を広角の映像602上で移動させ、テンプレート608と広角の映像602における点(m, n)との正規化相互相関値Sを、次式に基づき計算する。
【００８０】
【数６】

ここで、(6)式における各記号の意味は以下の通りである。
・I₁(x, y)：テンプレート上の点(x, y)における濃度
・I₂(x, y)：広角の映像上の点(x, y)における濃度
以上の計算に基づき、正規化相互相関値Sが最大となる広角の映像602における点(m, n)を求め、該点の位置に対応するカメラ401を特定すればよい。以上の動作を、全ての部分映像に対して実行することにより、広角カメラ200と各々のカメラ401との位置関係を特定することができる。
【００８１】
なお、濃度の相互相関に基づいて、映像の類似度を求めると述べたが、これはあくまでも一例である。映像の色空間や輪郭など、別の特徴に基づいて映像の類似度を求めても構わない。
【００８２】
(3) 手動で特定する方法
図２１は、本実施の形態３におけるサーバ300の表示画面を示す図である。この表示画面は、映像配信システム100を起動し、映像の配信を開始する直前に出現する。その後ユーザは、まず映像選択ボタン603を操作することにより、表示される部分映像601を切り替える。すると、現在表示されている部分映像601と広角の映像602との位置関係を手動入力するよう促すメッセージ609が、該表示画面において提示される。この時ユーザは、マウス (図示せず)を操作してカーソル607を動かし、広角の映像602の上の所定の点をクリックすることにより、該位置関係を手動入力する。手動入力が完了すると、広角の映像602において、その部分映像601に対応する位置に十字形状のポインタ610が付される。以上の動作を、全ての部分映像に対して実行することにより、広角の映像602と各々の部分映像301との位置関係を特定することができる。
【００８３】
この方法は、映像配信の開始から終了に至るまで、広角カメラ200及びカメラアレイ400の配置位置が不変である場合に、特に大きな効果を奏する。これに対して、上記(1)乃至(2)の方法は、途中でカメラ401の配置位置を変更しても有効である。
【００８４】
以上の処理は、前記サーバ300内の前記CPU 301により実行される。このとき、前記HDD 303には該特定処理を施すための所定のプログラムを予め格納しておく。
【００８５】
３．２．３配信部
配信部35は、クライアントが持つ映像表示端末からの要求に応じて、前記エンコード部34によりエンコードされた広角の映像データ及び少なくとも一つの部分映像データを、インターネットを介して配信する。このとき、映像データのみならず、前記特定部36が特定した対応関係を併せて、クライアントに配信すると、映像表示端末において、図１７に示されている表示画面を提示することが可能となるので好適である。映像配信の動作については、前述の実施の形態１と同様である。
【００８６】
４．実施の形態４
本発明の実施の形態４は、前記カメラアレイ400により取得された各々の部分映像データを自動的に選択する映像配信システムに関するものである。
【００８７】
実施の形態１乃至実施の形態３は、クライアントが配信を要求する部分映像を選択するものであった。しかし、部分映像を毎回手動で選択するのは面倒である。
【００８８】
図２２は、クライアントが持つ映像表示端末において、表示される部分映像601が自動的に選択される表示画面を示す図である。図のように、「AUTO」と書かれたチェックボックス611をチェックすると、部分映像601を自動的に選択して配信するモードに切り替えられる。これに対し、サーバ300は、発言者などの重要なシーンが映された部分映像を自動的に選択して、広角の映像と共にクライアントに配信する。これにより、クライアントは面倒な操作無しに、配信された映像から、撮影対象となるシーンをより深く理解することができる。
【００８９】
本実施の形態４は、このような動作を実現するための映像配信システムに関するものである。
【００９０】
４．１構成
図２３に本発明の実施の形態４に係る映像配信システムの構成を示す。サーバ300には広角カメラ200と、カメラアレイ400と、マイクアレイ500とが接続され、広角の映像データ及び複数の部分映像データ及び複数の音声データが取得される。サーバ300により取得された映像データは、インターネットを介して配信され、該インターネットに接続されたクライアントPCにおいて表示される。また、サーバ300により取得された音声データは、後述する映像選択のために使用されるほか、必要に応じてインターネットを介して配信され、該インターネットに接続されたクライアントPCにおいて再生される。
【００９１】
次に、上記各部の構成について説明する。なお、広角カメラ200、カメラアレイ400、及びサーバ300の構成は、実施の形態１と同様であるので、説明を省略する。
【００９２】
４．１．１マイクアレイ
マイクアレイ500は、少なくとも2つのマイクロフォン501-1、501-2より構成される。使用されるマイクロフォン501-1、501-2は、圧電型、容量型 (いわゆるコンデンサマイクロフォン)など様々な種類のものを使用することができる。各々のマイクロフォン501-1、501-2は、カメラ401と同様に、別々に離れて配置されたものであっても、各々のマイクロフォン501-1、501-2を共通の筐体に固定して配置したものであっても構わない。図２４は、実施の形態４における広角カメラ２００及びマイクアレイ５００の構成を説明する図であり、このように、広角カメラ200とマイクアレイ500とを1つの筐体に一体化してもよい。図２４に示したように、広角カメラ200を構成するカメラ部201の撮像素子214と、マイクアレイ500を構成するマイクロフォン501-1、501-2とは、台座202に配置されている。
【００９３】
該マイクロフォン501-1、501-2において取得された音声信号は、マイクロフォン内部でデジタル化された後、サーバ300に送出される。カメラアレイ400と同様に、マイクアレイ500をサーバ300の外部I/F 307、具体的にはUSB2.0のような高速シリアルインターフェースを経由して接続することにより、部分映像と音声とを同期的に取得することが可能である。
【００９４】
４．２動作
図２５は、本実施の形態４に係る映像配信システムの、機能別のブロックを示す図である。図１８に示された実施の形態３のブロック図に加えて、音声取得部37、音源検出部38、及び映像選択部39を追加したものである。以下において、図２５に示された各部の動作を具体的に説明する。なお、第1撮像部31、第2撮像部32、変形部33、及び特定部36の動作は、前述の実施の形態と同様であるので、説明を省略する。
【００９５】
４．２．１音声取得部
音声取得部37の構成・動作は、前述の４．１．１に記載されたカメラアレイ400により取得され、デジタル化された音声データを出力するものである。
【００９６】
４．２．２音源検出部
音源検出部38は、前記音声取得部37により取得された音声データに基づき、発言者のいる位置又は方向を検出するものである。その動作例を、以下において説明する。
【００９７】
(1) マイクアレイ500に入力される音声の到達時間差による方法
本方法は、複数のマイクロフォン501が、ある筐体の既知の位置に固定された場合に有効である。図２６は、本発明の実施の形態４の音源検出部38の動作原理を説明するための図である。図２６に示すように、2つのマイクロフォン501-1、501-2（それぞれマイク1、マイク2と便宜的に称することとする）が間隔lだけ離れて並んでおり、音声がθ方向から入射する場合、マイク1が出力する音声データs₁(t)と、マイク1が出力する音声データs₁(t)との関係は、
【００９８】
【数７】

v：音速
となり、マイク1の音声データがマイク2の音声データに対して
【００９９】
【外１】

だけ時間が進んでいることとなる。この原理を利用して、話者の音声の方向を特定する手順を説明する。
【０１００】
まず、マイク1とマイク2の音声データの到達時間差を検出する。この到達時間差は、例えばマイク1の音声データs₁(t)とマイク2の音声データs₂(t+dt)との相互相関値により計算される。ここで、相互相関値C (t, dt)は、次式により算出される。
【０１０１】
【数８】

ここで、Nは相関窓の大きさを示す正の整数であり、(8)式は時刻t以前のN個のサンプルを用いて積和演算が行われることを示す。このとき、C (t, dt)を最大化するdtが到達時間差となる。
【０１０２】
次に、マイクの間隔l、到達時間差dt及び音速vを用いて、音声とマイクロフォンの基線とがなす角θを計算する。
【０１０３】
【数９】

ここで、θの値域は0 °以上180°以下とする。
【０１０４】
なお、以上の手順のみでは、マイクロフォン501-1、501-2の前側の180°の範囲しか方向が検出されず、音源方向が特定されない。すなわち、音源検出部38が出力する角度θは、実際には音声の到達方向と2つのマイク間の基線とがなす角度であり、実際の音声の方向は図２７に示すように、2つのマイクの中点を頂点とする頂角θの円錐の側面上のいずれかに存在している。
【０１０５】
この問題を解消するために、マイク1とマイク2より構成される組と平行でない別のマイクロフォンの組を用いて補正を行う。図２８は、4つのマイクロフォン501-1、501-2、501-3、501-4を2組に分けて音源方向を検出する場合の組分けの様子を示した説明図である。図２８示したように、組分けは、あるマイクロフォン501-1と501-3（例えばマイク1（マイク3））と、そのマイクロフォンと最も距離の離れたマイクロフォン501-2と501-4（マイク2（マイク4））とを組み合わせる。
【０１０６】
最も距離の離れた2つのマイクの組を用いることで、音声の到達時間差が最大となり、方向検知の精度が向上する。なお、ここでは、マイクアレイ500には4つのマイクロフォン501-1、501-2、501-3、501-4が備わっているが、3つのマイクロフォンによっても、音源方向を精度良く検出できる。図２９は、3つのマイクロフォン501-1、501-2、501-3によってマイクアレイ500が構成される場合のマイクロフォンの組の採り方を説明する説明図である。図示したように、マイクロフォンを正三角形に配置することにより、どのマイクの組を採用しても、精度良く音源方向を検出することができるようになる。なお、図２９に示した例では、第1の組と第2の組を採用して全方向の音源を検出できるが、補完的に第3の組を使用してもよい。
【０１０７】
(2) 指向性マイクアレイによる方法
また、限られた範囲の音声のみを入力可能な指向性マイクロフォンを利用することにより、発言者の方向を検出することも可能である。図３０は、本実施の形態４におけるマイクアレイ500と音源方向との関係を説明する説明図である。このマイクアレイ500は、指向性を有するマイクロフォン501-1、501-2、501-3、501-4を4つ有し、その音声の強度に基づいて音源方向を決定する。便宜的に4つのマイクロフォン501-1、501-2、501-3、501-4をマイク1〜4とする。
【０１０８】
今、音声強度が、マイク1で20 、マイク2で30、マイク3で20，マイク4で5という数値であったとする。この場合はマイク2 の方向に音源があると判断する。マイク1とマイク3の強度を比較するといずれも同じ値20であるので、最終的に音源方向はマイク2の方向（図でθ＝45°と示した方向）と決定する
図３１は、実施の形態４における音源検出部３８の動作の別の例を説明する図である。音声強度がマイク1で15、マイク2で30、マイク3で25、マイク4で5であったとする。この場合はマイク2の方向に音源があると初期判断する。マイク1とマイク3の強度を比較すると、マイク3の強度がマイク1より大きいので、音源方向をマイク2方向からマイク3 方向に若干量移動させた方向（図でθ＝30°と示した方向）と決定する。この方向の移動量は指向性マイクの特性にしたがって予め決定しておけばよい。
【０１０９】
以上で説明した音源検出部38の機能は、サーバ300におけるCPU 301により実行される。このとき、前記HDD 303には該機能を実現するための所定のプログラムが予め格納されている。
【０１１０】
４．２．３映像選択部
映像選択部39は、前記特定部36により特定された対応関係と、前記音源検出部38により検出された発言者の位置又は方向とを用いて、クライアントに配信する部分映像を自動的に選択するものである。
【０１１１】
図３２は、実施の形態４における映像選択部39の動作の一例を示す図であり、A〜Fの6人の参加者2がテーブル1を囲んで会議を開いている様子を上から眺めたものである。テーブル1の上には、広角カメラ200及びマイクアレイ500が設置されており、また参加者毎にカメラ401（図示せず）が1台設置されている。今、音源検出部38が検出した音源の方向が、図における矢印381のようであったとする。このとき、映像選択部39は、該音源の方向と、前記特定部36により特定された広角カメラ200と各カメラ401との対応関係に基づき、該音源の方向に対し最も近くに配置されたカメラ401を選択する。すなわち、図においては、参加者Eを撮影しているカメラ401を選択する。
【０１１２】
以上で説明した映像選択部39の機能は、サーバ300におけるCPU 301により実現させることができる。このとき、前記HDD 303には該機能を実現するための所定のプログラムを予め格納しておく。
【０１１３】
４．２．４エンコード部
エンコード部34の動作は、前記映像選択部39により選択された部分映像データ及び前記変形部33により出力されたパノラマ映像データを、それぞれ映像配信に適した形式にエンコードする。エンコード動作については、前述の実施の形態１と同様である。
【０１１４】
なお、変形部33が存在しない場合には、エンコード部34は、前記変形部33により変形された広角の映像データの代わりに、前記第1撮像部31により取得された広角の映像データをエンコードする。
【０１１５】
４．２．５配信部
配信部35は、前記エンコード部34によりエンコードされた広角の映像データ及び部分映像データを、インターネットを介して配信する。このとき、映像データのみならず、前記特定部36が特定した対応関係及び前記映像選択部39が選択した部分映像の撮影範囲を併せて、クライアントに配信すると、映像表示端末において、図２２に示された表示画面を提示することが可能となるので好適である。映像配信の動作については、前述の実施の形態１と同様である。
【０１１６】
５．実施の形態５
本発明の実施の形態５は、前述の実施の形態４と同様に、前記カメラアレイ400により取得された各々の部分映像データを自動的に選択する映像配信システムに関するものであり、カメラアレイ400を構成する各々のカメラ401と、マイクアレイ500を構成する各々のマイクロフォン501とを、1対1の対応関係となるよう構成したものである。ここでは「1対1の対応関係」を、「個々のカメラ401に対し、略一致する位置又は方角に配置されたマイクロフォン501が1つあること」と定義する。
【０１１７】
５．１構成
本実施の形態における映像配信システム100の構成は、前述の実施の形態４と同様に、図２３に示されている。
【０１１８】
次に、上図の各部の構成について説明する。なお、広角カメラ200及びサーバ300の構成は、前述の実施の形態１と同様であるので、説明を省略する。
【０１１９】
５．１．１カメラアレイ及びマイクアレイ
図３３は、本発明の実施の形態５におけるカメラ401及びマイクロフォン501の外観を示す図である。図示したように、カメラ401とマイクロフォン501とは、共通の筐体502に一体化した構造となっている。また、マイクロフォン501は指向性を有し、限られた範囲の音声のみを入力可能である。この一体化されたカメラ401及びマイクロフォン501を、参加者につき1台設置する。
【０１２０】
５．２動作
図３４は、本発明の実施の形態５に係る映像配信システムを、機能別のブロック図に書き直した図であり、図２５に示された実施の形態４のブロック図から音源検出部38を削除し、また特定部36から映像選択部39への接続を削除したものである。第1撮像部31、第2撮像部32、変形部33、エンコード部34、配信部35、特定部36、及び音声取得部37の動作は、前述の実施の形態４と同様である。
【０１２１】
５．２．１映像選択部
前述の一体化されたカメラ401及びマイクロフォン501を使用することにより、各々のカメラ401とマイクロフォン501との対応関係が既知である。したがって、映像選択部39は、最も大きな信号振幅が得られたマイクロフォン501に対応するカメラ401により取得された部分映像を選択すると良い。
【０１２２】
６．実施の形態６
本実施の形態は、前述の実施の形態４或は５と同様に、前記カメラアレイ400により取得された各々の部分映像データを自動的に選択する映像配信システムに関するものである。
【０１２３】
６．１構成
前述の実施の形態１乃至３と同様に、図４乃至図９に示される。
【０１２４】
６．２動作
図３５は、本発明の実施の形態６に係る映像配信システムを、機能別のブロック図に書き直した図であり、図「１８に示された実施の形態３のブロック図に更に加えて、映像選択部39及び動き検出部40を追加したものである。以下において、図３５に示された各部の動作を具体的に説明する。なお、第1撮像部31、第2撮像部32、変形部33、エンコード部34、配信部35、及び特定部36の動作は、前述の実施の形態と同様であるので、説明を省略する。
【０１２５】
６．２．１動き検出部
動き検出部40は、広角の映像データにおける被写体の動きを検出し、映像中の各部位における動きの特徴量を出力するものである。ここで、「動きの特徴量」とは、被写体の動きの大小を指すものとする。
【０１２６】
動画における動きの検出は、前の時刻と現在の時刻のフレーム間の差分をとる方法、オプティカルフローによる方法などの周知技術により実現可能である。これらの技術により、広角の映像データにおいて、被写体が動いた位置及びその動きの大小を検出することができる。この動作によれば、本発明を遠隔監視システムとして使用する場合、動いている被写体を捉えたカメラからの部分映像が配信されるため、好適である。
【０１２７】
また、本実施の形態６に係る映像配信システムが、遠隔会議システムとして使用される場合、参加者の唇の動きを検出することにより、発言者の位置又は方向を自動的に検出すると好適である。唇の動きの検出は、例えば文献 (M.Kass, A.Witkin and D.Terzopoulos: “SNAKES: Active Contour Models”, ICCV, pp.259-268 (1987) )等の周知技術により実現できる。また、実施の形態４乃至５のように、マイクロフォン501が使用できる場合には、音声データに基づく発話区間の抽出結果と併せて唇の動きを検出することにより、発言者の検出精度を向上させることもできる。例えば、当出願人により出願された特開平6-43897公報には、音声データから抽出された音声特徴と、映像データより抽出された顔面の動的視覚特徴とを用いて、会話を認識するシステムが開示されている。この動作により、音声データ中に発話以外の雑音が多く占められる場合でも、一層安定的に発言者の位置又は方向を検出することが可能となる。
【０１２８】
以上で説明した動き検出部40の機能は、広角カメラ200の内部に実装してもよいし、またサーバ300におけるCPU 301により実現させても構わない。後者の場合、前記HDD 303には該機能を実現するための所定のプログラムを予め格納しておく。
【０１２９】
６．２．２映像選択部
本実施の形態６における映像選択部39は、前記特定部36により特定された対応関係と、前記動き検出部40により検出された被写体の動きの特徴量とを用いて、クライアントに配信する部分映像を自動的に選択するものである。映像選択部39は、まず該被写体の動きの特徴量に基づき、広角の映像データにおいて最も大きな動きが検出された画像位置を特定する。次に、特定された画像位置と、前記特定部36により特定された広角カメラ200と各カメラ401との対応関係とに基づき、前述の実施の形態４において説明したのと同様の手順により、該位置に対し最も近くに配置されたカメラ401を選択する。これにより、最も大きな動きが検出された被写体を撮影した部分映像を自動的に選択することができる。
【０１３０】
以上で説明した映像選択部39の機能は、サーバ300におけるCPU 301により実行される。このとき、前記HDD 303には該機能を実現するための所定のプログラムを予め格納しておく。
【０１３１】
７．実施の形態７
また、上述の実施の形態６においては、広角の映像において被写体の動きを検出したが、前記カメラアレイ400により取得された各々の部分映像データにおいて、被写体の動きを検出してもよい。
【０１３２】
７．１構成
本発明の実施の形態７の構成は、前述の実施の形態１乃至３と同様に、図４乃至図９に示される。
【０１３３】
７．２動作
図３６は、本発明の実施の形態７に係る映像配信システムの、機能別のブロックを示す図である。以下において、図３６に示された各部の動作を具体的に説明する。なお、第1撮像部31、第2撮像部32、変形部33、エンコード部34、及び配信部35の動作は、前述の実施の形態と同様であるので、説明を省略する。
【０１３４】
７．２．１動き検出部
本実施の形態７における動き検出部40は、各々の部分映像データにおける被写体の動きを検出し、各部分映像データにおける動きの特徴量を出力するものである。ここで、「動きの特徴量」は、実施の形態６と同様に、被写体の動きの大小を指すものとする。また、各々の部分映像データにおける被写体の動きも、上述の実施の形態６で説明した周知技術により検出する。
【０１３５】
また、本実施の形態に係る映像配信システムが、遠隔会議システムとして使用される場合、部分映像における参加者の唇の動きを検出することにより、発言者の位置又は方向を自動的に検出すると好適である。この動作も、上述の実施の形態６で説明した周知技術により実現可能である。また、本実施の形態では、カメラ401により各参加者の顔が大きく撮影されるので、上述の実施の形態６に比較して、より安定的に参加者の唇の動きを検出することができる。
【０１３６】
以上で説明した動き検出部40の機能は、カメラ401の内部に実装してもよいし、またサーバ300におけるCPU 301により実現させても構わない。後者の場合、前記HDD 303には該機能を実現するための所定のプログラムを予め格納しておく。
【０１３７】
７．２．２映像選択部
本実施の形態における前記映像選択部39は、前記動き検出部40により検出された、部分映像における被写体の動きに基づき、クライアントに配信する部分映像を自動的に選択するものである。具体的には、各々の部分映像における被写体の動きの特徴量から、最も大きな動きが検出された部分映像を特定し、これをクライアントに配信する部分映像として自動的に選択する。ここで、本実施の形態は、特定部36を必ずしも必要としないので、上述の実施の形態６に比較して、より簡単な構成・処理で適切な部分映像を選択することができる。
【０１３８】
以上で説明した映像選択部39の機能は、サーバ300におけるCPU 301により実行される。このとき、前記HDD 303には該機能を実現するための所定のプログラムを予め格納しておく。
【０１３９】
７．３その他
なお、上述の実施の形態６又は本実施の形態においては、カメラアレイ400を構成する各々のカメラ401が、他のカメラと一部共通する撮影領域を含むと好適である。図３７(a)は、各々のカメラが互いに共通する撮影領域を含まない場合における映像表示端末の画面を示す図である。図３７に示すように、参加者Aが席を立って移動している時、前記映像選択部39は、該参加者Aに最も近い撮影領域を含む部分映像（図中、黒色のバー605で示されたもの）を自動的に選択する。しかし、該参加者Aがいずれのカメラ401においても撮影されない場所に移動した場合には、重要な被写体が何も写されていない部分映像が選択されてしまう。このように、移動中の被写体を連続的に追跡して映した部分映像を配信できないという問題が生ずる。
【０１４０】
そこで、図３７(b)に示すように、各々のカメラが互いに共通する撮影領域を含むよう配置すれば、この問題を解決することができる。図中、斜線で示されたバーは、2つ以上のカメラ401で重複して撮影されている範囲を示す。図８のように、カメラアレイ400を、各々のカメラ401を筐体402に固定して構成する場合には、互いの撮影範囲が一部重複するように各々のカメラ401を固定するとよい。
【０１４１】
８．実施の形態８
なお、本発明に係る映像配信システム100は、PCによりその機能を実現させることができる。この場合は上記各部を実現するソフトウェアをハードディスクに格納し、適宜処理プログラムを実行させることによりその機能を実現させることができる。
【０１４２】
９．実施の形態９
また、上記プログラムを、CD-ROMのような記録媒体に格納することができる。図３８に示されるように、該プログラムを格納したCD-ROM 308をPCに装着し、適宜該プログラムを実行させることによりその機能を実現させることができる。なお、該プログラムを格納する記録媒体としては、上記CD-ROM 308に限られず、例えばDVD-ROM等の別の媒体であってもよいことはいうまでもない。
【０１４３】
以上の各実施の形態は、本発明のほんの一例を説明したにすぎず、本発明の権利範囲を上記実施の形態の通りに限定・縮小すべきではない。例えば、各実施の形態において、インターネットを通じて映像データを配信すると説明したが、衛星通信や地上波通信などの別の通信回線を使用しても構わない。
【０１４４】
また、広角カメラ200、カメラアレイ400、及びマイクアレイ500が、USBハブに接続されるという構成例を用いて説明したが、これらの接続形態は上記説明に限定されるものではない。例えば、PCIバス、IEEE 1394、Bluetoothなどの別のインターフェースを使用しても構わない。
【０１４５】
また、広角カメラ200に使用されるミラー211として、双曲面ミラー及び一方向に曲率をもった曲面ミラーを実施の形態に挙げたが、放物面ミラーや円錐ミラーなど、上記以外の形態であっても構わない。
【０１４６】
また、第1撮像部31の説明において、広角カメラ200においてデジタル化された映像データを出力すると説明したが、広角カメラ200がアナログの映像信号を出力するものであっても構わない。この場合、該広角カメラ200と、アナログ映像信号に対してデジタル化処理を施すビデオキャプチャボードとを組み合わせることにより、デジタル形式の映像データを出力することができる。すなわち、上記実施の形態で説明した第1撮像部31と同様の動作を実現することができる。
【０１４７】
また、第2撮像部32の説明において、カメラアレイ400を構成する各々のカメラ401においてデジタル化された部分映像データを出力すると説明したが、これらのカメラ401がそれぞれアナログの映像信号を出力するものであっても構わない。この場合、これらのカメラ401と、多チャンネルのアナログ映像信号に対してデジタル化処理を施すビデオキャプチャボードとを組み合わせることにより、デジタル形式の部分映像データを出力することができる。すなわち、上記実施の形態で説明した第2撮像部32と同様の動作を実現することができる。
【０１４８】
また、音声取得部37の説明において、マイクアレイ500を構成する各々のマイクロフォン501においてデジタル化された音声データを出力すると説明したが、これらのマイクロフォン501がそれぞれアナログの音声信号を出力するものであっても構わない。この場合、これらのマイクロフォン501と、多チャンネルのアナログ音声信号に対してデジタル化処理を施すオーディオキャプチャボードとを組み合わせることにより、デジタル形式の音声データを出力することができる。すなわち、上記実施の形態で説明した音声取得部37と同様の動作を実現することができる。
【０１４９】
また、エンコード部34及び配信部35が同一のサーバ300に実装されると説明したが、サーバ300とは別個にエンコード用PCを設置しても構わない。この場合、エンコードされたデータは、電気通信回線を経由して、該エンコード用PCからサーバ300に転送される。
【０１５０】
また、エンコード部34においては、RealNetworks社により提供されているRealProducerを用いて、映像データをRealVideo形式に変換すると説明したが、エンコード部の構成はこれに限定されない。例えば、Microsoft社により提供されているWindows（登録商標）Media エンコーダというプログラムを用いて、映像データをWindows（登録商標）Media Video形式に変換しても構わない。配信部35に関しても同様であり、RealServerの代わりにWindows（登録商標）Media Serviceなどの別のプログラムを使用しても構わない。
【０１５１】
また、配信部35の映像配信対象として、クライアントPCを例に挙げて説明したが、PDA (Personal Digital Assistant)や携帯電話などの端末であっても構わない。このとき、Microsoft社より提供されているPDA用のWindows（登録商標）Media Playerを使用すれば、該PDAにおいて、前述のWindows（登録商標）Media Video形式の映像データを再生することができる。
【０１５２】
また、動き検出部40の動作の説明において、「動きの特徴量」は被写体の動きの大小を指すと述べたが、例えば被写体の移動軌跡の形状など、別のものであっても構わない。
【０１５３】
【発明の効果】
本発明によれば、第１撮像部３１により取得された広角の映像と、第２撮像部３２により取得された各々の部分映像との対応関係を特定する特定部３６を備えることにより、閲覧者に対して、高い解像度を持つ所望のシーンの映像を一層容易に選択することが可能となる。
【０１５５】
更に本発明によれば、音源検出部３８により出力された音源の位置若しくは方向により、所定の映像を選択する映像選択部３９を備えることにより、ユーザに面倒な操作を強いることなく、所望のシーンの映像を高い解像度で取得・配信することが可能となる。
【０１５６】
更に、広角の映像を変形する変形部３３を備えることにより、取得された広範囲の映像を、更に閲覧者に観察しやすい形で表示することが可能となる。
【０１５７】
更に、第２撮像部３２により取得される各々の映像が、少なくとも一の他の映像と一部の共通する領域を含むことにより、更に所望のシーンの映像を高い解像度で漏れなく取得・表示することが可能となる。
【図面の簡単な説明】
【図１】本発明に係る映像配信システムの使用例を示す図である。
【図２】本発明に係る映像表示端末の表示画面の一例を示す図である。
【図３】本発明に係る映像配信システムの別の使用例を示す図である。
【図４】実施の形態１に係る映像配信システムの構成を示す図である。
【図５】実施の形態１に係る広角カメラ２００の構成を示す図である。
【図６】実施の形態１に係る広角カメラ２００の構造を示す図である。
【図７】図６に示された広角カメラ２００により撮影される映像を示す図である。
【図８】実施の形態１に係るカメラアレイ４００の一例を示す図である。
【図９】実施の形態１に係るサーバ３００の構成を示す図である。
【図１０】実施の形態１に係る動作を示すブロック図である。
【図１１】実施の形態１における変形部３３の動作を説明する図である。
【図１２】変形部３３における原理を説明する図である。
【図１３】変形部３３で使用される座標変換テーブルを説明する図である。
【図１４】実施の形態２に係る広角カメラ２００の構成を示す図である。
【図１５】図１４に示された広角カメラ２００により撮影される映像を示す図である。
【図１６】実施の形態２における変形部３３の動作を映像表示と同時に実現する例を示す図である。
【図１７】実施の形態３に係る映像配信システムを使用した場合の映像表示端末の表示画面の一例を示す図である。
【図１８】実施の形態３に係る動作を示すブロック図である。
【図１９】実施の形態３に係る特定部３６の動作の一例を示す図である。
【図２０】実施の形態３に係る特定部３６の動作の一例を示す図である。
【図２１】実施の形態３に係るサーバ300の表示画面の一例を示す図である。
【図２２】実施の形態４に係る映像配信システムを使用した場合の映像表示端末の表示画面の一例を示す図である。
【図２３】実施の形態４に係る映像配信システムの構成を示す図である。
【図２４】実施の形態４における広角カメラ２００及びマイクアレイ５００の構成を説明する図である。
【図２５】実施の形態４に係る動作を示すブロック図である。
【図２６】実施の形態４における音源検出部３８の動作原理を説明する図である。
【図２７】実施の形態４における音源検出部３８の問題を説明する図である。
【図２８】実施の形態４におけるマイクロフォン５０１の配置例を説明する図である。
【図２９】実施の形態４におけるマイクロフォン５０１の別の配置例を説明する図である。
【図３０】実施の形態４における音源検出部３８の動作を説明する図である。
【図３１】実施の形態４における音源検出部３８の動作を説明する図である。
【図３２】実施の形態４における映像選択部３９の動作を説明する図である。
【図３３】実施の形態５におけるカメラ４０１の構成を示す図である。
【図３４】実施の形態５に係る動作を示すブロック図である。
【図３５】実施の形態６に係る動作を示すブロック図である。
【図３６】実施の形態７に係る動作を示すブロック図である。
【図３７】実施の形態６及び７に係る映像配信システムを使用した場合の問題を示す図である。
【図３８】実施の形態９に係る構成例を示す図である。
【符号の説明】
１テーブル
２参加者
３キャビネット
３１第１撮像部
３２第２撮像部
３３変形部
３４エンコード部
３５配信部
３６特定部
３７音声取得部
３８音源検出部
３９映像選択部
４０動き検出部
２００広角カメラ
２１１ミラー
２１２レンズ
２１３絞り
２１４撮像素子
２１５駆動部
２１６前処理回路
２１７モータ駆動部
３００サーバ
３１０バス
３２０ＵＳＢハブ
３３０インターネット
３５０クライアントＰＣ
４００カメラアレイ
４０１−１から４０１−４カメラ
５００マイクアレイ
５０１マイクロフォン
６００表示用ウィンドウ[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a system for distributing video of a wide range of scenes acquired using an imaging means having a wide-angle visual field. Specifically, it is used for applications such as a monitoring system, a remote conference system, and a remote education system.
[0002]
[Prior art]
With the development of telecommunications technology, remote monitoring systems and video conferencing systems for transferring and displaying images of a scene to a remote location have come to be used in many situations. In order to further improve the convenience of such a system, there have been proposed many camera devices capable of shooting a wide range of scenes that cannot be shot with a normal camera and systems for displaying partial videos obtained by cutting out only the necessary scenes. .
[0003]
For example, in Japanese Patent Application Laid-Open No. H5-122689, a camera device that detects a voice input from a microphone, determines a speaker, and automatically controls the camera based on the determination result to capture the speaker. And video conferencing systems have been proposed. There is also a camera device in which a plurality of cameras are arranged so as to form an intensive field of view.
[0004]
However, in the invention disclosed in Japanese Patent Application Laid-Open No. 5-122689, there are problems that it takes time to control the camera in order to catch the speaker and that the drive unit that rotates the camera is easily broken. In addition, the above two conventional techniques have a problem that it is impossible to obtain a video recording the entire scene. Applicant will (1) display only the partial video that shows the target part of the scene, (2) display only the video that shows the entire scene, (3) (1) and (2) We presented three types of video display formats, which simultaneously display the video of the above, and conducted a test to evaluate which is the most preferable. As a result, the evaluation result that the display form (3) is overwhelmingly desirable and then the display form (2) is desirable was obtained. As described above, in such a system, it is recognized that it is important to transfer a video including the entire scene. However, the above-described prior art does not solve this problem.
[0005]
On the other hand, various wide-angle photographing devices have been proposed in order to acquire a single image that records the entire scene. For example, in Japanese Patent No. 2939087 and Japanese Patent No. 3054146, an invention relating to a 360-degree panoramic camera using a hyperbolic dome type mirror is disclosed. According to these publications, by shooting an image reflected on a dome-shaped mirror, a 360-degree (hemisphere) scene can be taken, which has a higher lateral resolution than a fish-eye camera. If this camera is installed on a desk or ceiling, the entire interior can be observed, which is suitable for monitoring purposes. The photographed image has a circularly distorted shape as shown in FIG. 7, and Japanese Patent No. 3054146 describes means for transforming the photographed image as if photographed with a normal camera. According to these publications, it is possible to generate not only an image in which the entire scene is recorded but also a partial video by applying a clipping process. However, in the case of the wide-angle imaging devices described in these publications, by obtaining an omnidirectional image as a single circular image, there is a useless area where nothing is projected like the black-painted portion in FIG. As a result, the image quality of the partial video obtained is insufficient.
[0006]
In addition, Japanese Patent Application Laid-Open No. 2001-94857 discloses an invention in which a single wide-angle image is generated by warping a set of images synchronously captured by a camera array to a common coordinate system. ing. According to this publication, a wide-angle image having high resolution over the entire screen can be acquired by combining images captured by a normal camera into one sheet. However, there is a problem that a huge processing cost is required to warp a plurality of video signals in real time, and that the camera array needs to be calibrated in advance.
[0007]
[Problems to be solved by the invention]
The present invention has been made in view of the above-mentioned problems. The first object of the present invention is to acquire and distribute a wide range of images with a simple configuration and processing, and at the same time, to obtain a desired scene image with high resolution. It is an object of the present invention to provide a video distribution system that enables acquisition / distribution, a program and a recording medium for executing processing of each unit of the system.
[0008]
A second object of the present invention is to provide a video distribution system that enables a viewer to select a video of a desired scene having a high resolution more easily.
[0009]
A third object of the present invention is to provide a video distribution system that makes it possible to acquire and distribute a video of a desired scene at a high resolution without forcing the user to perform troublesome operations. is there.
[0010]
A fourth object of the present invention is to provide a video distribution system that makes it possible to display a wide range of acquired videos in a form that can be easily observed by a viewer.
A fifth object of the present invention is to provide a video distribution system that makes it possible to obtain and display a video of a desired scene with high resolution without omission.
[0011]
[Means for Solving the Problems]
  The invention according to claim 1 is a video distribution system, wherein the first imaging means for acquiring wide-angle video and the second imaging for synchronously acquiring videos in which a plurality of different predetermined areas are captured. Means for identifying the correspondence between the wide-angle video and each video acquired by the second imaging means, at least one of the video acquired by the second imaging means, the wide-angle And a distribution means for distributing the correspondence specified by the specifying means.The second imaging means is composed of a plurality of cameras, each of the plurality of cameras is provided with an identifier, and the first imaging means uses the identifier attached to the plurality of cameras as an imaging range. And the specifying means specifies the correspondence relationship based on a shooting position of an identifier included in the wide-angle video.The
[0015]
  Claim2The invention according to claim 1 is a video distribution system, wherein the specifying unit specifies the correspondence relationship based on a similarity between the wide-angle video and each video acquired by the second imaging unit. is there.
[0016]
  Claim 1,2By providing the specifying unit that specifies the correspondence between the wide-angle video acquired by the first imaging unit and each partial video acquired by the second imaging unit according to the described invention, It is possible to more easily select a desired scene image having a high resolution.
[0017]
  Claim3The invention described in the above is a video distribution system, a first imaging unit that acquires a wide-angle video, and a second imaging unit that synchronously acquires videos in which a plurality of different predetermined areas are captured, A video selection unit that selects a predetermined video from a plurality of videos acquired by the second imaging unit; a distribution unit that distributes the predetermined video and the wide-angle video selected by the video selection unit; A plurality of sound acquisition means for acquiring the sound intensity, a sound acquisition means having the maximum sound intensity among a plurality of sound intensity acquired by the plurality of sound acquisition means, and a relative intensity difference between the remaining sound acquisition means Sound source detection means for detecting the position or direction of the sound source based on the image, and the video selection means selects the predetermined video based on the position or direction of the sound source output by the sound source detection means. A.
[0019]
  Claim4The video distribution system further includes a motion detection unit that detects a motion of a subject in the wide-angle video or a plurality of videos acquired by the second imaging unit, and the video selection The means selects the predetermined video based on the movement of the subject output by the motion detection means.
[0020]
  Claim3According to the invention described in item 4, the video selection unit that selects a predetermined video according to the position or direction of the sound source output by the sound source detection unit can be used to select a desired scene without forcing the user to perform troublesome operations. It is possible to acquire and distribute video with high resolution.
[0021]
  Claim5The invention described in 1 is a program that causes a computer to execute processing relating to each means of the video distribution system.
[0022]
  Furthermore, by providing a deformation unit that deforms a wide-angle image, it is possible to display a wide range of acquired images in a form that is easier for a viewer to observe.
[0023]
  Further, each video acquired by the second imaging unit includes at least one other video and a part of the common area, so that a video of a desired scene can be further acquired and displayed at a high resolution without omission. Is possible.
[0024]
  Claim5,According to the sixth aspect of the invention, by providing a specifying unit that specifies a correspondence relationship between the wide-angle video acquired by the first imaging unit and each partial video acquired by the second imaging unit, Thus, it is possible to more easily select an image of a desired scene having a high resolution.
[0025]
DETAILED DESCRIPTION OF THE INVENTION
First, a usage example of how the video distribution system is used will be briefly outlined, and then an embodiment of the video distribution system will be specifically described. In each embodiment, the components and operations thereof are described, and finally the processing flow is described.
[0026]
First, a usage example of the video distribution system will be described.
[0027]
FIG. 1 is an explanatory diagram outlining an example of use in which the present invention is installed in a conference scene. The video distribution system includes a wide-angle camera 200 that acquires wide-angle video, a camera array 400 that includes a plurality of cameras 401-1 to 401-4 having a normal angle of view, and a microphone 501 that acquires audio during a conference. And a server 300 for capturing and distributing the video data acquired by the wide-angle camera 200 and the camera array 400 and the audio data acquired by the microphone 501.
[0028]
As shown in FIG. 1, the wide-angle camera 200 is installed on the table 1 and collects all the surrounding images overlooking the direction of the conference participants (speakers) 2-1 to 2-4, for example, the horizontal plane. Take an image. Further, each of the cameras 401-1 to 401-4 constituting the camera array 400 is placed, for example, in front of each participant in the conference, and photographs the appearance of each participant. The images acquired by these cameras 401-1 to 401-4 are hereinafter referred to as “partial images”. The server 300 is stored in the cabinet 3 and acquires video data from the wide-angle camera 200 and camera array 400 and audio data acquired by the microphone 501. In addition, the server 300 distributes the acquired video data to terminals such as PC (Personal Computer) and PDA (Personal Digital Assistant) held by the client via a telecommunication line in response to a request from the client. (In FIG. 1, the telecommunication line and the terminal of the client are omitted). FIG. 2 is a diagram showing a display screen in a video display terminal of a PC possessed by a client. As shown in FIG. 2, at least one partial image 601 acquired by the camera array 400 and an image 602 acquired by the wide-angle camera are displayed on the display window 600. Here, when the video selection button 603 is operated, the selection information of the partial video requested for distribution is transmitted to the server 300, and the video 601 from the camera array 400 sent via the server 300 is switched.
[0029]
On the other hand, FIG. 3 is an explanatory diagram outlining an example in which the present invention is applied to a system that distributes the state of a venue such as a museum as a video. The video distribution system 100 includes a wide-angle camera 200 that acquires wide-angle video, a camera array 400 that includes at least one camera 401-1 and 401-2 having a normal angle of view, and the wide-angle camera 200 and the camera array 400. And a server 300 for taking in the video data obtained in (1) (the server 300 is omitted in FIG. 3).
[0030]
As shown in FIG. 3, the wide-angle camera 200 is installed on the ceiling and collectively captures images of the entire periphery overlooking the entire venue. In addition, each camera 401-1, 401-2 constituting the camera array 400 is installed in various places in the venue, for example, suspended from the ceiling to reflect the front of each painting exhibited in the hall It is fixed with. In addition, the server 300 is installed in an invisible place in the venue, and acquires video data from the wide-angle camera 200 and the camera array 400. In addition to this, the server 300 distributes the acquired video data to the terminal held by the client via the telecommunication line in response to a request from the client (note that the telecommunication line and the client have the figure in the figure). Device is omitted).
[0031]
In each of the following embodiments, a case will be described in which the video distribution system of the present invention is applied to conference shooting and video distribution.
[0032]
1. Embodiment 1
First, Embodiment 1 of the present invention will be described.
[0033]
1.1 Configuration
FIG. 4 is a diagram showing the configuration of the video distribution system according to Embodiment 1 of the present invention. The server 300 is connected to the wide-angle camera 200 and the camera array 400 via the USB hub 320 and the bus 310, and wide-angle video data and a plurality of partial video data acquired by the camera array 400 are acquired. Video data acquired by the server 300 is distributed via the Internet 330 and displayed on a client PC connected to the Internet.
[0034]
Next, the configuration of each part will be described.
[0035]
1.1.1 Wide angle camera
FIG. 5 is a diagram illustrating a configuration of the wide-angle camera 200 as the first imaging unit according to the first embodiment. The wide-angle camera 200 as the first imaging means includes a mirror 211 having a curved surface with a predetermined shape, a lens 212, an aperture 213, an imaging device 214 such as a CCD (Charge Coupled Device), and the timing of the imaging device 214. A drive unit 215 that performs control and digitization processing such as analog-digital conversion on the video signal obtained by the imaging device 214, and edge enhancement and γ correction for the digital signal obtained by the drive unit 215 And the like, and a motor driving unit 217 for driving the diaphragm 213 to control the iris.
[0036]
The mirror 211 is for enabling wide-angle shooting by reflecting light incident on the optical system. Here, a hyperboloid mirror is used as a mirror having a curved surface with a predetermined shape. FIG. 6 is a diagram for explaining an optical path when the hyperboloid mirror 211 of the present embodiment is used. FIG. 7 is a diagram showing a state of a wide-angle image formed on the surface of the image sensor 214 by the hyperboloid mirror 211 of the present embodiment. As shown in FIG. 7, the image reflected from the hyperboloid mirror 211 and taken into the image sensor 214 has a donut shape (this donut-shaped image is hereinafter referred to as “doughnut image”). The donut image is imaged by the image sensor 214, digitized by the drive unit 215, and sent to the server 300 described later via the preprocessing circuit 216. Note that the central portion in FIG. 6 reflects the direction of the image sensor 214, which is unimportant image information. Therefore, the head portion 218 of the hyperboloid mirror 211 may be painted black to obtain black information. Depending on the mode of use, a reference line may be drawn on the top of the head 218, and when the wide-angle camera 200 is started up, the motor drive unit 217 may be driven to be used for initial settings such as focus adjustment.
[0037]
As described above, wide-angle images can be taken with an inexpensive and simple configuration by combining a normal camera and a mirror.
1.1.2 Camera array
The camera array 400 includes at least one camera, and each camera captures a portion of the scene in the capturing range of the wide-angle camera 200 with higher resolution. Even if the cameras 401 constituting the camera array 400 are arranged separately as shown in FIGS. 1 and 3, the respective cameras 401-1 to 401-3 are fixed to the casing 402 as shown in FIG. It doesn't matter. As the imaging device used in the camera 401, various types such as a CCD and a CMOS (Complementary Metal-Oxide Semiconductor) type can be used. The video signal imaged by the image sensor is digitized inside the camera and then sent to the server 300 described later.
[0038]
By preparing at least one camera 401 having the above-described configuration, a partial video can be acquired with an inexpensive and simple configuration.
[0039]
1.1.3 Server
FIG. 9 is a diagram showing a configuration example of the server 300 in the present embodiment. That is, a CPU (Central Processing Unit) 301 that performs various controls and processes in the video distribution system 100, an SDRAM (Synchronous Dynamic Random Access Memory) 302, an HDD (Hard Disk Drive) 303, and a pointing device such as a mouse 311; Various input interfaces (hereinafter abbreviated as I / F) 304 such as a keyboard 312, a power supply 305, a display I / F 306 for connecting a display such as a CRT (Cathode Ray Tube), the wide-angle camera 200 and the An external I / F 307 for connecting an external device such as the camera array 400 is connected via a bus 313.
[0040]
Next, each component of the server 300 will be described. The CPU 301 performs various processes and controls such as acquisition of video from the wide-angle camera 200 and camera array 400, distribution of the acquired video, and the like according to a predetermined program stored in the HDD 303. The SDRAM 302 is used as a work area for the CPU 301, and also stores each processing program stored in the HDD 303 and an OS (Operating System) such as Windows (registered trademark) NT Server (registered trademark of Microsoft Corporation in the United States). Used as an area.
[0041]
Examples of the external I / F 307 include various I / F boards, USB (Universal Serial Bus), IEEE 1394, or wireless I / Fs such as IrDA and Bluetooth. The wide-angle video data and a plurality of partial video data acquired by the camera array 400 are synchronized by connecting the wide-angle camera 200 and the camera array 400 to the server 300 via a high-speed serial interface such as USB 2.0. Can be obtained automatically.
[0042]
1.2 Operation
FIG. 10 is a diagram in which the video distribution system according to the present embodiment shown in FIG. 4 is rewritten into a functional block diagram. Hereinafter, the operation of each unit shown in FIG. 10 will be described in detail.
[0043]
1.2.1 First imaging unit
The first imaging unit 31 includes the wide-angle camera 200 described in 1.1.1 above, and performs an operation of outputting acquired and digitized wide-angle video data.
[0044]
1.2.2 Second imaging unit
The second imaging unit 32 is configured by the camera array 400 described in 1.1.2 above, and performs an operation of outputting the acquired and digitized partial video data.
[0045]
1.2.3 Deformation part
FIG. 11 is a diagram illustrating the operation of the deforming unit 33 in the first embodiment. The deformation unit 33 transforms the wide-angle video data acquired by the first imaging unit 31 into a video (hereinafter referred to as a panoramic video) close to a perspective transformation image captured by a normal camera, as shown in FIG. In general, an image obtained by a camera capable of photographing a wide-angle range includes large distortion unlike the shape of an image that can be confirmed by the human eye, as described above, so that it is convenient for viewing later. It is preferable to apply a deformation process to. The wide-angle video data (hereinafter referred to as AMBruckstein and TJRichardson: “Omniview Cameras with Curved Surface Mirrors”, Proc. Of the IEEE Workshop on Omnidirectional Vision 2000, pp.79-84) A method for transforming the donut image shown in FIG. 7 into a panoramic image will be described.
[0046]
FIG. 12 is a diagram for explaining the principle of image deformation in a camera using a hyperboloidal mirror. FIG. 12 (a) shows an example of the operation of the deforming unit 33. The donut image is coordinate-converted into a panoramic image projected on a cylindrical surface having the horizontal axis as the azimuth and the vertical axis as the elevation angle. FIG. 12B is a diagram showing the geometric structure of the wide-angle camera 200, and the optical system of the camera in FIG. 12B is a central projection model. Here, the meaning of each variable in the figure is as follows.
(u, v): Coordinates in donut video
(u₀, v₀): Coordinates of the center of the hyperboloid mirror in the donut image
(θ, φ): Coordinates in panoramic video
r: (u₀, v₀) To (u, v) in pixel units
r_max: Radius of hyperbolic mirror in pixel unit in donut video
θ: Azimuth
φ: elevation angle
ψ: vertical angle from the optical axis of the camera
F: Focal point of hyperboloid mirror
F ′: coincides with the focal point of the hyperboloid paired with the hyperboloid mirror and the optical center of the camera.
[0047]
At this time, the following relationship is established between the apex angle ψ and the elevation angle φ.
[0048]
[Expression 1]

here,
[0049]
[Expression 2]

It is. Φmin is the value of the elevation angle φ corresponding to the position of the radius rmax on the donut image, which represents the photographing limit value in the elevation direction of the camera. The values of rmax and φmin are generally easily known.
Hereinafter, a modification procedure will be described.
(i) Find polar coordinates (r, θ) corresponding to point (u, v) by solving the following equation.
[0050]
[Equation 3]

(ii) The vertex angle ψ corresponding to r calculated by the equation (3) is obtained by the following equation.
[0051]
[Expression 4]

here,
[0052]
[Equation 5]

And ψ_maxIs the radius r on the donut image_maxPosition and elevation angle φ_minIs the value of the apex angle ψ corresponding to. ψ_maxThe value of_minCan be obtained by substituting.
[0053]
(iii) The elevation angle φ corresponding to ψ calculated by equation (4) is obtained by equation (1).
[0054]
With the above procedure, an arbitrary point (u, v) in the donut image captured by the hyperboloid mirror can be transformed into a point (θ, φ) in the panoramic image. That is, the donut video is transformed into a panoramic video.
[0055]
FIG. 13 is a diagram for explaining a coordinate conversion table used in the deforming unit 33. When distributing panoramic video from shooting at a time, the calculation time required for the deformation process becomes a problem. Therefore, it is preferable to create a coordinate conversion table based on the above procedure as shown in FIG. is there. In the coordinate conversion table of FIG. 13, the coordinates (u, v) of the donut image corresponding to each point (θ, φ) are stored.
[0056]
The above deformation process is executed by the CPU 301 in the server 300. At this time, the HDD 303 stores in advance a predetermined program for performing the deformation process.
[0057]
1.2.4 Encoding section
The encoding unit 34 of FIG. 10 encodes at least one of the partial video data acquired by the second imaging unit 32 of FIG. 10 and the panoramic video data output by the deformation unit 33 into a format suitable for video distribution. To do. Here, the operation when RealVideo provided by RealNetworks is used as a format suitable for video distribution will be described. Encodes the acquired video data to RealVideo format using the RealProducer encoding program provided by the company. As long as acquisition of video data continues, RealProducer always encodes video and audio, and continuously transmits the encoded data to the distribution unit 35. The above encoding process is executed by the CPU 301 in the server 300. At this time, RealProducer is installed in the HDD 303 in advance.
[0058]
1.2.5 Distribution Department
The distribution unit 35 distributes the wide-angle video data encoded by the encoding unit 34 and at least one partial video data via the Internet in response to a request from the video display terminal held by the client. Here, an example of the operation when a program called RealServer provided by RealNetworks is used as the video distribution will be described.
[0059]
The client instructs the partial video data to be requested to the server 300 via the Internet. Here, the selection of the partial video data to be requested is performed by operating the video selection button 603 in FIG. Communication via the Internet is performed via TCP (Transport Control Protocol) connection, and communication between the video display terminal and the server 300 is performed via HTTP (HyperText Transfer Protocol) or RTSP.
[0060]
Upon receiving a request from the client, the server 300 transmits wide-angle video data and predetermined partial video data via the Internet according to the request. At this time, communication via the Internet is performed through UDP (User Datagram Protocol) connection, and transmission of video data is performed through RDT (Real Data Transport).
[0061]
Through the above operation, the video data distributed to the video display terminal is displayed by the video display program (for example, RealPlayer provided by RealNetworks) held by the client.
[0062]
2. Embodiment 2
Further, a curved mirror having a curvature in one direction can be used for the wide-angle camera 200.
[0063]
2.1 Configuration
The configuration of the second embodiment is shown in FIG. 4 as in the first embodiment.
[0064]
Hereinafter, the configuration of the wide-angle camera 200 in the present embodiment will be described. Note that the configurations of the server 300 and the camera array 400 are also the same as those described in the first embodiment, and a description thereof will be omitted.
[0065]
2.1.1 Wide angle camera
FIG. 14 is a diagram showing the configuration of the wide-angle camera 200 in the present embodiment. As shown in FIG. 14, the wide-angle camera 200 includes a camera 219 having a normal angle of view and a mirror 211 having a curvature in one direction, and cannot shoot all directions, but has a wide range. You can shoot a scene. FIG. 15 is a diagram showing a state of a wide-angle image projected on the camera 219 when the mirror 211 is used, and a scene behind the camera 219 can be taken. As shown in FIG. 15, the captured image has a shape compressed in the horizontal direction in a state where the horizontal angle of the incident light is proportional to the horizontal coordinate of the position of the image to be captured. . Further, it is possible to improve so as to reduce the reflection of the camera 219 itself in the image.
[0066]
As described above, wide-angle images can be taken with an inexpensive and simple configuration by combining a normal camera and a mirror.
[0067]
2.2 Operation
A block diagram according to function in the present embodiment is shown in FIG. 10 as in the first embodiment. Hereinafter, the operation of each unit shown in FIG. 10 will be described in detail. The operations of the first imaging unit 31, the second imaging unit 32, and the distribution unit 35 are the same as those in the first embodiment described above, and thus the description thereof is omitted.
[0068]
2.2.1 Deformation part
In Embodiment 2 of the present invention, it is possible to obtain a panoramic image by simply stretching the video data acquired by the wide-angle camera 200 in the horizontal direction. As in the case of using a hyperboloidal mirror, if a coordinate conversion table as shown in FIG. 13 is created and the coordinates (u, v) of the untransformed image corresponding to each point of the panoramic image are stored. Good.
[0069]
In addition, when this wide-angle camera 200 is used, it is possible to display a panoramic video on the client side that displays the video without providing the deformation unit 33 in the video distribution system 100. Now, it is assumed that an image having a horizontal (horizontal) shooting range of 180 degrees, a vertical (vertical) shooting range of 60 degrees, and a size of 352 × 240 pixels is acquired by the wide-angle camera 200. In this case, a panoramic image can be obtained by extending the horizontal length by three times, that is, by 1056 pixels. In addition, the machine name of the server 300 is “vidserv”, the wide-angle video data name distributed from the video distribution system is “movie.rm” (a data format called RealVideo described later), and communication between the video display terminal and the server 300 The protocol used is RTSP (Real Time Streaming Protocol). At this time, the process for executing the enlargement process can be described using SMIL (Synchronized Multimedia Integrated Language) recommended by W3C (World Wide Web Consortium) as shown in FIG. As shown in FIG. 16, when the size of the display area specified in the <region> tag is different from the image size of the associated video data “movie.rm”, the display is performed by specifying the “fit” attribute as “fill”. Video data is enlarged and reduced in accordance with the size of the area. That is, a panoramic video can be displayed by designating a display area having a desired enlargement ratio for video data and designating attribute values as described above. The above deformation process is executed simultaneously with the video display in the video display terminal of the client. That is, since it is not necessary for the server 300 to execute the deformation process, wide-angle video data can be distributed at a low processing cost.
[0070]
2.2.2 Encoding section
The operation of the encoding unit 34 is the same as that of the first embodiment, and at least one of the partial video data acquired by the second imaging unit 32 and the panoramic video data output by the deformation unit 33 are respectively Encode to a format suitable for video distribution.
[0071]
When the deforming unit 33 is not present, the encoding unit 34 encodes the wide-angle video data acquired by the first imaging unit 31 instead of the wide-angle video data deformed by the deforming unit 33.
[0072]
3. Embodiment 3
The third embodiment of the present invention relates to a video distribution system for specifying the correspondence between wide-angle video data acquired by the wide-angle camera 200 and each partial video data acquired by the camera array 400. . Examples of the “correspondence” here include the following.
The positional relationship between the wide-angle camera 200 and each camera 401 constituting the camera array 400
-Positional relationship between wide-angle video data and each partial video data
When the above correspondence is unknown, the client can guarantee that the desired partial video data is delivered from the server 300 even if the video selection button 603 in FIG. There is no. In order to solve this problem, a countermeasure such as installing the camera array 400 may be considered so that the partial video data to be distributed is switched counterclockwise when the left arrow button of the video selection button 603 is sequentially pressed. However, there is a problem that the installation work of the video distribution system 100 becomes troublesome because the switching order of the partial videos must correspond to the arrangement order of the cameras.
[0073]
FIG. 17 is a diagram showing an example of a display screen using the above correspondence in the video display terminal held by the client. In the figure, a bar 604 is installed below the wide-angle image 602, and the shooting range corresponding to the currently displayed partial image 601 is indicated by a black bar 605. Further, the shooting ranges of partial videos other than the currently displayed partial video 601 are indicated by gray bars 606, respectively. Here, when the client moves the cursor 607 by operating a mouse (not shown) and clicks on the gray bar 606 indicating a predetermined partial video, the client selects a partial video to be requested for distribution to the server 300. Information is transmitted, and the partial video 601 from the camera array 400 sent via the server 300 is switched to the corresponding partial video. In this way, the above-described correspondence is specified, so that the installation work of the video distribution system 100 becomes easy. In addition, the client can select a desired video more easily, and can further understand the scene to be photographed from the distributed video.
[0074]
Embodiment 3 of the present invention relates to a video distribution system for realizing such an operation.
[0075]
3.1 Configuration
As in the first embodiment, the configuration of the third embodiment of the present invention is shown in FIGS.
[0076]
3.2 Operation
FIG. 18 is a diagram showing the video distribution system according to the third embodiment of the present invention in functional blocks. In addition to the block diagram of the first embodiment shown in FIG. Part 36 is added. The operation of each unit shown in FIG. 18 will be specifically described below. Note that the operations of the first imaging unit 31, the second imaging unit 32, the deforming unit 33, and the encoding unit 34 are the same as those in the first embodiment, and a description thereof is omitted.
[0077]
3.2.1 Specific part
The specifying unit 36 performs an operation of specifying the correspondence between the wide-angle video data acquired by the wide-angle camera 200 and each partial video data acquired by the camera array 400. This operation will be described below.
[0078]
(1) A method of using identifiers assigned to the cameras 401 constituting the camera array 400
FIG. 19 is a diagram illustrating another operation example of the specifying unit 36. As shown in FIG. 19A, an identifier 403 is attached to each camera 401 constituting the camera array 400, and the camera 401 is arranged at a position where it can be captured by the wide-angle camera 200. In this state, the video data acquired by the wide-angle camera 200 is as shown in FIG. In this video data, the positional relationship between the wide-angle camera 200 and each camera 401 can be specified by detecting the image coordinates where the identifier 403 is reflected. Here, the identifier 403 includes
・ Seal with arithmetic numbers,
·barcode,
・ Color code,
・ 2D barcode,
The operation of reading these identifiers from video data is already well known in the field of pattern recognition.
(2) A method of using video data acquired by the wide-angle camera 200 and the camera array 400
FIG. 20 is a diagram illustrating another operation example of the specifying unit 36. In this operation example, a portion having a high degree of similarity between wide-angle video data and each partial video data is detected.
[0079]
Here, an operation when template matching is used as means for detecting a portion having a high degree of similarity will be described. First, as shown in FIG. 20A, a template 608 having a size of (2DX + 1) × (2DY + 1) is generated from each partial image acquired by the camera array 400. Next, as in Z20 (b), the template 608 is moved on the wide-angle image 602, and the normalized cross-correlation value S between the template 608 and the point (m, n) in the wide-angle image 602 is expressed by the following equation: Calculate based on
[0080]
[Formula 6]

Here, the meaning of each symbol in the equation (6) is as follows.
・ I₁(x, y): concentration at point (x, y) on the template
・ I₂(x, y): Density at point (x, y) on wide-angle image
Based on the above calculation, a point (m, n) in the wide-angle video 602 that maximizes the normalized cross-correlation value S is obtained, and the camera 401 corresponding to the position of the point may be specified. The positional relationship between the wide-angle camera 200 and each camera 401 can be specified by performing the above operation on all partial videos.
[0081]
Although it has been described that the similarity of images is obtained based on the cross-correlation of density, this is merely an example. You may obtain | require the similarity of an image | video based on another characteristic, such as the color space of an image | video, and an outline.
[0082]
(3) Manual identification method
FIG. 21 is a diagram showing a display screen of the server 300 in the third embodiment. This display screen appears immediately before starting the video distribution system 100 and starting the video distribution. Thereafter, the user first switches the displayed partial video 601 by operating the video selection button 603. Then, a message 609 that prompts the user to manually input the positional relationship between the currently displayed partial video 601 and wide-angle video 602 is presented on the display screen. At this time, the user manually inputs the positional relationship by operating a mouse (not shown) to move the cursor 607 and clicking a predetermined point on the wide-angle image 602. When manual input is completed, a cross-shaped pointer 610 is added to the position corresponding to the partial image 601 in the wide-angle image 602. The positional relationship between the wide-angle video 602 and each partial video 301 can be specified by executing the above operation on all the partial videos.
[0083]
This method is particularly effective when the arrangement positions of the wide-angle camera 200 and the camera array 400 are unchanged from the start to the end of video distribution. On the other hand, the methods (1) to (2) are effective even if the arrangement position of the camera 401 is changed halfway.
[0084]
The above processing is executed by the CPU 301 in the server 300. At this time, the HDD 303 stores in advance a predetermined program for performing the specific processing.
[0085]
3.2.3 Distribution Department
The distribution unit 35 distributes the wide-angle video data and at least one partial video data encoded by the encoding unit 34 via the Internet in response to a request from the video display terminal held by the client. At this time, if not only the video data but also the correspondence specified by the specifying unit 36 is delivered to the client, the display screen shown in FIG. 17 can be presented on the video display terminal. Is preferred. The video distribution operation is the same as that in the first embodiment.
[0086]
4). Embodiment 4
The fourth embodiment of the present invention relates to a video distribution system that automatically selects each partial video data acquired by the camera array 400.
[0087]
In the first to third embodiments, the client selects a partial video for which distribution is requested. However, it is troublesome to manually select a partial video each time.
[0088]
FIG. 22 is a diagram showing a display screen on which a partial video 601 to be displayed is automatically selected on a video display terminal held by a client. As shown in the figure, when a check box 611 written “AUTO” is checked, the mode is switched to a mode in which the partial video 601 is automatically selected and distributed. On the other hand, the server 300 automatically selects a partial video in which an important scene such as a speaker is shown and distributes it to the client together with a wide-angle video. Accordingly, the client can understand the scene to be photographed more deeply from the distributed video without troublesome operations.
[0089]
The fourth embodiment relates to a video distribution system for realizing such an operation.
[0090]
4.1 Configuration
FIG. 23 shows the configuration of a video distribution system according to Embodiment 4 of the present invention. A wide-angle camera 200, a camera array 400, and a microphone array 500 are connected to the server 300, and wide-angle video data, a plurality of partial video data, and a plurality of audio data are acquired. Video data acquired by the server 300 is distributed via the Internet and displayed on a client PC connected to the Internet. Further, the audio data acquired by the server 300 is used for selecting a video, which will be described later, and is distributed via the Internet as necessary, and is reproduced on a client PC connected to the Internet.
[0091]
Next, the configuration of each part will be described. Note that the configurations of the wide-angle camera 200, the camera array 400, and the server 300 are the same as those in the first embodiment, and a description thereof will be omitted.
[0092]
4.1.1 Microphone array
The microphone array 500 includes at least two microphones 501-1 and 501-2. As the microphones 501-1 and 501-2 used, various types such as a piezoelectric type and a capacitive type (so-called condenser microphone) can be used. As with the camera 401, each microphone 501-1, 501-2 is fixed separately to each other even if they are separately located. It may be arranged. FIG. 24 is a diagram for explaining the configuration of the wide-angle camera 200 and the microphone array 500 in the fourth embodiment. As described above, the wide-angle camera 200 and the microphone array 500 may be integrated into one housing. As shown in FIG. 24, the image sensor 214 of the camera unit 201 constituting the wide-angle camera 200 and the microphones 501-1 and 501-2 constituting the microphone array 500 are arranged on the pedestal 202.
[0093]
Audio signals acquired by the microphones 501-1 and 501-2 are digitized inside the microphones and then sent to the server 300. Similar to the camera array 400, the microphone array 500 is connected via the external I / F 307 of the server 300, specifically via a high-speed serial interface such as USB 2.0, so that the partial video and audio can be synchronized. It is possible to get to.
[0094]
4.2 Operation
FIG. 25 is a diagram illustrating functional blocks of the video distribution system according to the fourth embodiment. In addition to the block diagram of the third embodiment shown in FIG. 18, an audio acquisition unit 37, a sound source detection unit 38, and a video selection unit 39 are added. In the following, the operation of each part shown in FIG. 25 will be specifically described. Note that the operations of the first imaging unit 31, the second imaging unit 32, the deforming unit 33, and the specifying unit 36 are the same as those in the above-described embodiment, and thus description thereof is omitted.
[0095]
4.2.1 Voice acquisition unit
The configuration and operation of the audio acquisition unit 37 is to output digital audio data acquired by the camera array 400 described in 4.1.1 above.
[0096]
4.2.2 Sound source detector
The sound source detection unit 38 detects the position or direction of the speaker based on the audio data acquired by the audio acquisition unit 37. An example of the operation will be described below.
[0097]
(1) Method based on difference in arrival time of audio input to microphone array 500
This method is effective when a plurality of microphones 501 are fixed at a known position of a certain housing. FIG. 26 is a diagram for explaining the operation principle of the sound source detection unit 38 according to the fourth embodiment of the present invention. As shown in FIG. 26, two microphones 501-1 and 501-2 (referred to as microphone 1 and microphone 2 for convenience) are arranged at a distance l, and sound enters from the θ direction. Audio data output by microphone 1₁(t) and audio data s output by microphone 1₁The relationship with (t) is
[0098]
[Expression 7]

v: Sound speed
And the audio data of microphone 1 is
[0099]
[Outside 1]

Only time will be advanced. A procedure for specifying the direction of the speaker's voice using this principle will be described.
[0100]
First, the arrival time difference between the audio data of the microphone 1 and the microphone 2 is detected. This arrival time difference is, for example, the voice data s of the microphone 1₁(t) and microphone 2 audio data s₂Calculated by the cross-correlation value with (t + dt). Here, the cross-correlation value C (t, dt) is calculated by the following equation.
[0101]
[Equation 8]

Here, N is a positive integer indicating the size of the correlation window, and equation (8) indicates that the product-sum operation is performed using N samples before time t. At this time, dt that maximizes C (t, dt) is the arrival time difference.
[0102]
Next, the angle θ formed by the voice and the base line of the microphone is calculated using the microphone interval l, the arrival time difference dt, and the sound velocity v.
[0103]
[Equation 9]

Here, the value range of θ is 0 ° or more and 180 ° or less.
[0104]
Note that the direction is detected only in the range of 180 ° in front of the microphones 501-1 and 501-2 only by the above procedure, and the sound source direction is not specified. That is, the angle θ output by the sound source detection unit 38 is actually an angle formed by the voice arrival direction and the base line between the two microphones. The actual voice direction is two microphones as shown in FIG. It exists anywhere on the side of the cone of apex angle θ with the midpoint as the apex.
[0105]
In order to solve this problem, correction is performed using another microphone pair that is not parallel to the pair composed of the microphone 1 and the microphone 2. FIG. 28 is an explanatory diagram showing how the four microphones 501-1, 501-2, 501-3, and 501-4 are divided into two groups and the sound source direction is detected. As shown in FIG. 28, the grouping is performed using a certain microphone 501-1 and 501-3 (for example, microphone 1 (microphone 3)) and microphones 501-2 and 501-4 (microphone 2) that are the farthest from the microphone. (Mic 4)).
[0106]
By using a pair of two microphones that are farthest apart, the difference in arrival time of the voice is maximized, and the accuracy of direction detection is improved. Here, the microphone array 500 includes four microphones 501-1, 501-2, 501-3, and 501-4. However, the direction of the sound source can be detected with high accuracy using three microphones. FIG. 29 is an explanatory diagram for explaining how to use a set of microphones when the microphone array 500 is configured by three microphones 501-1, 501-2, and 501-3. As shown in the figure, by arranging the microphones in an equilateral triangle, the direction of the sound source can be detected with high accuracy regardless of which microphone pair is employed. In the example shown in FIG. 29, the first set and the second set can be adopted to detect the sound source in all directions, but the third set may be used complementarily.
[0107]
(2) Directional microphone array method
It is also possible to detect the direction of the speaker by using a directional microphone that can input only a limited range of speech. FIG. 30 is an explanatory diagram for explaining the relationship between the microphone array 500 and the sound source direction in the fourth embodiment. The microphone array 500 includes four microphones 501-1, 501-2, 501-3, and 501-4 having directivity, and determines the sound source direction based on the sound intensity. For convenience, four microphones 501-1, 501-2, 501-3, and 501-4 are designated as microphones 1 to 4.
[0108]
Assume that the voice intensity is 20 for

microphone

1, 30 for

microphone

2, 20 for

microphone

3, and 5 for microphone 4. In this case, it is determined that there is a sound source in the direction of the microphone 2. Comparing the intensities of the microphone 1 and the microphone 3 both have the same value of 20. Therefore, the sound source direction is finally determined to be the direction of the microphone 2 (the direction indicated as θ = 45 ° in the figure).
FIG. 31 is a diagram for explaining another example of the operation of the sound source detection unit 38 in the fourth embodiment. Assume that the voice intensity is 15 for

microphone

1, 30 for

microphone

2, 25 for

microphone

3, and 5 for microphone 4. In this case, it is initially determined that there is a sound source in the direction of the microphone 2. Comparing the strength of microphone 1 and microphone 3, since the strength of microphone 3 is greater than microphone 1, the direction of the sound source is slightly moved from microphone 2 to microphone 3 (the direction indicated as θ = 30 ° in the figure) ). The amount of movement in this direction may be determined in advance according to the characteristics of the directional microphone.
[0109]
The function of the sound source detection unit 38 described above is executed by the CPU 301 in the server 300. At this time, the HDD 303 stores a predetermined program for realizing the function in advance.
[0110]
4.2.3 Video selection section
The video selection unit 39 automatically selects a partial video to be distributed to the client using the correspondence specified by the specification unit 36 and the position or direction of the speaker detected by the sound source detection unit 38. Is.
[0111]
FIG. 32 is a diagram illustrating an example of the operation of the video selection unit 39 according to the fourth embodiment, and a state in which six participants 2 of A to F surround the table 1 and hold a conference is viewed from above. Is. On the table 1, a wide-angle camera 200 and a microphone array 500 are installed, and one camera 401 (not shown) is installed for each participant. Assume that the direction of the sound source detected by the sound source detection unit 38 is as indicated by an arrow 381 in the figure. At this time, the video selection unit 39 is a camera arranged closest to the direction of the sound source based on the direction of the sound source and the correspondence relationship between the wide-angle camera 200 and each camera 401 specified by the specifying unit 36. Select 401. That is, in the figure, the camera 401 that is shooting the participant E is selected.
[0112]
The function of the video selection unit 39 described above can be realized by the CPU 301 in the server 300. At this time, the HDD 303 stores a predetermined program for realizing the function in advance.
[0113]
4.2.4 Encoding section
The operation of the encoding unit 34 encodes the partial video data selected by the video selection unit 39 and the panoramic video data output by the transformation unit 33 into a format suitable for video distribution. The encoding operation is the same as that in the first embodiment.
[0114]
If the deforming unit 33 does not exist, the encoding unit 34 encodes the wide-angle video data acquired by the first imaging unit 31 instead of the wide-angle video data deformed by the deforming unit 33. .
[0115]
4.2.5 Distribution Department
The distribution unit 35 distributes wide-angle video data and partial video data encoded by the encoding unit 34 via the Internet. At this time, when not only the video data but also the correspondence specified by the specifying unit 36 and the shooting range of the partial video selected by the video selecting unit 39 are delivered to the client, the video display terminal shown in FIG. This is preferable because the displayed display screen can be presented. The video distribution operation is the same as that in the first embodiment.
[0116]
5). Embodiment 5
The fifth embodiment of the present invention relates to a video distribution system that automatically selects each partial video data acquired by the camera array 400, as in the fourth embodiment described above. Each configured camera 401 and each microphone 501 configuring the microphone array 500 are configured to have a one-to-one correspondence. Here, the “one-to-one correspondence” is defined as “there is one microphone 501 arranged at an approximately coincident position or direction with respect to each camera 401”.
[0117]
5.1 Configuration
The configuration of video distribution system 100 in the present embodiment is shown in FIG. 23 as in the fourth embodiment.
[0118]
Next, the configuration of each part in the above diagram will be described. Note that the configurations of the wide-angle camera 200 and the server 300 are the same as those in the first embodiment described above, and thus the description thereof is omitted.
[0119]
5.1.1 Camera array and microphone array
FIG. 33 is a diagram illustrating the appearance of the camera 401 and the microphone 501 according to the fifth embodiment of the present invention. As shown in the figure, the camera 401 and the microphone 501 have a structure integrated with a common housing 502. The microphone 501 has directivity and can input only a limited range of sound. One integrated camera 401 and microphone 501 is installed per participant.
[0120]
5.2 Operation
34 is a diagram in which the video distribution system according to Embodiment 5 of the present invention is rewritten into a functional block diagram, and the sound source detection unit 38 is deleted from the block diagram of Embodiment 4 shown in FIG. In addition, the connection from the specifying unit 36 to the video selection unit 39 is deleted. The operations of the first imaging unit 31, the second imaging unit 32, the deforming unit 33, the encoding unit 34, the distribution unit 35, the specifying unit 36, and the voice acquisition unit 37 are the same as those in the above-described fourth embodiment.
[0121]
5.2.1 Video selection section
By using the integrated camera 401 and microphone 501 described above, the correspondence between each camera 401 and microphone 501 is known. Therefore, the video selection unit 39 may select a partial video acquired by the camera 401 corresponding to the microphone 501 from which the largest signal amplitude is obtained.
[0122]
6). Embodiment 6
The present embodiment relates to a video distribution system that automatically selects each partial video data acquired by the camera array 400, as in the fourth or fifth embodiment.
[0123]
6.1 Configuration
Similar to Embodiments 1 to 3 described above, FIG. 4 to FIG. 9 show.
[0124]
6.2 Operation
FIG. 35 is a diagram in which the video distribution system according to the sixth embodiment of the present invention is rewritten into a functional block diagram. In addition to the block diagram of the third embodiment shown in FIG. The selection unit 39 and the motion detection unit 40 are added, and the operation of each unit shown in Fig. 35 will be specifically described below: the first imaging unit 31, the second imaging unit 32, and the deformation unit Since the operations of 33, the encoding unit 34, the distribution unit 35, and the specifying unit 36 are the same as those in the above-described embodiment, description thereof will be omitted.
[0125]
6.2.1 Motion detection unit
The motion detection unit 40 detects the motion of the subject in the wide-angle video data and outputs the motion feature amount at each part in the video. Here, the “motion feature amount” refers to the magnitude of the movement of the subject.
[0126]
Motion detection in a moving image can be realized by a known technique such as a method of obtaining a difference between frames of the previous time and the current time, a method using an optical flow, and the like. With these technologies, it is possible to detect the position of the subject and the magnitude of the movement in the wide-angle video data. According to this operation, when the present invention is used as a remote monitoring system, a partial video from a camera that captures a moving subject is distributed, which is preferable.
[0127]
Further, when the video distribution system according to the sixth embodiment is used as a remote conference system, it is preferable that the position or direction of the speaker is automatically detected by detecting the movement of the lips of the participant. . Detection of lip movement can be realized by well-known techniques such as literature (M. Kass, A. Witkin and D. Terzopoulos: “SNAKES: Active Contour Models”, ICCV, pp. 259-268 (1987)). Further, when the microphone 501 can be used as in the fourth to fifth embodiments, the detection accuracy of the speaker is improved by detecting the movement of the lips together with the extraction result of the utterance section based on the voice data. You can also For example, Japanese Patent Application Laid-Open No. 6-43897 filed by the present applicant discloses a system for recognizing a conversation using a voice feature extracted from voice data and a dynamic visual feature of a face extracted from video data. Is disclosed. This operation makes it possible to detect the position or direction of the speaker more stably even when a lot of noise other than speech is occupied in the voice data.
[0128]
The function of the motion detection unit 40 described above may be implemented in the wide-angle camera 200 or may be realized by the CPU 301 in the server 300. In the latter case, the HDD 303 stores a predetermined program for realizing the function in advance.
[0129]
6.2.2 Image selector
The video selection unit 39 according to the sixth embodiment uses the correspondence specified by the specifying unit 36 and the feature amount of the subject motion detected by the motion detection unit 40 to deliver the partial video to the client. Is automatically selected. The video selection unit 39 first specifies the image position where the largest motion is detected in the wide-angle video data based on the feature amount of the motion of the subject. Next, based on the identified image position and the correspondence between the wide-angle camera 200 identified by the identifying unit 36 and each camera 401, the same procedure as described in the above-described fourth embodiment is performed. The camera 401 arranged closest to the position is selected. As a result, it is possible to automatically select a partial video obtained by photographing the subject in which the largest movement is detected.
[0130]
The function of the video selection unit 39 described above is executed by the CPU 301 in the server 300. At this time, the HDD 303 stores a predetermined program for realizing the function in advance.
[0131]
7. Embodiment 7
In Embodiment 6 described above, the motion of the subject is detected in the wide-angle video, but the motion of the subject may be detected in each partial video data acquired by the camera array 400.
[0132]
7.1 Configuration
The configuration of the seventh embodiment of the present invention is shown in FIGS. 4 to 9 as in the first to third embodiments.
[0133]
7.2 Operation
FIG. 36 is a diagram showing functional blocks of the video distribution system according to Embodiment 7 of the present invention. In the following, the operation of each part shown in FIG. 36 will be specifically described. The operations of the first image capturing unit 31, the second image capturing unit 32, the deforming unit 33, the encoding unit 34, and the distributing unit 35 are the same as those in the above-described embodiment, and thus description thereof is omitted.
[0134]
7.2.1 Motion detector
The motion detection unit 40 according to the seventh embodiment detects the motion of the subject in each partial video data and outputs the motion feature amount in each partial video data. Here, as in the sixth embodiment, the “feature feature amount” indicates the magnitude of the movement of the subject. The movement of the subject in each partial video data is also detected by the well-known technique described in the sixth embodiment.
[0135]
In addition, when the video distribution system according to the present embodiment is used as a remote conference system, it is preferable to automatically detect the position or direction of the speaker by detecting the movement of the participant's lips in the partial video. It is. This operation can also be realized by the well-known technique described in the sixth embodiment. In the present embodiment, since the face of each participant is photographed largely by the camera 401, the movement of the lips of the participant can be detected more stably than in the above-described sixth embodiment. .
[0136]
The function of the motion detection unit 40 described above may be implemented in the camera 401 or may be realized by the CPU 301 in the server 300. In the latter case, the HDD 303 stores a predetermined program for realizing the function in advance.
[0137]
7.2.2 Video selection part
The video selection unit 39 in the present embodiment automatically selects a partial video to be distributed to the client based on the motion of the subject in the partial video detected by the motion detection unit 40. Specifically, the partial video in which the largest motion is detected is identified from the feature amount of the subject motion in each partial video, and this is automatically selected as the partial video to be distributed to the client. Here, since this embodiment does not necessarily require the specifying unit 36, it is possible to select an appropriate partial video with a simpler configuration and processing than in the above-described sixth embodiment.
[0138]
The function of the video selection unit 39 described above is executed by the CPU 301 in the server 300. At this time, the HDD 303 stores a predetermined program for realizing the function in advance.
[0139]
7.3 Other
In the above-described sixth embodiment or the present embodiment, it is preferable that each camera 401 constituting the camera array 400 includes an imaging region that is partially in common with other cameras. FIG. 37 (a) is a diagram illustrating a screen of the video display terminal when each camera does not include a common shooting area. As shown in FIG. 37, when Participant A is standing and moving, the video selection unit 39 displays a partial video including a shooting area closest to Participant A (in FIG. 37, a black bar 605 is used). Automatically select the one shown). However, when the participant A moves to a place where none of the cameras 401 can shoot, a partial video in which no important subject is captured is selected. As described above, there arises a problem that a partial video image that is continuously tracked and shown on a moving subject cannot be distributed.
[0140]
Therefore, as shown in FIG. 37 (b), this problem can be solved if each camera is arranged so as to include a common photographing region. In the figure, a bar indicated by diagonal lines indicates a range in which two or more cameras 401 are photographed in duplicate. As shown in FIG. 8, when the camera array 400 is configured by fixing each camera 401 to the housing 402, each camera 401 may be fixed so that the photographing ranges partially overlap each other.
[0141]
8). Embodiment 8
Note that the video distribution system 100 according to the present invention can realize its functions by a PC. In this case, the functions can be realized by storing software that realizes the above-described units in a hard disk and appropriately executing a processing program.
[0142]
9. Embodiment 9
Further, the program can be stored in a recording medium such as a CD-ROM. As shown in FIG. 38, the function can be realized by mounting a CD-ROM 308 storing the program on a PC and appropriately executing the program. Needless to say, the recording medium for storing the program is not limited to the CD-ROM 308, and may be another medium such as a DVD-ROM.
[0143]
Each of the above embodiments is merely an example of the present invention, and the scope of rights of the present invention should not be limited or reduced as in the above embodiment. For example, in each embodiment, it has been described that video data is distributed through the Internet, but another communication line such as satellite communication or terrestrial communication may be used.
[0144]
Moreover, although the wide-angle camera 200, the camera array 400, and the microphone array 500 have been described using the configuration example in which they are connected to the USB hub, these connection forms are not limited to the above description. For example, another interface such as a PCI bus, IEEE 1394, or Bluetooth may be used.
[0145]
Further, as the mirror 211 used in the wide-angle camera 200, a hyperboloidal mirror and a curved mirror having a curvature in one direction are mentioned in the embodiment, but other forms such as a parabolic mirror and a conical mirror are used. It doesn't matter.
[0146]
In the description of the first imaging unit 31, it has been described that the video data digitized by the wide-angle camera 200 is output. However, the wide-angle camera 200 may output an analog video signal. In this case, digital video data can be output by combining the wide-angle camera 200 and a video capture board that digitizes an analog video signal. That is, an operation similar to that of the first imaging unit 31 described in the above embodiment can be realized.
[0147]
In the description of the second imaging unit 32, it has been described that the partial video data digitized by each camera 401 constituting the camera array 400 is output. However, each of these cameras 401 outputs an analog video signal. It does not matter. In this case, by combining these cameras 401 with a video capture board that performs digitization processing on multi-channel analog video signals, digital partial video data can be output. That is, an operation similar to that of the second imaging unit 32 described in the above embodiment can be realized.
[0148]
Further, in the description of the sound acquisition unit 37, it has been described that each microphone 501 constituting the microphone array 500 outputs the digitized sound data. However, each of the microphones 501 outputs an analog sound signal. It doesn't matter. In this case, digital audio data can be output by combining these microphones 501 and an audio capture board that digitizes multi-channel analog audio signals. That is, an operation similar to that of the voice acquisition unit 37 described in the above embodiment can be realized.
[0149]
Further, although the encoding unit 34 and the distribution unit 35 have been described as being mounted on the same server 300, an encoding PC may be installed separately from the server 300. In this case, the encoded data is transferred from the encoding PC to the server 300 via the telecommunication line.
[0150]
In the encoding unit 34, it has been described that the video data is converted into the RealVideo format using RealProducer provided by RealNetworks, but the configuration of the encoding unit is not limited to this. For example, the video data may be converted into the Windows (registered trademark) Media Video format using a program called Windows (registered trademark) Media Encoder provided by Microsoft. The same applies to the distribution unit 35, and another program such as Windows (registered trademark) Media Service may be used instead of RealServer.
[0151]
Further, although the client PC has been described as an example of the video distribution target of the distribution unit 35, a terminal such as a PDA (Personal Digital Assistant) or a mobile phone may be used. At this time, if the Windows (registered trademark) Media Player for PDA provided by Microsoft Corporation is used, the above-mentioned Windows (registered trademark) Media Video format video data can be reproduced on the PDA.
[0152]
In the description of the operation of the motion detection unit 40, it has been described that the “feature feature amount” indicates the magnitude of the movement of the subject, but it may be different, for example, the shape of the movement locus of the subject.
[0153]
【The invention's effect】
  According to the present invention,By providing the specifying unit 36 for specifying the correspondence between the wide-angle video acquired by the first imaging unit 31 and each partial video acquired by the second imaging unit 32, the viewer has high resolution. It is possible to more easily select an image of a desired scene having
[0155]
  Furthermore, according to the present invention,Depending on the position or direction of the sound source output by the sound source detection unit 38By providing the video selection unit 39 for selecting a predetermined video, it is possible to acquire and distribute a video of a desired scene with high resolution without forcing the user to perform a troublesome operation.
[0156]
  Furthermore, by providing the deforming unit 33 that deforms the wide-angle video, it is possible to display the acquired wide-range video in a form that can be more easily observed by the viewer.
[0157]
  Furthermore, each video acquired by the second imaging unit 32 includes at least one other video and a part of the common area, so that a video of a desired scene can be acquired and displayed at a high resolution without omission. It becomes possible.
[Brief description of the drawings]
FIG. 1 is a diagram showing a usage example of a video distribution system according to the present invention.
FIG. 2 is a diagram showing an example of a display screen of the video display terminal according to the present invention.
FIG. 3 is a diagram showing another example of use of the video distribution system according to the present invention.
4 is a diagram showing a configuration of a video distribution system according to Embodiment 1. FIG.
5 is a diagram showing a configuration of a wide-angle camera 200 according to Embodiment 1. FIG.
6 is a diagram showing a structure of a wide-angle camera 200 according to Embodiment 1. FIG.
7 is a diagram illustrating an image captured by the wide-angle camera 200 illustrated in FIG. 6. FIG.
8 is a diagram showing an example of a camera array 400 according to Embodiment 1. FIG.
FIG. 9 is a diagram showing a configuration of a server 300 according to the first embodiment.
FIG. 10 is a block diagram showing an operation according to the first embodiment.
FIG. 11 is a diagram for explaining the operation of a deforming unit 33 in the first embodiment.
12 is a diagram for explaining the principle in a deforming unit 33. FIG.
FIG. 13 is a diagram for explaining a coordinate conversion table used in a deforming unit 33;
14 is a diagram illustrating a configuration of a wide-angle camera 200 according to Embodiment 2. FIG.
15 is a diagram showing an image captured by the wide-angle camera 200 shown in FIG.
FIG. 16 is a diagram illustrating an example in which the operation of the deforming unit 33 in the second embodiment is realized simultaneously with video display.
FIG. 17 is a diagram showing an example of a display screen of a video display terminal when the video distribution system according to Embodiment 3 is used.
FIG. 18 is a block diagram showing an operation according to the third embodiment.
FIG. 19 is a diagram illustrating an example of the operation of the specifying unit 36 according to the third embodiment.
FIG. 20 is a diagram showing an example of the operation of the specifying unit 36 according to the third embodiment.
FIG. 21 is a diagram showing an example of a display screen of the server 300 according to the third embodiment.
FIG. 22 is a diagram showing an example of a display screen of a video display terminal when the video distribution system according to Embodiment 4 is used.
23 is a diagram showing a configuration of a video distribution system according to Embodiment 4. FIG.
24 is a diagram illustrating the configuration of a wide-angle camera 200 and a microphone array 500 according to Embodiment 4. FIG.
FIG. 25 is a block diagram showing an operation according to the fourth embodiment.
FIG. 26 is a diagram for explaining the operation principle of the sound source detection unit in the fourth embodiment.
FIG. 27 is a diagram illustrating a problem of the sound source detection unit in the fourth embodiment.
28 is a diagram illustrating an arrangement example of microphones 501 in Embodiment 4. FIG.
29 is a diagram illustrating another arrangement example of microphones 501 according to Embodiment 4. FIG.
30 is a diagram illustrating the operation of the sound source detection unit 38 in the fourth embodiment. FIG.
31 is a diagram for explaining the operation of the sound source detection unit 38 according to Embodiment 4. FIG.
32 is a diagram for explaining the operation of the video selection unit 39 in the fourth embodiment. FIG.
33 is a diagram illustrating a configuration of a camera 401 according to Embodiment 5. FIG.
FIG. 34 is a block diagram showing an operation according to the fifth embodiment.
FIG. 35 is a block diagram showing an operation according to the sixth embodiment.
FIG. 36 is a block diagram showing an operation according to the seventh embodiment.
FIG. 37 is a diagram showing a problem when the video distribution systems according to Embodiments 6 and 7 are used.
38 is a diagram illustrating a configuration example according to Embodiment 9. FIG.
[Explanation of symbols]
1 table
2 participants
3 Cabinet
31 First imaging unit
32 Second imaging unit
33 Deformation part
34 Encoding section
35 Distribution Department
36 Specific part
37 Voice acquisition unit
38 Sound source detector
39 Video selection section
40 Motion detector
200 Wide-angle camera
211 mirror
212 lenses
213 Aperture
214 Image sensor
215 Drive unit
216 Pre-processing circuit
217 Motor drive unit
300 servers
310 bus
320 USB hub
330 Internet
350 client PC
400 camera array
401-1 to 401-4 camera
500 Microphone array
501 microphone
600 Display window

Claims

A first imaging means for acquiring a wide-angle image;
A second imaging means for synchronously acquiring videos in which a plurality of different predetermined areas are captured;
Specifying means for specifying the correspondence between the wide-angle video and each video acquired by the second imaging means;
Possess and distribution unit for distributing at least one, the wide-angle image, and the corresponding relationship specified by the specifying means of the image acquired by the second imaging means,
The second imaging means is composed of a plurality of cameras, and each of the plurality of cameras is assigned an identifier,
The first imaging means includes an identifier attached to the plurality of cameras in an imaging range,
The specifying unit, based on the imaging position of the identifier included in the wide-angle image, a video distribution system, characterized in der Rukoto intended to identify the correspondence relation.

The specifying means is:
The video distribution system according to claim 1, wherein the correspondence relationship is specified based on a similarity between the wide-angle video and each video acquired by the second imaging unit.

A first imaging means for acquiring a wide-angle image;
A second imaging means for synchronously acquiring videos in which a plurality of different predetermined areas are captured;
Video selection means for selecting a predetermined video from a plurality of videos acquired by the second imaging means;
Distribution means for distributing the predetermined video selected by the video selection means and the wide-angle video;
A plurality of voice acquisition means for acquiring voice intensity;
The position or direction of the sound source is detected on the basis of the sound acquisition means having the maximum sound intensity and the relative intensity difference between the remaining sound acquisition means among the plurality of sound intensity acquired by the plurality of sound acquisition means. Sound source detection means,
The video distribution system, wherein the video selection means selects the predetermined video based on a position or direction of a sound source output by the sound source detection means.

Furthermore, it has a motion detection means for detecting the motion of the subject in the wide-angle video or a plurality of videos acquired by the second imaging means,
The video distribution system according to claim 3, wherein the video selection unit is configured to select the predetermined video based on the motion of the subject output by the motion detection unit.

A program that causes a computer to execute processing according to each unit of the video distribution system according to any one of claims 1 to 4 .

Processing for performing in accordance with the means of the video distribution system according to any one of claims 1 to 4, the recording medium characterized by recording a computer readable program software.