JP3572849B2

JP3572849B2 - Sound source position measuring device and camera photographing control device

Info

Publication number: JP3572849B2
Application number: JP3029297A
Authority: JP
Inventors: 昌秀孫
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1997-02-14
Filing date: 1997-02-14
Publication date: 2004-10-06
Anticipated expiration: 2017-02-14
Also published as: JPH10227849A

Description

【０００１】
【発明の属する技術分野】
本発明は音源からの音情報によって音源位置を計算する音源位置計測装置、及び音源位置の情報に基づいてカメラの撮影位置を制御するカメラ撮影制御装置に関し、特に同一音源からの音が複数のマイクに到達するときの時間差を利用して音源位置を計算する音源位置計測装置、及びその計算結果に基づいてカメラの撮影位置を制御するカメラ撮影制御装置に関する。
【０００２】
【従来の技術】
会議の映像記録や遠隔会議システムにおいては、会議の場全体を撮影したり、個々の発言者をズームアップして捉えることが重要である。これによって、発言者の表情や会議の雰囲気をよりよく捉え、会議の臨場感を高めることができるからである。しかも、このような撮影は自動的に行われることが望まれている。
【０００３】
そこで、発言者を撮影できる会議システムとして特開平４−１２２１８４号公報や特開平４−２９７１９６号公報のような発明がなされている。前者は、個々の音源すなわち各発言者それぞれにマイクを割り当て、そのマイクからの信号によって発言者の検出を行うもので、発言者の撮影は１台のカメラによって行われる。一方後者の発明は、各話者ごとにカメラとマイクを割り当て、それぞれの話者の発言を各話者の声紋によって判別するという方法を採っている。また、特開平６−２１７３０４号公報では、話者ごとに固有の無線信号を発信する発信機を各マイクに取り付け、この無線信号を受信する受信機によって計測された受信信号をもとに発信機の座標位置を求める方法を採っている。
【０００４】
ところが、これらのシステムでは、参加者全員に対してマイクを用意しなければならないため煩雑な設置作業が必要になると共に、カメラで撮影する位置はあらかじめ設定された場所のみであるため、柔軟性にかけていた。
【０００５】
それに対し、特開平７−１４０５２７号公報に見られるカメラ撮影制御装置では、同一音源からの音を２本のマイクが捉えたときに、マイク間の音の位相差を利用して音源の方向を知るという方法をとることによって、少ないマイクで多くの音源方向にカメラを向けることができるように工夫している。
【０００６】
また、同一音源からの音が複数のマイクに到達するときの時間差を利用して、音源の方向を知ること、すなわち音源定位が可能であることは広く知られている。この場合、２本のマイクでは、それらを結ぶ線の中点に垂直な面を対称面とした対称性が残るため、前後の判定ができなくなる。このため、特開平７−１４０５２７号公報に開示された発明では、カメラの脇にマイクを取り付け、カメラ前方の音のみを捉えるようにしている。
【０００７】
また、会議の映像記録を行う際に各話者の発話時刻の情報を取得できると、後に映像の編集に場合に便利である。そこで従来は、各話者に割り当てられたマイクへの音声入力を検知することによって、発話時刻の情報を取得することも行われていた。
【０００８】
【発明が解決しようとする課題】
しかし、話者をカメラで撮影する場合には、従来の方法によって求められるのはあくまで音源の方向であって、音源位置を求めるには至っていないという問題点があった。すなわち、求められる音源の方向はマイクの位置が基準となっていることから、マイクとカメラが一体でなくてはならない。その結果、マイクとカメラとを分離することができず、各話者からの発声を捉えやすい位置にマイクを配置することができないという制約があった。しかも、カメラで撮影する場合には、カメラから音源までの位置によってズームの度合いが変わるため、音源からの距離が分からないと、ズーム量を調節することはできない。
【０００９】
また、発話時刻の情報を取得する場合に、話者ごとにマイクを割り当てるのでは、マイクの設置やケーブルの取り回し等のために煩雑な作業が必要となる。従って、もっと簡単な構成で、マイクの捉えた音声と話者とを対応づけられることが望まれている。
【００１０】
本発明はこのような点に鑑みてなされたものであり、同一音源からの音が複数のマイクに到達するときの時間差を利用して、音源位置を求めることができる音源位置計測装置を提供することを目的とする。
【００１１】
また、本発明の別の目的は、カメラとマイクとを異なる位置に配置しても、音源の方向へカメラを向けることができるカメラ撮影制御置を提供することである。
【００１３】
【課題を解決するための手段】
本発明では上記課題を解決するために、音情報によって音源位置を計算する音源位置計測装置において、正三角形の３つの頂点に設置されたマイクと、１つの音源の発した音が前記マイクのそれぞれで捉えられた時刻を音検出時刻として検出する音信号検出手段と、前記マイクごとの音検出時刻に基づいて、前記マイクの２個ずつを組合わせることにより得られる３組のマイクペアごとの音の到達時間差を測定する時間差測定手段と、前記各マイクペアを結ぶ線から±６０度の範囲を前記各マイクペアの音源計測範囲とし、前記各マイクペアが捉えた音の到達時間差の符号によって音源の存在可能範囲を限定し、それぞれの前記マイクの位置と前記マイクペアごとの音の到達時間差とから、音源位置を計算する音源位置計算手段と、を有することを特徴とする音源位置計測装置が提供される。
【００１４】
このような音源位置計測装置によれば、ある音源から音が発せられると、各マイクにより、その音源からの距離に応じた時刻にその音が捉えられる。次いで、マイクが音を捉えた時刻が音信号検出手段で検出され、音検出時刻とされる。すると、その音検出時刻に基づいて、時間差測定手段により、マイクペアごとの音の到達時間差が測定される。そして、音源位置計算手段により、各マイクペアが捉えた音の到達時間差の符号によって音源の存在可能範囲が限定され、マイクペアごとの音の到達時間差に基づいて音源位置が計算される。
【００１５】
また、回転台の上に設置されたカメラの向きを制御しながら撮影を行うカメラ撮影装置において、正三角形の３つの頂点に設置されたマイクと、１つの音源の発した音が前記マイクのそれぞれで捉えられた時刻を音検出時刻として検出する音信号検出手段と、前記マイクごとの音検出時刻に基づいて、前記マイクの２個ずつを組合わせることにより得られる３組のマイクペアごとの音の到達時間差を測定する時間差測定手段と、前記各マイクペアを結ぶ線から±６０度の範囲を前記各マイクペアの音源計測範囲とし、前記各マイクペアが捉えた音の到達時間差の符号によって音源の存在可能範囲を限定し、それぞれの前記マイクの位置、前記マイクペアごとの音の到達時間差及び前記カメラの位置の情報に基づき、前記カメラに対する音源の相対的な位置を計算する音源位置計算手段と、前記音源位置計算手段が算出した音源の位置情報に基づき、音源の方向へカメラの向きを制御するカメラ制御手段と、を有することを特徴とするカメラ撮影制御装置が提供される。
【００１６】
このようなカメラ撮影装置によれば、ある音源から音が発せられると、各マイクが、その音源からの距離に応じた時刻にその音を捉える。マイクが音を捉えた時刻が音信号検出手段により音検出時刻とされ、その音検出時刻に基づいて、時間差測定手段によりマイクペアごとの音の到達時間差が測定され、音源位置計算手段により、各マイクペアが捉えた音の到達時間差の符号によって音源の存在可能範囲が限定され、マイクペアごとの音の到達時間差に基づいて音源位置が計算される。そして、カメラ制御手段により、音源の方向へカメラの向きが制御される。これにより、常に話者に対してカメラを向けておくことができる。
【００２１】
【発明の実施の形態】
以下、本発明の実施の形態を図面を参照して説明する。
図１は、本発明のカメラ撮影制御装置の原理構成図である。まず、所定の形状の三角形の３つの頂点にマイク１〜３が配置されている。音信号検出手段４は、１つの音源の発した音がマイク１〜３のそれぞれで捉えられた時刻を音検出時刻として検出するとともに、マイク１〜３の捉えた音声信号を映像・音声記録再生手段１２に送る。時間差測定手段５は、マイクごとの音検出時刻に基づいて、マイク１〜３の２個ずつを組合わせることにより得られる３組のマイクペアごとの音の到達時間差を測定する。音源位置計算手段６は、それぞれのマイク１〜３の位置とマイクペアごとの音の到達時間差とから、音源位置を計算する。計算された音源位置の情報は、カメラ制御手段７と発話情報記録手段１１とに渡される。
【００２２】
カメラ制御手段７は、カメラ本体９が載せられてた回転台を回転させるカメラ駆動手段８と、カメラ本体９のズーム量を調整するための電動レンズ１０とを制御している。そして、カメラ制御手段７は、音源位置情報に基づき、カメラが音源を所定の大きさで捉えるようにカメラの向きとズーム量を制御する。カメラ本体９で撮影された映像信号は、映像・音声記録再生手段１２に送られる。
【００２３】
発話情報記録手段１１は、マイクが音を捉えた際の音源位置情報と、発話時刻とを記録する。映像・音声記録再生手段１２は、カメラ本体９から送られた映像情報を記録再生する。再生指示手段１３は、発話情報記録手段１１に格納された情報を用いて、映像・音声記録再生手段１２に対して再生指示を与える。
【００２４】
このようなカメラ撮影制御装置によれば、ある場所で発言をした者がいると、その声は３本のマイク１〜３で捉えられる。すると、音信号検出手段４により、各マイク１〜３が音を捉えた時刻が検出され、音検出時刻とされる。この際、話者のいる位置から各マイク１〜３までの距離の違いにより、各マイク１〜３が音を捉える時刻には時間差が生じている。そこで、時間差測定手段５によりマイク１〜３の２個ずつを組合わせることにより３組のマイクペアが作られ、そのマイクぺアごとの音の到達時間差が測定される。そして、各マイクペアの音の到達時間差を用いて、音源位置計算手段６によって音源位置が計算される。この際、カメラ本体９の位置を基準とした座標系における音源位置が算出される。
【００２５】
音源位置が求まると、カメラ制御手段７によりカメラ本体９から見た音源の方向と距離とが計算され、カメラ駆動手段８を制御することによりカメラ本体９が音源の方向に向けられるるとともに、電動レンズ１０を制御することによりズーム量が調節される。
【００２６】
一方、音信号検出手段４が検出した音声情報と、カメラ本体９が撮影した映像情報とが映像・音声記録再生手段１２に記録されるとともに、音源ごとの発話時刻が発話情報記録手段１１に格納される。そして、この装置のオペレータは、発話情報記録手段１１に格納された情報により任意の話者の発話時刻を知ることができ、特定の話者の発言のみを映像・音声記録再生手段１２から抽出して再生することもできる。
【００２７】
このようにして、３本のマイクのみで音源位置を測定できるため、マイクとカメラとを離しておいてもカメラを音源に向けることができるとともに、カメラから音源までの距離に応じてズーム量を調整することもできる。さらに、音源位置と発話時刻とを対応づけて記録しておくことができるため、各話者の発話時刻と映像・音声記録再生手段１２に記録された情報とを対応づけることにより、映像のインデックスを作成することができる。
【００２８】
ところで、本発明のカメラ撮影制御装置は、テレビ会議システムや重要な会議のビデオ撮影として有効に利用できる。そこで、本発明のカメラ撮影制御装置を用いて会議の内容を撮影する場合を例にとり実施の形態を説明する。
【００２９】
図２は、第１の実施の形態に係るカメラ撮影制御装置の配置図である。この例では、司会者２１と向き合って、複数の参加者２２ａ〜２２ｋがコの字形のテーブル２３に着いている。会議室の中央には、正三角形の頂点に配置された３本のマイク１〜３が置かれている。これらのマイク１〜３は、カメラ撮影制御装置の装置本体１００に接続されている。装置本体１００には、カメラ２００が接続されている。
【００３０】
発言者の声をマイク１〜３が捉えると、装置本体１００によって音源位置が計算され、カメラが発言者の方向に向けられるとともに、カメラから発言者までの距離に応じてズーム・インやズーム・アウトが行われる。
【００３１】
なお、３本のマイク１〜３の作る正三角形の辺の長さ、すなわちマイク間の距離は任意に設定可能であるが、ここでは会議の場で設置しやすく、常温での音速値３４０ｍ／ｓから計算しやすい値である６８ｃｍとしている。
【００３２】
図３は、第１の実施の形態に係るカメラ撮影制御装置の内部構成図である。音声入力手段である正三角形の頂点に配置された３本のマイク１〜３は音信号検出器１１０に接続されている。音信号検出器１１０は、マイク１〜３の捉えた音を音声信号として出力するとともに、各マイク１〜３が音を捉えた時刻に応じた信号を出力する。その信号は、時間差測定器１２０に入力される。時間差測定器１２０は、各マイク１〜３が音を捉えた時刻の時間差を計算して、時間差データを出力する。その時間差データは音源位置計算器１３０に入力される。音源位置計算器１３０は、時間差データから音源位置を計算して出力する。
【００３３】
基準信号発生器１０１は、同一音源からの音を各マイクが捉えた時刻の時間差を計測するため基準クロックを発生し、その基準クロックを時間差測定器１２０と音源位置計算器１３０とに供給する。ここでは周期０．５μｓの方形波を生成している。この周期によって音の時間差計測の分解能が決まる。この実施の形態ではマイク間の距離が６８ｃｍであるため、マイク間の最大の時間差は６８ｃｍを音速（３４０ｍ／ｓ）で割った値、すなわち２ｍｓとなる。０．５μｓという値はこの４０００分の１の値であるため、会議における話者位置計算には十分な分解能である。
【００３４】
カメラ制御装置１４０は、音源位置計算器１３０から音源位置データを受け取り、カメラ本体２０３の方向およびズーム量を制御する。カメラ本体２０３は回転台２０１の上に固定されており、回転台２０１を回転させることによりカメラ本体２０３の方向を変えることができる。また、カメラ本体２０３には、撮影範囲を変える電動レンズ２０２を備えており、この電動レンズ２０２の調整によりズーム量が調整できる。カメラ本体２０３の撮影した画像は、映像出力１０３として出力される。
【００３５】
このような構成のカメラ撮影制御装置によれば、図２の司会者２１若しくは参加者２２ａ〜２２ｋが発言すると、その音声は３つのマイク１〜３で捉えられ音信号検出器１１０に入力される。以下に、音信号検出器１１０の詳細を説明する。
【００３６】
図４は、音信号検出器１１０の内部構成を示す図である。音信号検出器１１０は、基本的に、アンプ１１１ａ〜１１１ｃ、信号・電力変換器１１２ａ〜１１２ｃ、レベル判定器１１３ａ〜１１３ｃ、及び微分器１１４ａ〜１１４ｃの４つの要素を１組とした、３つの組から構成されている。
【００３７】
各マイク１〜３からの信号１０３ａ〜１０３ｃは、信号を増幅するアンプ１１１ａ〜１１１ｃに入力される。アンプ１１１ａ〜１１１ｃの出力は、それぞれに対応して設けられた信号・電力変換器１１２ａ〜１１２ｃに入力されるとともに、ミキサー１１５にも入力される。ミキサー１１５は、アンプによって増幅された音声信号をステレオ信号に変換し、音声出力１０２として出力する。
【００３８】
各信号・電力変換器１１２ａ〜１１２ｃは、アンプ１１１ａ〜１１１ｃの出力信号から音信号のパワーを求め、パワー信号をそれぞれに対応して設けられたレベル判定器１１３ａ〜１１３ｃに入力する。各レベル判定器１１３ａ〜１１３ｃは、パワー信号を特定のしきい値によって２値化したのちＴＴＬ（Transistor-Transistor-Logic ）レベルの信号を、それぞれに対応して設けられた微分器１１４ａ〜１１４ｃに入力する。各微分器１１４ａ〜１１４ｃは、入力された信号の出力の立ち上がりを検出し、立ち上がりを示すトリガ信号１０４ａ〜１０４ｃを出力する。
【００３９】
なお、微分器１１４ａ〜１１４ｃの出力の前には、ダイオードの働きにより、正の微分結果のみを出力するようにしてある。また、アンプ１１１ａ〜１１１ｃの増幅度とレベル判定器１１３ａ〜１１３ｃのしきい値は、各組ごとにボリュームによって任意に設定することができる。
【００４０】
このような音信号検出器１１０における信号の変化を、マイク１からの信号１０３ａを例にとり以下に説明する。
図５は、音信号検出器１１０における信号の変化を示す図である。なお、この図では全て横軸が時間であり、縦軸が電圧である。
【００４１】
（Ａ）はマイク１から入力された信号の波形を示す図である。マイク１からは比較的弱い波形の信号１０３ａが入力されている。この信号１０３ａがアンプ１１１ａで増幅される。
【００４２】
（Ｂ）はアンプ１１１ａで増幅後の信号の波形を示す図である。アンプ１１１ａにより増幅されることにより、（Ａ）に示した波形に比べ振幅の大きい波形となっている。この信号は、信号・電圧変換器１１２ａにより、パワー信号に変換される。
【００４３】
（Ｃ）は、パワー信号の波形を示す図である。パワー信号は、正の値だけの信号である。この信号は、レベル判別器１１３ａでしきい値１１６によってレベル判定がなされる。
【００４４】
（Ｄ）は、レベル判定後の信号の波形を示す図である。レベル判定により２値化された信号が得られる。この信号が微分器１１４ａを通過することにより、トリガ状の信号となる。
【００４５】
（Ｅ）は、微分器１１４ａから出力される信号の波形を示す図である。微分器１１４ａから出力されるトリガ信号状の波形は、微分器１１４ａに入力された信号の立ち上がり時を示している。
【００４６】
このようにして、マイク１が捉えた音から、その音を捉えた時点を示すトリガ信号状の波形が得られる。この処理は、他の２つのマイク２，３の捉えた音に対しても同様に行われ、それぞれのマイク２，３に対応するトリガ信号１０４ｂ〜１０４ｃが音信号検出器１１０の出力信号となる。
【００４７】
各音信号検出器１１０から出力された３つのトリガ信号１０４ａ〜１０４ｃは、基準信号発生器１０１からのクロック信号とともに時間差測定器１２０に入力される。
【００４８】
図６は、時間差測定器１２０の内部構成を示す図である。時間差測定器１２０は、３つのトリガ信号変換器１２１ａ〜１２１ｃと、３つの遅延器１２２ａ〜１２２ｃとから構成される。トリガ信号変換器１２１ａ〜１２１ｃは、入力されたトリガ信号１０４ａ〜１０４ｃを基準信号発生器１０１が生成する方形波のＨ（ハイ）レベルと同じ時間幅の方形波に変換する。また、各遅延器１２２ａ〜１２２ｃは、２つの方形波の時間差を測定する。
【００４９】
このような構成の時間差測定器１２０によれば、入力されたトリガ信号１０４ａ〜１０４ｃは、トリガ信号変換器１２１ａ〜１２１ｃにより、基準信号である方形波のＨレベルと同じ時間幅の方形波に変換される。
【００５０】
図７は、トリガ信号から得られる方形波を示す図である。（Ａ）はトリガ信号を表しており、（Ｂ）は方形波を表している。すなわち、ある時刻から１６０．８μｓ後にトリガ信号１０４ａがあったとすると、そのトリガ信号１０４ａはつぎの基準クロックがＨレベルとなっている１６１．０μｓから１６１．２５μｓの間がＨレベルであるような方形波１２０ａに変換される。
【００５１】
基準クロックの周期に量子化された信号は、次に２つずつの信号をペアにして、遅延器１２２ａ〜１２２ｃに入力される。すなわち、マイク１とマイク２とによる信号のペア、マイク２とマイク３とによる信号のペア、そしてマイク３とマイク１とによる信号のペアの３つの組が遅延器１２２ａ〜１２２ｃに入力される。
【００５２】
図８は、遅延器１２２ａの内部構成を示す図である。遅延器１２２ａは、実際にディレイ回路によって構成することも可能であるが、この例では信号がクロックの周期に合わせて量子化されていることを利用して、順序判定器１２２ａａ、カウンタ１２２ａｂおよびデータ変換器１２２ａｃによって構成している。
【００５３】
順序判定器１２２ａａはどちらの入力が先にあったかを判定するもので、マイク１による信号１２３ａ、マイク２による信号１２３ｂ、およびクリア信号１２２ａｄの３つの入力を持つ。そして、順序判定器１２２ａａは、順序を判定するための出力である順序判定ビット１２２ａｅ，１２２ａｆをデータ変換器１２２ａｃに対して出力するとともに、時間差計測のための信号１２２ａｇをカウンタ１２２ａｂへ出力する。
【００５４】
カウンタ１２２ａｂには、順序判定器１２２ａａの出力１２２ａｇ以外に、基準信号発生器１０１からのクロック信号１２２ａｈが入力されている。カウンタ１２２ａｂは、順序判定器１２２ａａの出力した信号１２２ａｇがＨレベルである間だけ、入力されているクロック信号１２２ａｈのＨレベルの回数を計数するものであり、計数結果である計数出力１２２ａｉをデータ変換器１２２ａｃに対して出力する。また、カウンタ１２２ａｂは、順序判定器１２２ａａに対するクリア信号１２２ａｄも出力する。
【００５５】
データ変換器１２２ａｃは、順序判定器１２２ａａの２つの順序判定ビット１２２ａｅ，１２２ａｆと、２バイトの符号無し２進数のカウンタ１２２ａｂの計数出力１２２ａｉを、２バイトの符号付き２進数に変換するものである。そして、変換結果を出力データ１０５ａとするとともにデータ変換終了信号１０６ａを１パルスのＨレベルとして出力する。
【００５６】
このような構成の遅延器１２２ａは、次のように動作する。
２つの信号１２３ａ，１２３ｂが入力されると、順序判定器１２２ａａにより、どちらの信号の入力が早かったかが判定される。信号１２３ａの入力のほうが早くＨレベルになれば順序判定ビット１２２ａｅがＨレベルに保たれ、逆に信号１２３ｂの入力のほうが早くＨレベルになれば順序判定ビット１２２ａｆがＨレベルに保たれる。そして、カウンタ１２２ａｂへ出力される信号１２２ａｇは、２つの信号１２３ａ，１２３ｂ入力のどちらかがＨレベルになった時点から他方の信号がＨレベルになるまでＨレベルが保たれる。これらの順序判定器１２２ａａの状態は、クリア信号１２２ａｄがＨレベルになることによってクリアされる。したがって、順序判定ビット１２２ａｅおよび順序判定ビット１２２ａｆの出力がこの順序で（１，０）ならば信号１２３ａの入力が早く、また（０，１）ならば信号１２３ｂの入力が早いことが判定できる。
【００５７】
カウンタ１２２ａｂは、信号１２２ａｇがＨレベルになると、入力されているクロック信号１２２ａｈのＨレベルの回数の計数を開始する。信号１２２ａｇがＬ（ロー）レベルになると、カウンタ１２２ａｂは計数を停止し、計数した情報を１６ビットの符号なし２進数として計数出力１２２ａｉとする。したがって、クロックの周期０．５μｓにこの計数結果を乗じた値が計測された時間差となるが、ここではカウンタ１２２ａｂの計数した値をそのまま用いる。カウンタ１２２ａｂは、計数出力１２２ａｉが出力されると同時にクリア信号１２２ａｄにも１パルスのＨレベルの信号を送り出す。このクリア信号１２２ａｄによって、順序判定器１２２ａａの状態がクリアされ、次の入力に備える。
【００５８】
２つの順序判定ビット１２２ａｅ，１２２ａｆの出力と２バイトの符号無し２進数のカウンタ１２２ａｂの計数出力１２２ａｉとは、データ変換器１２２ａｃにより２バイトの符号付き２進数に変換され、出力データ１０５ａとなる。データ変換が終了すると、データ変換終了信号１０６ａを１パルスのＨレベルとして出力する。出力データ１０５ａとデータ変換終了信号１０６ａとは、次の音源位置計算器１３０に送られる。なお、３つの遅延器１２２ａ〜１２２ｃのそれぞれのデータ変換器は、マイクペアの順序を（１、２）、（２、３）および（３、１）とし、それぞれ第１列のマイクの音のほうが先であると、符号をプラスにし、逆の場合は符号をマイナスにする。
【００５９】
ここで、音源位置計算器１３０の機能の詳細を説明する前に、音源位置の計算方法について述べる。音源位置計算の基本は、２つのマイクにおける音源方向の計算にある。
【００６０】
図９は、音源方向の計算のためのマイクの座標系を示す図である。図に示すように、２つのマイク１，２を通る直線をｕ軸とし、２つのマイク１，２の中点を原点としたｕ−ｖ座標系を考える。そして原点から−ａだけ離れた位置（−ａ，０）にマイク１が、またａだけ離れた位置（ａ，０）にマイク２が置かれており、音源Ｐの位置が原点から（ｒ，Θ）の位置にあるとすると、音源Ｐからの音がそれぞれのマイクに到達するときの時間差ｄｔは、
【００６１】
【数１】
ｄｔ＝〔（ｒ^２＋２ｒａｃｏｓ Θ＋ａ^２）^１／２−（ｒ^２−２ｒａｃｏｓ Θ＋ａ^２）^１／２〕／Ｖ_０・・・（１）
となる。ただし、Ｖ_０は音速の値（常温で３４０ｍ／ｓ）である。これをｒをパラメータとしてプロットしてみる。
【００６２】
図１０は、マイク間の時間差と音源方向との関係式を示すグラフである。ただし、ここではａ＝１ｍで計算している。このように、ｄｔに対するｒの効果はΘに比べて十分小さく、ｒの値を変化させたとしても角度誤差は５度程度であることと、いずれのｒをとってもΘが約２５度より小さくなる領域ではそれまでの領域に比べてｄｔに対する角度感度が高くなっていることが分かる。このことから、マイクペアへの音の到達時間差を知ることによって、その音の音源の方向、すなわちｕ−ｖ座標の原点と音源とを結ぶ直線を求めることができる。別のマイクペアによってもうひとつの直線を求めれば、それらの直線の交点から音源位置が求められる。
【００６３】
本実施の形態では、マイクペアの組み合わせによって音源位置の計算を行うため、Ｘ軸から±６０度の範囲を各マイクペアの計測範囲として音源位置計算を行う。
【００６４】
図１１は、１組のマイクペアの計測範囲を示す図である。２本のマイク１，２がｕ軸上に原点を挟んで配置されているとすれば、このマイクペアの計測範囲は、原点から見て、ｕ軸を中心に正の方向に±６０°、負の方向に±６０°である。このように計測範囲を定め、３組のマイクペアのうち、音源位置が計測範囲内にある２つのマイクペアを使って音源位置の計算を行う。３本のマイクは正三角形の頂点に配置されているため、任意の音源位置に対して必ず２つのマイクペアがこの範囲に収まる。したがって、音源位置計算に利用するデータは、３つの時間差計数データから最も時間差の小さいデータを除外したものとなる。
【００６５】
そこで、音源位置の計算には、まず、それぞれのマイクペアにおいて個別に図１１に示したような座標系をとって音源方向を求める。音源方向は、計測された時間差データを角度−時間差情報テーブルと比較することによって求める。
【００６６】
図１２は角度−時間差情報テーブルを示す図である。この例ではマイク間の距離が６８ｃｍであるので、図のテーブルではｒの値を３ｍに設定して、上の式でΘが０度から６０度まで５度ごと２つのマイクの時間差情報を入力してある。テーブルの第１列目は、上から順に６０度から５度ごとに０度までの角度が与えられており、第２列目にはその角度での２つのマイクの時間差データが格納されている。時間差データは、上の式で計算した時間差を基準クロックの周期である０．５μｓで割った値の２バイトの符号付き整数表現にしてある。これによって、時間差測定器１２０で求めた時間差データとの比較が直接行える。時間差データから方向を知るには、このテーブルの第２列目を参照し、時間差データの絶対値がテーブル内の値を超えない最大のセルを見出せば、そのテーブルの第１列から角度データを得ることができる。このとき、図１１のｖ軸の両側で線対称になっているが、時間差データの値がマイナスならばマイク１側に発話位置があり、またプラスならばマイク２側に発話位置があることになる。
【００６７】
また、ここで求められた角度データは、２つのマイクを結ぶ軸（図１１のｕ軸）を対称軸としてその軸から両側の２つの直線が解として成り立つが、この対称性の問題は、時間差データとマイクペアの組み合わせ情報とによって解決することができる。
【００６８】
図１３は、２つの時間差情報から音源位置を求めるための方法を説明する図である。この例では、マイク３，１のマイクペアとマイク２，３のマイクペアにより音源位置を求める。なお、各マイク１〜３を頂点とする正三角形の重心（各頂点と対辺の中点とを結ぶ線分の交点）が原点となるようにｘ−ｙ座標が定義されている。マイク３，１のマイクペアからは２本の直線３１ａ，３１ｂが求められ、マイク２，３のマイクペアからも２本の直線３２ａ，３２ｂが求められる。また、音源Ｐからの音はマイク３に達するよりも速くマイク１およびマイク２に達するため、マイク２とマイク３とのペアからはプラスの角度情報（マイク２側に音源があることを示す）が得られ、またマイク３とマイク１のペアからはマイナスの角度情報（マイク１側に音源があることを示す）が得られる。しかも、図１１に示したように、各マイクペアの計測範囲は制限されている。したがって、時間差情報の組み合わせにより、音源の範囲が限定できる。図１３の例における音源位置は、第１象限のｘ軸からの角度が０〜６０度の範囲に限定される。すなわち、直線３１ｂおよび直線３２ｂの可能性はないことになり、結局直線３１ａおよび直線３２ａの２つが残ることになる。
【００６９】
このように、マイクペアの組み合わせと各マイクペアにおける時間差データの符号によって音源位置の範囲が特定される。その全部の組み合わせを以下に示す。
【００７０】
図１４は、マイクペアの組み合わせと時間差データの符号によって決まる音源の範囲を示す図である。図中での１、２、３はそれぞれマイク１，２，３の位置を示している。また、マイク１，２のマイクペアの時間差データを（１，２）、マイク２，３のマイクペアの時間差データを（２，３）、マイク３，１のマイクペアの時間差データを（３，１）としている。したがって、マイクペアの時間差データの符号と音源の範囲との関係は以下のようになる。なお、以下の説明における角度は、正のｘ軸からの角度（反時計回り）で表す。
【００７１】
（２，３）＞０、（３，１）＜０であれば、音源は正のｘ軸からの角度が０〜６０度の範囲内に限定される。
（１，２）＞０、（３，１）＜０であれば、音源は正のｘ軸からの角度が６０〜１２０度の範囲内に限定される。
【００７２】
（１，２）＞０、（２，３）＜０であれば、音源は正のｘ軸からの角度が１２０〜１８０度の範囲内に限定される。
（２，３）＜０、（３，１）＞０であれば、音源は正のｘ軸からの角度が１８０〜２４０度の範囲内に限定される。
【００７３】
（１，２）＜０、（３，１）＞０であれば、音源は正のｘ軸からの角度が２４０〜３００度の範囲内に限定される。
（１，２）＜０、（２，３）＞０であれば、音源は正のｘ軸からの角度が３００〜３６０度の範囲内に限定される。
【００７４】
なお、この図に示したマイクペアの時間差データの符号と音源の範囲との関係は、音源位置領域テーブルとして音源位置計算器１３０内に予め保持しておく。以上のようにして特定された２つの直線を図１１のｕ−ｖ座標系から図１３のようなｘ−ｙ座標系に座標変換すれば、それらの交点から音源位置Ｐ（ｘ，ｙ）が求められる。図１３の例では、マイク３，１のペアの場合は点（−ａ／２，ａ／２√３）を通り、ｘ軸と「６０度＋上記の方法で求めたｕ−ｖ座標系での角度」の角度をなす直線３１ａを、またマイク２，３のペアの場合は点（０，−ａ／２√３）を通りｘ軸と「上記の方法で求めたｕ−ｖ座標系での角度」の角度をなす直線３２ａを得て、それら直線の交点を求めればよい。
【００７５】
位置の計算は、時間差測定器１２０で得られた３つの符号付き２進数情報を用いて、音源位置計算器１３０で行われる。音源位置計算器１３０はコンピュータ上に構築されている。
【００７６】
図１５は、音源位置測定器の内部構成を示す図である。図に示すように、音源位置測定器１３０は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１３３を中心に構成されている。
【００７７】
３つのポート１３１ａ〜１３１ｃは、時間差測定器１２０で計測されたデータを取り込みＣＰＵ１３３に供給する。ＡＮＤゲート１３２は、データをＣＰＵ１３３に取り込むタイミングを与える。ＣＰＵ１３３は、各ポート１３１ａ〜１３１ｃからのデータを蓄えるためのレジスタ１３３ａ〜１３３ｃを有している。また、ＣＰＵ１３３は、バス１３４を介して各種周辺機器との間のデータ転送を行う。
【００７８】
バス１３４には、入力装置１３５、出力ポート１３６、メモリ１３７、およびハードディスク１３８が接続されている。入力装置１３５は、カメラとマイクとの位置関係を入力するためのものである。出力ポート１３６は、計算結果をカメラ制御装置１４０へ出力するための通信ポートである。メモリ１３７は、ＣＰＵ１３３が実行するプログラムを一時的に保持するためのものである。ハードディスク１３８は、プログラム、角度−時間差情報テーブル、音源位置領域テーブルおよび座標変換テーブルを保持している。なお、音源位置計算器１３０で行われる音源位置の計算処理は、ハードディスク１３８に格納されたプログラムをメモリ１３７にロードし、そのプログラムをＣＰＵ１３３が実行することにより実施される処理である。
【００７９】
このような音源位置計算器１３０に入力される時間差測定器１２０からの３つのデータ出力１０５ａ〜１０５ｃは、まず３つのポート１３１ａ〜１３１ｃに伝えられ、時間差測定器１２０からの出力１０６ａ〜１０６ｃがすべてＨレベルになってＡＮＤゲート１３２の出力がＨレベルになった時点で、ＣＰＵ１３３の中の３つのレジスタ１３３ａ〜１３３ｃに取り込まれる。３つのレジスタ１３３ａ〜１３３ｃに取り込まれた３つの時間差データの内、最も時間差の少ないデータを除いた２つの時間差データが、あらかじめメモリ上に確保されている変数領域ｔ１およびｔ２に書き込まれる。そして、この２つの時間差データに基づいて、前述の音源位置の計算が行われる。これにより、図１３のｘ−ｙ座標系での音源位置が求められる。その後、カメラ位置を原点にとったＸ−Ｙ座標系へ座標変換し、Ｘ−Ｙ座標系での音源位置Ｐ（Ｘ，Ｙ）を求める。
図１６は、マイクの座標系とカメラの座標系の関係を示す図である。マイク１〜３の中心（三角形の重心）がｘ−ｙ座標の原点であり、カメラ２００（図２に示す）の回転中心が、Ｘ−Ｙ座標の原点である。
【００８０】
実際には、マイク２およびマイク３を結ぶ線と平行なｘ軸と回転台２０１の０度の方向が平行となるようにマイク１〜３とカメラ２００とが設置される。したがってこの際の座標変換は平行移動のみである。平行移動量のデータは、装置を設置した時点で測定し、入力装置１３５によって音源位置計算器１３０に入力する。音源位置計算器１３０は、このデータを用いて座標変換を行う。座標変換は平行移動だけでなく、両方の座標系がある角度をなしていた場合でも、この回転による座標変換を行うだけでよい。ただし、その際にはｘ−ｙ座標系とＸ−Ｙ座標系のなす角度を測定しておく必要がある。このようにして求められた音源位置Ｐ（Ｘ，Ｙ）は、２バイトの浮動小数点データとして出力ポート１３６から送り出され、カメラ制御装置１４０に入力される。
【００８１】
図１７は、カメラ制御装置１４０の内部構成を示す図である。この図に示すように、カメラ制御装置１４０は、角度計算部１４１、ズーム量計算部１４２、データ出力部１４３、キャリブレーション部１４４、ボリューム１４５、及びキャリブレーション設定ボタン１４６から構成される。
【００８２】
角度計算部１４１には、音源位置データＰ（Ｘ，Ｙ）と回転台２０１の動作を知らせる信号２０４とが入力されている。角度計算部１４１は、信号２０４がＬレベルのときに、音源位置データＰ（Ｘ，Ｙ）を極座標系のデータ（Ｒ，Θ）に変換する。Ｒはカメラから音源位置までの距離、Θは音源位置ＰとＸ−Ｙ座標系の原点とを結んだ線がＸ軸となす角度である。さらに、Θの値はロータリーエンコーダのデータフォーマットに変換され、データ出力部１４３に送られる。ここでは、角度Θを１．４で割った値の整数値を８ビットの符号無し２進数に変換する。この１．４という値は、３６０度を８ビットのデータの最大値２５６で割ったものである。角度計算部１４１で計算された音源までの距離Ｒは、ズーム量計算部１４２とキャリブレーション部１４４に送られる。キャリブレーション部１４４は、カメラと人物との距離が変化すると、それに合わせてカメラのズーム量も変化するため、基準距離と基準ズーム量の値という２つの基準情報をズーム量計算部１４２に設定するために設けている。キャリブレーション部１４４にはズーム量を手動で設定できるボリューム１４５およびキャリブレーション設定ボタン１４６が接続されている。キャリブレーション部１４４は、角度計算部１４１が出力する距離データを用い、その時点でのズーム量をボリューム１４５によって設定する。ボリューム１４５を回すとカメラ２００の電動レンズが駆動され、ズーム量が変化するようになっている。ボリューム１４５を操作してちょうどよいズーム量となった時点でキャリブレーション設定ボタン１４６を押すと、その時点での距離データＲｏとズーム量データＺｏが基準情報としてズーム量計算部１４２に出力される。これらのデータを受け取ったズーム量計算部１４２は、それらの値をメモリに保持し、それ以後のズーム量の計算に用いる。ただし、本装置のように撮影対象が人物であるような場合には撮影対象の大きさが決まっているとみて差し支えないため、あらかじめ上の２つの基準情報のデフォルト値を設定してあってもよい。
【００８３】
さて、角度計算部１４１で計算された音源までの距離Ｒはズーム量計算部１４２に送られ、ズーム量が計算される。ズーム量の計算は、距離Ｒを基準距離Ｒｏで割り、その値に基準ズーム量Ｚｏを乗じることによって行う。計算された角度データとズーム量は、データ出力部１４３を経てそれぞれ回転台２０１および電動レンズ２０２に送られ、カメラ撮影装置が駆動される。このとき、回転台２０１の動作が終了しない間に次の撮影位置情報が出力されないよう、回転台２０１の動作を知らせる信号２０４が回転台２０１からカメラ制御装置１４０に送り出される。この信号はＨレベルならば回転台２０１が動作中であることを示し、その間はカメラ制御装置１４０における計算は行われない。具体的には、この信号は角度計算部１４１に送られ、この信号がＨレベルの間は音源位置計算器１３０から送られてくるデータを内部に取り込まない。
【００８４】
このようにしてカメラ撮影が行われた会議は、映像出力１０３および音声出力１０２を通して記録したり遠隔地に電送することができる。この際、各会議場においては参加者全員のためにマイクを用意する必要がないとともに、３本のマイクとカメラとが同じ場所にある必要もない。したがって、マイクは全員の会話を捉えやすい位置に配置し、カメラは全員の表情を捉えやすい位置に配置できる。
【００８５】
なお、上記の説明では、２本のマイクが捉えた音の到達時間差と音源方向との関係が、音源までの距離に依存しないことを利用して音源位置の計算を行っているが、３つのマイクが捉えた音の時間差から幾何学的に音源位置を計算することもできる。以下にその計算方法について説明する。
【００８６】
まず、簡単のために、ｘ−ｙ座標系の（０，ａ）、（０，０）、（ａ，０）の３個所に３本のマイク１〜３が設置されているものとする。音源Ｐ（ｘ，ｙ）から発せられた音がそれぞれのマイク１〜３に達した時間を、それぞれｔ１、ｔ２、ｔ３とする。音源Ｐと各マイクとの距離をそれぞれｌ１，ｌ２
，ｌ３とすると、
【００８７】
【数２】

である。各マイク間の時間差すなわちｔ１−ｔ２をｄｔ１、ｔ２−ｔ３をｄｔ２、ｔ３−ｔ１をｄｔ３とし、音速の値をｖｓとすると、各マイク間の音の到達時間差は音源から各マイクまでの距離によって次のように与えられる。
【００８８】
【数３】

これらの、式（２）、式（３）から
【００８９】
【数４】

となり、この連立方程式を解けば音源位置Ｐ（ｘ，ｙ）を求めることができる。このような音源位置の求め方は、三角測量の原理と同様のものであって、広く知られている。この計算方法を用いるか、あるいは前述の第１の実施の形態で用いた方法で計算するのかは、音源位置計算器の性能や、要求される音の時間差計測の分解能によって判断する。例えば、高い分解能が要求される場合には、簡単な計算で高速に演算できる第１の実施の形態で説明した方法を用い、分解能が低くてよい場合には、角度−時間差情報テーブルなどの事前の準備が省ける幾何学的な音源位置計算方法を用いることができる。
【００９０】
また、上記の第１の実施の形態では、マイクの床からの高さとカメラの床からの高さが同じ高さであるようなシステムであったが、カメラの位置を高くすることも可能である。この場合、マイクの床からの高さからカメラの床からの高さを引いた値をＺ軸方向へのオフセット量ｄとして、カメラ制御装置内の計算において鉛直方向のカメラの向きを計算すればよい。ただし、上の座標系においては、鉛直下向きをＺ軸の正の方向としている。すなわち、音源位置計算器１３０で求めてカメラ位置のオフセットを加えた音源位置Ｐ（Ｘ，Ｙ）をＰ（Ｘ，Ｙ，ｄ）とし、床と垂直な方向の角度Φを
【００９１】
【数５】
ｄ／（Ｘ２＋Ｙ２）^１／２＝ｔａｎ Φ・・・（５）
によって求めればよい。Ｚ軸方向にも回転可能な回転機構を実施例１の回転台に付加すれば、Ｚ軸方向にオフセットがあった場合でも正しく話者を捉えることが可能となる。この回転機構は、第１の実施の形態と全く同様の構成で実現することができる。
【００９２】
さらに、第１の実施の形態においては２次元平面内での音源位置計算を行っていたが、マイクを正三角錐の４つの頂点に配置することによって、３次元空間内の音源位置を知ることが可能である。正三角錐では４つの正三角形があり、それぞれの三角形で実施例１のような計算を行って各三角形の平面上での音源方向を示す三角形面に垂直な面を割り出すことができる。各正三角形ごとに音源方向を示す平面を見出せば、あとは各平面の交点を求めることで３次元空間内の音源位置を知ることができる。
【００９３】
ところで、上記の第１の実施の形態では算出された音源位置情報をカメラの制御に利用する場合のみを説明したが、音源位置情報を映像信号のインデックス情報として利用することもできる。そのようなインデックス機能を有する音源位置記録装置を第２の実施の形態として以下に説明する。
【００９４】
図１８は、音源位置記録装置の内部構成図である。この音源位置記録装置は、３本のマイク１〜３、音信号検出器１１０、時間差測定器１２０、音源位置計算器１３０及び基準信号発生器１０１の構成は、図３に示した第１の実施の形態の構成と同じであるため、同一の番号を付して説明を省略する。また、この図には示していないが、音源位置計算器１３０には図３に示したものと同様のカメラ制御装置１４０が接続されており、そのカメラ制御装置１４０はカメラ本体２０３が設置された回転台２０１と電動レンズ２０２とを制御している。
【００９５】
この実施の形態では、音源位置計算器１３０の出力は、カメラ制御装置１４０とともにコンピュータ１５１にも入力されている。コンピュータ１５１には、入力装置１５２、表示装置１５３、記憶装置１５４及び制御装置１５５が接続されていている。制御装置１５５には、さらに記録再生装置１５６が接続されている。
【００９６】
コンピュータ１５１は、音源位置テーブルと時刻テーブルとを有しており、音源位置計算器１３０が出力した音源位置情報を音源位置テーブルに、またその情報を受け取った時刻を時刻テーブルに登録する。なお、コンピュータ１５１が実行する処理内容、及び音源位置テーブルと時刻テーブルとの詳細は後述する。
【００９７】
入力装置１５２は、音源位置計算終了の合図やコメントデータなどの入力を行うためのものである。表示装置１５３は、音源位置データ、時刻データおよびコメントデータを表示するためのものである。記憶装置１５４は、音源位置データと時刻データおよびコメントデータを記憶するためのものであり、ハードディスク装置等を使用する。
【００９８】
制御装置１５５は、記録再生装置１５６を制御するためものもである。記録再生装置１５６には音源位置記録と同時に、第１の実施の形態によって記録された映像記録が収められる。この第２の実施の形態では、ハードディスク上にディジタル記録された映像記録を用いている。ハードディスクは通常のテープ記録と異なり、ランダムアクセスが素早く行えるため、映像の検索作業を短時間に済ませることができる。利用者は、すでに記憶装置１５４に保存されている音源位置データと時刻データおよびコメントデータを表示装置１５３上に表示し、ここで再生の指示を作成する。例えば、司会者の発言部分だけを通して再生すれば、会議の議題リストを作成するといった作業が効率よく行える。制御装置１５５は、コンピュータが作成した再生時刻リストを元に再生装置の早送りや再生の指示を行う。再生が開始されると、入力装置から早送りの指示があるまで再生を続ける。
【００９９】
図１９は、音源位置テーブルの例を示す図である。音源位置テーブル４１は２列３６行のマトリックスであり、第１列目および第２列目にはそれぞれｘ座標値及びｙ座標値が格納される。このマトリックスには初期値として「ｎｕｌ」が入力されている。なお、３６行という行数は、角度分解能が５度程度であることと、会場における参加者の数として高々この程度を想定していることによる。
【０１００】
そして、マトリックスに格納されていない新しい音源位置データが音源位置計算器１３０から送られてくると、その座標値を次の行に追加して格納していく。このとき、すでに登録済みのデータと比較して角度誤差が５度以内で、かつ距離の誤差がｘ方向、ｙ方向ともに±２５ｃｍ以内であれば同じ位置とみなし、追加処理は行わない。その理由は、話者は着座姿勢を保っているとしても常に固定しているわけではなく、椅子を動かしたり頭をゆすったりすることが考えられるためである。
【０１０１】
図２０は、時刻テーブルの例を示す図である。時刻テーブル４２は初期状態で３６列１０２４行のマトリックスであり、各行のインデックスが音源位置テーブル４１の各行のインデックスに対応している。このマトリックスは、第１列以外には初期値として「ｎｕｌ」が入力されているが、第１列目には初期値として１が格納されている。この第１列目はポインタ情報を格納するために利用する。すなわち、この例では音源Ｐ１の時刻データがすでに３つ記録されているので、Ｐ１のポインタを与える第１行目には４が設定される。この情報により、音源Ｐ１からの次の時刻データがあった場合、このポインタを参照することによってどのアドレスにその時刻データを記入すればよいかをたやすく知ることができる。時刻データの記入が終わると、ポインタの値はインクリメントされる。
【０１０２】
時刻の表現形式は、ここでは計測開始時からの経過時間を０．１秒単位で計り、その値を２バイトの符号無し２進数によって表現している。したがって、１．３秒後の場合の時刻表現は「１３」となる。もちろん、計測開始時からの経過時間が分かればよいので、別の表現をとっても構わない。この時、受け取った音源位置がすでに音源位置テーブル４１に登録されているものであれば、その音源位置を示す行インデックスを求め、時刻テーブル４２の対応する列インデックスのすでに記録されている時刻情報の後に新しい時刻情報が追加される。受け取った音源位置が音源位置テーブル４１に未登録であれば、この音源位置を新たに音源位置テーブル４１に時刻とともに登録する。
【０１０３】
音源位置テーブル４１は２×３６の大きさの２次元マトリックスであり、コンピュータのメモリ上に確保される。また時刻テーブル４２は記録開始時は３６×１０２４の大きさの２次元マトリックスがコンピュータのメモリ上に確保され、同一位置の時刻情報が満杯になると自動的にメモリアロケーション機能によって新たに１０２４行が追加される。
【０１０４】
図２１は、音源位置記録の処理手順を示すフローチャートである。これはコンピュータ１５１によって行われる処理である。以下、処理手順をステップ番号に沿って説明する。
〔Ｓ１〕音源位置テーブルＴ_１と時刻テーブルＴ_２との領域をメモリ上に確保する。
〔Ｓ２〕位置格納変数Ｐと時刻格納変数ｘを確保する。
〔Ｓ３〕音源位置テーブルＴ_１と時刻テーブルＴ_２とを初期化する。
〔Ｓ４〕時刻ｔのカウントを開始する
〔Ｓ５〕音源位置計算器からの出力があるか否かを判断する。出力があればステップＳ６に進み、出力がなければこの処理を繰り返す。
〔Ｓ６〕現在の時刻ｔを時刻格納変数ｘに入力する。
〔Ｓ７〕音源位置（ｘ、ｙ）を位置格納変数Ｐに入力する。
〔Ｓ８〕位置格納変数Ｐが既に音源位置テーブルＴ_１に登録されているが否かを判断する。登録済みであればステップＳ１０に進み、未登録であればステップＳ９に進む。
〔Ｓ９〕音源位置テーブルＴ_１に位置格納変数Ｐの値を新たに格納し、ステップＳ１０に進む。
〔Ｓ１０〕音源位置テーブルＴ_１での位置格納変数Ｐの行インデックスｋを求める。
〔Ｓ１１〕時刻テーブルＴ_２の列ｋの第１行目に格納されているポインタ値ａを求め、時刻テーブルＴ_２のａ行ｋ列にｘの値を格納する。
〔Ｓ１２〕時刻テーブルＴ_２の列ｋの第１行目に格納されているポインタ値ａをインクリメントする。
〔Ｓ１３〕音源位置の計算が終了かいなかを判断する。終了であればステップＳ１４に進み、終了でなければステップ５に進む。
〔Ｓ１４〕記憶装置上に音源位置テーブルＴ_１および時刻テーブルＴ_２のデータを保存する。
〔Ｓ１５〕必要に応じてコメントを付加し、処理を終了する。
【０１０５】
このようにして、音源位置と発話時刻の情報を映像のインデックスとして利用することができる。
【０１０６】
【発明の効果】
以上説明したように本発明の音源位置計測装置では、３本のマイクを所定の三角形の頂点に配置し、同一音源からの音が３本のマイクに到達する時間差を測定するようにしたため、各マイクの位置とマイクペアごとの音の到達時間差とから、音源の方向のみならず音源位置を算出することができる。
【０１０７】
また、本発明のカメラ撮影制御装置では、各マイクの位置、マイクペアごとの音の到達時間差及びカメラの位置から、カメラに対する音源の相対的な位置を計算するようにしたため、カメラとマイクとが離れた位置にあっても、カメラが音源の方向を向くように制御することができる。
【図面の簡単な説明】
【図１】本発明のカメラ撮影制御装置の原理構成図である。
【図２】第１の実施の形態に係るカメラ撮影制御装置の配置図である。
【図３】第１の実施の形態に係るカメラ撮影制御装置の内部構成図である。
【図４】音信号検出装置の内部構成を示す図である。
【図５】音信号検出器における信号の変化を示す図である。（Ａ）はマイクから入力された信号の波形を示す図であり、（Ｂ）はアンプで増幅後の信号の波形を示す図であり、（Ｃ）は、パワー信号の波形を示す図であり、（Ｄ）は、レベル判定後の信号の波形を示す図であり、（Ｅ）は、微分器から出力される信号の波形を示す図である。
【図６】時間差測定器の内部構成を示す図である。
【図７】トリガ信号から得られる方形波を示す図である。（Ａ）はトリガ信号を表しており、（Ｂ）は方形波を表している。
【図８】遅延器の内部構成を示す図である。
【図９】音源方向の計算のためのマイクの座標系を示す図である。
【図１０】マイク間の時間差と音源方向との関係式を示すグラフである。
【図１１】１組のマイクペアの計測範囲を示す図である。
【図１２】角度−時間差情報テーブルを示す図である。
【図１３】２つの時間差情報から音源位置を求めるための方法を説明する図である。
【図１４】マイクペアの組み合わせと時間差データの符号によって決まる音源の範囲を示す図である。
【図１５】音源位置測定器の内部構成を示す図である。
【図１６】マイクの座標系とカメラの座標系の関係を示す図である。
【図１７】カメラ制御装置の内部構成を示す図である。
【図１８】音源位置記録装置の内部構成図である。
【図１９】音源位置テーブルの例を示す図である。
【図２０】時刻テーブルの例を示す図である。
【図２１】音源位置記録の処理手順を示すフローチャートである。
【符号の説明】
１〜３マイク
４音信号検出手段
５時間差測定手段
６音源位置計算手段
７カメラ制御手段
８カメラ駆動手段
９カメラ本体
１０電動レンズ
１１発話情報記録手段
１２映像・音声記録再生手段
１３再生指示手段[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention is a sound source position measuring device that calculates a sound source position based on sound information from a sound source, as well as Camera shooting control device that controls the shooting position of the camera based on information on the sound source position In place In particular, a sound source position measurement device that calculates a sound source position using a time difference when a sound from the same sound source reaches a plurality of microphones, as well as Camera photographing control device that controls the photographing position of the camera based on the calculation result In place Related.
[0002]
[Prior art]
2. Description of the Related Art In recording video of a conference or a remote conference system, it is important to photograph the entire conference venue or to capture individual speakers in a zoom-in manner. Thereby, the expression of the speaker and the atmosphere of the meeting can be better grasped, and the presence of the meeting can be enhanced. Moreover, it is desired that such photographing be performed automatically.
[0003]
Therefore, inventions such as Japanese Patent Application Laid-Open Nos. 4-122184 and 4-297196 have been made as conference systems capable of photographing a speaker. In the former, a microphone is assigned to each sound source, that is, each speaker, and the speaker is detected by a signal from the microphone. The photographing of the speaker is performed by one camera. On the other hand, the latter invention employs a method in which a camera and a microphone are assigned to each speaker, and the utterance of each speaker is determined based on the voiceprint of each speaker. In Japanese Patent Application Laid-Open No. Hei 6-217304, a transmitter for transmitting a radio signal unique to each speaker is attached to each microphone, and the transmitter is transmitted based on a reception signal measured by a receiver for receiving the radio signal. The method of obtaining the coordinate position of is adopted.
[0004]
However, in these systems, microphones must be prepared for all participants, which requires cumbersome installation work, and the camera shoots only at preset locations, so flexibility is limited. Was.
[0005]
On the other hand, in the camera photographing control device disclosed in Japanese Patent Laid-Open No. Hei 7-140527, when two microphones capture sound from the same sound source, the direction of the sound source is changed by utilizing the phase difference of the sound between the microphones. By taking the method of knowing, we have devised so that the camera can be aimed at many sound source directions with a small number of microphones.
[0006]
It is widely known that it is possible to know the direction of a sound source, that is, to perform sound source localization by using a time difference when sounds from the same sound source reach a plurality of microphones. In this case, with the two microphones, the symmetry with the plane perpendicular to the midpoint of the line connecting them remaining as the symmetry plane remains, so that it is not possible to determine the front and rear. For this reason, in the invention disclosed in Japanese Patent Application Laid-Open No. Hei 7-140527, a microphone is attached to the side of the camera so that only the sound in front of the camera is captured.
[0007]
In addition, if the information of the utterance time of each speaker can be acquired when the video of the conference is recorded, it is convenient for editing the video later. Therefore, conventionally, the information of the utterance time has been obtained by detecting a voice input to a microphone assigned to each speaker.
[0008]
[Problems to be solved by the invention]
However, when a speaker is photographed by a camera, there is a problem that the direction obtained by the conventional method is only the direction of the sound source, and the position of the sound source has not yet been obtained. That is, the direction of the sound source to be obtained is based on the position of the microphone, so that the microphone and the camera must be integrated. As a result, the microphone and the camera cannot be separated, and there is a restriction that the microphone cannot be arranged at a position where the utterance from each speaker can be easily captured. In addition, when photographing with a camera, the degree of zoom varies depending on the position from the camera to the sound source. Therefore, the zoom amount cannot be adjusted unless the distance from the sound source is known.
[0009]
In addition, when utterance time information is acquired, allocating a microphone to each speaker requires cumbersome work for installation of the microphone, routing of cables, and the like. Therefore, it is desired that the voice captured by the microphone and the speaker can be associated with a simpler configuration.
[0010]
The present invention has been made in view of such a point, and provides a sound source position measuring device capable of determining a sound source position by using a time difference when sounds from the same sound source reach a plurality of microphones. The purpose is to:
[0011]
Another object of the present invention is to provide a camera photographing control device that can point a camera toward a sound source even when a camera and a microphone are arranged at different positions.
[0013]
[Means for Solving the Problems]
In the present invention, in order to solve the above-described problems, in a sound source position measurement device that calculates a sound source position based on sound information, Correct Microphones installed at three vertices of a triangle, sound signal detecting means for detecting a time at which a sound emitted from one sound source is captured by each of the microphones as a sound detection time, and a sound detection time for each microphone. Time difference measuring means for measuring the arrival time difference of the sound for each of the three microphone pairs obtained by combining two of the microphones on the basis of The range of ± 60 degrees from the line connecting the microphone pairs is taken as the sound source measurement range of each microphone pair, and the possible range of the sound source is limited by the sign of the arrival time difference of the sound captured by each microphone pair, The position of each microphone Said And a sound source position calculating means for calculating a sound source position from a sound arrival time difference for each microphone pair.
[0014]
According to such a sound source position measuring device, when a sound is emitted from a certain sound source, each microphone captures the sound at a time corresponding to a distance from the sound source. Next, the time at which the microphone captures the sound is detected by the sound signal detecting means, and is set as the sound detection time. Then, based on the sound detection time, the time difference measuring means measures the arrival time difference of the sound for each microphone pair. And The sound source position calculation means limits the possible range of the sound source by the sign of the arrival time difference of the sound captured by each microphone pair, and based on the arrival time difference of the sound for each microphone pair. The sound source position is calculated.
[0015]
Further, in a camera photographing apparatus that performs photographing while controlling the direction of a camera installed on a turntable, Correct Microphones installed at three vertices of a triangle, sound signal detecting means for detecting a time at which a sound emitted from one sound source is captured by each of the microphones as a sound detection time, and a sound detection time for each microphone. Time difference measuring means for measuring the arrival time difference of the sound for each of the three microphone pairs obtained by combining two of the microphones on the basis of The range of ± 60 degrees from the line connecting the microphone pairs is taken as the sound source measurement range of each microphone pair, and the possible range of the sound source is limited by the sign of the arrival time difference of the sound captured by each microphone pair, The position of each said microphone, Said Sound source position calculation means for calculating the relative position of the sound source with respect to the camera, based on the arrival time difference of the sound for each microphone pair and information on the position of the camera, based on the position information of the sound source calculated by the sound source position calculation means, Camera control means for controlling the direction of the camera in the direction of the sound source.
[0016]
According to such a camera photographing apparatus, when a sound is emitted from a certain sound source, each microphone captures the sound at a time corresponding to a distance from the sound source. The time at which the microphone captures the sound is taken as the sound detection time by the sound signal detection means, and based on the sound detection time, the time difference measurement means measures the arrival time difference of the sound for each microphone pair, The sound source position calculation means limits the possible range of the sound source by the sign of the arrival time difference of the sound captured by each microphone pair, and based on the arrival time difference of the sound for each microphone pair. The sound source position is calculated. The camera control means controls the direction of the camera in the direction of the sound source. As a result, the camera can always be pointed at the speaker.
[0021]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
FIG. 1 is a principle configuration diagram of a camera photographing control device of the present invention. First, microphones 1 to 3 are arranged at three vertices of a triangle having a predetermined shape. The sound signal detection means 4 detects the time at which the sound emitted from one sound source is captured by each of the microphones 1 to 3 as a sound detection time, and reproduces the audio signal captured by the microphones 1 to 3 for video / audio recording and reproduction. Send to means 12. The time difference measuring means 5 measures the arrival time difference of the sound for each of the three microphone pairs obtained by combining two microphones 1 to 3 based on the sound detection time for each microphone. The sound source position calculation means 6 calculates a sound source position from the positions of the microphones 1 to 3 and the arrival time difference of sound for each microphone pair. Information on the calculated sound source position is passed to the camera control means 7 and the utterance information recording means 11.
[0022]
The camera control means 7 controls a camera driving means 8 for rotating a turntable on which the camera body 9 is mounted, and an electric lens 10 for adjusting a zoom amount of the camera body 9. Then, based on the sound source position information, the camera control means 7 controls the direction and the zoom amount of the camera so that the camera captures the sound source at a predetermined size. The video signal captured by the camera body 9 is sent to video / audio recording / reproducing means 12.
[0023]
The utterance information recording means 11 records sound source position information when the microphone captures sound and the utterance time. The video / audio recording / reproducing means 12 records and reproduces the video information sent from the camera body 9. The reproduction instructing means 13 gives a reproduction instruction to the video / audio recording / reproducing means 12 using the information stored in the speech information recording means 11.
[0024]
According to such a camera photographing control device, when there is a person speaking at a certain place, the voice is captured by the three microphones 1 to 3. Then, the time when each of the microphones 1 to 3 captures the sound is detected by the sound signal detecting means 4, and is set as the sound detection time. At this time, due to the difference in the distance from the position where the speaker is present to each of the microphones 1 to 3, there is a time difference in the time at which each of the microphones 1 to 3 captures sound. Therefore, three microphone pairs are created by combining two microphones 1 to 3 by the time difference measuring means 5, and the arrival time difference of the sound for each microphone pair is measured. Then, the sound source position is calculated by the sound source position calculating means 6 using the arrival time difference of the sound of each microphone pair. At this time, the sound source position in the coordinate system based on the position of the camera body 9 is calculated.
[0025]
When the position of the sound source is determined, the direction and distance of the sound source viewed from the camera body 9 are calculated by the camera control means 7, and the camera body 9 is directed in the direction of the sound source by controlling the camera driving means 8. By controlling the lens 10, the zoom amount is adjusted.
[0026]
On the other hand, the audio information detected by the audio signal detecting means 4 and the video information captured by the camera body 9 are recorded in the video / audio recording / reproducing means 12, and the utterance time for each sound source is stored in the utterance information recording means 11. Is done. Then, the operator of this apparatus can know the utterance time of an arbitrary speaker from the information stored in the utterance information recording means 11, and extracts only the utterance of a specific speaker from the video / audio recording / reproducing means 12. You can also play.
[0027]
In this way, since the sound source position can be measured with only three microphones, the camera can be pointed at the sound source even when the microphone is separated from the camera, and the zoom amount can be adjusted according to the distance from the camera to the sound source. It can also be adjusted. Furthermore, since the sound source position and the utterance time can be recorded in association with each other, the utterance time of each speaker is associated with the information recorded in the video / audio recording / reproducing means 12 so that the index of the video can be obtained. Can be created.
[0028]
By the way, the camera photographing control device of the present invention can be effectively used as a videoconferencing system or video photographing of important meetings. Therefore, an embodiment will be described by taking as an example a case where the contents of a meeting are photographed using the camera photographing control device of the present invention.
[0029]
FIG. 2 is a layout diagram of the camera photographing control device according to the first embodiment. In this example, facing the moderator 21, a plurality of participants 22a to 22k arrive at the U-shaped table 23. In the center of the conference room, three microphones 1 to 3 arranged at the vertices of an equilateral triangle are placed. These microphones 1 to 3 are connected to the device main body 100 of the camera photographing control device. The camera 200 is connected to the apparatus main body 100.
[0030]
When the microphones 1 to 3 capture the voice of the speaker, the sound source position is calculated by the apparatus main body 100, and the camera is turned in the direction of the speaker, and zoom-in or zoom-in is performed according to the distance from the camera to the speaker. Out is done.
[0031]
The length of the sides of the equilateral triangle formed by the three microphones 1 to 3, that is, the distance between the microphones can be set arbitrarily. However, here, the microphones are easily installed in a meeting place, and the sound speed value at room temperature is 340 m / It is 68 cm, which is a value that can be easily calculated from s.
[0032]
FIG. 3 is an internal configuration diagram of the camera photographing control device according to the first embodiment. Three microphones 1 to 3 arranged at the vertices of an equilateral triangle, which are voice input means, are connected to the sound signal detector 110. The sound signal detector 110 outputs the sound captured by the microphones 1 to 3 as an audio signal, and outputs a signal corresponding to the time when each of the microphones 1 to 3 captures the sound. The signal is input to the time difference measuring device 120. The time difference measuring device 120 calculates a time difference between times when the microphones 1 to 3 capture the sound, and outputs time difference data. The time difference data is input to the sound source position calculator 130. The sound source position calculator 130 calculates and outputs a sound source position from the time difference data.
[0033]
The reference signal generator 101 generates a reference clock for measuring the time difference between the times when the sounds from the same sound source are captured by each microphone, and supplies the reference clock to the time difference measuring device 120 and the sound source position calculator 130. Here, a square wave having a period of 0.5 μs is generated. The resolution of the time difference measurement of the sound is determined by this cycle. In this embodiment, since the distance between the microphones is 68 cm, the maximum time difference between the microphones is a value obtained by dividing 68 cm by the speed of sound (340 m / s), that is, 2 ms. Since the value of 0.5 μs is 1/4000 of this value, the resolution is sufficient for speaker position calculation in a conference.
[0034]
The camera control device 140 receives the sound source position data from the sound source position calculator 130 and controls the direction of the camera body 203 and the zoom amount. The camera body 203 is fixed on the turntable 201, and the direction of the camera body 203 can be changed by rotating the turntable 201. Further, the camera body 203 includes an electric lens 202 for changing a shooting range, and the zoom amount can be adjusted by adjusting the electric lens 202. The image captured by the camera body 203 is output as a video output 103.
[0035]
According to the camera photographing control device having such a configuration, when the moderator 21 or the participants 22a to 22k in FIG. 2 speak, the sound is captured by the three microphones 1 to 3 and input to the sound signal detector 110. . Hereinafter, the sound signal detector 110 will be described in detail.
[0036]
FIG. 4 is a diagram showing the internal configuration of the sound signal detector 110. The sound signal detector 110 basically includes three elements, each of which is a set of four elements of amplifiers 111a to 111c, signal / power converters 112a to 112c, level determiners 113a to 113c, and differentiators 114a to 114c. It consists of a set.
[0037]
The signals 103a to 103c from the microphones 1 to 3 are input to amplifiers 111a to 111c that amplify the signals. Outputs of the amplifiers 111a to 111c are input to signal / power converters 112a to 112c provided correspondingly, and also input to a mixer 115. The mixer 115 converts the audio signal amplified by the amplifier into a stereo signal, and outputs the stereo signal as the audio output 102.
[0038]
Each of the signal / power converters 112a to 112c obtains the power of the sound signal from the output signals of the amplifiers 111a to 111c, and inputs the power signal to the corresponding level determiners 113a to 113c. Each of the level determiners 113a to 113c binarizes the power signal with a specific threshold value, and then outputs a TTL (Transistor-Transistor-Logic) level signal to differentiators 114a to 114c provided correspondingly. input. Each of the differentiators 114a to 114c sets the output of the input signal. Chi The rise is detected, and trigger signals 104a to 104c indicating the rise are output.
[0039]
Prior to the outputs of the differentiators 114a to 114c, only positive differential results are output by the action of the diodes. Further, the amplification degree of the amplifiers 111a to 111c and the threshold values of the level determination units 113a to 113c can be arbitrarily set for each set by the volume.
[0040]
The change of the signal in the sound signal detector 110 will be described below by taking the signal 103a from the microphone 1 as an example.
FIG. 5 is a diagram illustrating a change in a signal in the sound signal detector 110. In this figure, the horizontal axis represents time, and the vertical axis represents voltage.
[0041]
FIG. 3A is a diagram illustrating a waveform of a signal input from the microphone 1. A signal 103a having a relatively weak waveform is input from the microphone 1. This signal 103a is amplified by the amplifier 111a.
[0042]
(B) is a diagram showing a waveform of a signal amplified by the amplifier 111a. By being amplified by the amplifier 111a, the waveform becomes larger in amplitude than the waveform shown in FIG. This signal is converted into a power signal by the signal / voltage converter 112a.
[0043]
(C) is a diagram showing a waveform of a power signal. The power signal is a signal having only a positive value. The level of this signal is determined by the level discriminator 113a based on the threshold value 116.
[0044]
(D) is a diagram showing the waveform of the signal after the level determination. A binary signal is obtained by the level determination. When this signal passes through the differentiator 114a, it becomes a trigger-like signal.
[0045]
(E) is a diagram showing a waveform of a signal output from the differentiator 114a. The waveform of the trigger signal output from the differentiator 114a indicates the rising time of the signal input to the differentiator 114a.
[0046]
In this way, from the sound captured by the microphone 1, a trigger signal-like waveform indicating the time when the sound is captured is obtained. This process is similarly performed on the sounds captured by the other two

microphones

2 and 3, and the trigger signals 104 b to 104 c corresponding to the

respective microphones

2 and 3 are output signals of the sound signal detector 110. .
[0047]
The three trigger signals 104 a to 104 c output from each sound signal detector 110 are input to the time difference measuring device 120 together with the clock signal from the reference signal generator 101.
[0048]
FIG. 6 is a diagram showing the internal configuration of the time difference measuring device 120. The time difference measuring device 120 includes three trigger signal converters 121a to 121c and three delay devices 122a to 122c. The trigger signal converters 121a to 121c convert the input trigger signals 104a to 104c into a square wave having the same time width as the H (high) level of the square wave generated by the reference signal generator 101. Further, each of the delay units 122a to 122c measures a time difference between two square waves.
[0049]
According to the time difference measuring instrument 120 having such a configuration, the input trigger signals 104a to 104c are converted by the trigger signal converters 121a to 121c into a square wave having the same time width as the H level of the square wave as the reference signal. Is done.
[0050]
FIG. 7 is a diagram illustrating a square wave obtained from the trigger signal. (A) shows a trigger signal, and (B) shows a square wave. That is, if there is a trigger signal 104a 160.8 μs after a certain time, the trigger signal 104a is a square wave having an H level between 161.0 μs and 161.25 μs when the next reference clock is at an H level. 120a.
[0051]
The signal quantized in the cycle of the reference clock is then input to the delay units 122a to 122c by pairing two signals. That is, three pairs of a signal pair of the microphone 1 and the microphone 2, a signal pair of the microphone 2 and the microphone 3, and a signal pair of the microphone 3 and the microphone 1 are input to the delay units 122a to 122c.
[0052]
FIG. 8 is a diagram showing an internal configuration of the delay unit 122a. The delay unit 122a can be actually configured by a delay circuit, but in this example, the order determination unit 122aa, the counter 122ab, and the data It is constituted by a converter 122ac.
[0053]
The order determiner 122aa determines which input comes first, and has three inputs: a signal 123a from the microphone 1, a signal 123b from the microphone 2, and a clear signal 122ad. Then, the order determiner 122aa outputs the order determination bits 122ae and 122af, which are outputs for determining the order, to the data converter 122ac, and outputs a signal 122ag for time difference measurement to the counter 122ab.
[0054]
The clock signal 122ah from the reference signal generator 101 is input to the counter 122ab in addition to the output 122ag of the order determiner 122aa. The counter 122ab counts the number of H-levels of the input clock signal 122ah only while the signal 122ag output from the sequence determiner 122aa is at H-level, and converts the count output 122ai, which is the counting result, into data conversion. Is output to the container 122ac. The counter 122ab also outputs a clear signal 122ad to the order determiner 122aa.
[0055]
The data converter 122ac converts the two order determination bits 122ae and 122af of the order determiner 122aa and the count output 122ai of the 2-byte unsigned binary counter 122ab into a 2-byte signed binary number. . The conversion result is output as the output data 105a, and the data conversion end signal 106a is output as one pulse of the H level.
[0056]
The delay unit 122a having such a configuration operates as follows.
When the two

signals

123a and 123b are input, the order determiner 122aa determines which signal is input earlier. If the input of the signal 123a goes to the H level earlier, the order determination bit 122ae is kept at the H level. Conversely, if the input of the signal 123b goes to the H level earlier, the order determination bit 122af is kept at the H level. The signal 122ag output to the counter 122ab is kept at the H level from the time when either of the two

signals

123a and 123b becomes the H level until the other signal becomes the H level. The states of these order determiners 122aa are cleared when the clear signal 122ad goes high. Therefore, if the output of the order determination bit 122ae and the order determination bit 122af is (1, 0) in this order, it can be determined that the input of the signal 123a is early, and if the output is (0, 1), it can be determined that the input of the signal 123b is early.
[0057]
When the signal 122ag goes high, the counter 122ab starts counting the number of times the input clock signal 122ah goes high. When the signal 122ag goes to the L (low) level, the counter 122ab stops counting, and uses the counted information as a 16-bit unsigned binary number as the count output 122ai. Therefore, a value obtained by multiplying the clock cycle of 0.5 μs by this counting result is the measured time difference. Here, the value counted by the counter 122ab is used as it is. The counter 122ab sends out a one-pulse H-level signal to the clear signal 122ad at the same time that the count output 122ai is output. The clear signal 122ad clears the state of the order determiner 122aa and prepares for the next input.
[0058]
The output of the two order determination bits 122ae and 122af and the count output 122ai of the 2-byte unsigned binary counter 122ab are converted into a 2-byte signed binary number by the data converter 122ac to become output data 105a. When the data conversion is completed, the data conversion end signal 106a is output as one pulse H level. The output data 105a and the data conversion end signal 106a are sent to the next sound source position calculator 130. Each of the data converters of the three delay units 122a to 122c sets the order of the microphone pair to (1, 2), (2, 3), and (3, 1), and the sound of the microphone in the first row is higher than the other. If it is earlier, the sign is made positive, and if the opposite, the sign is made negative.
[0059]
Before describing the details of the function of the sound source position calculator 130, a method of calculating the sound source position will be described. The basis of the sound source position calculation is to calculate the sound source direction in the two microphones.
[0060]
FIG. 9 is a diagram showing a microphone coordinate system for calculating a sound source direction. As shown in the figure, a uv coordinate system in which a straight line passing through the two

microphones

1 and 2 is the u-axis and the midpoint of the two

microphones

1 and 2 is the origin is considered. The microphone 1 is placed at a position (-a, 0) away from the origin by -a, and the microphone 2 is placed at a position (a, 0) away from the origin, and the position of the sound source P is (r, Θ), the time difference dt when the sound from the sound source P reaches each microphone is
[0061]
(Equation 1)
dt = [(r ² + 2racos Θ + a ² ) ^1/2 − (R ² -2racos Θ + a ² ) ^1/2 ] / V ₀ ... (1)
It becomes. Where V ₀ Is the value of sound speed (340 m / s at room temperature). This is plotted using r as a parameter.
[0062]
FIG. 10 is a graph showing a relational expression between a time difference between microphones and a sound source direction. Here, the calculation is performed with a = 1 m. As described above, the effect of r on dt is sufficiently smaller than 、, and the angle error is about 5 degrees even if the value of r is changed, and Θ becomes smaller than about 25 degrees regardless of any r. It can be seen that the angle sensitivity to dt is higher in the region than in the previous regions. From this, knowing the arrival time difference of the sound to the microphone pair, it is possible to determine the direction of the sound source of the sound, that is, a straight line connecting the origin of the uv coordinates and the sound source. If another straight line is obtained by another microphone pair, the sound source position is obtained from the intersection of those straight lines.
[0063]
In the present embodiment, since the sound source position is calculated by the combination of the microphone pairs, the sound source position is calculated by setting the range of ± 60 degrees from the X axis as the measurement range of each microphone pair.
[0064]
FIG. 11 is a diagram illustrating a measurement range of one microphone pair. Assuming that the two

microphones

1 and 2 are arranged on the u axis with the origin interposed therebetween, the measurement range of this microphone pair is ± 60 ° in the positive direction centering on the u axis and negative when viewed from the origin. ± 60 ° in the direction of. Thus, the measurement range is determined, and the sound source position is calculated using two microphone pairs whose sound source positions are within the measurement range among the three microphone pairs. Since the three microphones are arranged at the vertices of an equilateral triangle, two microphone pairs always fall within this range for an arbitrary sound source position. Therefore, the data used for the sound source position calculation is obtained by excluding the data with the smallest time difference from the three time difference count data.
[0065]
Therefore, in the calculation of the sound source position, first, the sound source direction is obtained by individually using the coordinate system shown in FIG. 11 for each microphone pair. The sound source direction is obtained by comparing the measured time difference data with the angle-time difference information table.
[0066]
FIG. 12 is a diagram showing an angle-time difference information table. In this example, since the distance between the microphones is 68 cm, the value of r is set to 3 m in the table in the figure, and in the above equation, Θ is 0 degree to 60 degrees. I have. The first column of the table is given an angle of 0 degree every 60 degrees to 5 degrees from the top, and the second column stores the time difference data of the two microphones at that angle. . The time difference data is a 2-byte signed integer representation of a value obtained by dividing the time difference calculated by the above equation by the reference clock cycle of 0.5 μs. Thus, the comparison with the time difference data obtained by the time difference measuring device 120 can be directly performed. To find the direction from the time difference data, refer to the second column of this table. If the largest cell whose absolute value of the time difference data does not exceed the value in the table is found, the angle data is calculated from the first column of the table. Obtainable. At this time, it is line-symmetric on both sides of the v-axis in FIG. 11, but if the value of the time difference data is minus, the utterance position is on the microphone 1 side, and if it is plus, the utterance position is on the microphone 2 side. Become.
[0067]
In addition, the angle data obtained here is based on the axis connecting the two microphones (the u-axis in FIG. 11) as a symmetric axis, and two straight lines on both sides from the axis are formed as solutions. This can be solved by the data and the combination information of the microphone pair.
[0068]
FIG. 13 is a diagram illustrating a method for obtaining a sound source position from two pieces of time difference information. In this example, the sound source position is obtained from the microphone pair of the

microphones

3 and 1 and the microphone pair of the

microphones

2 and 3. The xy coordinates are defined such that the center of gravity of the equilateral triangle having the microphones 1 to 3 as vertices (the intersection of the line segment connecting each vertex and the midpoint of the opposite side) becomes the origin. Two straight lines 31a and 31b are obtained from the microphone pair of the

microphones

3 and 1, and two

straight lines

32a and 32b are also obtained from the microphone pair of the

microphones

2 and 3. Further, since the sound from the sound source P reaches the

microphones

1 and 2 faster than the sound from the microphone 3, the pair of the microphone 2 and the microphone 3 has positive angle information (indicating that the sound source is on the microphone 2 side). , And negative angle information (indicating that there is a sound source on the microphone 1 side) is obtained from the pair of the microphone 3 and the microphone 1. Moreover, as shown in FIG. 11, the measurement range of each microphone pair is limited. Therefore, the range of the sound source can be limited by the combination of the time difference information. The sound source position in the example of FIG. 13 is limited to a range in which the angle from the x-axis of the first quadrant is 0 to 60 degrees. That is, there is no possibility of the straight line 31b and the straight line 32b, and eventually two

straight lines

31a and 32a remain.
[0069]
As described above, the range of the sound source position is specified by the combination of the microphone pairs and the sign of the time difference data in each microphone pair. All combinations are shown below.
[0070]
FIG. 14 is a diagram illustrating a range of the sound source determined by the combination of the microphone pair and the sign of the time difference data. 1, 2 and 3 in the figure indicate the positions of the

microphones

1, 2 and 3, respectively. Further, the time difference data of the microphone pair of the

microphones

1 and 2 is (1, 2), the time difference data of the microphone pair of the

microphones

2 and 3 is (2, 3), and the time difference data of the microphone pair of the

microphones

3 and 1 is (3, 1). I have. Therefore, the relationship between the code of the time difference data of the microphone pair and the range of the sound source is as follows. Note that the angle in the following description is represented by an angle (counterclockwise) from the positive x-axis.
[0071]
If (2,3)> 0 and (3,1) <0, the sound source is limited to an angle from the positive x-axis within the range of 0 to 60 degrees.
If (1,2)> 0 and (3,1) <0, the sound source is limited to an angle from the positive x-axis within the range of 60 to 120 degrees.
[0072]
If (1,2)> 0 and (2,3) <0, the sound source is limited to an angle of 120 to 180 degrees from the positive x-axis.
If (2,3) <0 and (3,1)> 0, the angle of the sound source from the positive x-axis is limited to a range of 180 to 240 degrees.
[0073]
If (1,2) <0, (3,1)> 0, the sound source is limited to an angle of 240 to 300 degrees from the positive x-axis.
If (1,2) <0, (2,3)> 0, the sound source is limited to an angle from the positive x-axis within a range of 300 to 360 degrees.
[0074]
Note that the relationship between the code of the time difference data of the microphone pair and the range of the sound source shown in this figure is held in the sound source position calculator 130 in advance as a sound source position area table. If the two straight lines specified as described above are coordinate-transformed from the uv coordinate system in FIG. 11 to the xy coordinate system as in FIG. 13, the sound source position P (x, y) can be obtained from the intersection between the two straight lines. Desired. In the example of FIG. 13, in the case of the pair of

microphones

3 and 1, the point passes through the point (−a / 2, a / 2√3), and the x axis and “60 degrees + the uv coordinate system obtained by the above method” , And in the case of a pair of

microphones

2 and 3, through the point (0, −a / 2√3) and the x axis and the uv coordinate system obtained by the above method. It is sufficient to obtain the straight line 32a forming the angle of “the angle of” and obtain the intersection of these straight lines.
[0075]
The calculation of the position is performed by the sound source position calculator 130 using the three signed binary information obtained by the time difference measuring device 120. The sound source position calculator 130 is constructed on a computer.
[0076]
FIG. 15 is a diagram showing an internal configuration of the sound source position measuring device. As shown in the figure, the sound source position measuring device 130 is mainly configured with a CPU (Central Processing Unit) 133.
[0077]
The three ports 131a to 131c take in the data measured by the time difference measuring device 120 and supply the data to the CPU 133. The AND gate 132 gives a timing at which data is taken into the CPU 133. The CPU 133 has registers 133a to 133c for storing data from the ports 131a to 131c. Further, the CPU 133 performs data transfer with various peripheral devices via the bus 134.
[0078]
The input device 135, the output port 136, the memory 137, and the hard disk 138 are connected to the bus 134. The input device 135 is for inputting the positional relationship between the camera and the microphone. The output port 136 is a communication port for outputting a calculation result to the camera control device 140. The memory 137 is for temporarily holding a program executed by the CPU 133. The hard disk 138 holds a program, an angle-time difference information table, a sound source position area table, and a coordinate conversion table. Note that the sound source position calculation process performed by the sound source position calculator 130 is a process executed by loading a program stored in the hard disk 138 into the memory 137 and executing the program by the CPU 133.
[0079]
The three data outputs 105a to 105c from the time difference measuring device 120 input to the sound source position calculator 130 are first transmitted to three ports 131a to 131c, and all the outputs 106a to 106c from the time difference measuring device 120 are output. When the output of the AND gate 132 becomes H level due to the H level, it is taken into three registers 133a to 133c in the CPU 133. Of the three time difference data taken into the three registers 133a to 133c, two time difference data excluding the data with the smallest time difference are written to the variable areas t1 and t2 previously secured on the memory. Then, the above-described calculation of the sound source position is performed based on the two time difference data. Thereby, the sound source position in the xy coordinate system in FIG. 13 is obtained. Thereafter, coordinate conversion is performed to an XY coordinate system with the camera position as the origin, and a sound source position P (X, Y) in the XY coordinate system is obtained.
FIG. 16 is a diagram illustrating the relationship between the coordinate system of the microphone and the coordinate system of the camera. The centers of the microphones 1 to 3 (the centers of gravity of the triangles) are the origin of the xy coordinates, and the rotation center of the camera 200 (shown in FIG. 2) is the origin of the XY coordinates.
[0080]
Actually, the microphones 1 to 3 and the camera 200 are installed so that the x-axis parallel to the line connecting the

microphones

2 and 3 and the direction of 0 degrees of the turntable 201 are parallel. Therefore, the coordinate conversion at this time is only translation. The data of the translation amount is measured at the time of installation of the device, and is input to the sound source position calculator 130 by the input device 135. The sound source position calculator 130 performs coordinate conversion using this data. The coordinate conversion is not limited to translation, and even when both coordinate systems form a certain angle, it is only necessary to perform the coordinate conversion by this rotation. However, at this time, it is necessary to measure the angle between the xy coordinate system and the XY coordinate system. The sound source position P (X, Y) obtained in this manner is sent out from the output port 136 as 2-byte floating point data, and is input to the camera control device 140.
[0081]
FIG. 17 is a diagram showing the internal configuration of the camera control device 140. As shown in this figure, the camera control device 140 includes an angle calculation unit 141, a zoom amount calculation unit 142, a data output unit 143, a calibration unit 144, a volume 145, and a calibration setting button 146.
[0082]
The angle calculation unit 141 receives the sound source position data P (X, Y) and a signal 204 indicating the operation of the turntable 201. The angle calculator 141 converts the sound source position data P (X, Y) into polar coordinate system data (R, Ｙ) when the signal 204 is at the L level. R is the distance from the camera to the sound source position, and Θ is the angle formed by the line connecting the sound source position P and the origin of the XY coordinate system with the X axis. Further, the value of Θ is converted into the data format of the rotary encoder and sent to the data output unit 143. Here, an integer value obtained by dividing the angle Θ by 1.4 is converted into an 8-bit unsigned binary number. The value of 1.4 is obtained by dividing 360 degrees by the maximum value 256 of 8-bit data. The distance R to the sound source calculated by the angle calculation unit 141 is sent to the zoom amount calculation unit 142 and the calibration unit 144. The calibration unit 144 sets two pieces of reference information, that is, a reference distance and a value of the reference zoom amount, in the zoom amount calculation unit 142 because the zoom amount of the camera also changes when the distance between the camera and the person changes. It is provided for. The calibration section 144 is connected with a volume 145 for manually setting the zoom amount and a calibration setting button 146. The calibration unit 144 uses the distance data output by the angle calculation unit 141, and sets the zoom amount at that time by using the volume 145. When the volume 145 is turned, the motorized lens of the camera 200 is driven, and the zoom amount changes. When the user presses the calibration setting button 146 at the time when the volume is properly adjusted by operating the volume 145, the distance data Ro and the zoom data Zo at that time are output to the zoom calculator 142 as reference information. Upon receiving these data, the zoom amount calculation unit 142 stores the values in a memory and uses them in subsequent calculations of the zoom amount. However, in the case where the photographing target is a person as in the present apparatus, the size of the photographing target can be considered to be fixed, so even if the default values of the above two reference information are set in advance. Good.
[0083]
The distance R to the sound source calculated by the angle calculation unit 141 is sent to the zoom amount calculation unit 142, and the zoom amount is calculated. The calculation of the zoom amount is performed by dividing the distance R by the reference distance Ro and multiplying the value by the reference zoom amount Zo. The calculated angle data and zoom amount are sent to the turntable 201 and the motorized lens 202 via the data output unit 143, respectively, and the camera photographing device is driven. At this time, a signal 204 indicating the operation of the turntable 201 is sent from the turntable 201 to the camera control device 140 so that the next imaging position information is not output before the operation of the turntable 201 is completed. If this signal is at the H level, it indicates that the turntable 201 is operating, and during that time, the calculation by the camera control device 140 is not performed. Specifically, this signal is sent to the angle calculator 141, and while this signal is at the H level, the data sent from the sound source position calculator 130 is not taken in.
[0084]
The conference in which the camera shooting is performed can be recorded through the video output 103 and the audio output 102 or transmitted to a remote place. At this time, it is not necessary to prepare microphones for all the participants in each conference hall, and it is not necessary that the three microphones and the camera are in the same place. Therefore, the microphone can be placed at a position where it is easy to catch the conversation of all, and the camera can be placed at a position where it is easy to catch the facial expressions of all.
[0085]
In the above description, the sound source position is calculated using the fact that the relationship between the arrival time difference between the sounds captured by the two microphones and the sound source direction does not depend on the distance to the sound source. The sound source position can also be calculated geometrically from the time difference between the sounds captured by the microphone. The calculation method will be described below.
[0086]
First, for the sake of simplicity, it is assumed that three microphones 1 to 3 are installed at three positions (0, a), (0, 0), and (a, 0) in the xy coordinate system. The times at which the sounds emitted from the sound source P (x, y) reach the respective microphones 1 to 3 are defined as t1, t2, and t3, respectively. The distance between the sound source P and each microphone is 11 and 12 respectively.
, L3,
[0087]
(Equation 2)

It is. Assuming that the time difference between the microphones, i.e., t1-t2 is dt1, t2-t3 is dt2, t3-t1 is dt3, and the sound velocity value is vs, the arrival time difference of the sound between the microphones depends on the distance from the sound source to each microphone. It is given as:
[0088]
(Equation 3)

From these equations (2) and (3),
[0089]
(Equation 4)

By solving this simultaneous equation, the sound source position P (x, y) can be obtained. Such a method of obtaining the sound source position is similar to the principle of triangulation and is widely known. Whether to use this calculation method or to use the method used in the first embodiment is determined based on the performance of the sound source position calculator and the required resolution of the time difference measurement of the sound. For example, when a high resolution is required, the method described in the first embodiment, which can calculate at high speed with a simple calculation, is used. When the resolution may be low, a method such as an angle-time difference information table is used. Can be used.
[0090]
Further, in the first embodiment, the system is such that the height of the microphone from the floor is the same as the height of the camera from the floor. However, the position of the camera can be increased. is there. In this case, a value obtained by subtracting the height of the camera from the floor from the height of the microphone from the floor is set as the offset amount d in the Z-axis direction, and the vertical camera direction is calculated in the calculation in the camera control device. Good. However, in the upper coordinate system, the vertically downward direction is the positive direction of the Z axis. That is, the sound source position P (X, Y) obtained by adding the camera position offset obtained by the sound source position calculator 130 is defined as P (X, Y, d), and the angle Φ in the direction perpendicular to the floor is defined as
[0091]
(Equation 5)
d / (X2 + Y2) ^1/2 = Tan Φ (5)
Can be obtained by If a rotation mechanism that can also rotate in the Z-axis direction is added to the turntable of the first embodiment, it is possible to correctly catch a speaker even when there is an offset in the Z-axis direction. This rotation mechanism can be realized with the same configuration as in the first embodiment.
[0092]
Further, in the first embodiment, the sound source position is calculated in the two-dimensional plane. However, by arranging the microphones at the four vertices of the regular triangular pyramid, it is possible to know the sound source position in the three-dimensional space. It is possible. There are four equilateral triangles in a regular triangular pyramid, and a calculation as in the first embodiment can be performed on each of the triangles to determine a plane perpendicular to the triangle plane indicating the sound source direction on the plane of each triangle. Once a plane indicating the sound source direction is found for each equilateral triangle, the position of the sound source in the three-dimensional space can be known by finding the intersection of each plane.
[0093]
By the way, in the above-described first embodiment, only the case where the calculated sound source position information is used for controlling the camera has been described. However, the sound source position information can be used as index information of a video signal. A sound source position recording apparatus having such an index function will be described below as a second embodiment.
[0094]
FIG. 18 is an internal configuration diagram of the sound source position recording device. In this sound source position recording device, the structures of three microphones 1 to 3, a sound signal detector 110, a time difference measuring device 120, a sound source position calculator 130, and a reference signal generator 101 are the same as those of the first embodiment shown in FIG. Since the configuration is the same as that of the embodiment, the same reference numeral is assigned and the description is omitted. Although not shown in this figure, a camera control device 140 similar to that shown in FIG. 3 is connected to the sound source position calculator 130, and the camera control device 140 has a camera body 203 installed therein. The turntable 201 and the electric lens 202 are controlled.
[0095]
In this embodiment, the output of the sound source position calculator 130 is also input to the computer 151 together with the camera control device 140. The input device 152, the display device 153, the storage device 154, and the control device 155 are connected to the computer 151. The recording / reproducing device 156 is further connected to the control device 155.
[0096]
The computer 151 has a sound source position table and a time table. The sound source position information output by the sound source position calculator 130 is registered in the sound source position table, and the time when the information is received is registered in the time table. The details of the processing executed by the computer 151 and the details of the sound source position table and the time table will be described later.
[0097]
The input device 152 is used to input a signal indicating completion of sound source position calculation, comment data, and the like. The display device 153 is for displaying sound source position data, time data, and comment data. The storage device 154 stores sound source position data, time data, and comment data, and uses a hard disk device or the like.
[0098]
The control device 155 controls the recording / reproducing device 156. The recording / reproducing device 156 stores the video recording recorded by the first embodiment simultaneously with the recording of the sound source position. In the second embodiment, video recording digitally recorded on a hard disk is used. Unlike a normal tape recording, a hard disk can perform random access quickly, so that a video search operation can be completed in a short time. The user displays the sound source position data, the time data, and the comment data already stored in the storage device 154 on the display device 153, and creates a reproduction instruction here. For example, by playing back only through the moderator's remarks, the task of creating a meeting agenda list can be performed efficiently. The control device 155 instructs fast-forward or playback of the playback device based on the playback time list created by the computer. When the reproduction is started, the reproduction is continued until a fast forward instruction is given from the input device.
[0099]
FIG. 19 is a diagram illustrating an example of the sound source position table. The sound source position table 41 is a matrix of 2 columns and 36 rows, and an x coordinate value and a y coordinate value are stored in a first column and a second column, respectively. "Null" is input to this matrix as an initial value. The number of lines of 36 is based on the assumption that the angular resolution is about 5 degrees and the number of participants in the venue is at most this degree.
[0100]
Then, when new sound source position data not stored in the matrix is sent from the sound source position calculator 130, the coordinate values are added to the next row and stored. At this time, if the angle error is within 5 degrees compared to the already registered data and the distance error is within ± 25 cm in both the x and y directions, it is regarded as the same position and no additional processing is performed. The reason is that the speaker is not always fixed, even if he or she is in a sitting position, and may move the chair or shake his head.
[0101]
FIG. 20 is a diagram illustrating an example of the time table. The time table 42 is a matrix of 1024 rows and 36 columns in the initial state, and the index of each row corresponds to the index of each row of the sound source position table 41. In this matrix, "null" is input as an initial value in columns other than the first column, but "1" is stored in the first column as an initial value. This first column is used to store pointer information. That is, in this example, since three pieces of time data of the sound source P1 have already been recorded, 4 is set in the first line giving the pointer of P1. With this information, when there is next time data from the sound source P1, by referring to this pointer, it is possible to easily know at which address the time data should be written. When the entry of the time data is completed, the value of the pointer is incremented.
[0102]
In the expression format of the time, the elapsed time from the start of the measurement is measured in units of 0.1 second, and the value is represented by a 2-byte unsigned binary number. Therefore, the time expression after 1.3 seconds is “13”. Of course, it is sufficient to know the elapsed time from the start of the measurement, so another expression may be used. At this time, if the received sound source position is already registered in the sound source position table 41, a row index indicating the sound source position is obtained, and the time index of the already recorded time information of the corresponding column index of the time table 42 is obtained. Later, new time information will be added. If the received sound source position is not registered in the sound source position table 41, the sound source position is newly registered in the sound source position table 41 together with the time.
[0103]
The sound source position table 41 is a two-dimensional matrix having a size of 2 × 36, and is secured on a memory of the computer. In the time table 42, at the start of recording, a two-dimensional matrix of 36 × 1024 is secured in the memory of the computer, and when the time information at the same position becomes full, 1024 new rows are automatically added by the memory allocation function. Is done.
[0104]
FIG. 21 is a flowchart showing the processing procedure of sound source position recording. This is a process performed by the computer 151. Hereinafter, the processing procedure will be described along the step numbers.
[S1] Sound source position table T ₁ And time table T ₂ Is secured in the memory.
[S2] A position storage variable P and a time storage variable x are secured.
[S3] Sound source position table T ₁ And time table T ₂ And are initialized.
[S4] Start counting at time t.
[S5] It is determined whether there is an output from the sound source position calculator. If there is an output, the process proceeds to step S6, and if there is no output, this process is repeated.
[S6] The current time t is input to the time storage variable x.
[S7] The sound source position (x, y) is input to the position storage variable P.
[S8] The position storage variable P is already in the sound source position table T ₁ It is determined whether or not it has been registered. If registered, the process proceeds to step S10, and if not registered, the process proceeds to step S9.
[S9] Sound source position table T ₁ , The value of the position storage variable P is newly stored, and the process proceeds to step S10.
[S10] Sound source position table T ₁ , The row index k of the position storage variable P is obtained.
[S11] Time table T ₂ Of the pointer k stored in the first row of the column k of the ₂ The value of x is stored in the row a and the column k.
[S12] Time table T ₂ The pointer value a stored in the first row of column k is incremented.
[S13] It is determined whether the calculation of the sound source position is completed. If it is completed, the process proceeds to step S14, and if not, the process proceeds to step S5.
[S14] The sound source position table T is stored in the storage device. ₁ And time table T ₂ Save the data.
[S15] A comment is added if necessary, and the process ends.
[0105]
In this way, the information on the sound source position and the utterance time can be used as the index of the video.
[0106]
【The invention's effect】
As described above, in the sound source position measuring device of the present invention, three microphones are arranged at the vertices of a predetermined triangle, and the time difference between the sounds from the same sound source reaching the three microphones is measured. From the position of the microphone and the arrival time difference of the sound for each microphone pair, not only the direction of the sound source but also the position of the sound source can be calculated.
[0107]
Further, in the camera photographing control device of the present invention, the relative position of the sound source with respect to the camera is calculated from the position of each microphone, the arrival time difference of sound for each microphone pair, and the position of the camera. The camera can be controlled so that it faces the sound source even if the camera is in a different position.
[Brief description of the drawings]
FIG. 1 is a principle configuration diagram of a camera photographing control device of the present invention.
FIG. 2 is a layout diagram of the camera photographing control device according to the first embodiment.
FIG. 3 is an internal configuration diagram of the camera photographing control device according to the first embodiment.
FIG. 4 is a diagram showing an internal configuration of a sound signal detection device.
FIG. 5 is a diagram illustrating a change in a signal in a sound signal detector. (A) is a diagram showing a waveform of a signal input from a microphone, (B) is a diagram showing a waveform of a signal amplified by an amplifier, and (C) is a diagram showing a waveform of a power signal. (D) is a diagram showing the waveform of the signal after the level determination, and (E) is a diagram showing the waveform of the signal output from the differentiator.
FIG. 6 is a diagram showing an internal configuration of a time difference measuring device.
FIG. 7 is a diagram illustrating a square wave obtained from a trigger signal. (A) shows a trigger signal, and (B) shows a square wave.
FIG. 8 is a diagram showing an internal configuration of a delay unit.
FIG. 9 is a diagram showing a coordinate system of a microphone for calculating a sound source direction.
FIG. 10 is a graph showing a relational expression between a time difference between microphones and a sound source direction.
FIG. 11 is a diagram showing a measurement range of one microphone pair.
FIG. 12 is a diagram showing an angle-time difference information table.
FIG. 13 is a diagram illustrating a method for determining a sound source position from two pieces of time difference information.
FIG. 14 is a diagram illustrating a range of a sound source determined by a combination of microphone pairs and a sign of time difference data.
FIG. 15 is a diagram showing an internal configuration of a sound source position measuring device.
FIG. 16 is a diagram illustrating a relationship between a microphone coordinate system and a camera coordinate system.
FIG. 17 is a diagram showing an internal configuration of a camera control device.
FIG. 18 is an internal configuration diagram of a sound source position recording device.
FIG. 19 is a diagram illustrating an example of a sound source position table.
FIG. 20 is a diagram illustrating an example of a time table.
FIG. 21 is a flowchart showing a processing procedure of sound source position recording.
[Explanation of symbols]
1-3 microphone
4 Sound signal detection means
5 Time difference measuring means
6 Sound source position calculation means
7 Camera control means
8 Camera driving means
9 Camera body
10 Electric lens
11 Utterance information recording means
12 Video and audio recording and playback means
13 Playback instruction means

Claims

In a sound source position measuring device that calculates a sound source position based on sound information,
A microphone installed in the three vertices of the positive triangle,
Sound signal detection means for detecting, as sound detection time, the time at which the sound emitted by one sound source is captured by each of the microphones;
A time difference measuring unit configured to measure a sound arrival time difference of each of three microphone pairs obtained by combining two of the microphones based on a sound detection time of each of the microphones;
The range of ± 60 degrees from the line connecting the microphone pairs is defined as the sound source measurement range of each microphone pair. from the arrival time difference of the sound of each of the microphone pair and a sound source position calculation means for calculating a sound source position,
A sound source position measuring device comprising:

In a camera photographing device that performs photographing while controlling the direction of a camera installed on a turntable,
  Microphones installed at three vertices of an equilateral triangle,
  Sound signal detection means for detecting, as sound detection time, the time at which the sound emitted by one sound source is captured by each of the microphones;
  A time difference measuring means for measuring a difference in arrival time of sound for each of three microphone pairs obtained by combining two of the microphones based on the sound detection time for each microphone;
  The range of ± 60 degrees from the line connecting the microphone pairs is defined as the sound source measurement range of each microphone pair, the possible range of the sound source is limited by the sign of the arrival time difference of the sound captured by each microphone pair, and the position of each microphone Sound source position calculation means for calculating relative position information of a sound source with respect to the camera based on arrival time difference of sound and position information of the camera for each microphone pair,
  Camera control means for controlling the direction of the camera in the direction of the sound source, based on the position information of the sound source calculated by the sound source position calculation means,
  A camera photographing control device comprising:

In a camera photographing device that performs photographing while controlling the direction of a camera installed on a turntable,
  Microphones installed at three vertices of a triangle of a predetermined shape,
  Sound signal detection means for detecting, as sound detection time, the time at which the sound emitted by one sound source is captured by each of the microphones;
  A time difference measuring means for measuring a difference in arrival time of sound for each of three microphone pairs obtained by combining two of the microphones based on the sound detection time for each microphone;
  Sound source position calculating means for calculating relative position information of a sound source with respect to the camera, based on information on the position of each microphone, the arrival time difference of sound for each microphone pair and the position of the camera,
  Camera control means for controlling the direction of the camera in the direction of the sound source, based on the position information of the sound source calculated by the sound source position calculation means,
  A video / audio recording / reproducing unit for recording and reproducing the sound captured by the microphone and the video captured by the camera,
  Based on the calculation result of the sound source position calculation means, as an index of the video recorded by the video and audio recording and reproduction means, utterance information recording means for recording the utterance time for each sound source position,
  A camera photographing control device comprising: