JPH06351015A

JPH06351015A - Image pickup system for video conference system

Info

Publication number: JPH06351015A
Application number: JP5138537A
Authority: JP
Inventors: Shinzou Matsui; 紳造松井
Original assignee: Olympus Optical Co Ltd
Current assignee: Olympus Corp
Priority date: 1993-06-10
Filing date: 1993-06-10
Publication date: 1994-12-22

Abstract

PURPOSE:To automatically change an image pickup direction, viewing angle, and sound collecting direction in a real time corresponding to the movement of a speaker without fixing the position of the speaker, and to detect the direction of the speaker by using at least three microphones in order to suppress the increase of a cost. CONSTITUTION:A camera 1 and a high directivity sound collecting microphone are fixed to a stand 4 by a direction controller 3. Also, a speaker direction detecting part 5 which detects the direction of the speaker is fixed to the stand 4. Information obtained from the speaker direction detecting part 5 is supplied to a speaker position analyzing part 6, and the direction of the speaker is obtained. A system control part 7 controls a camera control part 8 in order to move the camera 1 and the high directivity sound collecting microphone 2 by the information searched by the speaker position analyzing part 6, and controls the movement of the direction controller 3 in order to set the direction of the camera 1 and the high directivity 2 sound collecting microphone 2.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は話者位置を検出して撮
像エリアを自動的に設定可能なテレビジョン会議システ
ム用の撮像システムに関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an image pickup system for a television conference system capable of automatically setting an image pickup area by detecting a speaker position.

【０００２】[0002]

【従来の技術】近年、テレビジョン（以下、テレビと略
記する）を用いた会議システムが普及してきている。例
えば、話者の数に対応したカメラを用意し、各カメラか
らの信号を切換えるテレビ会議システムや、１台の全景
を撮像するカメラと１台の人物撮像専用のカメラを用い
るテレビ会議システムが知られている（特開平４−３２
３９９０号公報参照）。特に、後者のテレビ会議システ
ムでは、人物撮像専用のカメラは、旋回台等によりその
撮像方向を変えることができ、リモートコントローラ
（以下リモコンと略記する）等によって人物撮像専用の
カメラを動かして話者方向に向けていた。2. Description of the Related Art In recent years, a conference system using a television (hereinafter abbreviated as a television) has become widespread. For example, a video conference system that prepares a camera corresponding to the number of speakers and switches signals from each camera, and a video conference system that uses a camera that captures one panoramic view and a camera dedicated to capturing one person are known. (Japanese Patent Laid-Open No. 4-32)
3990 gazette). Particularly in the latter video conference system, the camera for exclusive use of image capturing of a person can change its imaging direction by a swivel base or the like, and the camera for exclusive use of image capturing can be moved by a remote controller (hereinafter abbreviated as a remote controller) or the like. It was turning in the direction.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、話者の
数に対応したカメラを用意するシステムの場合、１人に
つき１台カメラを必要としているので、多人数による会
議等の場合、話者の数が多く、そのコストがかなり高い
ものとなっていた。更に、話者が話しながら移動するよ
うな場合、話者がカメラの撮像画角外に出てしまうとい
った不都合が生じる。However, in the case of a system in which a camera corresponding to the number of speakers is prepared, one camera is required for each person. However, the cost was quite high. Further, when the speaker moves while talking, there is an inconvenience that the speaker goes out of the imaging angle of view of the camera.

【０００４】また、リモコンを使用したテレビ会議シス
テムの場合、カメラの台数は少なくて済むものの、話者
が話しながら移動するような場合、いちいち移動する話
者にカメラを追従させなければならず、その操作が煩わ
しいものであった。加えて、話者が話している途中で他
の話者が話し始めたような場合、話者を適格に認識する
ことができず、リアルタイムに適切な映像及び音声情報
を得ることが困難なものであった。Further, in the case of a video conference system using a remote controller, although the number of cameras is small, when the speaker moves while talking, the camera must be made to follow the moving speaker one by one, The operation was troublesome. In addition, if another speaker starts speaking while the speaker is speaking, the speaker cannot be properly recognized and it is difficult to obtain appropriate video and audio information in real time. Met.

【０００５】この発明は上記課題に鑑みてなされたもの
で、話者の位置を固定化せずに話者の移動に対応して、
撮像方向、画角、集音方向をリアルタイムに自動的に変
えることができ、且つコストの上昇を抑えたテレビジョ
ン会議システム用の撮像システムを提供することを目的
とする。The present invention has been made in view of the above-mentioned problems, and corresponds to the movement of the speaker without fixing the position of the speaker,
An object of the present invention is to provide an image pickup system for a television conference system capable of automatically changing the image pickup direction, the angle of view, and the sound collection direction in real time and suppressing an increase in cost.

【０００６】[0006]

【課題を解決するための手段】すなわちこの発明は、複
数の異なる場所の間で映像情報及び音声情報等の情報を
伝送して会議等を行うテレビジョン会議システム用の撮
像システムに於いて、少なくとも一人の話者を撮像する
撮像手段と、この撮像手段で撮像されるべく話者が発す
る音声情報を検出する音声情報検出手段と、この音声情
報検出手段により検出された音声情報に基いて、該音声
の発生した方位情報を検出する方位情報検出手段と、こ
の方位情報検出手段により検出された方位情報に基い
て、上記撮像手段による撮像状態を制御する制御手段と
を具備することを特徴とする。That is, the present invention provides at least an imaging system for a television conference system for transmitting information such as video information and audio information between a plurality of different places to hold a conference. Based on the voice information detected by the voice information detected by the image pickup means for picking up one speaker, the voice information detected by the speaker to be picked up by the image pickup means, and the voice information detected by the voice information detection means. An azimuth information detecting means for detecting azimuth information in which sound is generated, and a control means for controlling an image pickup state by the image pickup means based on the azimuth information detected by the azimuth information detecting means. .

【０００７】[0007]

【作用】この発明は、複数の異なる場所の間で映像情報
及び音声情報等の情報を伝送して会議等を行うテレビジ
ョン会議システム用の撮像システムである。このシステ
ムに於いて、撮像手段が少なくとも一人の話者を撮像
し、この撮像手段で撮像されるべく話者が発する音声情
報が、音声情報検出手段で検出される。この音声情報検
出手段により検出された音声情報に基いて、方位情報検
出手段が該音声の発生された方位情報を検出する。する
と、この方位情報検出手段により検出された方位情報に
基いて、制御手段が上記撮像手段による撮像状態を制御
する。The present invention is an imaging system for a television conference system for transmitting information such as video information and audio information between a plurality of different places to hold a conference. In this system, the image pickup means picks up an image of at least one speaker, and the voice information emitted by the speaker to be picked up by the image pickup means is detected by the voice information detecting means. Based on the voice information detected by the voice information detecting means, the direction information detecting means detects the direction information in which the voice is generated. Then, based on the azimuth information detected by the azimuth information detecting means, the control means controls the image pickup state by the image pickup means.

【０００８】[0008]

【実施例】以下、図面を参照してこの発明の実施例を説
明する。図１は、この発明のテレビジョン会議システム
用の撮像システムの一実施例の構成を示す外略図であ
る。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is an outline diagram showing the configuration of an embodiment of an image pickup system for a television conference system according to the present invention.

【０００９】同図に於いて、カメラ１は、ズーム機能及
びオートフォーカス（ＡＦ）機能を有したカメラであ
る。このカメラ１には、高指向性集音マイクロホン（以
下マイクと略記する）２が取付けられている。上記カメ
ラ１は、また、方位制御装置３によってスタンド４に据
付けられているもので、その撮像方向及び集音方向が上
下方向（図示矢印Ａ方向）及び左右方向（図示矢印Ｂ方
向）に制御される。上記スタンド４には、話者の方位を
検出するための話者方位検出部５が取付けられている。In FIG. 1, a camera 1 is a camera having a zoom function and an autofocus (AF) function. A high directional sound collecting microphone (hereinafter abbreviated as a microphone) 2 is attached to the camera 1. The camera 1 is also installed on the stand 4 by the azimuth control device 3, and its imaging direction and sound collection direction are controlled in the vertical direction (arrow A direction in the drawing) and the left and right direction (arrow B direction in the drawing). It A speaker orientation detecting section 5 for detecting the orientation of the speaker is attached to the stand 4.

【００１０】上記話者方位検出部５から得られた情報
は、話者位置解析部６に供給され、ここで種々の解析が
なされる。この話者位置解析部６では、後述するよう
に、上記カメラ１から話者への方位が求められる。シス
テム制御部７は、話者位置解析部６で求められた情報に
より、カメラ１及び高指向性集音マイク２を移動させる
べくカメラ制御部８を制御すると共に、カメラ１及び高
指向性集音マイク２の方位を設定すべく方位制御装置３
の動きを制御する。尚、カメラ制御部８は、カメラ１の
ズーム機能、ＡＦ機能等の動きを制御するものである。The information obtained from the speaker orientation detecting section 5 is supplied to the speaker position analyzing section 6 where various analyzes are performed. The speaker position analysis unit 6 obtains the direction from the camera 1 to the speaker, as described later. The system control unit 7 controls the camera control unit 8 to move the camera 1 and the high-directional sound collection microphone 2 based on the information obtained by the speaker position analysis unit 6, and also controls the camera 1 and the high-directivity sound collection. Direction control device 3 for setting the direction of microphone 2
Control the movement of. The camera control unit 8 controls movements of the camera 1, such as a zoom function and an AF function.

【００１１】このようにして、撮像システムは構成され
ている。そして、例えばテーブル９の近傍には、互いに
適切な間隔で複数の話者１０₁、１０₂ 、１０₃ 、１０₄
、…、１０_n が配置されている。The image pickup system is constructed in this manner. Then, for example, in the vicinity of the table 9, a plurality of speakers 10 ₁ , 10 ₂ , 10 ₃ , 10 ₄ are arranged at appropriate intervals.
, ... 10 _n are arranged.

【００１２】ここで、図２及び図３を参照して、話者方
位検出について説明する。先ず、図２に於いて、マイク
１１ａ及び１１ｂを使用した場合の話者の方位検出につ
いて説明する。いま、話者１０ａから音声が発せられた
とすると、２つのマイク１１ａ及び１１ｂに、その音声
が集音される。このとき、マイク１１ａ及び１１ｂに入
射される音声の時間のずれを検出することにより、話者
１０ａの角度（入射角度θ）が求められる。Now, with reference to FIGS. 2 and 3, the detection of the speaker direction will be described. First, referring to FIG. 2, the detection of the direction of the speaker when the microphones 11a and 11b are used will be described. Now, assuming that a voice is emitted from the speaker 10a, the voice is collected by the two microphones 11a and 11b. At this time, the angle (incident angle θ) of the speaker 10a is obtained by detecting the time difference between the voices incident on the microphones 11a and 11b.

【００１３】ところが、２つのマイク１１ａ及び１１ｂ
だけでは、マイク１１ａ、１１ｂ間を結ぶ線に対して入
射角度θの音声は、図２の斜線部で表される円錐上の側
面１２上からの音声全てが同じ入射角度として認識され
てしまう。このため、話者の３次元上での移動（上下方
向及び左右方向両方の移動）等には対応することができ
ない。However, the two microphones 11a and 11b
Only by itself, the sound having the incident angle θ with respect to the line connecting the microphones 11a and 11b is recognized as the same incident angle from all the sounds from the side surface 12 on the cone represented by the hatched portion in FIG. Therefore, it is not possible to deal with the movement of the speaker in three dimensions (movement in both the vertical direction and the horizontal direction).

【００１４】このため、図３に示されるように、マイク
１１ａ、１１ｂに加えてマイク１１を設けて、話者の方
位を特定することができるようにしている。すなわち、
話者１０ａから音声が発せられた場合、上記したよう
に、マイク１１ａ、１１ｂを結ぶ線に対して底面１２ａ
を有する円錐が考えられる。また、同時に、マイク１１
ａ、１１ｃを結ぶ線に対して底面１２ｂを有する円錐が
考えられる。これにより、底面１２ａ、１２ｂを有する
２つの円錐から、それぞれの円錐上の側面上で交わる点
が、音声を発した話者１０ａの位置となる。こうして、
話者の位置、すなわち方位を特定することができる。Therefore, as shown in FIG. 3, the microphone 11 is provided in addition to the microphones 11a and 11b so that the orientation of the speaker can be specified. That is,
When a voice is emitted from the speaker 10a, as described above, the bottom surface 12a is connected to the line connecting the microphones 11a and 11b.
A cone with is conceivable. At the same time, the microphone 11
A cone having a bottom surface 12b with respect to the line connecting a and 11c is conceivable. As a result, the point where the two cones having the bottom surfaces 12a and 12b intersect on the side surfaces on the respective cones is the position of the speaker 10a who uttered the voice. Thus
The position of the speaker, that is, the direction can be specified.

【００１５】図４は、話者方位検出部５の具体的な構成
例を示したもので、（ａ）は話者方向から見た正面図、
（ｂ）は斜視図である。同図に於いて、基準となるマイ
ク１３ａから所定距離Ｌ_ab、Ｌ_acをおいてマイク１３
ｂ、１３ｃが、図示の如く配置されている。すなわち、
マイク１３ｂは、各話者とマイク１３ａとを結ぶ線上に
ない位置に配置される。そして、マイク１３ｃは、マイ
ク１３ａ、１３ｃを結ぶ線に対してマイク１３ａから垂
直方向で、且つ各話者とマイク１３ａとを結ぶ線上にな
い位置に配置されている。上記所定距離Ｌ_ab、Ｌ_acは、
音声周波数によって決められるもので、例えば５〜１０
ｃｍである。FIG. 4 shows a specific example of the configuration of the speaker orientation detecting section 5, in which (a) is a front view seen from the speaker direction,
(B) is a perspective view. In the figure, the microphone 13 is _placed at predetermined distances L _ab and L _ac from the reference microphone 13 a.
b and 13c are arranged as shown. That is,
The microphone 13b is arranged at a position not on the line connecting each speaker and the microphone 13a. The microphone 13c is arranged in a direction perpendicular to the line connecting the microphones 13a and 13c and not on the line connecting each speaker and the microphone 13a. The predetermined distances L _ab and L _ac are
It is determined by the audio frequency, for example 5-10
cm.

【００１６】また、上記マイク１３ａ、１３ｂ、１３ｃ
には、吸音材１４が取付けられている。この吸音材１４
は、このシステムが設けられる部屋の壁等からの反射音
や、エアコンディショナの音、話者の音声に対してその
反対側から入射する音、及び話者の移動しないエリアか
らの音を吸音して、検出誤差を少なくするためのもので
ある。The microphones 13a, 13b, 13c are also provided.
A sound absorbing material 14 is attached to the. This sound absorbing material 14
Absorbs the sound reflected from the wall of the room where this system is installed, the sound of the air conditioner, the sound incident from the opposite side to the speaker's sound, and the sound from the area where the speaker does not move. Then, the detection error is reduced.

【００１７】次に、図５のブロック図を参照して、話者
方位検出部５及び話者位置解析部６の詳細な構成を説明
する。話者方位検出部５は、上述したマイク１３ａ、１
３ｂ、１３ｃと、これらそれぞれのマイク１３ａ、１３
ｂ、１３ｃで集音された音声の信号を増幅して話者位置
解析部６に出力する増幅器１５ａ、１５ｂ、１５ｃで構
成されている。Next, with reference to the block diagram of FIG. 5, detailed configurations of the speaker azimuth detecting unit 5 and the speaker position analyzing unit 6 will be described. The speaker azimuth detecting unit 5 includes the microphones 13a and 1 described above.
3b, 13c and their respective microphones 13a, 13c
It is composed of amplifiers 15a, 15b and 15c for amplifying the voice signals collected by b and 13c and outputting them to the speaker position analyzing section 6.

【００１８】話者位置解析部６は、上記増幅器１５ａ、
１５ｂ、１５ｃの出力の特定周波数を抽出するバンドパ
スフィルタ（ＢＰＦ）１６ａ、１６ｂ、１６ｃと、この
バンドパスフィルタ１６ａ、１６ｂ、１６ｃで抽出され
た出力をＡ／Ｄ変化するＡ／Ｄコンバータ１７ａ、１７
ｂ、１７ｃと、話者方位解析部１８ａ、１８ｂと、話者
方位演算部１９とより構成される。話者方位解析部１８
ａ、１８ｂは、それぞれマイク１３ａ、１３ｂと、マイ
ク１３ａ、１３ｃからの音声信号と、音速及びマイク間
距離（Ｌ_ab、Ｌ_ac）に基いて、上述したような原理によ
り、話者の方向を求める。そして、話者方位解析部１８
ａと、１８ｂの解析結果から、図４に示されたような２
つの円錐の交点が、話者方位演算部１９により求められ
る。これにより、話者の方向が決定される。The speaker position analysis unit 6 includes the amplifier 15a,
Bandpass filters (BPF) 16a, 16b, 16c for extracting specific frequencies of the outputs of 15b, 15c, and an A / D converter 17a for A / D changing the outputs extracted by the bandpass filters 16a, 16b, 16c, 17
b, 17c, speaker orientation analysis units 18a, 18b, and a speaker orientation calculation unit 19. Speaker direction analysis unit 18
a and 18b indicate the direction of the speaker based on the above-described principle based on the sound signals from the microphones 13a and 13b and the audio signals from the microphones 13a and 13c, the sound velocity and the inter-microphone distances (L _ab and L _ac ), respectively. Ask. Then, the speaker orientation analysis unit 18
From the analysis results of a and 18b, 2 as shown in FIG.
The intersection of the two cones is obtained by the speaker orientation calculator 19. This determines the direction of the speaker.

【００１９】図６は、話者方位解析部１８ａの詳細なブ
ロック構成図である。この場合、話者方位解析部１８ｂ
の構成は、入力される信号がマイク１３ｂ及び１３ｃか
らのものに代わるだけで、その他の構成は話者方位解析
部１８ａと同じであるので、ここでは説明は省略する。FIG. 6 is a detailed block diagram of the speaker orientation analysis unit 18a. In this case, the speaker orientation analysis unit 18b
The configuration is similar to that of the speaker azimuth analyzing unit 18a except that the input signal is changed from that of the microphones 13b and 13c, and the description thereof is omitted here.

【００２０】話者方位解析部１８ａに於いて、マイク１
３ａより入力される基準となる音声信号は、高速フーリ
エ変換（ＦＦＴ）演算部２０ａで高速フーリエ変換され
る。同様に、ＦＦＴ演算部２０ｂでは、マイク１３ｂよ
り入力される音声信号がＦＦＴ変換される。そして、こ
れらＦＦＴ演算部２０ａ及び２０ｂでＦＦＴ演算された
結果は、クロススペクトル演算部２１にてクロススペク
トルが演算される。ここで、２つの音声信号のデータを
基に、各周波数に於ける位相を検出するための処理がな
される。In the speaker direction analysis unit 18a, the microphone 1
The reference voice signal input from 3a is subjected to fast Fourier transform in the fast Fourier transform (FFT) calculation unit 20a. Similarly, in the FFT calculation unit 20b, the audio signal input from the microphone 13b is FFT-converted. The cross spectrum calculation unit 21 calculates the cross spectrum of the result of the FFT calculation performed by the FFT calculation units 20a and 20b. Here, processing for detecting the phase at each frequency is performed based on the data of the two audio signals.

【００２１】そして、位相計算部２２では、各周波数に
於ける複素数の値により、位相が算出される。その周波
数の値と、この算出された位相値と、上記２つのマイク
１３ａ、１３ｂ間の距離と音速から、入射角が入射角演
算部２３で求められる。こうして求められた入射角か
ら、平均値演算部２４にて各入射角の平均値θave が求
められる。Then, the phase calculator 22 calculates the phase from the value of the complex number at each frequency. The incident angle is calculated by the incident angle calculation unit 23 from the value of the frequency, the calculated phase value, the distance between the two microphones 13a and 13b and the sound velocity. From the incident angles thus obtained, the average value calculator 24 obtains the average value θave of each incident angle.

【００２２】この入射角の平均値が、それぞれの話者方
位解析部１８ａ及び１８ｂから入力されて、話者方位演
算部１９で話者の方向が決定される。そして、ここで決
定された話者の方向に向けて、システム制御部７が方位
制御装置３を制御して、カメラ１及び高指向性集音マイ
ク２を作動させる。The average value of the incident angles is input from the respective speaker orientation analysis units 18a and 18b, and the speaker orientation calculation unit 19 determines the direction of the speaker. Then, the system control unit 7 controls the azimuth control device 3 in the direction of the speaker determined here to operate the camera 1 and the high-directional sound collecting microphone 2.

【００２３】このように、第１の実施例によれば、クロ
ススペクトルの位相成分から各周波数に於いての遅延時
間を計算し、各々入射角を計算してそれを平均化してい
るので、精度良く入射角を求めることができる。As described above, according to the first embodiment, the delay time at each frequency is calculated from the phase component of the cross spectrum, the respective incident angles are calculated, and the averaged values are calculated. The incident angle can be obtained well.

【００２４】また、吸音材を使用することにより、会議
室の壁等の反射音や、エアコンディショナ等の不必要な
音を除去することができ、音声の誤認識を低減すること
ができる。Further, by using the sound absorbing material, it is possible to remove the reflected sound from the wall of the conference room and the unnecessary sound such as the air conditioner, and reduce the erroneous recognition of voice.

【００２５】次に、この発明の第２の実施例について説
明する。尚、以下に述べる実施例に於いて、同一の構成
要素には同一の参照番号を付して、重複を避けるためそ
の説明は省略するものとする。Next, a second embodiment of the present invention will be described. In the embodiments described below, the same components are designated by the same reference numerals, and the description thereof will be omitted to avoid duplication.

【００２６】上述した第１の実施例では、話者方位検出
部を１つ用いて話者の方向を特定していたが、この第２
の実施例は、話者方位検出部を２つ用いて、話者の方向
及び位置を特定しようとするものである。In the above-described first embodiment, the direction of the speaker is specified by using one speaker direction detecting section, but this second direction is used.
In this embodiment, two speaker orientation detectors are used to identify the direction and position of the speaker.

【００２７】図７及び図８を参照すると、第１の話者方
位検出手段としての話者方位検出部５と所定間隔をおい
て、第２の話者方位検出手段として話者方位検出部２５
が設置されている。この話者方位検出部２５は、話者方
位検出部５によって求められた方向と組合わせて、話者
の位置または話者までの距離を検出するためのものであ
る。この話者方位検出部２５の検出結果は、話者方位検
出部５の検出結果と共に、話者位置解析ユニット２６に
供給される。With reference to FIGS. 7 and 8, a speaker orientation detecting section 25 as a second speaker orientation detecting means is provided at a predetermined interval from the speaker orientation detecting section 5 as a first speaker orientation detecting means.
Is installed. The speaker orientation detecting unit 25 is for detecting the position of the speaker or the distance to the speaker in combination with the direction obtained by the speaker orientation detecting unit 5. The detection result of the speaker orientation detecting unit 25 is supplied to the speaker position analyzing unit 26 together with the detection result of the speaker orientation detecting unit 5.

【００２８】話者方位検出部２５は、図９に示されるよ
うに、マイク１３ｄ、１３ｅが所定距離Ｌ_deをおいて配
置されている。そして、これらのマイク１３ｄ、１３ｅ
には、吸音材１４が取付けられている。As shown in FIG. 9, in the speaker direction detecting section 25, microphones 13d and 13e are arranged at a predetermined distance L _de . And these microphones 13d and 13e
A sound absorbing material 14 is attached to the.

【００２９】図１０は、この第２の実施例によるシステ
ムの話者方位検出部５、２５及び話者位置解析ユニット
２６の詳細な構成を説明するブロック図である。尚、話
者方位検出部５及び話者位置解析部６については上述し
たので、説明は省略する。FIG. 10 is a block diagram for explaining the detailed construction of the speaker azimuth detecting units 5 and 25 and the speaker position analyzing unit 26 of the system according to the second embodiment. Since the speaker azimuth detecting unit 5 and the speaker position analyzing unit 6 have been described above, the description thereof will be omitted.

【００３０】話者方位検出部２５は、上述したマイク１
３ｄ、１３ｅと、これらそれぞれのマイク１３ｄ、１３
ｅで集音された音声の信号を増幅して話者位置解析部２
７に出力する増幅器１５ｄ、１５ｅで構成されている。The speaker azimuth detecting unit 25 is the microphone 1 described above.
3d and 13e and their respective microphones 13d and 13
Speaker position analysis unit 2 by amplifying the signal of the voice collected by e
It is composed of amplifiers 15d and 15e for outputting to the No. 7.

【００３１】話者位置解析部２７は、上記増幅器１５
ｄ、１５ｅの出力の特定周波数を抽出するバンドパスフ
ィルタ１６ｄ、１６ｅと、このバンドパスフィルタ１６
ｄ、１６ｅで抽出された出力をＡ／Ｄ変化するＡ／Ｄコ
ンバータ１７ｄ、１７ｅと、話者方位解析部１８ｃによ
り構成される。話者方位解析部１８ｃは、マイク１３
ｄ、１３ｅと、１３ｄからの音声信号と、音速及びマイ
ク間距離（Ｌ_de）に基いて、上述したような原理によ
り、話者の方向を求める。The speaker position analysis unit 27 is provided with the amplifier 15 described above.
bandpass filters 16d and 16e for extracting a specific frequency of the outputs of d and 15e, and this bandpass filter 16
It is composed of A / D converters 17d and 17e for A / D changing the outputs extracted by d and 16e, and a speaker orientation analysis unit 18c. The speaker direction analysis unit 18c uses the microphone 13
Based on the sound signals from d, 13e, and 13d, the speed of sound, and the distance between microphones (L _de ), the direction of the speaker is obtained by the above-described principle.

【００３２】そして、話者方位解析部１８ｃの解析結果
と、話者位置解析部６の話者方位演算部１９の演算結果
が、話者位置演算部２８に入力される。この話者位置演
算部２８では、話者位置解析部６と話者位置解析部２７
から、話者の方向と共に、話者とカメラ１との距離を演
算して、その結果をシステム制御部７へ出力する。これ
により求められた話者の方向及び話者とカメラ１との距
離に従って、システム制御部７が方位制御装置３を制御
して話者の方向を特定する。それと共に、カメラ制御部
８を介して、カメラ１のズーム機能、ＡＦ機能が制御さ
れて、カメラの画角が決定されて撮像がなされる。Then, the analysis result of the speaker direction analysis unit 18c and the calculation result of the speaker direction calculation unit 19 of the speaker position analysis unit 6 are input to the speaker position calculation unit 28. The speaker position calculation unit 28 includes a speaker position analysis unit 6 and a speaker position analysis unit 27.
Then, the distance between the speaker and the camera 1 is calculated together with the direction of the speaker, and the result is output to the system control unit 7. According to the direction of the speaker and the distance between the speaker and the camera 1 thus obtained, the system control unit 7 controls the azimuth control device 3 to specify the direction of the speaker. At the same time, the zoom function and the AF function of the camera 1 are controlled via the camera control unit 8, the angle of view of the camera is determined, and imaging is performed.

【００３３】このように、２系統の話者方位検出部を設
けることにより、話者の方向及び位置を検出することが
でき、これによってカメラの撮像エリアを自動的に設定
することができる。例えば、図７に於いて、初めに話者
１０₁ が音声を発していたとすると、システム制御部７
によってカメラ１は話者１０₁ のみを撮像する画角で動
作する。そして、続けて新たに他の話者（例えば話者１
０₄ ）から音声が発せられたならば、システム制御部７
は方位制御装置３によるパンニングが間に合わないと判
断した場合、画角を話者１０₁ から全ての話者が撮像さ
れる全景に切替えるように、カメラ制御部８を制御す
る。その後、特定の話者１０₄ のみを撮像する画角でカ
メラ１が動作するようにする。尚、何れの話者も音声を
発しない場合は、全景を捕らえる画角で撮像するように
すれば良い。As described above, the speaker direction and position can be detected by providing the two-speaker direction detecting units, and thus the image pickup area of the camera can be automatically set. For example, in FIG. 7, assuming that the speaker 10 ₁ is making a voice at _first , the system control unit 7
Thus, the camera 1 operates at an angle of view that captures only the speaker 10 ₁ . Then, another new speaker (for example, speaker 1
0 ₄ ), the system control unit 7
When it is determined that the panning by the azimuth control device 3 is not in time, the camera controller 8 controls the camera angle so that the angle of view is switched from the speaker 10 ₁ to the full view in which all the speakers are imaged. After that, the camera 1 is made to operate at an angle of view that captures only the specific speaker 10 ₄ . If no speaker produces a voice, the image may be captured at an angle of view that captures the entire view.

【００３４】次に、この発明の第３の実施例について説
明する。図１１は、話者方位解析部の他の例を示したブ
ロック構成図である。尚、ここでは図６の話者方位解析
部１８ｄとしているが、話者方位解析部１８ａ〜１８ｃ
についても同様であるので、ここでの説明は省略する。Next, a third embodiment of the present invention will be described. FIG. 11 is a block diagram showing another example of the speaker orientation analysis unit. Although the speaker orientation analysis unit 18d in FIG. 6 is used here, the speaker orientation analysis units 18a to 18c are used.
Since the same applies to, the description thereof will be omitted here.

【００３５】話者方位解析部１８ｄに於いて、マイク１
３ａ及び１３ｂより入力される音声信号は、ＦＦＴ演算
部２０ａ及び２０ｂでＦＦＴ演算される。そして、これ
らＦＦＴ演算部２０ａ及び２０ｂでＦＦＴ演算された結
果は、クロススペクトル演算部２１にてクロススペクト
ルが演算されて、２つの音声信号のデータを基に、各周
波数に於ける位相を検出するための処理がなされる。In the speaker direction analysis unit 18d, the microphone 1
The audio signals input from 3a and 13b are FFT-calculated by the FFT calculators 20a and 20b. Then, a cross spectrum of the result of the FFT calculation performed by the FFT calculation units 20a and 20b is calculated by the cross spectrum calculation unit 21, and the phase at each frequency is detected based on the data of the two audio signals. Processing is performed.

【００３６】そして、位相計算部２２では、各周波数に
於ける複素数の値により、位相が算出される。また、パ
ワー演算部２９では、上記クロススペクトルのパワース
ペクトルが演算される。この算出された位相値と、その
周波数の値と上記２つのマイク１３ａ、１３ｂ間の距離
と音速から、入射角が入射角演算部２３で求められる。
そして、周波数成分選択部３０にて、上記パワースペク
トルを誤検出の要因となる音声パワーの低い周波数成分
を排除するために、各パワー値に対する所定の閾値との
比較による周波数成分が選択される。こうして求められ
た上記周波数成分に於いて、入射角の平均値θave が平
均値演算部２４で求められる。Then, the phase calculator 22 calculates the phase from the value of the complex number at each frequency. The power calculator 29 calculates the power spectrum of the cross spectrum. From the calculated phase value, the value of the frequency, the distance between the two microphones 13a and 13b, and the speed of sound, the incident angle is calculated by the incident angle calculator 23.
Then, the frequency component selection unit 30 selects a frequency component by comparing each power value with a predetermined threshold value in order to eliminate a frequency component with low voice power that causes a false detection of the power spectrum. The average value θave of the incident angles of the frequency components thus obtained is obtained by the average value calculator 24.

【００３７】このように、第３の実施例によれば、周波
数成分に於いては音声成分のレベルが低い場合に、回路
ノイズや外乱ノイズにより位相誤差を生じてしまうの
で、上記各周波数に於ける入射角を上記クロススペクト
ルのパワースペクトルを計算し、その結果を閾値と比較
して、ある一定信号レベル以下の周波数成分を除去して
いる。このため、精度良く入射角を求めることができ
る。As described above, according to the third embodiment, when the level of the voice component is low in the frequency component, a phase error occurs due to the circuit noise or the disturbance noise. The power spectrum of the cross spectrum is calculated for the angle of incidence, and the result is compared with a threshold to remove frequency components below a certain constant signal level. Therefore, the incident angle can be accurately obtained.

【００３８】図１２は、話者方位解析部の更に他の例を
示したブロック構成図である。尚、ここでは図６の話者
方位解析部１８ｄとしているが、話者方位解析部１８ａ
〜１８ｃについても同様であるので、ここでの説明は省
略する。FIG. 12 is a block diagram showing still another example of the speaker orientation analysis unit. Although the speaker orientation analysis unit 18d in FIG. 6 is used here, the speaker orientation analysis unit 18a is also used.
Since the same applies to ~ 18c, description thereof is omitted here.

【００３９】話者方位解析部１８ｅに於いて、ＦＦＴ演
算部２０ａ及び２０ｂでＦＦＴ演算された結果は、クロ
ススペクトル演算部２１にてクロススペクトルが演算さ
れて、２つの音声信号のデータを基に、各周波数に於け
る位相を検出するための処理がなされる。そして、位相
計算部２２では、各周波数に於ける複素数の値により、
位相が算出される。また、パワー演算部２９では、上記
クロススペクトルのパワースペクトルが演算される。こ
の算出された位相値と、その周波数の値と上記２つのマ
イク１３ａ、１３ｂ間の距離と音速から、入射角が入射
角演算部２３で求められる。そして、第１の周波数成分
選択手段としての周波数成分選択部３０にて、各パワー
値に対する所定の閾値との比較による周波数成分が選択
される。In the speaker direction analysis unit 18e, the result of the FFT operation performed by the FFT operation units 20a and 20b is the cross spectrum operation performed by the cross spectrum operation unit 21 and the data of the two audio signals are used as the basis. , Processing for detecting the phase at each frequency is performed. Then, in the phase calculation unit 22, according to the value of the complex number at each frequency,
The phase is calculated. The power calculator 29 calculates the power spectrum of the cross spectrum. From the calculated phase value, the value of the frequency, the distance between the two microphones 13a and 13b, and the speed of sound, the incident angle is calculated by the incident angle calculator 23. Then, the frequency component selection unit 30 as the first frequency component selection means selects a frequency component by comparing each power value with a predetermined threshold value.

【００４０】この周波数成分選択部３０により選択され
た周波数成分の入射角の平均と分散が、ヒストグラム等
により統計計算部３１で求められる。そして、この統計
計算により、上記平均値に対して、ある一定の分散内
（例えば２σ）に入る周波数成分が、第２の周波数成分
選択手段としての周波数成分選択部３２で選択される。
この後、該周波数成分に於いて、入射角の平均値θave
が平均値演算部２４で求められる。The statistical calculation unit 31 obtains the average and variance of the incident angles of the frequency components selected by the frequency component selection unit 30 using a histogram or the like. By this statistical calculation, the frequency component that falls within a certain variance (for example, 2σ) with respect to the average value is selected by the frequency component selection unit 32 as the second frequency component selection means.
After this, in the frequency component, the average value of the incident angle θave
Is calculated by the average value calculator 24.

【００４１】このように、第４の実施例によれば、各周
波数に於ける入射角を統計処理して平均値から大きく外
れている周波数成分の要素を除去することで、例えばエ
アコンディショナ等の話者以外から発生する音の影響に
よる誤検出を低減することができ、精度良く入射角を求
めることができる。As described above, according to the fourth embodiment, the incident angle at each frequency is statistically processed to remove the element of the frequency component that is largely deviated from the average value, for example, an air conditioner or the like. It is possible to reduce erroneous detection due to the influence of a sound generated from a person other than the speaker, and it is possible to accurately determine the incident angle.

【００４２】[0042]

【発明の効果】以上のようにこの発明によれば、話者の
位置を固定化せずに話者の移動に対応して、撮像方向、
画角、集音方向をリアルタイムに自動的に変えることが
でき、且つコストの上昇を抑えたテレビジョン会議シス
テム用の撮像システムを提供することができる。As described above, according to the present invention, the imaging direction is changed in accordance with the movement of the speaker without fixing the position of the speaker.
It is possible to provide an image pickup system for a television conference system capable of automatically changing the angle of view and the sound collecting direction in real time and suppressing an increase in cost.

[Brief description of drawings]

【図１】この発明のテレビジョン会議システム用の撮像
システムの一実施例の構成を示す外略図である。FIG. 1 is an outline diagram showing the configuration of an embodiment of an image pickup system for a television conference system according to the present invention.

【図２】マイク１１ａ及び１１ｂを使用した場合の話者
の方位検出について説明する原理図である。FIG. 2 is a principle diagram illustrating detection of a speaker's direction when using microphones 11a and 11b.

【図３】マイク１１ａ、１１ｂ及び１１ｃを使用した場
合の話者の方位検出について説明する図である。FIG. 3 is a diagram for explaining direction detection of a speaker when using microphones 11a, 11b and 11c.

【図４】話者方位検出部５の具体的な構成例を示したも
ので、（ａ）は話者方向から見た正面図、（ｂ）は斜視
図である。4A and 4B show a specific configuration example of a speaker azimuth detecting section 5, where FIG. 4A is a front view seen from the speaker direction, and FIG. 4B is a perspective view.

【図５】話者方位検出部５及び話者位置解析部６の詳細
を示すブロック構成図である。5 is a block configuration diagram showing details of a speaker orientation detection unit 5 and a speaker position analysis unit 6. FIG.

【図６】図５の話者方位解析部１８ａの詳細を示すブロ
ック構成図である。6 is a block configuration diagram showing details of a speaker orientation analysis unit 18a in FIG.

【図７】この発明のテレビジョン会議システム用の撮像
システムの第２の実施例の構成を示す外略図である。FIG. 7 is an outline diagram showing a configuration of a second embodiment of the image pickup system for the television conference system of the present invention.

【図８】図７の話者方位検出部５と話者方位検出部２５
の配置を示す図である。FIG. 8 is a speaker orientation detection unit 5 and a speaker orientation detection unit 25 of FIG.
It is a figure which shows arrangement | positioning.

【図９】話者方位検出部２５の具体的な構成例を示す図
である。FIG. 9 is a diagram showing a specific configuration example of a speaker orientation detection unit 25.

【図１０】この発明の第２の実施例によるシステムの話
者方位検出部５、２５及び話者位置解析ユニット２６の
詳細を示すブロック構成図である。FIG. 10 is a block configuration diagram showing details of speaker azimuth detecting units 5 and 25 and a speaker position analyzing unit 26 of a system according to a second embodiment of the present invention.

【図１１】この発明の第３の実施例による話者方位解析
部の他の例を示したブロック構成図である。FIG. 11 is a block diagram showing another example of the speaker direction analysis unit according to the third embodiment of the present invention.

【図１２】この発明の第４の実施例による話者方位解析
部の更に他の例を示したブロック構成図である。FIG. 12 is a block configuration diagram showing still another example of the speaker orientation analysis unit according to the fourth embodiment of the present invention.

[Explanation of symbols]

１…カメラ、２…高指向性集音マイクロホン、３…方位
制御装置、４…スタンド、５…話者方位検出部、６…話
者位置解析部、７…システム制御部、８…カメラ制御
部、９…テーブル、１０₁ 、１０₂ 、１０₃ 、１０₄ 、
…、１０_n 、１０ａ、１０ｂ、１０ｃ…話者、１１ａ〜
１１ｃ、１３ａ〜１３ｅ…マイク、１４…吸音材。DESCRIPTION OF SYMBOLS 1 ... Camera, 2 ... High directivity sound collection microphone, 3 ... Direction control device, 4 ... Stand, 5 ... Speaker direction detection part, 6 ... Speaker position analysis part, 7 ... System control part, 8 ... Camera control part , 9 ... Table, 10 ₁ , 10 ₂ , 10 ₃ , 10 ₄ ,
... 10 _n , 10a, 10b, 10c ... speaker, 11a-
11c, 13a to 13e ... Microphone, 14 ... Sound absorbing material.

Claims

[Claims]

1. An image pickup system for a television conference system for transmitting information such as video information and audio information between a plurality of different places to hold a conference, and an image pickup means for picking up an image of at least one speaker. And voice information detecting means for detecting voice information emitted by the speaker to be captured by the image capturing means, and direction information in which the voice is generated is detected based on the voice information detected by the voice information detecting means. An image pickup system for a television conference system, comprising: azimuth information detection means; and control means for controlling an image pickup state by the image pickup means based on the azimuth information detected by the azimuth information detection means.