JPH0785581B2

JPH0785581B2 - Camera control device for speaker shooting of video conference system

Info

Publication number: JPH0785581B2
Application number: JP62318006A
Authority: JP
Inventors: 裕明名取; 隆策今井; 雄治吉田; 敏弘本間; 均佐藤; 均石黒; 庸市芦田; 政数山口
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1987-12-15
Filing date: 1987-12-15
Publication date: 1995-09-13
Anticipated expiration: 2010-09-13
Also published as: JPH01158880A

Description

【発明の詳細な説明】［目次］概要産業上の利用分野従来の技術発明が解決しようとする問題点問題点を解決するための手段作用実施例発明の効果［概要］本発明は、テレビ会議で現在話者の出席者を自動検出してその出席
者がテレビ撮影されるように出席者撮影用のカメラを駆
動する制御装置に関するものであり、現在の話者を常に確実に自動検出することが可能となる
装置の提供を目的とし、このため、出席者毎に設けられたマイクロホンの入力時
間を過去約４秒に亘って保持し、その間に約２秒に設定
された時間以上の通算入力が確認されたマイクロホンの
出席者を現在の話者とした検出を行なう、ことを特徴と
している。DETAILED DESCRIPTION [Table of Contents] Outline Industrial field of use Conventional technology Problems to be solved by the invention Means for solving the problems Action Example Effect of the invention [Overview] The present invention is a video conference. It relates to a control device that automatically detects the attendee of the current speaker and drives the camera for shooting the attendee so that the attendee can be filmed on TV. The purpose of this is to provide a device capable of enabling the input time of the microphone provided for each attendee over the past approximately 4 seconds, and during that time, the total input time of more than the time set to approximately 2 seconds The feature is that detection is made with the attendee of the confirmed microphone as the current speaker.

［産業上の利用分野］本発明は、テレビ会議で現在の話者となっている出席者
を自動的に特定してその話者がテレビ撮影されるように
出席者撮影用のカメラを駆動する制御装置に関するもの
である。INDUSTRIAL APPLICABILITY The present invention automatically identifies an attendee who is the current speaker in a video conference and drives a camera for shooting the attendee so that the speaker is filmed on television. The present invention relates to a control device.

テレビ会議では、会議室の全景を撮影するカメラのほか
の別のカメラが設けられる。In the video conference, another camera is provided in addition to the camera for taking a panoramic view of the conference room.

このカメラの撮影対象として現在発言中の話者が出席者
から選択されており、その選択は特定の出席者（例えば
司会者）により行なわれていた。The speaker who is currently speaking is selected from among the attendees as an object to be photographed by this camera, and the selection is made by a specific attendee (for example, the moderator).

ところがこれを行なうことはその出席者にとって大きな
負担となり、誤りを招き易い。However, doing this places a heavy burden on the attendees and is prone to error.

そこで、テレビ会議のシステムにこの種の装置が利用さ
れる。Therefore, this type of device is used in a video conference system.

［従来の技術］この種の装置では出席者毎にマイクロホンが設けられ、
入力の確認されたマイクロホンの出席者が現在発言中の
話者として特定される。[Prior Art] In this type of device, a microphone is provided for each attendee,
The microphone attendee whose input is confirmed is identified as the speaker who is currently speaking.

そしてこの話者がテレビ撮影されるように１台の出席者
撮影用カメラが駆動され、その結果、話者が変わる毎に
それらが逐次撮影される。Then, one attendee photographing camera is driven so that the speaker is photographed on the television, and as a result, they are sequentially photographed every time the speaker changes.

ここで従来においては、第７図（Ａ）のように２秒以上
のマイク入力が確認されたときに、そのマイクロホンの
出席者が話者として特定されており、２秒以下のマイク
入力は雑音として取り扱われている。Here, conventionally, when a microphone input of 2 seconds or more is confirmed as shown in FIG. 7 (A), the attendant of the microphone is identified as a speaker, and the microphone input of 2 seconds or less is noisy. Is treated as.

さらに同図（Ｂ），（Ｃ）のように２人以上の出席者が
発言中のときには、それらのマイク入力がなかったもの
とみなされる。Further, when two or more attendees are speaking as shown in FIGS. 8B and 8C, it is considered that there is no microphone input for them.

［発明が解決しようとする問題点］しかしながら従来においては、第７図（Ｄ）のように間
欠的なマイク入力の発言をした出席者は話者として特定
されず、したがって、他の出席者の様子をうかがうなど
のためにその発言を途切れさせると、実際には発言して
いるにもかかわらずその出席者は話者として特定されな
いという課題があった。[Problems to be Solved by the Invention] However, in the past, the attendee who made an intermittent microphone input as shown in FIG. There was a problem in that when the speech was interrupted for the purpose of watching the situation, the attendees were not specified as speakers even though they were actually speaking.

また、同図（Ｅ）で示される発言中に同図（Ｆ）の雑音
が発生したときには、それらが同図（Ｂ），（Ｃ）の同
時入力とされ、したがってこのときの発言が無視され、
その出席者が話者として特定されないという課題もあっ
た。Further, when the noise shown in FIG. 6F occurs during the speech shown in FIG. 8E, they are simultaneously input in FIGS. 7B and 7C, so that the speech at this time is ignored. ,
There was also the problem that the attendees were not identified as speakers.

また従来、特開昭61−66487号公報に示されているよう
に、レベル比較に先立って各音声信号の振幅変化を平滑
化し、一定周期T1毎にレベル比較を行い、この比較結果の
最大レベルが一定値以上となるもののみを検出し、一定周期T2毎に最大レベルが一定値以上となる回数
を各音声信号チャネル毎に計測して、その計測結果が最
大かつ一定値以上となる音声信号チャネルを検出し、それに対応する映像信号を選択して切り替える、映像信号切替方式も一般に開示されている。Further, conventionally, as shown in Japanese Patent Laid-Open No. 61-66487, the amplitude change of each audio signal is smoothed prior to the level comparison, and the level comparison is performed every constant period T1. For each audio signal channel, the number of times that the maximum level becomes a certain value or more is measured for each constant period T2, and the audio signal whose measurement result is the maximum and a certain value or more is detected. A video signal switching method is also generally disclosed, which detects a channel and selects and switches a video signal corresponding thereto.

しかし、かかる切替方式では、例えば、ＡとＢとが同時
に会話を始めた場合、通常はどちらかが会話のリーダー
シップを取り、その相手側は単にあいずちをうつ程度で
あったとしても、会話開始時の音圧レベルの大きさで話
者を選択してしまうため、単にあいずちを打つ相手側を
選択してしまうとの課題があった。However, in such a switching method, for example, when A and B start a conversation at the same time, one of them usually takes the leadership of the conversation, and even if the other party is merely depressed, the conversation is Since the speaker is selected according to the level of the sound pressure level at the start, there is a problem that the other party who simply strikes is selected.

本発明は上記従来の課題に鑑みて為されたものであり、
その目的は、話者の特定精度が高く、しかも雑音の影響
を受けない高性能な装置を提供することにある。The present invention has been made in view of the above conventional problems,
It is an object of the present invention to provide a high-performance device which has a high speaker identification accuracy and is not affected by noise.

［問題点を解決するための手段］上記目的を解決するために、本発明にかかる装置は第１
図のように構成されている。[Means for Solving Problems] In order to solve the above-mentioned object, the device according to the present invention is
It is configured as shown.

同図において、マイク入力判断手段10ではテレビ会議の
出席者に対して各々用意されたマイクロホンの入力有無
が判断される。In the figure, the microphone input judging means 10 judges whether or not the microphones prepared for attendees of the video conference are input.

そして判断結果保持手段12では各マイク入力の判断結果
が現在から約４秒以前までの期間に亘って保持される。Then, the judgment result holding means 12 holds the judgment result of each microphone input for a period from the present to about 4 seconds or less.

また時間通算手段14では前記期間における各マイク入力
の通算時間が手段12で保持中の判断結果から求められ
る。Further, the time totaling means 14 obtains the total time of each microphone input in the above period from the judgment result held by the means 12.

さらに話者特定手段16では手段14で求められた通算時間
が約２秒の設定時間を越えるマイクロホンの出席者が現
在の話者として特定される。Further, the speaker identifying means 16 identifies the attendee of the microphone whose total time calculated by the means 14 exceeds the set time of about 2 seconds as the current speaker.

そして、手段16で現在の話者として特定された出席者が
撮像範囲に収まるカメラ駆動がカメラ駆動手段20で行な
われる。Then, the camera driving means 20 drives the camera so that the attendee identified as the current speaker by the means 16 falls within the imaging range.

［作用］本発明では、過去約４秒間に亘って各マイク入力が保持
され、その間に約２秒以上の通算マイク入力が確認され
たマイクロホンの出席者が話者として特定される。[Operation] In the present invention, each microphone input is held for the past about 4 seconds, and the microphone attendee whose total microphone input for about 2 seconds or more is confirmed is identified as the speaker.

［実施例］以下、図面に基づいて本発明にかかる装置の好適な実施
例を説明する。[Embodiment] A preferred embodiment of an apparatus according to the present invention will be described below with reference to the drawings.

第２図には本発明が適用されたシステムの一例が示され
ており、テレビ会議の出席者21−1,21−2,21−3,21−４
に対してマイクロホン22−1,22−2,22−3,22−４が各々
用意されている。FIG. 2 shows an example of a system to which the present invention is applied. The participants 21-1, 21-2, 21-3, 21-4 of the video conference are shown.
On the other hand, microphones 22-1, 22-2, 22-3, 22-4 are prepared respectively.

そしてこれらのマイクロホン22−1,22−2,22−3,22−４
の音声信号はマイクロホンミキサ23に与えられており、
その音声信号は音声伝送装置24に与えられている。And these microphones 22-1, 22-2, 22-3, 22-4
The audio signal of is given to the microphone mixer 23,
The audio signal is given to the audio transmission device 24.

またマイクロホンミキサ23の音声信号は話者検出回路25
に与えられており、話者検出回路25では入力の音声信号
により話者が特定されている。The voice signal from the microphone mixer 23 is the speaker detection circuit 25.
The speaker detection circuit 25 identifies the speaker by the input voice signal.

さらにこれを内容とした話者検出回路25の出力信号は旋
回台制御装置26に与えられており、旋回台制御装置26で
電動旋回台27が制御されている。Further, the output signal of the speaker detection circuit 25 having this content is given to the swivel controller 26, and the swivel controller 26 controls the electric swivel 27.

これによりカメラ28は話者検出回路25で特定された話者
の出席者21−1,21−2,21−３または21−４へ指向され、
画面内に話者の納まる映像の信号がカメラ28から映像伝
送装置29へ与えられる。As a result, the camera 28 is directed to the attendees 21-1, 21-2, 21-3 or 21-4 of the speaker identified by the speaker detection circuit 25,
A video signal in which a speaker fits within the screen is given from the camera 28 to the video transmission device 29.

第３図には話者検出回路25が示されており、サンプリン
グ回路30にはマイクロホン22−1,22−2,22−3,22−４の
音声信号がマイクロホンミキサ23を介して入力される。A speaker detection circuit 25 is shown in FIG. 3, and the voice signals of the microphones 22-1, 22-2, 22-3, 22-4 are input to the sampling circuit 30 via the microphone mixer 23. .

このサンプリング回路30では音声信号が所定レベル以上
のときにON,それ以下のときにOFFとなるデジタル信号が
各マイクロホン22−1,22−2,22−3,22−４について得ら
れており、それらのデジタル信号の値はサンプリングさ
れて入力バッファ31に取り込まれる。In this sampling circuit 30, a digital signal that is turned on when the audio signal is above a predetermined level and turned off when it is below that level is obtained for each microphone 22-1, 22-2, 22-3, 22-4, The values of those digital signals are sampled and taken into the input buffer 31.

入力バッファ31では各マイクロホン22−1,22−2,22−3,
22−４についてのデジタル信号値に応じてフラグのセッ
ト、リセットが行なわれ、入力バッファ31の各フラグ内
容はマイクロホン22−1,22−2,22−3,22−４に対して用
意された蓄積バッファ32−1,32−2,32−3,32−４に各々
分配される。In the input buffer 31, each microphone 22-1, 22-2, 22-3,
Flags are set and reset according to the digital signal value for 22-4, and the contents of each flag in the input buffer 31 are prepared for the microphones 22-1, 22-2, 22-3, 22-4. It is distributed to the storage buffers 32-1, 32-2, 32-3, 32-4.

これら蓄積バッファ32−1,32−2,32−3,32−４のビット
数は４秒間のサンプリング数と同一とされており、最も
古いビットの内容が最新のもので更新される。The number of bits of these accumulation buffers 32-1, 32-2, 32-3, 32-4 is the same as the number of samplings for 4 seconds, and the contents of the oldest bit are updated with the latest.

その結果、音声信号が所定レベル以上となったときのビ
ットが最も古いものにセットされ、各蓄積バッファ32−
1,32−2,32−3,32−４におけるセットビット数でマイク
ロホン22−1,22−2,22−3,22−４の音声入力が確認され
た通算時間MIC1on,MIC2on,MIC3on,MIC4onが示される。As a result, the bit when the voice signal exceeds a predetermined level is set to the oldest one, and each accumulation buffer 32-
Total time MIC1on, MIC2on, MIC3on, MIC4on that the voice input of the microphone 22-1, 22-2, 22-3, 22-4 was confirmed by the set bit number in 1,32-2, 32-3, 32-4. Is shown.

処理回路44では第４図のようにまずそれらが逐次読み込
まれ（ステップ40）、予め設定された検出時間（２秒）
と比較される（ステップ41）。In the processing circuit 44, as shown in FIG. 4, first, they are sequentially read (step 40), and the preset detection time (2 seconds)
(Step 41).

さらに現在読み込まれたマイク入力確認の通算時間MIC
（ｎ）onが検出時間を越えていたとき（ステップ41でYE
S）には、マイクロホン22−1,22−2,22−3,22−４に対
して付された番号のうち該当のものとこの通算時間MIC
（ｎ）onが記憶される（ステップ42）。In addition, the total time for checking the microphone input that is currently loaded is MIC.
(N) on exceeds the detection time (YE in step 41)
S) shows the corresponding one of the numbers assigned to the microphones 22-1, 22-2, 22-3, 22-4 and the total time MIC.
(N) on is stored (step 42).

そして全ての通算時間MIC（ｎ）onと検出時間との比較
（ステップ41）の完了が確認されると（ステップ43でYE
S）、前記通算時間MIC（ｎ）onが検出時間を越えたとき
に記憶されたマイク番号の総数が０か、単数か、複数か
が判断される（ステップ44）。When it is confirmed that the comparison of all the total time MIC (n) on and the detection time (step 41) is completed (YE in step 43).
S), it is determined whether the total number of microphone numbers stored when the total time MIC (n) on exceeds the detection time is 0, singular or plural (step 44).

その際に記憶されていたマイク番号の数が０のときには
全ての出席者21−1,21−2,21−3,21−４が発言しておら
ず、話者が無い旨の判断が行なわれる（ステップ45）。When the number of microphone numbers stored at that time is 0, it is judged that all the attendees 21-1, 21-2, 21-3, 21-4 are not speaking and there is no speaker. (Step 45).

また記憶されていたマイク番号の数が単数のときにはそ
のマイク番号で示されるマイクロホン22−1,22−2,22−
３または22−４の出席者21−1,21−2,21−３または21−
４が話者として特定され（ステップ46）、このマイク番
号を内容とする信号が旋回台制御装置26へ出力される
（ステップ47）。When the number of stored microphone numbers is singular, the microphones 22-1, 22-2, 22-
3 or 22-4 attendees 21-1, 21-2, 21-3 or 21-
4 is identified as the speaker (step 46), and a signal containing this microphone number is output to the swivel controller 26 (step 47).

その結果、このときのマイク番号で示されるマイクロホ
ン22−1,22−2,22−３または22−４の出席者21−1,21−
2,21−３または21−４が画面へ収まるようにカメラ28が
電動旋回台27で旋回される。As a result, the attendees 21-1, 21- of the microphone 22-1, 22-2, 22-3 or 22-4 indicated by the microphone number at this time
The camera 28 is swiveled by the electric swivel base 27 so that 2, 21-3 or 21-4 fits on the screen.

さらに、通算時間MIC（ｎ）onが検出時間を越えたマイ
ク番号の数が複数のときには、前記通算時間MIC（ｎ）o
nの最も長いものがサーチされ（ステップ48）、通算時
間MIC（ｎ）onが最も長いもののマイク番号で示される
マイクロホン22−1,22−2,22−３または22−４の出席者
21−1,21−2,21−３または21−４が話者として特定され
る（ステップ49）。Further, when there are a plurality of microphone numbers whose total time MIC (n) on exceeds the detection time, the total time MIC (n) o
Attendees of microphones 22-1, 22-2, 22-3 or 22-4 whose longest n is searched (step 48) and whose total time MIC (n) on is indicated by the microphone number.
21-1, 21-2, 21-3 or 21-4 is specified as a speaker (step 49).

したがって、このように強制的に優先して特定された話
者のマイク番号が出力され（ステップ47）、その話者に
カメラ28が向けられる。Therefore, the microphone number of the speaker who is forcibly prioritized in this way is output (step 47), and the camera 28 is directed to the speaker.

第５図では本実施例における話者特定作用が説明されて
おり、同図（Ａ）のマイク入力が行なわれると、現在か
ら４秒以前までの通算時間MIC（ｎ）onは同図（Ｂ）の
ように変化する。FIG. 5 explains the speaker specifying operation in this embodiment. When the microphone input shown in FIG. 5A is performed, the total time MIC (n) on from the present to 4 seconds before is shown in FIG. ) Changes.

ここでは同図（Ａ）のように最初に雑音がマイク入力と
して与えられるが、その雑音は４秒の時間枠中で通算し
て２秒間以上発生しておらず、このためそのマイク方向
へカメラ28が誤って向けられることはない。Here, as shown in FIG. 7A, noise is first given as a microphone input, but that noise has not occurred for more than 2 seconds in total within the time frame of 4 seconds. 28 cannot be misdirected.

また同図（Ａ）のように雑音に続いて発音が開始される
と、４秒の時間枠で発言時間が２秒となったときにその
発言元が話者として認識され、従来のように発言が２秒
継続することを持つことなく、その話者へ直ちにカメラ
28が向けられる。Further, as shown in FIG. 3A, when the pronunciation is started following the noise, when the speaking time becomes 2 seconds in the time frame of 4 seconds, the speaking source is recognized as the speaker, which is different from the conventional case. Immediate camera to the speaker without having to say for 2 seconds
28 directed.

さらに話者が他の同意を得たりそれらの様子をうかがう
ために発言を中断した場合であっても、通算時間で話者
の特定が行なわれるので同一人が話者としてそのまま認
識される。Further, even if the speaker interrupts his / her utterance in order to obtain other consents or to check their appearance, the same person is recognized as the speaker because the speaker is specified in the total time.

ここで、雑音が２秒以上連続することが無いこと、一回
の発言中に２秒以上その発言の途切れることがないこ
と、雑音が２秒以上連続して発生することのないことが
確認されている。Here, it was confirmed that the noise did not continue for 2 seconds or more, the speech was not interrupted for 2 seconds or more during one speech, and the noise did not occur continuously for 2 seconds or more. ing.

したがって、雑音による誤った話者特定を完全に排除で
きる。Therefore, false speaker identification due to noise can be completely eliminated.

また、その時間枠内で通算２秒以上のマイク入力が確認
されたときは必ず発言が行なわれているので、この発言
をした話者を確実に特定できる。Also, when the microphone input for a total of 2 seconds or more is confirmed within the time frame, the utterance is always made, so that the speaker who made the utterance can be surely specified.

さらに、その話者特定がマイク入力の時間通算で行なわ
れているので、最初に発言が途切れた場合であっても、
その話者が迅速に特定できる。Furthermore, since the speaker identification is performed by the total time of microphone input, even if the speech is interrupted for the first time,
The speaker can be identified quickly.

このように本実施例によれば、雑音の影響を完全に排除
しながら正確な話者認識を迅速に行なうことが可能とな
る。As described above, according to the present embodiment, it is possible to quickly perform accurate speaker recognition while completely eliminating the influence of noise.

第６図はある話者の発言中に他のマイク入力があったと
きの作用を説明するものであり、同図（Ａ）のようにあ
る話者の発言中に同図（Ｂ）のように他で雑音がマイク
入力され、その発言が終了する前にこれに続いて他の発
言が開始される。ここで、マイク入力の通算で話者の特
定が行なわれ、雑音入力の通算時間が前記検出時間を越
えることがないので、同図（Ｃ），（Ｄ）から理解され
るように、ある話者の発言中に雑音が重なった場合でも
前述した第７図のように誤った話者認識は行なわれな
い。FIG. 6 is a diagram for explaining the operation when there is another microphone input while a speaker is speaking, as shown in FIG. 6B while a speaker is speaking as shown in FIG. Then, noise is input to the microphone by another, and another speech is started following this before the speech ends. Here, since the speaker is specified by the total of microphone inputs and the total time of noise input does not exceed the detection time, as will be understood from FIGS. Even if noise is superimposed on a person's speech, erroneous speaker recognition as shown in FIG. 7 is not performed.

また、ある発言中に他の発言が行なわれた場合であって
も、発言継続時間が長い方が優先して話者と認識される
ので、撮影すべき話者の切り換えが観者に違和感を与え
ることなく円滑に行なわれる。Further, even if another speech is made during a certain speech, the speaker having the longer speech duration is preferentially recognized as the speaker, so that switching the speaker to be photographed makes the viewer feel uncomfortable. It is done smoothly without giving.

また、ある時刻で複数のマイク入力があった場合にも、
約４秒間のサイクルの間での入力時間の積算を認識し、
現在の話者を特定することにより、単に声の大きい方を
選択するのではなく、その会話でリーダーシップを取っ
ている方を自動的に選択できる。Also, if there are multiple microphone inputs at a certain time,
Recognize the integration of the input time during the cycle of about 4 seconds,
By identifying the current speaker, it is possible to automatically select who has the leadership in the conversation, rather than simply selecting the louder one.

例えば、ＡとＢが同時に会話を始めた場合、通常はどち
らかが会話のリーダーシップを取り、その相手側はあい
ずちを打つ程度のときでも、会話の開始時の音圧レベル
の大きさで判断せず、約４秒間の間でどちらがリーダー
シップを取るかを観察して選択できることとなる。For example, when A and B start a conversation at the same time, one of them usually takes the leadership of the conversation, and the other party has a sound pressure level at the beginning of the conversation even when the other side is just about to hit each other. Without making a decision, you will be able to observe and select which one will take the leadership for about 4 seconds.

このように本実施例によれば、マイク入力が重なった場
合においても、発言中の話者へ誤りなくカメラ28をスム
ーズに向けることが可能となる。As described above, according to the present embodiment, even when the microphone inputs overlap, the camera 28 can be smoothly pointed to the speaker who is speaking without error.

［発明の効果］以上説明したように本発明によれば、現在から約４秒前
までの時間枠内で通算して２秒のマイク入力が確認され
た出席者が話者として特定されるので、その特定をきわ
めて正確に行なうことが可能となり、しかも雑音による
影響を完全に排除することも可能となる。[Effects of the Invention] As described above, according to the present invention, the attendee whose microphone input is confirmed for 2 seconds in total within the time frame from the present to about 4 seconds before is specified as the speaker. , It becomes possible to perform the identification extremely accurately, and it is also possible to completely eliminate the influence of noise.

[Brief description of drawings]

第１図は発明の原理説明図、第２図は実施例の全体構成
説明図、第３図は話者検出回路の構成説明図、第４図は
処理回路の作用を説明するフローチャート、第５図およ
び第６図は実施例の作用説明図、第７図は従来例の説明
図である。 10……マイク入力判断手段 12……判断結果保持手段 14……時間通算手段 16……話者特定手段 20……カメラ駆動手段 21−1,21−2,21−3,21−４……出席者 22−1,22−2,22−3,22−４……マイクロホン 23……マイクロホンミキサ 24……音声伝送装置 25……話者検出回路 26……旋回台制御装置 27……電動旋回台 28……カメラ 29……映像伝送装置 30……サンプリング回路 31……入力バッファ 32−1,32−2,32−3,32−４……蓄積バッファ 33……処理回路FIG. 1 is an explanatory view of the principle of the invention, FIG. 2 is an explanatory view of the overall configuration of the embodiment, FIG. 3 is an explanatory view of the configuration of a speaker detection circuit, and FIG. 4 is a flowchart explaining the operation of a processing circuit. FIG. 6 and FIG. 6 are explanatory views of the operation of the embodiment, and FIG. 7 is an explanatory view of the conventional example. 10: Microphone input judging means 12: Judgment result holding means 14: Time totaling means 16: Speaker identifying means 20: Camera driving means 21-1, 21-2, 21-3, 21-4. Attendees 22-1,22-2,22-3,22-4 …… Microphone 23 …… Microphone mixer 24 …… Speech transmission device 25 …… Speaker detection circuit 26 …… Swivel base control device 27 …… Electric swiveling Platform 28 …… Camera 29 …… Video transmission device 30 …… Sampling circuit 31 …… Input buffer 32-1,32−2,32−3,32-4 …… Storage buffer 33 …… Processing circuit

───────────────────────────────────────────────────── フロントページの続き (72)発明者本間敏弘神奈川県川崎市中原区上小田中1015番地富士通株式会社内 (72)発明者佐藤均神奈川県川崎市中原区上小田中1015番地富士通株式会社内 (72)発明者石黒均神奈川県川崎市中原区上小田中1015番地富士通株式会社内 (72)発明者芦田庸市神奈川県川崎市中原区上小田中1015番地富士通株式会社内 (72)発明者山口政数神奈川県川崎市中原区上小田中1015番地富士通株式会社内 (56)参考文献特開昭61−87489（ＪＰ，Ａ) 特開昭61−66487（ＪＰ，Ａ) ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Toshihiro Honma 1015 Kamiodanaka, Nakahara-ku, Kawasaki City, Kanagawa Prefecture, Fujitsu Limited (72) Inventor Hitoshi Sato, 1015, Kamedotachu, Nakahara-ku, Kawasaki City, Kanagawa Prefecture, Fujitsu Limited ( 72) Inventor Hitoshi Ishiguro, 1015 Kamiodanaka, Nakahara-ku, Kawasaki-shi, Kanagawa, Fujitsu Limited (72) Inventor, Ashita Yoshishi, 1015, Kamedotachu, Nakahara-ku, Kawasaki, Kanagawa, Fujitsu, Ltd. (72) Inventor Masakazu Yamaguchi 1015 Kamiodanaka, Nakahara-ku, Kawasaki City, Kanagawa Prefecture, Fujitsu Limited (56) References JP-A-61-87489 (JP, A) JP-A-61-66487 (JP, A)

Claims

[Claims]

1. A microphone input judging means (10) for judging presence / absence of input of a microphone prepared for attendees of a video conference, and a result of judging presence / absence of each microphone input from the present to about 4 seconds or less. A determination result holding means (12) for holding for a period of time, a time totaling means (14) for obtaining a total time of each microphone input in the period based on the determination result being held by the determination result holding means (12), and a time totaling means The speaker identification means (16) for identifying the microphone speaker who has exceeded the set time of about 2 seconds obtained in (14) as the current speaker, and the speaker identification means (16) A speaker photographing camera control device for a video conference system, comprising: a camera driving unit (20) for driving a camera in which an attendee identified as a person is within an imaging range.

2. A means for totalizing the number of microphones whose total time calculated by the time totaling means (14) is equal to or longer than the set time, and the longest total time when there are a plurality of calculated microphones. A means for forcibly selecting an attendee of a microphone as a current speaker, and a camera control device for photographing a speaker of a video conference system according to claim (1). .