JP2000333150A

JP2000333150A - Video conference system

Info

Publication number: JP2000333150A
Application number: JP11139417A
Authority: JP
Inventors: Hiroaki Nagano; 浩明長野
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1999-05-20
Filing date: 1999-05-20
Publication date: 2000-11-30

Abstract

PROBLEM TO BE SOLVED: To facilitate generation of a minute after a conference by allowing the system to take a special attention to major talkers and interrupt talkers in the case that many attendees simultaneously make a speech. SOLUTION: The video conference system is provided with a major talker decision means 2 that discriminates revision of major talkers on the basis of a sate a silencing state of other major talker during utterance of the major talker when the other major talker makes an utterance while the major talker displayed on a base image 51, an interrupt talker decision means 3 that decides an interrupt talker displayed on a small image 52 on the basis of it as to whether or not other utterance comes from the major talker displayed on the base menu 51, an image synthesis display means 6 that displays a decided major talker onto the base image 51 and displays the interrupt talker onto the small image 52 on the base image, and a minute capturing means 7 that stores information of the major talkers and the interrupt talkers together with time information or the like.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、画像音声議事録を
容易に採取できるテレビ会議システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a video conference system capable of easily collecting video and audio minutes.

【０００２】[0002]

【従来の技術】一般のテレビ会議では画像イメージに変
化が乏しく臨場感にかける。話者を自動的にズームアッ
プしたりする機能もあるが、多人数が同時に発言した場
合に対応できない。また、録音と同様に、会議後の各話
者の録音内容からの会話の聞き取りによる議事録起こし
は、人間が通常行う話者への注意集中が阻害されている
ため、非常に話者の特定がしにくいものである。2. Description of the Related Art In a general video conference, a change in an image image is scarce and a sense of reality is given. Although there is a function to automatically zoom up the speaker, it cannot cope with a case where many people speak at the same time. In addition, as in the case of the recording, the transcription of the minutes by listening to the conversation from the recorded contents of each speaker after the meeting is very difficult for humans to concentrate on the speaker normally, so the speaker is very specific. It is difficult to do.

【０００３】[0003]

【発明が解決しようとする課題】従来のテレビ会議シス
テムは以上のように構成されているため、通常の会議に
おけるように話者へ注目することが出来ず、話者を特定
することが困難であり、事後の聞き取りによる議事録作
成が容易でない課題があった。Since the conventional video conference system is configured as described above, it is not possible to pay attention to the speaker as in a normal conference, and it is difficult to specify the speaker. There was a problem that it was not easy to create minutes by hearing after the fact.

【０００４】そこで、本発明の目的は、多人数が同時に
発言した場合に通常の会議におけるように主要な発言を
行う主要話者および質問などを行う割込み話者へ注目す
ることが出来、会議後の議事録作成が容易になるテレビ
会議システムを提供することを目的とする。[0004] Therefore, an object of the present invention is to focus on a main speaker who makes a main utterance and an interrupt speaker who makes a question or the like when a large number of people utter at the same time, as in a normal conference. The purpose of the present invention is to provide a video conference system that facilitates the creation of minutes.

【０００５】[0005]

【課題を解決するための手段】本発明に係るテレビ会議
システムは、テレビ会議参加者のうちでベース画面に表
示されている主要話者が発言を行なっているときに他の
発言があると、前記主要話者の発言中の沈黙している状
態をもとに前記主要話者の変更について判定を行うとと
もに、前記他の発言が前記ベース画面に表示されている
前記主要話者によるものか否かをもとに小画面に表示す
る割込み話者を判定する話者判定手段と、該話者判定手
段の判定結果をもとに、前記主要話者を前記ベース画面
へ表示し前記割込み話者を前記ベース画面上の小画面へ
合成画像として表示するとともに、前記主要話者および
前記割込み話者が行なった発言についての音声を出力す
る合成出力手段と、前記話者判定手段で判定された前記
主要話者および前記割込み話者の情報を時刻情報等とと
もに保存する議事録採取手段と備えたことを特徴とす
る。According to the video conference system of the present invention, when a main speaker displayed on the base screen among the video conference participants is speaking, there is another speech. Based on the state of silence in the speech of the main speaker, a determination is made as to the change of the main speaker, and whether or not the other speech is due to the main speaker displayed on the base screen Speaker determining means for determining an interrupting speaker to be displayed on a small screen based on the result, and displaying the main speaker on the base screen based on the determination result of the speaker determining means. Is displayed on the small screen on the base screen as a composite image, and the composite output means for outputting voices about utterances made by the main speaker and the interrupting speaker; and Key speaker and former Characterized by comprising the proceedings collecting means for storing the information of the interrupt speaker along with the time information, and the like.

【０００６】本発明のテレビ会議システムは、テレビ会
議参加者のうちでベース画面に表示されている主要話者
が発言を行なっているときに他の発言があると、前記主
要話者の発言中の沈黙している状態をもとに前記主要話
者の変更について判定し、さらに前記他の発言が前記ベ
ース画面に表示されている前記主要話者によるものか否
かをもとに小画面に表示する割込み話者を判定し、前記
各判定結果をもとに、前記主要話者を前記ベース画面へ
表示し前記割込み話者を前記ベース画面上の小画面へ合
成画像として音声と共に出力する。そして、前記判定さ
れた前記主要話者および前記割込み話者の情報を時刻情
報等とともに保存し、これをもとに通常の会議のように
多人数が同時に発言した場合の主要話者や割込み話者へ
注目できるようにして、会議後の議事録作成を容易化す
る。According to the video conference system of the present invention, when a main speaker displayed on the base screen among the video conference participants is making a speech and there is another speech, the main conference speaker is speaking. Judge the change of the main speaker based on the state of silence, and further change to a small screen based on whether or not the other utterance is due to the main speaker displayed on the base screen. The interrupting speaker to be displayed is determined, and based on the determination results, the main speaker is displayed on the base screen, and the interrupting speaker is output to a small screen on the base screen as a synthesized image together with voice. Then, the information of the determined main speaker and the interrupted speaker is stored together with time information and the like, and based on this information, the main speaker and the interrupted talk when a large number of people simultaneously speak as in a normal meeting. The minutes of the meeting after the meeting.

【０００７】[0007]

【発明の実施の形態】以下、本発明の実施の一形態につ
いて説明する。本発明の実施の一形態によるテレビ会議
システムは、テレビ会議において自動的に主要話者を検
出するとともに、質問者などの割込み話者の画面もカッ
トイン技法により画面に表示する。また、その記録をと
ることにより自動的に話者注釈つきの画像音声議事録を
採取できるようにしたものである。DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below. A video conference system according to an embodiment of the present invention automatically detects a main speaker in a video conference and displays a screen of an interrupting speaker such as a questioner on a screen by a cut-in technique. In addition, by taking the record, the video and audio minutes with speaker annotations can be automatically collected.

【０００８】図１は、本実施の形態のテレビ会議システ
ムの構成を示すブロック図であり、カメラ・マイク１と
主要話者判定手段（話者判定手段）２、割込話者判定手
段（話者判定手段）３、ベース画面手段４、カットイン
手段５、画像合成表示手段（合成出力手段）６、および
議事録採取手段７などを備えている。FIG. 1 is a block diagram showing a configuration of a video conference system according to the present embodiment. A camera / microphone 1, a main speaker determination unit (speaker determination unit) 2, and an interrupt speaker determination unit (speaker) Person determination means) 3, base screen means 4, cut-in means 5, image synthesis display means (synthesis output means) 6, minutes collection means 7, and the like.

【０００９】カメラ・マイク１はテレビ会議へのテレビ
会議参加者を、そのテレビ会議参加者がいる各場所で撮
影し、例えば静止画像信号として出力し、また前記テレ
ビ会議参加者の発した音声などを電気音声信号として出
力するものである。主要話者判定手段２は、カメラ・マ
イク１のマイクへの入力をもとに、図２に示すアルゴリ
ズムを使用して長時間、話を行なっている者を主要話者
として判定するものである。A camera / microphone 1 captures a video conference participant in a video conference at each location where the video conference participant is present and outputs it as, for example, a still image signal. Is output as an electric audio signal. The main speaker determination means 2 determines a person who has been talking for a long time as the main speaker using the algorithm shown in FIG. 2 based on the input to the microphone of the camera microphone 1. .

【００１０】割込話者判定手段３は、質問者等の比較的
短時間の発言を行う者を割込み話者として図３に示すア
ルゴリズムで判定するものである。The interrupting speaker judging means 3 judges a person who speaks in a relatively short time, such as a questioner, as an interrupting speaker by the algorithm shown in FIG.

【００１１】ベース画面手段４は、カメラ・マイク１の
カメラで撮影したテレビ会議参加者のうちの主要話者判
定手段２で判定した主要話者をテレビ会議の画像のベー
ス画面として表示するためのものである。The base screen means 4 is for displaying the main speaker determined by the main speaker determination means 2 among the video conference participants photographed by the camera of the camera / microphone 1 as a base screen of an image of the video conference. Things.

【００１２】カットイン手段５は、カメラ・マイク１の
カメラで撮影したテレビ会議参加者のうちの割込話者判
定手段３により判定した割込み話者を小画面として表示
するためのものである。The cut-in means 5 is for displaying, as a small screen, the interrupt speaker determined by the interrupt speaker determining means 3 among the video conference participants photographed by the camera of the camera / microphone 1.

【００１３】画像合成表示手段６は、前記ベース画面や
前記小画面を図４に示すような形態の合成画像としてテ
レビ会議画面に表示するものである。図４において、符
号５１は前記ベース画面、符号５２は小画面である。The image synthesizing and displaying means 6 is for displaying the base screen and the small screen on the video conference screen as a synthetic image having a form as shown in FIG. In FIG. 4, reference numeral 51 denotes the base screen, and reference numeral 52 denotes a small screen.

【００１４】議事録採取手段７は、主要話者判定手段２
で判定された主要話者、および割込話者判定手段３で判
定された割込み話者の情報を時刻情報等とともに保存す
るものであり、これにより画像合成表示手段６の作成し
た合成画像、音声と対照することにより、話者注釈つき
の議事録情報の作成を可能にするものである。The minutes collecting means 7 is a main speaker determining means 2
The information of the main speaker determined in step (1) and the information of the interrupting speaker determined by the interrupting speaker determination means 3 are stored together with time information and the like. This makes it possible to create minutes information with speaker annotations.

【００１５】次に、動作について説明する。テレビ会議
においては、主要話者は長時間連続して話す傾向があ
り、また場合に応じて質問、やじなどが他の参加者から
なされることが多い。この特性を利用し、カメラ・マイ
ク１からの入力をもとに、長時間話している者を主要話
者判定手段２で、図２に示すアルゴリズムにより主要話
者として判定し、ベース画面表示手段４によりテレビ会
議の画像のベース画面とする。Next, the operation will be described. In a video conference, the main speaker tends to talk continuously for a long time, and questions and jiji are often asked by other participants as occasion demands. Utilizing this characteristic, a person who has been talking for a long time is determined as a main speaker by an algorithm shown in FIG. 4 is used as the base screen of the video conference image.

【００１６】図２は、主要話者判定手段２による主要話
者判定動作を示すフローチャートである。このフローチ
ャートによれば、先ず、テレビ会議において話を行なっ
ている者が存在しているかを判定し（ステップＳ１）、
この結果、話している者がいなければ現在のテレビ会議
のベース画面５１をそのまま維持する（ステップＳ
５）。一方、話している者がいれば、その話者は現在の
ベース画面５１に表示された話者であるかを判定する
（ステップＳ２）。この判定の結果、前記話者が現在の
ベース画面５１に表示されている話者であれば現在のテ
レビ会議のベース画面５１をそのまま維持する（ステッ
プＳ５）。FIG. 2 is a flowchart showing the main speaker determination operation by the main speaker determination means 2. According to this flowchart, first, it is determined whether or not there is a person who is talking in a video conference (step S1).
As a result, if no one is speaking, the base screen 51 of the current video conference is maintained as it is (step S
5). On the other hand, if there is a speaker, it is determined whether the speaker is the speaker currently displayed on the base screen 51 (step S2). As a result of this determination, if the speaker is the speaker displayed on the current base screen 51, the base screen 51 of the current video conference is maintained as it is (step S5).

【００１７】ステップＳ２において、前記話者が現在の
ベース画面５１に表示されている話者でなければ、現在
のベース画面５１に表示されている者の沈黙している状
態、例えば沈黙している時間が定義値を超えたか否かを
判定する（ステップＳ３）。この場合、前記沈黙してい
る時間が定義値を超えず、前記ベース画面５１に表示さ
れている話者が話始めれば、前記ベース画面５１は変更
されずに表示されている話者はそのまま主要話者として
ベース画面５１に表示する（ステップＳ５）。In step S2, if the speaker is not the speaker currently displayed on the base screen 51, the person displayed on the current base screen 51 is in a state of silence, for example, silence. It is determined whether the time has exceeded the defined value (step S3). In this case, if the silence time does not exceed the defined value and the speaker displayed on the base screen 51 starts speaking, the speaker displayed on the base screen 51 without being changed remains the main speaker. The speaker is displayed on the base screen 51 (step S5).

【００１８】ステップＳ３において、現在のベース画面
５１に表示されている者の前記沈黙している時間が定義
値を超えると、ベース画面５１は変更され、ベース画面
５１に表示される主要話者を現在ベース画面に表示され
ている沈黙している者から、現在話を行なっている者へ
変更する（ステップＳ４）。In step S3, when the silence time of the person currently displayed on the base screen 51 exceeds the defined value, the base screen 51 is changed and the main speaker displayed on the base screen 51 is changed. The person who is currently silent on the base screen is changed to the person who is currently talking (step S4).

【００１９】また、質問者等の比較的短時間、発言する
者を割込み話者として割込話者判定手段３により図３に
示すアルゴリズムで判定し、カットイン手段５により小
画面５２として表示する。Also, a person who speaks for a relatively short time, such as a questioner, is determined as an interrupting speaker by the interrupter determining means 3 by the algorithm shown in FIG. 3 and displayed as a small screen 52 by the cut-in means 5. .

【００２０】図３は、この割込話者判定手段３による割
込み話者の判定動作を示すフローチャートである。この
フローチャートによれば、先ず、テレビ会議において話
を行なっている者が存在しているかを判定し（ステップ
Ｓ１１）、この結果、話している者がいなければ現在の
テレビ会議の小画面５２をそのまま維持する（ステップ
Ｓ１５）。一方、話している者がいれば、その話者は現
在のベース画面５１に表示された話者であるかを判定す
る（ステップＳ１２）。この判定の結果、前記話者が現
在のベース画面５１に表示されている話者であれば現在
のテレビ会議のベース画面５１をそのまま維持する（ス
テップＳ１５）。FIG. 3 is a flowchart showing the operation of the interrupting speaker judging means 3 for judging the interrupting speaker. According to this flowchart, first, it is determined whether there is a person who is talking in the video conference (step S11). As a result, if there is no person talking, the small screen 52 of the current video conference is left as it is. Maintain (step S15). On the other hand, if there is a speaker, it is determined whether the speaker is the speaker currently displayed on the base screen 51 (step S12). As a result of the determination, if the speaker is the speaker displayed on the current base screen 51, the base screen 51 of the current video conference is maintained as it is (step S15).

【００２１】ステップＳ１２において、前記話者が現在
のベース画面５１に表示されている者でなければ、現
在、発言している話者は小画面５２に表示されているか
の判定を行う（ステップＳ１３）。この結果、前記話者
が小画面５２に表示されていれば、ステップＳ１５に進
み前記小画面５２は変更されずに表示されている話者を
そのまま割込み話者として小画面５２に表示する（ステ
ップＳ１５）。In step S12, if the speaker is not the one currently displayed on the base screen 51, it is determined whether the speaker currently speaking is displayed on the small screen 52 (step S13). ). As a result, if the speaker is displayed on the small screen 52, the process proceeds to step S15, and the speaker displayed on the small screen 52 without being changed is displayed on the small screen 52 as an interrupting speaker (step S15). S15).

【００２２】ステップＳ１３において、現在の話者が小
画面５２に表示されていなければ、その現在の話者を小
画面５２に表示する（ステップＳ１４）。If the current speaker is not displayed on the small screen 52 in step S13, the current speaker is displayed on the small screen 52 (step S14).

【００２３】このように、質問者等の比較的短時間、発
言する者を割込み話者として割込話者判定手段３により
判定し、カットイン手段５により小画像５２として表示
する。As described above, a person who speaks for a relatively short time, such as a questioner, is determined as an interrupting speaker by the interrupting speaker determining means 3 and is displayed as a small image 52 by the cut-in means 5.

【００２４】これらベース画面５１および小画面５２の
画像は画像合成表示手段６により図４のようなイメージ
でテレビ会議画面に表示される。また、主要話者や割込
み話者の発言も音声として出力される。The images of the base screen 51 and the small screen 52 are displayed on the video conference screen as shown in FIG. Also, the speeches of the main speaker and the interrupting speaker are output as voice.

【００２５】議事録採取手段７は、主要話者判定手段２
で判定された主要話者、および割込話者判定手段３で判
定された割込み話者についての情報を時刻情報等ととも
に保存する。The minutes collecting means 7 is a main speaker judging means 2
The information on the main speaker determined in step (1) and the interrupt speaker determined by the interrupting speaker determination means 3 is stored together with time information and the like.

【００２６】これを画像合成表示手段６の出力した画
像、音声などの情報と対照することにより、話者注釈つ
きの議事録情報が作成される。By comparing this with information such as images and sounds output from the image synthesizing and displaying means 6, minutes information with speaker annotations is created.

【００２７】従って、本実施の形態によれば、主要話者
と割込み話者とをベース画面５１と小画面５２により明
確に区別して把握でき、またこれにより主に話を行なっ
ている者をベース画面５１の主要話者、質問などの短時
間話を行う者を小画面５２の割込み話者として特定でき
注目できるため、多人数が同時に発言した場合に通常の
会議におけるように主要な発言を行う話者および質問な
どを行う割込み話者へ注目することが出来、さらに会議
後の録音内容をもとにした聞き取りによる議事録作成が
容易になるテレビ会議システムが提供できる効果があ
る。Therefore, according to the present embodiment, the main speaker and the interrupting speaker can be clearly distinguished and grasped by the base screen 51 and the small screen 52. Since the main speaker on the screen 51 and the person who talks for a short time such as a question can be identified and noticed as the interrupting speaker on the small screen 52, when a large number of people speak at the same time, the main speech is made as in a normal meeting. It is possible to provide a video conference system in which a speaker and an interrupting speaker who makes a question can be noticed and a minutes can be easily prepared by listening based on the recorded contents after the meeting.

【００２８】なお、以上説明した実施の形態では、主要
話者判定手段２で判定された主要話者、および割込話者
判定手段３で判定された割込み話者の情報を時刻情報等
とともに保存し、これにより画像合成表示手段６の作成
した合成画像、音声と対照することにより、話者注釈つ
きの議事録情報の作成を可能にする議事録採取手段７を
備えるように構成したが、議事録採取機能７に音声認識
機能を付加することで、議事録作成を半自動化する構成
であっても良い。In the embodiment described above, the information of the main speaker determined by the main speaker determining means 2 and the information of the interrupting speaker determined by the interrupting speaker determining means 3 are stored together with time information and the like. Then, the apparatus is provided with the minutes collecting means 7 which enables the generation of the minutes information with the speaker annotation by comparing with the synthesized image and the voice generated by the image synthesizing and displaying means 6. By adding a voice recognition function to the collection function 7, the construction of minutes may be semi-automated.

【００２９】また、多人数の場合は、ベース画面表示手
段４、カットイン手段５に話者名をスーパーインポーズ
することにより、話者を明確化できるように構成しても
よい。In the case of a large number of people, the speaker name may be superimposed on the base screen display means 4 and the cut-in means 5 so as to clarify the speaker.

【００３０】[0030]

【発明の効果】以上、説明したように、通常の会議での
話者への注目と同様に、多人数が同時に発言した場合に
主要な発言を行う話者および質問などを行う割込み話者
へ注目することが出来、テレビ会議において発言を行う
者はベース画面または小画面のいずれかにその画像が表
示されるため、話者の画像は確実に表示され、話者を特
定できない状況が防止でき、また、話者についての情報
が自動的に採取されるため、会議後の録音内容をもとに
した事後の聞き取りによる議事録作成の労力が軽減でき
る効果がある。As described above, as described above, in the same way as the attention of the speaker in the ordinary meeting, when a large number of people speak at the same time, the speaker who makes a main speech and the interrupt speaker who makes a question etc. Attention can be paid, and the person who speaks in the video conference displays the image on either the base screen or the small screen, so that the image of the speaker is reliably displayed and the situation where the speaker can not be identified can be prevented. In addition, since the information about the speaker is automatically collected, it is possible to reduce the labor of preparing the minutes by hearing after the meeting based on the recorded contents after the meeting.

[Brief description of the drawings]

【図１】本発明の実施の一形態によるテレビ会議システ
ムの構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a video conference system according to an embodiment of the present invention.

【図２】本発明の実施の一形態によるテレビ会議システ
ムにおける主要話者判定手段による主要話者判定動作を
示すフローチャートである。FIG. 2 is a flowchart showing a main speaker determination operation by a main speaker determination unit in the video conference system according to one embodiment of the present invention;

【図３】本発明の実施の一形態によるテレビ会議システ
ムにおける割込話者判定手段による割込み話者の判定動
作を示すフローチャートである。FIG. 3 is a flowchart illustrating an interrupting speaker determination operation performed by an interrupting speaker determination unit in the video conference system according to the embodiment of the present invention;

【図４】本発明の実施の一形態によるテレビ会議システ
ムにおけるテレビ会議画面の表示内容を示す説明図であ
る。FIG. 4 is an explanatory diagram showing display contents of a video conference screen in the video conference system according to one embodiment of the present invention;

[Explanation of symbols]

１……カメラ・マイク、２……主要話者判定手段（話者
判定手段）、３……割込話者判定手段（話者判定手
段）、４……ベース画面手段、５……カットイン手段、
６……画像合成表示手段（合成出力手段）、７……議事
録採取手段。1 camera microphone 2 main speaker determination means (speaker determination means) 3 interrupt speaker determination means (speaker determination means) 4 base screen means 5 cut-in means,
6 ... image synthesis display means (synthesis output means) 7 ... minutes collection means

Claims

[Claims]

1. A video conference system for a plurality of video conference participants to conduct a conference from each location via a video conference participant's image and voice, wherein the video conference participant is displayed on a base screen among the video conference participants. If there is another utterance while the main speaker is speaking, while making a decision on the change of the main speaker based on the state of silence during the speech of the main speaker, Speaker determination means for determining an interrupting speaker to be displayed on a small screen based on whether or not another utterance is from the main speaker displayed on the base screen; and a determination result of the speaker determination means Based on the above, while the main speaker is displayed on the base screen, the interrupting speaker is displayed as a composite image on a small screen on the base screen, and a statement made by the main speaker and the interrupting speaker is performed. of Synthesizing output means for outputting a voice; and minutes collecting means for storing information of the main speaker and the interrupting speaker determined by the speaker determining means together with time information and the like. Video conference system.

2. The combined minutes output from the combined output means, wherein the minutes collecting means outputs the information stored about the main speaker and the interrupt speaker determined by the speaker determining means,
2. The video conference system according to claim 1, wherein the minutes information with the speaker annotation can be created by comparing with the voice.

3. The speaker judging means, when a main speaker among the video conference participants displayed on the base screen is making a speech, and there is another speech, the speech of the main speaker is made. A determination is made as to the change of the main speaker based on the time during which the main speaker is considered to be silent, and whether or not the other utterance is due to the main speaker being displayed on the base screen is determined. 3. The video conference system according to claim 1, wherein the interrupting speaker to be displayed on the small screen is determined.

4. The speaker judging means, when a main speaker displayed on the base screen among the video conference participants is making a speech, and there is another speech, the speech of the main speaker is made. Main speaker determination means for determining the change of the main speaker based on the time considered to be in silence,
Interrupt speaker determining means for determining an interrupting speaker to be displayed on a small screen based on whether or not the other utterance is due to the main speaker displayed on the base screen. The video conference system according to claim 3, wherein:

5. A base screen means for displaying, as a base screen of a video conference image, a main speaker determined by the main speaker determination means among the video conference participants, and 5. The video conference system according to claim 4, further comprising cut-in means for displaying, as a small screen, the interrupting speaker determined by said interrupting speaker determining means.