JPH06232979A

JPH06232979A - Multi-media communication conference system

Info

Publication number: JPH06232979A
Application number: JP1331593A
Authority: JP
Inventors: Hidehiko Watanabe; 秀彦渡辺
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1993-01-29
Filing date: 1993-01-29
Publication date: 1994-08-19
Anticipated expiration: 2015-12-11
Also published as: JP3115722B2

Abstract

PURPOSE:To attach a token only with an easy operation and to make a conference progress smoothly in a multi-media communication conference system. CONSTITUTION:Voices (words) in accordance with multi-media terminals 32-33 are registered in advance on a main station terminal 28. The fact that a call originating request is issued from the multi-media terminal 32 is notified to a multi-medial terminal 28 by touching a token request area on the LCD screen of the multi-media terminal 32. The multi-media terminal 28 displays from which participating terminal the request is issued on an LCD 29 screen, and a name equivalent to a terminal on which the token is desired to attach and registered in advance is vocalized after a token attaching area is touched. Thereby, the name is converted to data representing the feature of the voice by the feature extraction part of a speech recognition part, and furthermore, it is sent to a similarity calculation part, and data with highest similarity out of dictionary data is selected, then, the token is attached on a recognition result.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明はマルチメディア通信会議
システム、特に通信回線でマルチポイント接続された複
数のマルチメディア通信端末間でのトークン付与に関す
るマルチメディア通信会議システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a multimedia communication conferencing system, and more particularly to a multimedia communication conferencing system for assigning tokens among a plurality of multimedia communication terminals which are multipoint connected by communication lines.

【０００２】[0002]

【従来の技術】マルチメディア装置は、複数のメディア
を複合して利用することが可能な通信端末装置であり、
静止画情報、動画情報、音声情報あるいは手書き情報
（テレライティング情報）等の入出力が可能である。こ
のマルチメディア装置をＩＳＤＮのような通信回線を介
して複数台接続することにより、遠隔地者間で通信会議
が開催可能なマルチメディア通信会議システムを構築す
ることが可能である。2. Description of the Related Art A multimedia device is a communication terminal device capable of compositely using a plurality of media,
It is possible to input / output still image information, moving image information, voice information, handwriting information (telewriting information), and the like. By connecting a plurality of the multimedia devices via a communication line such as ISDN, it is possible to construct a multimedia communication conference system capable of holding a communication conference between remote persons.

【０００３】しかし、複数台のマルチメディア装置同士
が勝手に静止画情報、動画情報、音声情報あるいは描画
情報等を入力した場合は、情報の混乱が発生するので、
通信会議を秩序だてて進行させるためにトークン付与を
行い、そのトークンを有するマルチメディア装置のみが
発言権を持つようにしている。マルチメディア装置にお
けるトークン付与の操作は、操作パネルを使って手入力
によるものが一般的であった。However, if a plurality of multimedia devices arbitrarily input still image information, moving image information, audio information or drawing information, information is confused.
Tokens are added to the communication conference in an orderly manner, and only the multimedia device having the token has the right to speak. The operation of token addition in a multimedia device is generally manually input using an operation panel.

【０００４】また、例えば、特開平２−２６５３４６号
公報では、会議参加局の発言権の要求に対して、自動的
に発言権を与えるものであった。Further, for example, in Japanese Patent Laid-Open No. 2-265346, the right to speak is automatically given to the request for the right to speak from the conference participating stations.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、このよ
うな従来のマルチメディア通信会議システムにあって
は、操作パネルを使った手入力によるトークンの付与
は、会議に参加する通信端末が多数になると、操作パネ
ルから目的とする端末を探し出すのに手間がかかり、操
作性が悪くなってしまうという問題があった。However, in such a conventional multimedia communication conference system, when the token is manually input using the operation panel, the number of communication terminals participating in the conference becomes large. There is a problem in that it takes time and effort to find a target terminal from the operation panel, resulting in poor operability.

【０００６】また、トークンを自動的に付与するもの
は、議長端末などにより発言者を指定したくても、それ
が不可能であるという問題があった。本発明は、上記従
来の課題に鑑みてなされたものであり、通信会議端末の
参加局にトークンを付与する際に、簡単な操作だけでト
ークンを付与することが可能であって、スムーズに通信
会議を進行させることができるマルチメディア通信会議
システムを提供することを目的とする。[0006] Further, there is a problem in that the one which automatically assigns the token is impossible even if the speaker is desired to be designated by the chairperson terminal or the like. The present invention has been made in view of the above-mentioned conventional problems, and when assigning a token to a participating station of a communication conference terminal, it is possible to assign the token by a simple operation, and communication is performed smoothly. An object of the present invention is to provide a multimedia communication conference system capable of proceeding a conference.

【０００７】[0007]

【課題を解決するための手段】請求項１記載の発明は、
複数のメディアを複合して利用することが可能な複数の
マルチメディア通信端末により構成され、前記複数のマ
ルチメディア通信端末のうちの一つの端末を主局端末、
残りの端末を参加局端末とし、参加局端末から主局端末
に対してトークン要求があって、主局端末から参加局端
末にトークンが付与されたとき、該トークンが付与され
た参加局端末から送出される情報が各通信端末に表示さ
れるマルチメディア通信会議システムにおいて、前記主
局端末が、参加局端末からのトークン要求を音声で認識
する音声認識手段を有し、該音声認識手段の音声認識結
果に基づいてトークンを付与するようにしたことを特徴
としている。The invention according to claim 1 is
It is configured by a plurality of multimedia communication terminals capable of compositely using a plurality of media, and one terminal of the plurality of multimedia communication terminals is a master station terminal,
When the remaining terminal is used as a participating station terminal and a token request is made from the participating station terminal to the main station terminal and a token is added from the main station terminal to the participating station terminal, the participating station terminal to which the token is added In a multimedia communication conference system in which information to be transmitted is displayed on each communication terminal, the main station terminal has a voice recognition means for recognizing a token request from a participating station terminal by voice, and the voice of the voice recognition means. The feature is that tokens are added based on the recognition result.

【０００８】請求項２記載の発明は、請求項１記載のマ
ルチメディア通信会議システムにおいて、前記主局端末
の音声認識手段が、トークン要求の音声の特徴を抽出し
て、特徴を示すデータに変換する特徴抽出部と、該特徴
抽出部から音声の特徴を示すデータを入力し、入力した
データと予め記憶している辞書データ中のデータとの類
似度を算出し、最も類似度の高い上記辞書データ中のデ
ータを選択して、音声認識結果として出力する類似度算
出部と、からなることを特徴としている。According to a second aspect of the present invention, in the multimedia communication conference system according to the first aspect, the voice recognition means of the main station terminal extracts the voice feature of the token request and converts it into data showing the feature. The feature extraction unit and the data indicating the feature of the voice are input from the feature extraction unit, the similarity between the input data and the data in the dictionary data stored in advance is calculated, and the dictionary having the highest similarity is input. It is characterized in that it comprises a similarity calculation unit that selects data in the data and outputs it as a voice recognition result.

【０００９】請求項３記載の発明は、請求項１または２
記載のマルチメディア通信会議システムにおいて、前記
主局端末が、各参加局端末に対応した音声を登録する音
声登録手段を有し、該音声登録手段に登録された音声に
基づいて各参加局端末を識別することを特徴としてい
る。The invention according to claim 3 is the invention according to claim 1 or 2.
In the multimedia communication conference system described above, the main station terminal has a voice registration means for registering a voice corresponding to each participating station terminal, and each participating station terminal is controlled based on the voice registered in the voice registration means. It is characterized by identification.

【００１０】[0010]

【作用】請求項１記載の発明では、主局端末が参加局端
末にトークンを付与する際に、音声認識手段によりトー
クンを付与する参加局端末が指定できるので、手入力に
よるトークン付与操作に比べて操作が簡単で、会議をス
ムーズに進行させることができる。In the invention according to claim 1, when the main station terminal gives a token to the participating station terminal, the participating station terminal to which the token is given can be designated by the voice recognition means. It is easy to operate, and the conference can proceed smoothly.

【００１１】請求項２記載の発明では、音声認識手段が
特徴抽出部と類似度算出部とで構成されているため、ト
ークンを付与する参加局端末を判別するに当たり、音声
の特徴を抽出し、その類似度を算出した結果に基づいて
正確に判別される。請求項３記載の発明では、主局端末
には音声登録手段が設けられているため、各参加局端末
に対応した音声を登録しておき、この登録された音声に
基づいて各参加局端末が正確に識別される。According to the second aspect of the present invention, since the voice recognition means is composed of the feature extraction unit and the similarity calculation unit, the voice feature is extracted in determining the participating station terminal to which the token is added, It is accurately determined based on the result of calculating the similarity. In the invention described in claim 3, since the main station terminal is provided with the voice registration means, the voice corresponding to each participating station terminal is registered in advance, and each participating station terminal is controlled based on the registered voice. Accurately identified.

【００１２】[0012]

【実施例】以下、本発明を図面に基づいて説明する。ま
ず、構成を説明する。図１は本発明の一実施例に係るマ
ルチメディア通信会議システムのマルチメディア端末の
構成を示すブロック図である。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below with reference to the drawings. First, the configuration will be described. FIG. 1 is a block diagram showing the configuration of a multimedia terminal of a multimedia communication conference system according to an embodiment of the present invention.

【００１３】図１において、ＣＰＵ（Central Processi
ng Unit ）１は、ＲＯＭ（Read Only Memory）２に書き
込まれたプログラムに従って、マルチメディア装置のシ
ステム全体を制御するものである。ＲＡＭ（Random Acc
ess Memory）３は、ＣＰＵ１の動作に必要なワークエリ
アやデータを記憶する。In FIG. 1, a CPU (Central Processi)
The ng unit) 1 controls the entire system of the multimedia device according to a program written in a ROM (Read Only Memory) 2. RAM (Random Acc
The ess Memory) 3 stores a work area and data necessary for the operation of the CPU 1.

【００１４】ＤＭＡ（Direct Memory Access）コントロ
ーラ４は、ＤＭＡ転送モードにおける画情報の転送動作
等を制御する。ハンドセット５は、音声データのやり取
りをする受話器である。音声コーデック（ＣＯＤＥＣ）
６は、ＩＳＤＮ回線を通して受信したデジタル音声信号
を復号化してアナログ音声信号に変換したり、アナログ
音声信号を符号化してデジタル音声信号として送信した
りする。A DMA (Direct Memory Access) controller 4 controls the transfer operation of image information in the DMA transfer mode. The handset 5 is a handset that exchanges voice data. Audio codec (CODEC)
6 decodes a digital voice signal received through the ISDN line and converts it into an analog voice signal, or encodes the analog voice signal and transmits it as a digital voice signal.

【００１５】マイクロホン７は、会議中の会話内容を伝
達するための音声入力機器である。音声認識部８は、マ
イクロホンから入力される音声情報の認識を行うもので
ある。プリンタインターフェース１０は、プリンタ９と
システムバス２７とを接続して、入力される印字データ
に基づいて印字記録が行われる。The microphone 7 is a voice input device for transmitting conversation contents during the conference. The voice recognition unit 8 is for recognizing voice information input from a microphone. The printer interface 10 connects the printer 9 and the system bus 27, and prints and prints based on the input print data.

【００１６】スキャナインターフェース１２は、スキャ
ナ１１とシステムバス２７とを接続して、スキャナ１１
で読み込まれた原稿データをシステムバス２７に送出す
る。原稿メモリ１３は、原稿データ等を書き込み／読み
出し可能に蓄積する。タイマ１４は、現在時刻の管理を
行うものである。割り込み（ＩＮＴ）コントローラ２３
は、ＣＰＵ１の割り込み制御を行う。The scanner interface 12 connects the scanner 11 to the system bus 27 to connect the scanner 11 to the system bus 27.
The document data read in is sent to the system bus 27. The document memory 13 stores document data and the like in a writable / readable manner. The timer 14 manages the current time. Interrupt (INT) controller 23
Performs interrupt control of the CPU 1.

【００１７】通信制御部１６は、ＩＳＤＮ回線とのＳイ
ンターフェースを持つＮＣＵ（網制御部）１７を介して
所定の伝送制御手順に従ってテレライティングデータを
含む画像データの伝送を実行する。なお、この通信制御
部１６は、ＩＳＤＮの少なくともＢチャネル（６４Ｋbi
t/sec ）２本を使用しての同時通信が可能なよう構成さ
れている。The communication control unit 16 executes transmission of image data including telewriting data through an NCU (network control unit) 17 having an S interface with an ISDN line according to a predetermined transmission control procedure. It should be noted that this communication control unit 16 uses at least the B channel (64 Kbi
t / sec) It is configured so that simultaneous communication using two lines is possible.

【００１８】コーデック（ＣＯＤＥＣ）１８は、送信す
る画情報を所定の方式で符号化してその情報量を圧縮す
ると共に、受信時に符号化されている画情報を復号化し
て元の画情報に復元する。グラフィックコントローラ１
９は、ビデオラム（ＶＩＤＥＯＲＡＭ）２０に蓄積す
るグラフィックデータの制御を行うものである。The codec (CODEC) 18 encodes image information to be transmitted by a predetermined method and compresses the amount of the information, and also decodes the image information encoded at the time of reception to restore the original image information. . Graphic controller 1
Reference numeral 9 controls the graphic data accumulated in the video RAM (VIDEO RAM) 20.

【００１９】液晶表示部／タッチパネル（ＬＣＤ／Ｔ
Ｐ）コントローラ２１は、ＬＣＤ２２やタッチパネル２
３が接続され、ユーザインタフェースを提供したり、Ｌ
ＣＤ２２に表示される文書データに対して、トークンを
有している端末がタッチパネル２３とタッチペンとを使
って描画情報を入力する際の描画データ等を制御する。
このタッチパネル２３は、ＬＣＤ（液晶表示装置）の表
面に置かれ、オペレータはそのＬＣＤの画面上にタッチ
することにより、本システムに手書き情報の入力やＬＣ
Ｄ画面上に表示されたコマンドの選択が可能となる。Liquid crystal display / touch panel (LCD / T
P) The controller 21 includes the LCD 22 and the touch panel 2
3 is connected to provide a user interface or L
With respect to the document data displayed on the CD 22, the terminal having the token controls the drawing data and the like when the drawing information is input using the touch panel 23 and the touch pen.
The touch panel 23 is placed on the surface of an LCD (liquid crystal display device), and the operator touches the screen of the LCD to input handwritten information or LC in this system.
The command displayed on the D screen can be selected.

【００２０】動画コーデック２４は、ビデオ情報の符号
化・復号化を行うもので、ビデオ（ＶＩＤＥＯ）カメラ
２５やテレビモニタ２６が接続されている。システムバ
ス２７は、上記各部間を接続してデータをやり取りする
信号ラインである。本実施例のマルチメディア端末は、
これらのブロックで構成され、静止画伝送機能、テレラ
イティング機能、電話機能、データ通信機能などを有す
る。The moving picture codec 24 performs coding / decoding of video information, and is connected with a video (VIDEO) camera 25 and a television monitor 26. The system bus 27 is a signal line for exchanging data by connecting the above units. The multimedia terminal of this embodiment is
It is composed of these blocks and has a still image transmission function, a telewriting function, a telephone function, a data communication function and the like.

【００２１】図２は図１のマルチメディア端末を通信回
線網を介して複数台接続したマルチメディア通信会議シ
ステムの構成例を示す図であり、図３は会議中の参加局
端末のＬＣＤ画面例を示す図であり、図４は会議中の主
局端末のＬＣＤ画面例を示す図である。図２に示すよう
に、会議参加者は、マルチメディア端末２８のＬＣＤ２
９の画面を見ながら会議を行い、トークンのある会議参
加者がＬＣＤ画面に対して手書き情報を書き込むことが
できる。FIG. 2 is a diagram showing a configuration example of a multimedia communication conference system in which a plurality of the multimedia terminals of FIG. 1 are connected via a communication network, and FIG. 3 is an example of an LCD screen of a participating station terminal during a conference. FIG. 4 is a diagram showing an example of the LCD screen of the main station terminal during the conference. As shown in FIG. 2, the conference participants can use the LCD 2 of the multimedia terminal 28.
A conference can be held while looking at the screen of 9 and a conference participant with a token can write handwritten information on the LCD screen.

【００２２】ここで、マルチメディア端末２８が主局端
末となり、マルチメディア端末３２からｎ個目のマルチ
メディア端末３３までが参加局端末となったときの本実
施例の動作を図５及び図６のフローチャートを用いて説
明する。図５は音声登録時の操作手順を示すフローチャ
ートであり、図６は音声認識時の操作手順を示すフロー
チャートである。Here, the operation of this embodiment when the multimedia terminal 28 is the main station terminal and the multimedia terminals 32 to nth multimedia terminal 33 are the participating station terminals will be described with reference to FIGS. This will be described with reference to the flowchart of FIG. 5 is a flowchart showing an operation procedure at the time of voice registration, and FIG. 6 is a flowchart showing an operation procedure at the time of voice recognition.

【００２３】まず、主局端末２８は、マルチメディア端
末３２〜３３に対応する音声（言葉）を登録する。その
具体的な操作方法は、例えば、登録相手先を企画部にあ
るマルチメディア端末３２を登録する場合は、図５に示
すように、主局端末２８ののＬＣＤ２９画面に表示され
た「登録」（図示しない）メニューをタッチし（ステッ
プ１００）、相手先端末である「マルチメディア端末３
２」を指定する（ステップ１０１）。この情報は、ＬＣ
Ｄ／ＴＰコントローラ２１を介してＣＰＵ１に送られ
る。First, the master station terminal 28 registers voices (words) corresponding to the multimedia terminals 32 to 33. The specific operation method is, for example, when registering the multimedia terminal 32 in the planning department as the registration partner, as shown in FIG. 5, “Registration” displayed on the LCD 29 screen of the main station terminal 28. A menu (not shown) is touched (step 100) to select the "multimedia terminal 3" which is the other party's terminal.
2 "is designated (step 101). This information is LC
It is sent to the CPU 1 via the D / TP controller 21.

【００２４】次に、「スタート」をタッチして（ステッ
プ１０２）、相手先端末に対応する言葉、例えば「企画
部」と発声する（ステップ１０３）。この音声信号は、
マイクロホン７から音声認識部８に送られ、特徴抽出部
により特徴を抽出して（ステップ１０４）、辞書パター
ンが作成される（ステップ１０５）。このデータは、登
録端末と対応付けられて、辞書としてメモリに記憶され
る（ステップ１０６）。同様にしてｎ個の参加局端末に
対応する言葉の登録をする。これら一連の登録制御は、
ＣＰＵ１によりなされる。この登録作業は、通信会議の
前に予め完了しておく。Next, touch "Start" (step 102) and say a word corresponding to the other party's terminal, for example, "planning department" (step 103). This audio signal is
It is sent from the microphone 7 to the voice recognition unit 8, and the feature extraction unit extracts the features (step 104) to create a dictionary pattern (step 105). This data is stored in the memory as a dictionary in association with the registered terminal (step 106). Similarly, words corresponding to n participating station terminals are registered. These series of registration controls are
This is done by the CPU 1. This registration work is completed before the communication conference.

【００２５】次に、会議中にトークンを与える動作を説
明する。図２のマルチメディア端末２８及び３２〜３３
までのｎ個の端末は、通信回線網を介して接続され、Ｌ
ＣＤの画面上には、会議参加者に同じ画面が表示されて
いる（図３及び図４参照）。この画面は、会議の際の資
料や会議中のメモである。また、図３の参加局端末の画
面には、参加局端末が主局端末に対するトークン要求な
どのマルチメディア端末に対してのコマンド入力の画面
も同時に表示されている。Next, the operation of giving a token during a conference will be described. The multimedia terminals 28 and 32-33 of FIG.
Up to n terminals are connected via a communication network,
The same screen is displayed to the conference participants on the screen of the CD (see FIGS. 3 and 4). This screen is a document at the time of the meeting or a memo during the meeting. Further, on the screen of the participating station terminals of FIG. 3, a screen for command input to the multimedia terminal such as a token request from the participating station terminal to the main station terminal is also displayed at the same time.

【００２６】ここで、マルチメディア端末３２の参加局
端末のオペレータが主局端末であるマルチメディア端末
２８に対してトークン要求する場合は、図３に示すＬＣ
Ｄ２２画面上のトークン要求エリア３５をタッチする。
これにより、主局端末であるマルチメディア端末２８に
参加局端末であるマルチメディア端末３２より発言要求
があったことが伝えられる。マルチメディア端末２８で
は、その要求がどの参加局端末から来たかを図４に示す
ようにＬＣＤ２２画面上に表示する。もちろん、その他
の端末からトークンの要求があった場合も同様にして、
その要求が発生した端末のデータを表示する。主局端末
であるマルチメディア端末２８では、これらのトークン
の要求が上がっている端末の内、トークンを与えようと
する端末に相当する予め登録しておいた名前を、トーク
ン付与エリア３７をタッチした後に発声する。この音声
信号は、マイクロホン７を介して音声認識部８に送ら
れ、特徴抽出部において音声の特徴を示すデータに変換
される。さらに、類似度算出部に送られて、辞書データ
の中で最も類似度の高いものが選ばれて、それが認識結
果となる。そして、この認識結果はＣＰＵ１に送られ
て、この認識結果に対してトークンが付与される。Here, in the case where the operator of the participating station terminal of the multimedia terminal 32 makes a token request to the multimedia terminal 28 which is the main station terminal, the LC shown in FIG.
Touch the token request area 35 on the D22 screen.
As a result, the multimedia terminal 28, which is the main station terminal, is informed that the multimedia terminal 32, which is a participating station terminal, has made a speech request. The multimedia terminal 28 displays from which participating terminal terminal the request came from on the screen of the LCD 22 as shown in FIG. Of course, if there is a token request from another terminal,
Display the data of the terminal that made the request. In the multimedia terminal 28, which is the master station terminal, of the terminals for which a token request has been made, the token registration area 37 is touched with a pre-registered name corresponding to the terminal to which the token is to be given. I will speak later. This voice signal is sent to the voice recognition unit 8 via the microphone 7 and converted into data indicating the voice feature in the feature extraction unit. Further, it is sent to the similarity calculation unit, the dictionary data having the highest similarity is selected, and it is the recognition result. Then, this recognition result is sent to the CPU 1, and a token is added to this recognition result.

【００２７】本実施例の特徴抽出部には、特徴抽出ＬＳ
Ｉが用いられ、類似度算出部には、認識処理用ＬＳＩが
用いられている。特徴抽出ＬＳＩは、入力された音声か
らパワースペクトルを抽出し、このパワースペクトルか
ら周波数上のピークを抽出し、それに基づいて「０」と
「１」の２値化処理を行い、ＢＴＳＰ（Binary Time Sp
ectrum Pattern) データを生成するものである。The feature extraction unit of the present embodiment includes a feature extraction LS
I is used, and a recognition processing LSI is used for the similarity calculation unit. The feature extraction LSI extracts a power spectrum from the input voice, extracts a peak on the frequency from the power spectrum, performs binarization processing of "0" and "1" based on the extracted peak, and outputs a BTSP (Binary Time). Sp
ectrum Pattern) Generates data.

【００２８】認識処理用ＬＳＩは、ＣＭＯＳスタンダー
ドセルで構成され、特徴抽出ＬＳＩからの周期パルスに
よる割り込み処理で、ＢＴＳＰデータと１５チャネルの
パワースペクトラムデータが入力される。音声区間検出
は、音の特徴量として時間＝周波数パターン（ＴＳＰ：
Time Spectrum Pattern )データを用いて、周期パルス
に従った割り込み処理内で行われる。そして、そこで生
成される音声区間信号に従って、ＢＴＰＳデータが入力
される。The recognition processing LSI is composed of a CMOS standard cell, and BTSP data and 15-channel power spectrum data are input by interrupt processing by a periodic pulse from the feature extraction LSI. In the voice section detection, time = frequency pattern (TSP:
Time Spectrum Pattern) data is used in the interrupt process according to the periodic pulse. Then, the BTPS data is input according to the voice section signal generated there.

【００２９】このように、本実施例のマルチメディア通
信会議システムによれば、主局端末が参加局端末にトー
クンを付与する際に、手入力によらずに音声によりトー
クン付与機能が指定できるので、操作が簡単で、会議を
スムースに進行させることができるようになった。As described above, according to the multimedia communication conference system of the present embodiment, when the main station terminal gives tokens to the participating station terminals, the token giving function can be designated by voice without manual input. , Easy to operate, now you can smoothly proceed with the meeting.

【００３０】[0030]

【発明の効果】請求項１記載の発明によれば、主局端末
が参加局端末にトークンを付与する際に、音声認識手段
によりトークンを付与する参加局端末が指定できるの
で、手入力によるトークン付与操作に比べて操作が簡単
で、会議をスムーズに進行させることができる。According to the invention described in claim 1, when the main station terminal gives the token to the participating station terminals, the participant station terminal to which the token is given can be designated by the voice recognition means, so that the token manually input The operation is easier than the giving operation, and the conference can be smoothly progressed.

【００３１】請求項２記載の発明によれば、音声認識手
段が特徴抽出部と類似度算出部とで構成されているた
め、トークンを付与する参加局端末を判別するに当た
り、音声の特徴を抽出し、その類似度を算出した結果に
基づいて正確に判別することができる。請求項３記載の
発明によれば、主局端末には音声登録手段が設けられて
いるため、各参加局端末に対応した音声を登録してお
き、この登録された音声に基づいて各参加局端末を正確
に識別することができる。According to the second aspect of the invention, since the voice recognition means is composed of the feature extraction unit and the similarity calculation unit, the voice feature is extracted when determining the participating station terminal to which the token is added. However, it is possible to make an accurate determination based on the result of calculating the degree of similarity. According to the invention described in claim 3, since the main station terminal is provided with the voice registration means, the voice corresponding to each participating station terminal is registered, and each participating station is based on this registered voice. The terminal can be accurately identified.

[Brief description of drawings]

【図１】本発明の一実施例に係るマルチメディア通信会
議システムのマルチメディア端末の構成を示すブロック
図である。FIG. 1 is a block diagram showing a configuration of a multimedia terminal of a multimedia communication conference system according to an embodiment of the present invention.

【図２】図１のマルチメディア端末を通信回線網を介し
て複数台接続したマルチメディア通信会議システムの構
成例を示す図である。FIG. 2 is a diagram showing a configuration example of a multimedia communication conference system in which a plurality of multimedia terminals of FIG. 1 are connected via a communication network.

【図３】会議中の参加局端末のＬＣＤ画面例を示す図で
ある。FIG. 3 is a diagram showing an example of an LCD screen of a participating station terminal during a conference.

【図４】会議中の主局端末のＬＣＤ画面例を示す図であ
る。FIG. 4 is a diagram showing an example of an LCD screen of a main station terminal during a conference.

【図５】音声登録時の操作手順を示すフローチャートで
ある。FIG. 5 is a flowchart showing an operation procedure at the time of voice registration.

【図６】音声認識時の操作手順を示すフローチャートで
ある。FIG. 6 is a flowchart showing an operation procedure at the time of voice recognition.

[Explanation of symbols]

１ＣＰＵ２ＲＯＭ３ＲＡＭ８音声認識部１６通信制御部１７ＮＣＵ１９グラフィックコントローラ２０ビデオＲＡＭ２１ＬＣＤ／ＴＰコントローラ２２ＬＣＤ２３タッチパネル２８、３２、３３マルチメディア端末３１通信回線網 1 CPU 2 ROM 3 RAM 8 voice recognition unit 16 communication control unit 17 NCU 19 graphic controller 20 video RAM 21 LCD / TP controller 22 LCD 23 touch panel 28, 32, 33 multimedia terminal 31 communication line network

Claims

[Claims]

1. A plurality of multimedia communication terminals capable of composite use of a plurality of media, wherein one terminal of the plurality of multimedia communication terminals is a main station terminal and the remaining terminals are As a participating station terminal, when a token request is made from the participating station terminal to the main station terminal and a token is given from the main station terminal to the participating station terminal, information sent from the participating station terminal to which the token is given Is displayed on each communication terminal, the main station terminal has voice recognition means for recognizing a token request from a participating station terminal by voice, and based on the voice recognition result of the voice recognition means. A multimedia communication conferencing system characterized in that a token is added to the multimedia communication conference system.

2. The multimedia communication conference system according to claim 1, wherein the voice recognition means of the master station terminal extracts a voice feature of the token request, and converts it into data indicating the feature. Data indicating a voice feature is input from the feature extraction unit, the similarity between the input data and the data in the dictionary data stored in advance is calculated, and the data in the dictionary data with the highest similarity is selected. Then, a multimedia communication conference system, comprising: a similarity calculation unit that outputs the result as a voice recognition result.

3. The multimedia communication conference system according to claim 1, wherein the main station terminal has a voice registration means for registering a voice corresponding to each participating station terminal, and is registered in the voice registration means. A multimedia communication conferencing system characterized in that each participating station terminal is identified based on the audio.