JPH11272663A

JPH11272663A - Device and method for preparing minutes and recording medium

Info

Publication number: JPH11272663A
Application number: JP10076792A
Authority: JP
Inventors: Masao Iwasaki; 正生岩崎; Akifumi Umeda; 昌文梅田; Kazuhiro Takashima; 和宏高島; Toshihiro Morohoshi; 利弘諸星; Tomiyoshi Fukumoto; 富義福元
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1998-03-25
Filing date: 1998-03-25
Publication date: 1999-10-08

Abstract

PROBLEM TO BE SOLVED: To obtain the system which can generate the high precision minutes of even though a personal computer conference system using a communication network has narrow transmission bandwidth and can easily judge who of conference participants is speaking in a conference. SOLUTION: Voice data inputted from a voice input/output part 3 are stored in a voice storage part 9 and converted by a voice-text conversion part 6 into text data, which are sent to the minutes generation part 7 of the speaker's or other-party's personal computer. The voice input/output part 3 is triggered by detecting voice input to photograph and send a picture of the speaker's face to the minutes generation part 7. The image data and text data sent to the minutes generation part 7 are used by the minute generation part 7 to generate the conference minutes, which are recorded in a minutes recording part 8.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本願発明は、パーソナルコン
ピュータ( 以下パソコンを記述する) 等の情報処理機器
を用いたパソコン会議システムであって、特に会議中の
発言内容と発言者を撮影した画像を会議議事録として自
動作成機能を有するパソコン会議システム及び記録媒体
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a personal computer conference system using an information processing device such as a personal computer (hereinafter referred to as a personal computer). The present invention relates to a personal computer conference system and a recording medium having an automatic creation function as minutes.

【０００２】[0002]

【従来の技術】通常テレビ会議やテレビ電話は、専用ハ
ードを用いてデータの送受信を行い、画像表示にテレビ
モニタを使用している。これは、ＩＴＵ−Ｔ国際標準に
準拠した方式を採用し、画像・音声はデータ圧縮を行い
送受信している。2. Description of the Related Art Normally, in a video conference or a video phone, data is transmitted and received using dedicated hardware, and a TV monitor is used for image display. This adopts a method based on the ITU-T international standard, and transmits and receives images and sounds after compressing the data.

【０００３】また、通常パソコン会議を行う場合、図１
４のように、各会議参加者は会議参加端末の制御を行な
っているサーバに接続して会議を進行している。また、
テレビ会議で会議録を作成する特許が出願されているが
（特開平６−２２５３０号公報，特開平７−１８２３６
５号公報）、音声や画像データをこのサーバに記録し、
蓄積してあとで作成しているに過ぎない。[0003] Also, when a normal personal computer conference is held, FIG.
As shown in FIG. 4, each conference participant is connecting to a server that controls the conference participation terminal and is proceeding with the conference. Also,
Patents have been filed for creating conference proceedings in video conferences (Japanese Patent Application Laid-Open Nos. Hei 6-22530 and Hei 7-18236).
No. 5), voice and image data are recorded on this server,
It is only accumulated and created later.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、このよ
うなパソコン会議サーバを用いた場合、議事録の作成は
会議終了後になり手間がかかる。また会議参加端末の全
ての発言内容を、テキスト変換する手段を利用すること
が考えられるが、送信されてくる音声信号は必ずしも品
質が高いものとは限られない。一般にこのようなパソコ
ン会議システムで使用する音声信号はＩＴＵ−Ｔ勧告
Ｇ．７１１・Ｇ．７２２のような音声圧縮符号化が行わ
れている。However, when such a personal computer conference server is used, the minutes of the minutes are created after the conference, which is troublesome. In addition, although it is conceivable to use means for converting all the contents of speech of the conference participants into text, the transmitted voice signal is not always of high quality. Generally, audio signals used in such a personal computer conference system conform to ITU-T Recommendation G. 711.G. Speech compression encoding such as 722 is performed.

【０００５】端末間に用いられている通信網の状態によ
ってはサーバで受信、復号した音声信号からテキスト変
換するのは極めて困難であり、誤認識された議事録を作
成してしまうという問題点がある。また会議中の発言が
誰のものかすぐに判断することは困難である。[0005] Depending on the state of the communication network used between the terminals, it is extremely difficult to convert the speech signal received and decoded by the server into text, and the minutes of misrecognized minutes are created. is there. It is also difficult to immediately determine who is speaking during the meeting.

【０００６】[0006]

【課題を解決するための手段】このような、課題を解決
するために本願発明は、音声データを入力するための入
力手段と、この入力手段によって入力された音声データ
に識別子を付与して記憶する音声蓄積手段と、この音声
蓄積手段に記憶された音声データを文字データに変換す
る変換手段と、この変換手段によって変換された文字デ
ータをこの文字データに変換された音声データに付与さ
れた識別子をもとに並べ替え、議事録を作成する手段と
を有することを特徴とする。SUMMARY OF THE INVENTION In order to solve the above-mentioned problems, the present invention provides an input means for inputting voice data, and assigns an identifier to the voice data input by the input means and stores the data. Voice storage means for converting the voice data stored in the voice storage means into character data, and an identifier assigned to the voice data converted from the character data converted by the conversion means. And means for creating a minutes of the meeting.

【０００７】また、音声データおよびこの音声データに
対応する画像データを入力する画像入力手段と、この入
力手段によって入力された音声データおよびこの音声デ
ータに対応する画像データに識別子を付与して記憶する
記憶手段と、この記憶手段に記憶された音声データを文
字データに変換する変換手段と、この変換手段によって
変換された前記文字データと前記文字データに変換され
た音声データに対応する前記記憶手段に記憶された画像
データとを前記識別子をもとに並べ替え、議事録を作成
する手段とを有することを特徴とする。[0007] Further, image input means for inputting audio data and image data corresponding to the audio data, and identifiers are assigned to the audio data input by the input means and the image data corresponding to the audio data and stored. A storage unit, a conversion unit that converts the voice data stored in the storage unit into character data, and the storage unit corresponding to the character data converted by the conversion unit and the voice data converted to the character data. Means for rearranging the stored image data based on the identifier and creating a minutes.

【０００８】また、入力された音声データに識別子を付
与して記憶し、記憶された音声データを文字データに変
換し、変換された文字データを前記音声データに付与さ
れた識別子をもとに並べ替え、議事録を作成することを
特徴とする。[0008] Further, an identifier is added to the input voice data and stored, the stored voice data is converted into character data, and the converted character data is arranged based on the identifier provided to the voice data. In other words, the minutes are prepared.

【０００９】また、入力された音声データおよびこの音
声データに対応する画像データに識別子を付与して記憶
し、この記憶された音声データを文字データに変換し、
この変換された前記文字データと前記文字データに変換
された音声データに対応する記憶された画像データとを
前記識別子をもとに並べ替え、議事録を作成することを
特徴とする。An identifier is assigned to the input audio data and the image data corresponding to the audio data and stored, and the stored audio data is converted into character data.
The converted character data and the stored image data corresponding to the audio data converted to the character data are rearranged based on the identifier, and a minutes is created.

【００１０】また、複数のコンピュータを介してデータ
をやり取りし、議事録を作成するプログラムをコンピュ
ータで実行可能なように記録された記録媒体であって、
入力された音声データに識別子を付与して記憶し、記憶
された音声データを文字データに変換し、変換された文
字データを前記音声データに付与された識別子をもとに
並べ替え、議事録を作成するプログラムをコンピュータ
で実行可能なように記録された記録媒体である。[0010] Further, the present invention is a recording medium on which a program for exchanging data via a plurality of computers and creating minutes is recorded so as to be executable by the computers,
An identifier is added to the input voice data and stored, the stored voice data is converted into character data, the converted character data is rearranged based on the identifier provided to the voice data, and the minutes of the minutes are recorded. This is a recording medium on which a program to be created is recorded so as to be executable by a computer.

【００１１】また、複数のコンピュータを介してデータ
をやり取りし、議事録を作成するプログラムをコンピュ
ータで実行可能なように記録された記録媒体であって、
入力された音声データおよびこの音声データに対応する
画像データに識別子を付与して記憶し、この記憶された
音声データを文字データに変換し、この変換された前記
文字データと前記文字データに変換された音声データに
対応する記憶された画像データとを前記識別子をもとに
並べ替え、議事録を作成するプログラムをコンピュータ
で実行可能なように記録された記録媒体である。[0011] Also, a recording medium in which a program for exchanging data via a plurality of computers and creating minutes is recorded so as to be executed by the computers,
The input audio data and the image data corresponding to the audio data are given identifiers and stored, and the stored audio data is converted into character data, and the converted character data and the character data are converted. A recording medium in which a program for rearranging the stored image data corresponding to the generated audio data and rearranging the minutes based on the identifier is recorded so as to be executable by a computer.

【００１２】このような本願発明により、会議中に発言
された音声データは、音声入力側でテキスト変換処理が
行われる事により、誤りの少ない精度の高い議事録を作
成する事が可能になる。According to the invention of the present application, the voice data spoken during the conference is subjected to text conversion processing on the voice input side, thereby making it possible to create a highly accurate minutes with few errors.

【００１３】また議事録を、会議に参加している人のパ
ソコン上で作成してしまうため、別途パソコン会議サー
バを用意する必要もなくなった。また、議事録に発言者
の顔画像を添付する事により会議終了後に議事録を読ん
だときに、会議参加者の誰が発言したかが容易に知る事
ができる。Since the minutes of the meeting are created on the personal computer of the person participating in the meeting, there is no need to prepare a separate personal computer conference server. In addition, by attaching the face image of the speaker to the minutes, when the minutes are read after the meeting, it is possible to easily know who of the meeting participants has spoken.

【００１４】[0014]

【発明の実施の形態】以下、本願発明の実施例を図面を
参照して詳細に説明する。（第１の実施例）図１は本願発明のパソコン会議システ
ムの構成を示す図である。図１において、パソコン１が
本願発明に用いる情報処理機器であるパソコン本体であ
る。Embodiments of the present invention will be described below in detail with reference to the drawings. (First Embodiment) FIG. 1 is a diagram showing the configuration of a personal computer conference system according to the present invention. In FIG. 1, a personal computer 1 is a personal computer body which is an information processing device used in the present invention.

【００１５】画像入力部２はパソコン使用者である会議
出席者を撮影するデジタルビデオカメラやデジタルカメ
ラ等である。音声入出力部３は、マイクやスピーカ等よ
り構成されている。The image input unit 2 is a digital video camera, a digital camera, or the like for photographing a conference attendee who is a personal computer user. The audio input / output unit 3 includes a microphone, a speaker, and the like.

【００１６】またデータ入力部４は、画像入力部２、音
声入出力部３以外のデータを入力するものであり、具体
的にはパソコンのキーボードや、描画入力用のディジタ
イザやその他の補助手段の入力手段である。The data input section 4 is for inputting data other than the image input section 2 and the voice input / output section 3. Specifically, the data input section 4 includes a keyboard of a personal computer, a digitizer for drawing input, and other auxiliary means. Input means.

【００１７】パソコン会議処理部５はパソコン上でテレ
ビ・データ会議のサービスを提供すると共に、画像・音
声・データの圧縮伸長する手段を持っている。このパソ
コン会議処理部５は、画像入力部２・音声入出力部３・
データ入力部４から得られた各種データを、他のパソコ
ン使用者とデータの送受信を行う事により、離れたパソ
コンの間で相手の顔を見ながら、同じデータをリアルタ
イムに共有することを可能にする。The personal computer conference processing unit 5 provides a television / data conference service on a personal computer and has means for compressing / decompressing image / audio / data. The personal computer conference processing unit 5 includes an image input unit 2, an audio input / output unit 3,
By transmitting / receiving various data obtained from the data input unit 4 to other PC users, it is possible to share the same data in real time while viewing the other person's face between remote PCs. I do.

【００１８】また、送受信するデータを、画像はＩＴＵ
−Ｔ国際標準Ｈ．２６１準拠方式やＨ．２６３準拠方式
の符号化をする。音声ならＧ．７１１やＧ．７２８準拠
方式等の符号化をする。The image to be transmitted and received is represented by an ITU
-T International Standard H. H.261 compliant system and H.264. H.263 compliant coding. G. For voice. 711 and G. 728 compliant coding or the like.

【００１９】音声テキスト変換処理部６は音声入出力部
３から入力された発言者の音声信号を認識してテキスト
データに変換するものである。議事録作成部７は画像入
力部２より入力されたパソコン使用者の顔画像データと
音声テキスト変換処理部６から得られる発言者のテキス
トデータと、その他のデータを記録して議事録データと
して作成する。この議事録作成部７は、会議を行ってい
る相手から送られてくるデータも使用する。The voice / text conversion processing section 6 recognizes the voice signal of the speaker input from the voice input / output section 3 and converts it into text data. The minutes creating unit 7 records the face image data of the personal computer user input from the image input unit 2, the text data of the speaker obtained from the voice-to-text conversion processing unit 6, and other data, and creates the minutes data. I do. The minutes creating unit 7 also uses data sent from the other party having the meeting.

【００２０】議事録記録部８は議事録作成部７で作成さ
れた議事録を記録しておくところである。音声蓄積部９
は、音声テキスト変換処理部６に処理される前に音声信
号データを一時的に記録しておくところである。The minutes recording section 8 is where the minutes created by the minutes creating section 7 are recorded. Voice storage unit 9
Is where audio signal data is temporarily recorded before being processed by the audio-text conversion processing unit 6.

【００２１】通信インターフェース（以下、通信ＩＦと
記述する）１０は通信網１１と接続するための網インタ
ーフェースである。通信網はＩＳＤＮ・電話網・ＬＡＮ
・インターネット等のネットワーク回線である。モニタ
１２はパソコン会議処理部５の出力画面や、議事録結果
画像、他の会議参加者の顔画像などを表示するものであ
る。パソコン会議を行っている相手も同じ構成をもつも
のである。A communication interface (hereinafter referred to as a communication IF) 10 is a network interface for connecting to a communication network 11. Communication network is ISDN, telephone network, LAN
・ This is a network line such as the Internet. The monitor 12 displays an output screen of the personal computer conference processing section 5, a minutes result image, a face image of another conference participant, and the like. The other party holding the PC conference also has the same configuration.

【００２２】通常のパソコン会議での、発言者が発言さ
れた音声信号データの流れは図２のフローチャートのよ
うになる。まず、ステップＳ２１で、会議参加者の発言
が音声入出力部３から入力される。入力されたデータは
デジタルデータである。次にステップＳ２２で前記Ｓ２
１で得られた音声データをパソコン会議処理部５で国際
標準に準拠されている符号化方式により圧縮する。次の
ステップＳ２３で通信ＩＦ１０を介して各会議参加者の
パソコンへ送信する。The flow of voice signal data in which a speaker speaks in a normal personal computer conference is as shown in the flowchart of FIG. First, in step S21, the speech of the conference participant is input from the voice input / output unit 3. The input data is digital data. Next, in step S22, the aforementioned S2
The audio data obtained in step 1 is compressed by the personal computer conference processing unit 5 using an encoding system conforming to international standards. In the next step S23, the data is transmitted to the personal computer of each conference participant via the communication IF10.

【００２３】次に、音声信号がテキストデータに変換さ
れて送信される流れを図３のフローチャートに示す。ま
ず、ステップＳ３１で、会議参加者の発言が音声入出力
部２から入力される。ステップＳ３２で図４のような音
声データフォーマットで音声蓄積部９に蓄積される。FIG. 3 is a flowchart showing the flow of converting an audio signal into text data and transmitting the text data. First, in step S31, the speech of the conference participant is input from the voice input / output unit 2. In step S32, the audio data is stored in the audio storage unit 9 in the audio data format as shown in FIG.

【００２４】図４の音声データフォーマットは、ＩＤ番
号４１、Ｓ３１で音声入力された時の時刻データ４２、
音声デジタル信号４３で構成される。ＩＤ番号４１は、
会議の参加者を特定するＩＤで、パソコン会議処理部５
により割り当てられたユーザＩＤと発言番号である。The voice data format shown in FIG. 4 includes an ID number 41, time data 42 when voice is input at S31,
It is composed of an audio digital signal 43. ID number 41 is
An ID that identifies a participant in the conference
Are the user ID and the utterance number assigned.

【００２５】発言番号とは会議中に発せられた発言順に
付けられる番号で、同じ会議中には発言番号が同じ数値
が存在しないようにする。例えば一番最初の発言者の発
言番号は“００００１”とすると次の発言にたいしては
“００００２”が割り当てられるものとする。The utterance number is a number assigned in the order of utterances uttered during the conference, and the same numerical value is not present in the utterance number during the same conference. For example, if the statement number of the first speaker is “00001”, “00002” is assigned to the next statement.

【００２６】ユーザＩＤは、パソコン会議処理部５が起
動された時に自動的に割り当てられたものを使用する。
また、会議参加者が任意に変更することも可能である。
時刻データ４２は、Ｓ３１で音声入出力部３から発言者
の発言が入力された時の、パソコンの時刻を使用する。The user ID used is automatically assigned when the personal computer conference processing unit 5 is started.
It is also possible for the meeting participants to make any changes.
The time data 42 uses the time of the personal computer when the speech of the speaker is input from the voice input / output unit 3 in S31.

【００２７】次に、ステップＳ３３でＳ３２の音声蓄積
部９に記録されている図４の音声データを、音声テキス
ト変換処理部６でテキストデータに変換する。図４の音
声データの音声デジタル信号４３の部分が、音声デジタ
ルデータからテキストデータに変換される。この音声テ
キストデータが次のステップＳ３４にて通信ＩＦ１０を
介して他の会議参加者の議事録作成部７に送られるか、
または自分のパソコンの議事録作成部７に送られる。場
合により両者へ送られる。Ｓ３４でのテキストデータの
送信先は会議開始の時に会議参加者によって設定する。Next, in step S33, the voice data of FIG. 4 recorded in the voice storage unit 9 in S32 is converted into text data by the voice / text conversion processing unit 6. The audio digital signal 43 of the audio data in FIG. 4 is converted from the audio digital data to text data. Whether the voice text data is sent to the minutes creating unit 7 of another conference participant via the communication IF 10 in the next step S34,
Alternatively, it is sent to the minutes preparing section 7 of the personal computer. May be sent to both. The destination of the text data in S34 is set by the conference participant at the start of the conference.

【００２８】例えばＡ、Ｂ、Ｃの三箇所のパソコンで会
議をした場合。Ａが議事録を作成すると設定した場合、
会議参加者ＢとＣに対してテキストデータ送信要求を出
す。ステップＳ３４のテキストデータの送信先はＡは自
身の議事録作成部に、ＢとＣはＡの議事録作成部に送
る。最終的に議事録はＡのパソコンの議事録作成部７で
作成され、Ａのパソコンの議事録記録部８に記録され
る。ＢとＣは、Ａの議事録を参照すればよい。For example, when a conference is held at three personal computers A, B, and C. If A sets to create minutes,
A text data transmission request is issued to conference participants B and C. In step S34, A sends the text data to its own minutes creator, and sends B and C to its minutes creator. Finally, the minutes are created in the minutes creating section 7 of the personal computer of A, and are recorded in the minutes recording section 8 of the personal computer of A. B and C may refer to the minutes of A.

【００２９】次にＡとＢで両者が議事録を作成すると設
定した場合。ＡはＢとＣに、ＢはＡとＣに対してテキス
トデータ送信要求を出す。ＡとＢのパソコンの議事録作
成部７で議事録が作成され、ＣはＡとＢのどちらの議事
録も参照することが可能になる。Next, a case is set in which the minutes are prepared by A and B. A issues a text data transmission request to B and C, and B issues a text data transmission request to A and C. The minutes are created by the minutes creating unit 7 of the personal computers A and B, and C can refer to both the minutes A and B.

【００３０】また、全ての参加者が議事録作成の設定を
しない場合は、Ａ・Ｂ・Ｃにはテキストデータ送信要求
がこないことになる。その時は、各議事録作成部７より
会議参加者に対して「議事録を作成するか？Ｙ／Ｎ」の
要求を出す。全ての参加者がこの質問に対してＮＯ( 拒
否) をした場合、テキストデータ送信要求信号はどのパ
ソコンからも送信されず、議事録も作成されない。また
ＹＥＳ（賛成）をした場合はそのパソコンから他のパソ
コンに対してテキストデータ送信要求をだす。If all participants do not set the minutes, no text data transmission request will be sent to ABC. At that time, the minutes creating unit 7 issues a request to the meeting participant, "Do you want to create minutes? Y / N". If all participants answer NO (rejection) to this question, no text data transmission request signal is sent from any personal computer, and no minutes are created. If YES (agree), the personal computer issues a text data transmission request to another personal computer.

【００３１】また、ステップＳ３２で音声蓄積部９に記
録されている図４の音声データを、音声テキスト変換処
理部６でテキストデータに変換する処理は、ＣＰＵ使用
率が低い時に行えば効率がよい。Further, the process of converting the voice data of FIG. 4 recorded in the voice storage unit 9 in step S32 into text data by the voice / text conversion processing unit 6 is efficient when the CPU utilization is low. .

【００３２】例えば、発言者が発言しておらず、音声信
号を他会議参加者に送信していない時などに処理が行な
われるとにより、パソコン会議中の画像や音質の劣化や
レスポンスの低下を押さえることができる。For example, if the processing is performed when the speaker is not speaking and the audio signal is not transmitted to the other conference participants, the deterioration of the image and sound quality and the response during the PC conference are reduced. Can be held down.

【００３３】また、他パソコンへ向けてのテキストデー
タ送信時に、通信網上の画像・音声・データの伝送状態
を監視して、伝送帯域を再割り当てをすることにより、
パソコン会議中の画像や音質の劣化やレスポンスの低下
を押さえることができる。例として伝送状態を監視中に
会議参加者の発言が無い時は、音声信号の送信は無い状
態なので、音声信号の伝送帯域を減らして、テキストデ
ータを送信する伝送帯域を増やして送信をする。Further, when text data is transmitted to another personal computer, the transmission state of the image, voice, and data on the communication network is monitored, and the transmission band is reallocated.
Deterioration of image and sound quality and response during computer conference can be suppressed. As an example, when there is no remark from a conference participant while monitoring the transmission state, there is no transmission of the audio signal, so the transmission band of the audio signal is reduced and the transmission band for transmitting the text data is increased to transmit.

【００３４】次に議事録作成方法を詳しく説明する。上
述したＳ３４で議事録作成部７に送信された図４のフォ
ーマットの音声データからテキストに変換したデータ
は、図５ような状態で記憶されている。Next, a method for creating minutes will be described in detail. The data converted from the audio data in the format of FIG. 4 to text and transmitted to the minutes creating unit 7 in S34 described above is stored in a state as shown in FIG.

【００３５】前述している通りテキストデータの送信
は、必ずしもリアルタイムで行っていないため、記憶さ
れているデータは時間列順には並んでいない。まず発言
テキストデータに付加している時刻コードにて、時間列
順に並び替えをおこなう。これで、会議中に発言した順
番で発言内容が並んだだことになり、最終的には図６の
ようなフォーマットに再編成される。これで、発言内容
が時間列順に並んだ状態で記憶される。As described above, since the transmission of text data is not always performed in real time, the stored data is not arranged in time sequence. First, rearrangement is performed in the order of the time sequence according to the time code added to the comment text data. As a result, the contents of the remarks are arranged in the order in which the remarks were made during the meeting, and are finally reorganized into the format shown in FIG. As a result, the utterance contents are stored in a state where they are arranged in time sequence.

【００３６】ユーザから会議議事録の表示を求められた
時は、議事録作成部７は図７のようなレイアウトでパソ
コンのモニタ１２に表示する。モニタへの表示は会議中
に発言された内容７２と発言した発言者の名前７１を表
示する。発言者の名前７１は図４の音声データのＩＤ情
報４１を使用する。When the user requests the display of the minutes of the meeting, the minutes creating unit 7 displays the minutes on the monitor 12 of the personal computer in a layout as shown in FIG. The display on the monitor displays the content 72 spoken during the meeting and the name 71 of the speaker who spoke. The name 71 of the speaker uses the ID information 41 of the voice data in FIG.

【００３７】以上のように第１の実施例によれば、会議
議事録に必要となる会議中の発言内容を発言者側でテキ
スト文章化するので、通信網の状態が悪くても精度の高
い議事録が作成する事が可能になる。As described above, according to the first embodiment, since the contents of remarks during the meeting necessary for the minutes of the meeting are converted into text sentences on the side of the speaker, high accuracy is obtained even if the state of the communication network is poor. Minutes can be created.

【００３８】（第２の実施例）次に、本願発明の第２の
実施例について詳しく説明する。会議中に発言者の発言
をトリガにして発言者の顔を撮影し、そのデータを議事
録に使用するまでを図８のフローチャートにもとづき説
明する。(Second Embodiment) Next, a second embodiment of the present invention will be described in detail. The process from photographing the face of the speaker using the speech of the speaker during the meeting as a trigger and using the data in the minutes of the meeting will be described with reference to the flowchart of FIG.

【００３９】まず、ステップＳ８１で、会議参加者の発
言が、音声入出力部３から入力されたことを検出する。
画像入力部２は、常にパソコン使用者の顔を撮影して会
議参加の端末の画面上にデータを表示している。First, in step S81, it is detected that the speech of the conference participant has been input from the voice input / output unit 3.
The image input unit 2 always captures the face of the personal computer user and displays the data on the screen of the terminal participating in the conference.

【００４０】ステップＳ８２ではＳ８１で発言を検出
し、これをトリガにして次のタイミングで、画像入力部
２から入力された発言者の顔画像データに、ＩＤと時刻
データを付加する。In step S82, the speech is detected in S81, and using this as a trigger, the ID and time data are added to the face image data of the speaker input from the image input unit 2 at the next timing.

【００４１】図９はユーザＩＤを付加した顔画像データ
のフォーマットである。顔画像データのＩＤ９１は上述
したＳ３２で得られた発言者番号と同じ数値が与えられ
る。画像データ９２はＳ８２で撮影された発言者の顔の
画像データである。FIG. 9 shows the format of face image data to which a user ID has been added. The same numerical value as the speaker number obtained in S32 is given to the ID 91 of the face image data. The image data 92 is image data of the face of the speaker photographed in S82.

【００４２】次に、ステップＳ８３にて通信ＩＦ１０を
介してテキストデータ送信要求信号を出したパソコンの
議事録作成部７に、Ｓ８２で得られた画像データ送る。
次にステップＳ８４で議事録作成部７に蓄積されている
Ｓ３４で蓄積されていた音声データと、Ｓ８３で得られ
た顔画像データを関連付けて議事録を作成する。Next, in step S83, the image data obtained in step S82 is sent to the minutes creating section 7 of the personal computer that issued the text data transmission request signal via the communication IF 10.
Next, in step S84, the minutes are created by associating the voice data stored in S34 stored in the minutes creation unit 7 with the face image data obtained in S83.

【００４３】次に議事録作成方法を説明する。Ｓ８２で
得られた図９のフォーマットの発言者の顔画像データ
と、Ｓ３２で得られた図４のフォーマットの音声データ
が、図１０ような状態で記憶されている。Next, a method for creating minutes will be described. The face image data of the speaker in the format of FIG. 9 obtained in S82 and the voice data in the format of FIG. 4 obtained in S32 are stored in a state as shown in FIG.

【００４４】前述している議事録作成方法と同様に、ま
ず発言テキストデータに付加している時刻コードにて時
間列順に並び替えをおこなう。これで、会議中に発言し
た順番で発言内容が並んだだことになる。次に発言デー
タと発言者の顔データに付加してある発言番号を比較し
て、同じ数字で並び替えることにより、最終的には図１
１のような状態に再編成される。これで、発言者の顔画
像データ・発言内容が時間列順に並んだ状態で、議事録
記録部８に記録される。In the same manner as in the above-described minutes creating method, first, the order is rearranged in the time sequence according to the time code added to the comment text data. This means that the remarks are arranged in the order in which the remarks were made during the meeting. Next, the utterance data and the utterance numbers added to the utterer's face data are compared, and rearranged by the same number.
It is reorganized into a state like 1. As a result, the speaker's face image data / speech contents are recorded in the minutes recording unit 8 in a state of being arranged in time sequence.

【００４５】ユーザから会議議事録の表示を求められた
時は、議事録記録手段８のデータを議事録作成部７が、
図１２のようなレイアウトでパソコンのモニタ１２に表
示する。モニタ１２には会議中に発言された内容の文章
１２１、発言した発言者の発言時の表情を撮影した画像
１２２を表示する。When the user requests the display of the minutes of the meeting, the minutes of the minutes recording means 8 are converted by
It is displayed on the monitor 12 of the personal computer in a layout as shown in FIG. The monitor 12 displays a sentence 121 of the content spoken during the meeting, and an image 122 of the expression of the speaking speaker at the time of speaking.

【００４６】前述では発言の検出をトリガにして、次の
タイミングで次の１フレームの画像データを用いている
が、会議に使用している画像の送受信の送信方法にＩＴ
Ｕ−Ｔ勧告Ｈ．２６３やＭＰＥＧ４を用いている場合
は、発言者の発言をトリガにして次のタイミングのＩピ
クチャを利用して前述の効果を出すこともできる。In the above description, the image data of the next one frame is used at the next timing with the detection of the speech as a trigger.
U.T. Recommendation H. When H.263 or MPEG4 is used, the above-mentioned effect can be obtained by using the I-picture at the next timing, triggered by the speech of the speaker.

【００４７】静止画も前もって入手し、これをくり返し
使う方法も考えられる。実際の会議に入る前の会議参加
者の一人でも議事録作成の設定を行ない他者が参照した
とき（他の参加者のパソコンに対してテキスト送信要求
をだした時）に、各画像入力部２で参加者の顔を撮影
し、発言者ＩＤ９１を付加してテキスト送信要求をだし
ているパソコンに送信をし、議事録に使用する顔画像デ
ータに使う。It is also conceivable to obtain a still image in advance and use it repeatedly. Before entering the actual conference, even one of the conference participants sets the minutes, and when another person refers to it (when a text transmission request is issued to another participant's computer), each image input section In step 2, the participant's face is photographed, the speaker's ID 91 is added, and the participant's face is transmitted to the personal computer that has issued the text transmission request, and is used for the face image data used in the minutes.

【００４８】この画像はパソコン会議中に送信されてい
る動画像データに比べて画質の良い静止画となり、くり
返し画像データをやり取りする必要がなくなる。図９で
示した顔画像データのフォーマットに時刻を付加しても
よい。この時のフォーマットを図１３に示す。この時刻
データを用いれば、図１０と図１１で説明した並びかえ
がより効率的にできる。This image becomes a still image having higher image quality than the moving image data transmitted during the personal computer conference, and there is no need to repeatedly exchange image data. The time may be added to the format of the face image data shown in FIG. The format at this time is shown in FIG. If this time data is used, the rearrangement described with reference to FIGS. 10 and 11 can be performed more efficiently.

【００４９】以上のように本願発明の第２の実施例によ
れば、会議で話された内容が自動的に議事録として作成
され、議事録作成者の仕事を軽減する。また、発言内容
の横に発言者の顔画像を表示することにより、誰の発言
の物か後に読んでも容易にわかるようになる。As described above, according to the second embodiment of the present invention, the contents spoken in a meeting are automatically created as minutes, thereby reducing the work of the minutes creator. In addition, by displaying the face image of the speaker next to the content of the comment, it becomes easy to understand who the utterance is later.

【００５０】[0050]

【発明の効果】以上のように、本発明によれば、音声テ
キスト変換処理を自分のパソコン上にて行うことによ
り、精度の高いテキスト変換が可能になり、会議議事録
に使用することができる。また、議事録に発言者の顔画
像を貼り付けることにより、後で議事録を参照したとき
に容易に発言者を確認できる。As described above, according to the present invention, high-precision text conversion can be performed by performing voice-to-text conversion processing on one's own personal computer, and can be used for the minutes of a conference. . Also, by attaching the face image of the speaker to the minutes, the speaker can be easily confirmed when the minutes are referred to later.

[Brief description of the drawings]

【図１】本願発明の一実施形態である会議システムの構
成を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of a conference system according to an embodiment of the present invention.

【図２】本願発明の一実施形態であるパソコン会議での
通常の音声信号の処理の流れを示すフローチャートであ
る。FIG. 2 is a flowchart showing a flow of processing of a normal audio signal in a personal computer conference according to an embodiment of the present invention.

【図３】本願発明の一実施形態である第１の実施例の音
声信号がテキスト変換処理され送信される流れのフロー
チャートである。FIG. 3 is a flowchart of a flow in which an audio signal is subjected to text conversion processing and transmitted according to a first example which is an embodiment of the present invention.

【図４】本願発明の一実施形態である第１の実施例の音
声信号のフォーマットの模式図である。FIG. 4 is a schematic diagram of a format of an audio signal according to a first example which is an embodiment of the present invention.

【図５】本願発明の一実施形態である第１の実施例の議
事録を作成する前の議事録作成手段に記憶されているテ
キストデータの模式図である。FIG. 5 is a schematic diagram of text data stored in minutes creating means before minutes of the first example which is an embodiment of the present invention.

【図６】本願発明の一実施形態である第１の実施例の議
事録を作成した後の議事録記録手段に記憶されている音
声データの模式図である。FIG. 6 is a schematic diagram of audio data stored in the minutes recording means after the minutes of the first example, which is one embodiment of the present invention, is created.

【図７】本願発明の一実施形態である第１の実施例のテ
キスト変換された議事録をモニタ上に表示したときの図
である。FIG. 7 is a diagram when a text-converted minutes of the first example that is an embodiment of the present invention is displayed on a monitor.

【図８】本願発明の一実施形態である第２の実施例の入
力された顔画像データとテキストデータから議事録を作
成する流れを示したフローチャートでる。FIG. 8 is a flowchart showing a flow of creating minutes from input face image data and text data according to a second example that is an embodiment of the present invention.

【図９】本願発明の一実施形態である第２の実施例の顔
を撮影した画像のフォーマットの模式図である。FIG. 9 is a schematic diagram of a format of an image obtained by photographing a face according to a second example which is an embodiment of the present invention.

【図１０】本願発明の一実施形態である第２の実施例の
議事録を作成する前の議事録作成手段に記憶されている
画像データとテキストデータの模式図である。FIG. 10 is a schematic diagram of image data and text data stored in the minutes creating means before creating minutes in the second example which is an embodiment of the present invention.

【図１１】本願発明の一実施形態である第２の実施例の
議事録を作成した後の議事録記録手段に記憶されている
画像データとテキストデータの模式図である。FIG. 11 is a schematic diagram of image data and text data stored in the minutes recording means after the minutes of the second example, which is one embodiment of the present invention, is created.

【図１２】本願発明の一実施形態である第２の実施例の
顔画像とテキストデータからなる議事録をモニタ上に表
示したときの図である。FIG. 12 is a diagram when a minute including a face image and text data is displayed on a monitor according to a second example which is an embodiment of the present invention.

【図１３】本願発明の一実施形態である第２の実施例の
顔を撮影した画像に時刻コードを付加した場合のフォー
マットの模式図である。FIG. 13 is a schematic diagram of a format in which a time code is added to an image of a face taken according to the second example, which is one embodiment of the present invention.

【図１４】従来の会議システムの概略図である。FIG. 14 is a schematic diagram of a conventional conference system.

[Explanation of symbols]

１…パソコン、２…画像入力部、３…音声入出力部４…データ入力部、５…パソコン会議処理部６…音声テキスト変換処理部、７…議事録作成部８…議事録記録部、９…音声蓄積部１０…通信インターフェース、１１…通信網、１２…モ
ニタDESCRIPTION OF SYMBOLS 1 ... PC, 2 ... Image input part, 3 ... Voice input / output part 4 ... Data input part, 5 ... Personal computer conference processing part 6 ... Voice text conversion processing part, 7 ... Minutes preparation part 8 ... Minutes recording part, 9 ... voice storage unit 10 ... communication interface, 11 ... communication network, 12 ... monitor

───────────────────────────────────────────────────── フロントページの続き (72)発明者諸星利弘神奈川県川崎市幸区小向東芝町１番地株式会社東芝研究開発センター内 (72)発明者福元富義神奈川県川崎市幸区小向東芝町１番地株式会社東芝研究開発センター内 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Toshihiro Moroboshi 1st place, Komukai Toshiba-cho, Saiwai-ku, Kawasaki-shi, Kanagawa Prefecture Inside Toshiba R & D Center (72) Inventor Tomiyoshi Fukumoto Toshiba Komukai-shi, Kawasaki-shi, Kanagawa No. 1 town Toshiba R & D Center

Claims

[Claims]

An input means for inputting voice data, a voice storage means for assigning an identifier to voice data input by the input means and storing the voice data, and a voice data stored in the voice storage means as characters. Conversion means for converting the data into data, and means for rearranging the character data converted by the conversion means based on the identifier given to the audio data converted to the character data, and creating a minutes. Minutes creation device.

2. An image input means for inputting audio data and image data corresponding to the audio data, and an identifier assigned to the audio data input by the input means and the image data corresponding to the audio data, and stored. Storage means; conversion means for converting voice data stored in the storage means into character data; and the storage means corresponding to the character data converted by the conversion means and the voice data converted to the character data. Means for rearranging the stored image data on the basis of the identifier to generate a minutes.

3. An input voice data is provided with an identifier and stored, the stored voice data is converted into character data, and the converted character data is arranged based on the identifier provided to the voice data. A minutes creation method characterized in that the minutes are replaced.

4. An apparatus according to claim 1, wherein said input voice data and image data corresponding to said voice data are provided with identifiers and stored, and said stored voice data is converted into character data. A minutes creating method, wherein a minutes is created by rearranging stored image data corresponding to voice data converted into character data based on the identifier.

5. A recording medium in which a program for exchanging data via a plurality of computers and creating a minutes is recorded so as to be executed by a computer, wherein an identifier is assigned to input audio data. A program for storing and converting the stored voice data to character data, rearranging the converted character data based on the identifier assigned to the voice data, and creating a minutes of the minutes so that the computer can execute the program. A recorded recording medium.

6. A recording medium in which a program for exchanging data via a plurality of computers and creating a minutes is recorded so as to be executed by a computer, the input audio data and the audio data corresponding to the input audio data. Assigning an identifier to the image data to be stored, converting the stored voice data into character data, and storing the converted character data and the stored image data corresponding to the voice data converted into the character data And a recording medium in which a program for creating minutes is rearranged on the basis of the identifier and recorded so as to be executable by a computer.