JP2003177776A

JP2003177776A - Minutes recording system

Info

Publication number: JP2003177776A
Application number: JP2001378729A
Authority: JP
Inventors: Mizuaki Suzuki; 瑞明鈴木
Original assignee: Seiko Instruments Inc
Current assignee: Seiko Instruments Inc
Priority date: 2001-12-12
Filing date: 2001-12-12
Publication date: 2003-06-27

Abstract

<P>PROBLEM TO BE SOLVED: To provide a minutes recording system which identifies each of conference participants and automatically generates minutes in accordance with voice data in their speeches. <P>SOLUTION: The minutes recording system is constituted of computer terminal devices of which the number is equal to the number of conference participants, a data processing server, and a communication network system, and a network server, and each computer terminal device is provided with an A/D converter for converting the analog waveform of a voice signal inputted from a connected microphone to a digital signal, and digitized voice data of each conference participant is converted to sentence data by voice recognition techniques, and the sentence data is stored in a storage area of the network server while adding data indicating speaker's name and data indicating the time of his or her speech to the sentence data, thus generating minutes. <P>COPYRIGHT: (C)2003,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は、電子的技術手段
を用いて会議における複数の発言者の発言を記録し、議
事録を自動的に生成する議事録記録システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a minutes recording system for recording the remarks of a plurality of speakers in a conference using electronic technical means and automatically generating minutes.

【０００２】[0002]

【従来の技術】従来より、口述筆記等を目的として、単
独の特定話者の発言を文字、文章に変換する音声認識装
置またはプログラムが提供されているが、それら一般の
音声認識システムは、複数の発言者が参加する会議の音
声から発言者を特定し、その発言がどの発言者によるも
のかを識別できる形式とし、さらに複数の発言者の発言
がひとつの文章として議事録を記録することができなか
った。たとえば、特開２０００−２７６１９５に示され
る技術は、複数の発言者の発言を記録することを目的と
した技術であるが、個々の発言者の発言はそれぞれ関連
付けられることなく複数の記憶装置に個別に記録される
ため、通常の議事録の形式とは異なる文章しか生成する
ことができない。2. Description of the Related Art Conventionally, for the purpose of dictation and the like, there have been provided voice recognition devices or programs for converting the speech of a single specific speaker into characters and sentences. It is possible to identify the speaker from the audio of the conference in which the speakers of the above participate and identify the speaker who made the statement, and record the minutes of the statement of multiple speakers as one sentence. could not. For example, the technique disclosed in Japanese Patent Laid-Open No. 2000-276195 is intended to record the utterances of a plurality of speakers, but the utterances of individual speakers are not associated with each other but individually stored in a plurality of storage devices. Since it is recorded in, it is only possible to generate sentences that differ from the normal minutes format.

【０００３】[0003]

【発明が解決しようとする課題】通常、人間が作成する
会議や打ち合わせ、対談、インタビュー等の会議録、議
事録は、複数の発言者すべての発言が時系列に沿って一
連の文章として含まれている。したがって、電子機器に
よって自動的に生成される議事録においても、発言者の
氏名または発言者を示す記号が対応付けられた形式で記
録、保存されることが望まれる。[Problems to be Solved by the Invention] Usually, a meeting record such as a meeting, a meeting, a dialogue, an interview, etc. created by a human being and a minutes record include all the statements of a plurality of speakers as a series of sentences in chronological order. ing. Therefore, even in the minutes automatically generated by the electronic device, it is desired to record and save the name of the speaker or a symbol indicating the speaker in a correlated form.

【０００４】[0004]

【課題を解決するための手段】マイクロフォンと通信ネ
ットワーク接続機能を備えたコンピュータ端末を会議の
参加者と同数用意し、会議中の発言をデジタル化した音
声データおよびそれを音声認識機能により自然言語の文
字、文章に変換した文章データとそのデータに対応する
発言者の氏名または発言者を特定する記号をコード化し
たデータと発言の時刻を示すデータを付加したうえで、
記録保存し、時系列による並べ替え処理を行う構成と
し、この構成により通常望まれる形式を有する議事録文
書を得ることができる。[Means for Solving the Problems] The same number of computer terminals equipped with a microphone and a communication network connection function as the participants of the conference are prepared, and the voice data obtained by digitizing the speech during the conference and the natural language of the voice data by the voice recognition function. After adding character and text data converted into sentences and data that encodes the name of the speaker corresponding to that data or a symbol that identifies the speaker and data that indicates the time of speech,
It is configured to record and store and perform rearrangement processing in a time series. With this configuration, a minutes document having a normally desired format can be obtained.

【０００５】また、音声データを収集するために使用す
るコンピュータ端末各個人を特定するために必要な情報
を、あらかじめコンピュータ端末に入力されている使用
者の情報、コンピュータ端末のネットワークアドレスま
たはコンピュータ端末に固有の番号から得るような構成
とした。In addition, information necessary for identifying each individual computer terminal used for collecting voice data is stored in advance in the computer information of the user, the network address of the computer terminal or the computer terminal. It is configured to be obtained from a unique number.

【０００６】[0006]

【発明の実施の形態】以下、本発明の実施の形態を説明
する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below.

【０００７】（第１の実施の形態）図１は本発明の議事
録記録システムの概略構成を示している。本発明は複数
のマイクロフォン１０１およびコンピュータ端末装置１
０２、データ処理サーバー１０３、ネットワークサーバ
ー１０４、およびそれらを接続するための通信ネットワ
ークシステム１０５から構成される。データ処理サーバ
ー１０３とネットワークサーバー１０４は１台のコンピ
ュータで兼用することも可能である。(First Embodiment) FIG. 1 shows a schematic configuration of a minutes recording system of the present invention. The present invention relates to a plurality of microphones 101 and a computer terminal device 1.
02, a data processing server 103, a network server 104, and a communication network system 105 for connecting them. The data processing server 103 and the network server 104 can be shared by one computer.

【０００８】コンピュータ端末装置１０２は会議参加者
の人数に合わせて複数用意される。コンピュータ端末装
置１０２は、音声を収録可能なマイクロフォン１０１を
内蔵するか、または接続可能であり、マイクロフォン１
０１より入力された音声のアナログ信号をデジタル信号
に変換するＡ／Ｄ変換装置を備えている。Ａ／Ｄ変換装
置はコンピュータ端末装置１０２に内蔵されていても、
マイクロフォン１０１に内蔵され、デジタル信号にてコ
ンピュータ端末装置１０２と接続されていても本発明は
構成可能である。最近のコンピュータ機器はアナログ音
声入力端子とＵＳＢ規格あるいはＩＥＥＥ１３９４規格
のデジタル信号入力端子の両方を備えるものも多い。ま
た、マイクロフォン１０１とコンピュータ端末装置１０
２との接続は同軸ケーブル等を用いる有線方式であって
も、またＦＭ電波等のアナログ方式無線やブルートゥー
ス方式等のデジタル方式無線であってもよい。コンピュ
ータ端末装置１０２の外部にマイクロフォン１０１を設
ける場合は、図２に示すように、会議参加者２０１のそ
れぞれの頭部に身に付けるヘッドセット型マイクロフォ
ン２０２、衣服の胸部付近に装着するピン・マイク型マ
イクロフォン２０３やワイヤレス式ピン型マイクロフォ
ン２０５などの形態とするか、狭い指向性を持つマイク
ロフォン２０４を会議参加者２０１それぞれの頭部方向
へ向けて設置する方法とすれば音声を明瞭に収録でき
る。A plurality of computer terminal devices 102 are prepared according to the number of participants in the conference. The computer terminal device 102 has a built-in or connectable microphone 101 capable of recording voice.
It is equipped with an A / D conversion device for converting an analog signal of voice input from 01 into a digital signal. Even if the A / D conversion device is built in the computer terminal device 102,
The present invention can be configured even if it is built in the microphone 101 and is connected to the computer terminal device 102 by a digital signal. Many of the recent computer devices have both an analog audio input terminal and a USB standard or IEEE 1394 standard digital signal input terminal. Further, the microphone 101 and the computer terminal device 10
The connection with 2 may be a wired system using a coaxial cable or the like, an analog system wireless such as FM radio waves, or a digital system wireless such as Bluetooth system. When the microphone 101 is provided outside the computer terminal device 102, as shown in FIG. 2, a headset-type microphone 202 to be worn on the heads of the conference participants 201, a pin microphone to be worn near the chest of the clothes. The voice can be clearly recorded by adopting a form such as a microphone 203 or a wireless pin microphone 205, or by installing a microphone 204 having a narrow directivity toward the head of each conference participant 201.

【０００９】さらに、コンピュータ端末装置１０２はた
とえばＴＣＰ／ＩＰプロトコルに基づく通信ネットワー
クシステム１０５に接続可能なインターフェース装置を
有している。このインターフェース装置はイーサネット
（登録商標）などの有線方式であっても、ＩＥＥＥ８０
２．１１系列またはブルートゥース方式のようないわゆ
る無線ＬＡＮ方式であってもよい。このようなネットワ
ーク・インターフェース装置はそれぞれ固有のアドレス
番号を有しており、互いに重複しない。Further, the computer terminal device 102 has an interface device connectable to a communication network system 105 based on the TCP / IP protocol, for example. Even if the interface device is a wired system such as Ethernet (registered trademark), the IEEE80
A so-called wireless LAN system such as a 2.11 series or Bluetooth system may be used. Such network interface devices have unique address numbers and do not overlap each other.

【００１０】図１においては、コンピュータ端末装置１
０２は無線アクセスポイント１０６を経由して通信ネッ
トワークに接続する構成である。また、このコンピュー
タ端末装置１０２の形態は、デスクトップ（据え置き）
型、ノートブック（可搬）型、ＰＤＡ（電子手帳）型、
腕時計（リスト端末）型等、いずれであってもよいが、
より小型のコンピュータ端末を用いることができれば、
会議運営における利便性が高いことは明らかである。最
近の半導体演算装置（マイクロプロセッサ）の進歩によ
り、小型であっても十分な処理速度を有する機器が提供
されている。In FIG. 1, a computer terminal device 1
Reference numeral 02 is a configuration for connecting to a communication network via the wireless access point 106. Further, the form of this computer terminal device 102 is a desktop (stationary).
Type, notebook type (portable) type, PDA (electronic notebook) type,
It may be a wristwatch (wrist terminal) type, etc.,
If a smaller computer terminal could be used,
It is clear that the convenience of meeting management is high. Recent advances in semiconductor processing devices (microprocessors) have provided devices that have a sufficient processing speed even if they are small.

【００１１】なお、コンピュータ端末装置１０２の記憶
領域には、図３の発言データの模式図に示す発言者タグ
３０４として使用するため、その使用者それぞれの氏名
または使用者を特定できる重複しない名称をあらかじめ
記憶させておく。この記憶させる方法としては、キーボ
ード１０８より入力するか、マイクロフォン１０１より
音声にて入力する。あるいは，議事録記録システムとは
別に会議参加者がそれぞれＰＤＡ（電子手帳、携帯型電
子機器）またはリスト端末（腕時計型電子機器）を所持
している場合、文字コードからなるデータを音階に変換
しスピーカーより発する機能を用いて、それら携帯型機
器に記憶された使用者の氏名から変換された音階を議事
録記録システムに接続されたマイクロフォン１０１より
入力することにより、発言者の氏名とそれぞれのマイク
ロフォン１０１より入力された音声データを関連付ける
こともできる。コンピュータ端末装置１０２は、デジタ
ル化された音声データに発言者タグおよび発言の時刻を
示す時刻タグ３０５を付加して通信ネットワークシステ
ム１０５へ送信することができる。時刻タグ３０５の表
す時刻が正確なものとなるように、あらかじめ、各コン
ピュータ端末装置１０２は、ネットワークシステムを通
じて、サーバーより時刻データを取得し、各コンピュー
タ端末装置１０２の内部時計を修正しておく。The storage area of the computer terminal device 102 is used as the speaker tag 304 shown in the schematic diagram of the utterance data in FIG. 3, and therefore, the name of each user or a unique name for identifying the user is used. Remember in advance. As a method of storing this, input is performed from the keyboard 108 or voice from the microphone 101. Alternatively, if the conference participants each have a PDA (electronic notebook, portable electronic device) or wrist terminal (wristwatch electronic device) separately from the minutes recording system, the data consisting of character codes is converted into a scale. By using the function emitted from the speaker, the scale converted from the user's name stored in those portable devices is input from the microphone 101 connected to the minutes recording system, whereby the speaker's name and each microphone are input. The voice data input from 101 can be associated. The computer terminal device 102 can add a speaker tag and a time tag 305 indicating the time of the speech to the digitized voice data and transmit the data to the communication network system 105. In order for the time represented by the time tag 305 to be accurate, each computer terminal device 102 acquires time data from the server through the network system and corrects the internal clock of each computer terminal device 102 in advance.

【００１２】データ処理サーバー１０３は、コンピュー
タ端末装置１０２から送信されてくる波形データ３０３
を音声認識技術により文章データへと変換する処理と、
データに付加された発言者タグ３０４と時刻タグ３０５
の情報に基づき適切な順序となるようにデータの並べ替
え処理（ソーティング）を行う。特に、各コンピュータ
端末装置１０２がリスト端末や電子手帳等、搭載された
プロセッサの処理能力が限られる場合、大きな処理能力
を要求される音声認識技術による文章変換処理は高性能
のデータ処理サーバー１０３を用いる。The data processing server 103 has the waveform data 303 transmitted from the computer terminal device 102.
Processing to convert text data into text data using voice recognition technology,
Speaker tag 304 and time tag 305 added to the data
The data is rearranged (sorting) based on the above information so that the data is in an appropriate order. In particular, when each computer terminal device 102 has a limited processing capacity of a processor such as a wrist terminal or an electronic notebook, the text conversion processing by the voice recognition technology that requires a large processing capacity requires the high-performance data processing server 103. To use.

【００１３】ネットワークサーバー１０４は、磁気ディ
スク装置１０８などの大容量記憶装置を備え、上記の波
形データ３０３および文章データを記憶、蓄積する。デ
ータ処理サーバー１０３に備えられた磁気ディスク装置
１０９が十分大きな容量を持つ場合においては、ネット
ワークサーバー１０４の機能は兼用可能である。The network server 104 includes a mass storage device such as the magnetic disk device 108, and stores and accumulates the above waveform data 303 and text data. When the magnetic disk device 109 provided in the data processing server 103 has a sufficiently large capacity, the function of the network server 104 can also be used.

【００１４】次に本発明の実施の形態の動作について説
明する。Next, the operation of the embodiment of the present invention will be described.

【００１５】まず、コンピュータ端末装置１０２に記憶
された端末使用者の氏名情報または会議参加者が自らの
氏名を発した音声を音声認識技術により発言者の氏名ま
たは発言者を特定できる符号とネットワークアドレス
（ＩＰアドレス）を関連付けるための発言者アドレス対
応表を作成する。発言者アドレス対応表は、データ処理
サーバー１０３により作成され、データ処理サーバー１
０３またはネットワークサーバー１０４に記憶される。
コンピュータ端末装置１０２から通信ネットワークシス
テム１０５へ発せられるデータに発言者タグが付加され
ない手段を用いる場合、波形データ３０３および文章デ
ータ３１２のネットワークアドレスから対応する発言者
タグ３０４をその波形データ３０３および文章データ３
１２へ付加し、識別するために発言者アドレス対応表を
用いる。First, the name information of the terminal user stored in the computer terminal device 102 or the voice of the conference participant's own name, the code and network address by which the name of the speaker or the speaker can be specified by the voice recognition technology. A speaker address correspondence table for associating (IP address) is created. The speaker address correspondence table is created by the data processing server 103, and the data processing server 1
03 or the network server 104.
When using a means in which the speaker tag is not added to the data transmitted from the computer terminal device 102 to the communication network system 105, the speaker address 304 corresponding to the network address of the waveform data 303 and the sentence data 312 is converted to the waveform data 303 and the sentence data. Three
It is added to 12 and a speaker address correspondence table is used for identification.

【００１６】つぎに、会議、対談またはインタビュー等
を開始する。音声がマイクロフォン１０１より入力さ
れ、信号のレベルが一定値を超えた状態でＡ／Ｄ変換を
開始する。発言者の音声はマイクロフォン１０１へ入力
されアナログ電気信号に変換され、さらに前述のように
Ａ／Ｄ変換によりデジタル信号に変換する。アナログ信
号波形の電圧振幅を１２ｂｉｔから２４ｂｉｔ程度のデ
ジタル値で表現するデジタル・データに変換し、これに
発信元アドレス、送信先アドレス等のヘッダ情報３０２
および音声を収録した時刻を示す時刻タグ３０５を付加
して、通信ネットワークシステム１０５を経由してデー
タ処理サーバー１０３へ送信する。押した状態でのみデ
ータを送信するあるいは逆に送信しない指示を与えるス
イッチまたは押しボタンを各コンピュータ端末装置１０
２に備えることにより、発言者は自身の発言を記録する
か否かを選択することが可能となる。Next, a conference, a dialogue or an interview is started. Voice is input from the microphone 101, and A / D conversion is started when the signal level exceeds a certain value. The voice of the speaker is input to the microphone 101, converted into an analog electric signal, and further converted into a digital signal by A / D conversion as described above. The voltage amplitude of the analog signal waveform is converted from 12 bits to digital data represented by a digital value of about 24 bits, and the header information 302 such as the source address and the destination address is converted into this.
And a time tag 305 indicating the time when the voice is recorded is added and transmitted to the data processing server 103 via the communication network system 105. Each computer terminal device 10 is provided with a switch or push button that gives an instruction to transmit data only when pressed or vice versa.
By preparing for No. 2, the speaker can select whether or not to record his / her own message.

【００１７】また、複数のスイッチまたは押しボタン
に、「同意・賛成・賛同」「反論・異議」「質問・疑
問」等の意思を表明する機能を割り当て、それらのいづ
れかのボタンが押された時点の発言については、それら
の意思を表す意思表明タグ３０６をデータブロックに付
加してネットワークスシステムに送信する。この機能に
より、議事録に発言者の意思が明確に記録できる。これ
らの押しボタンは通常のコンピュータ装置が備えるキー
ボード１０７、マウスまたはＰＤＡの備える押しボタン
に前記の機能を割り当ててもよい。When a plurality of switches or push buttons are assigned a function of expressing their intentions such as "agree / agree / agree", "refute / disagree", "question / question", etc., when any of those buttons is pushed The statement of intention is added to the data block with the intention expressing tag 306 indicating the intention and transmitted to the network system. With this function, the intention of the speaker can be clearly recorded in the minutes. These push buttons may be assigned the above-mentioned functions to the push buttons included in the keyboard 107, the mouse, or the PDA included in an ordinary computer device.

【００１８】データ処理サーバー１０３は波形データ３
０３を受け取ると、そのヘッダ情報の発信元アドレスと
発言者アドレス対応表より発言者タグを生成し、それを
波形データ３０３に付加した上で、ディスク装置等の記
憶領域に保存する。さらに、その波形データ３０３を音
声認識技術により自然言語からなる文章データに変換
し、発言者タグ３０４、時刻タグ３０５、意思表明タグ
３０６を付加した上で保存する。また、この変換された
文章データは即座にネットワークシステムから各コンピ
ュータ端末装置１０２へ転送され、画面に表示される。
発言者はこれを読むことにより、自身の発言が誤って変
換された場合には、それを訂正する発言を行うことがで
きる。データ処理サーバー１０３は会議参加者の発言が
行われるたびに以上の動作を繰り返すことにより、会議
中の発言をディスク装置などの記憶領域に蓄積する。こ
のとき蓄積される記憶領域は、データ処理サーバー１０
３以外のネットワークサーバー１０４に搭載されたディ
スク装置１０９などでもよい。The data processing server 103 uses the waveform data 3
When 03 is received, a speaker tag is generated from the source address and speaker address correspondence table of the header information, added to the waveform data 303, and stored in a storage area such as a disk device. Further, the waveform data 303 is converted into sentence data in natural language by the voice recognition technique, and the speaker tag 304, the time tag 305, and the intention expression tag 306 are added and stored. Further, the converted text data is immediately transferred from the network system to each computer terminal device 102 and displayed on the screen.
By reading this, the speaker can make a statement to correct it if his / her statement is erroneously converted. The data processing server 103 accumulates the utterance during the conference in a storage area such as a disk device by repeating the above operation each time the utterance of the conference participant is made. The storage area accumulated at this time is the data processing server 10
A disk device 109 or the like mounted on the network server 104 other than 3 may be used.

【００１９】さて、音声の波形データを文字・文章のデ
ータへと変換するための音声認識技術においては、出現
頻度の高い単語や発言者の専門分野の単語を蓄積した発
言者ごとの辞書ファイルを参照することで変換効率をよ
り高めることが可能となる。そのためには、あらかじ
め、会議参加者の発音のパターンを収録したパターンフ
ァイルと専門分野や頻繁に使用する単語、熟語を蓄積し
たユーザー辞書ファイルを各コンピュータ端末装置１０
２の記憶領域にに格納しておき、それらファイルをデー
タ処理サーバー１０３が通信ネットワークシステム１０
５を通じて参照するか、会議開始前にデータ処理サーバ
ー１０３の記憶領域へ転送、コピーした上で参照すれば
よい。Now, in the voice recognition technology for converting voice waveform data into character / sentence data, a dictionary file for each speaker, which stores words with high frequency of appearance and words in the speaker's specialized field, is used. By referring to it, the conversion efficiency can be further improved. To this end, in advance, a pattern file containing pronunciation patterns of conference participants and a user dictionary file accumulating specialized fields, frequently used words, and idioms are stored in each computer terminal device 10.
2 in the storage area, and the data processing server 103 stores these files in the communication network system 10.
5 or transfer or copy to the storage area of the data processing server 103 before the conference starts, and then reference.

【００２０】以上のように蓄積された一連の発言データ
を時刻タグ３０５の時刻情報により発言順に並べ替えを
行い、発言者の氏名または発言者を特定する記号と発言
者の表明した意思または意思を示す記号を発言の文章に
付加して、ひとつの文書ファイルとして出力することに
より、議事録を生成する。The series of utterance data accumulated as described above is rearranged in the order of utterances according to the time information of the time tag 305, and the name of the speaker or a symbol for identifying the speaker and the intention or intention expressed by the speaker are displayed. The minutes are generated by adding the indicated symbol to the sentence of the remark and outputting it as one document file.

【００２１】なお、議事録は、発生時刻順にソーティン
グされた文書化データに、その発言者を特定する符号ま
たは氏名を付加して、会議室に設置したプラズマ・ディ
スプレイまたは画像データ・プロジェクタ等の大画面画
像表示装置の画面に即時に文章として表示してもよい。In addition, the minutes are obtained by adding a code or a name for identifying the speaker to the documented data sorted in the order of occurrence time, and using a large data such as a plasma display or an image data projector installed in the conference room. The text may be displayed on the screen of the screen image display device immediately.

【００２２】（第２の実施の形態）第１の実施の形態で
の発言者アドレス対応表は用いず、コンピュータ端末装
置１０２が、その内部の記憶領域の格納された使用者を
示すデータから発言者タグを生成し、通信ネットワーク
システム１０５へデータのパケットを発するたびに、そ
のデータへ発言者タグを付加する。(Second Embodiment) The speaker address correspondence table in the first embodiment is not used, and the computer terminal device 102 speaks from the data indicating the user stored in its internal storage area. Every time a speaker tag is generated and a packet of data is sent to the communication network system 105, a speaker tag is added to the data.

【００２３】（第３の実施の形態）近年のコンピュータ
製品においてはポータブル型あるいはノート型パーソナ
ルコンピュータ等においても高性能のマイクロプロセッ
サが搭載される場合がほとんどであり、電子手帳、ＰＤ
Ａなどの携帯型電子機器であっても、信号処理に特化し
たデジタル信号プロセッサをメインプロセッサに加えて
搭載することにより処理能力は向上できるため、それら
のコンピュータ端末装置１０２であっても十分、音声認
識技術を用いた文章変換処理が可能となっている。電子
手帳やＰＤＡでは、記憶領域が小さく、音声認識技術に
必要な十分な大きさの情報量を有する辞書ファイルを格
納できない場合がある。(Third Embodiment) In most computer products in recent years, a high-performance microprocessor is mounted even in a portable type or notebook type personal computer, and an electronic notebook or PD is used.
Even in a portable electronic device such as A, the processing capability can be improved by mounting a digital signal processor specialized for signal processing in addition to the main processor. Text conversion processing using voice recognition technology is possible. An electronic notebook or PDA may have a small storage area and may not be able to store a dictionary file having a sufficient amount of information necessary for voice recognition technology.

【００２４】このような場合は、通信ネットワークシス
テム１０５を通じてデータ処理サーバー１０３またはネ
ットワークサーバー１０４の記憶領域に格納された辞書
ファイルを参照することで十分な大きさの辞書ファイル
が利用可能となる。発言の波形データ３０３は各コンピ
ュータ端末装置１０２でＡ／Ｄ変換され、音声認識技術
により文章データに変換された後、発言者タグ３０４、
時刻タグ３０５等を付加する。文章データに発言者タグ
３０４、時刻タグ３０５等を付加した発言データは、通
信ネットワークシステム１０５を経由して、データ処理
サーバー１０３へ送信される。データ処理サーバー１０
３は時刻タグ３０５の情報に基づく発言データの並べ替
えを行い、データを記憶、保存する。この実施の形態に
おいては、データ処理サーバー１０３は波形データ３０
３を文章データに変換する処理を行う必要はない。In such a case, by referring to the dictionary file stored in the storage area of the data processing server 103 or the network server 104 through the communication network system 105, a dictionary file of a sufficient size can be used. The waveform data 303 of the utterance is A / D converted by each computer terminal device 102, converted into sentence data by the voice recognition technology, and then the speaker tag 304,
A time tag 305 and the like are added. The utterance data obtained by adding the speaker tag 304, the time tag 305, etc. to the text data is transmitted to the data processing server 103 via the communication network system 105. Data processing server 10
3 rearranges the utterance data based on the information of the time tag 305, and stores and saves the data. In this embodiment, the data processing server 103 uses the waveform data 30
It is not necessary to convert 3 into text data.

【００２５】（第４の実施の形態）各コンピュータ端末
装置１０２を前通信記ネットワークシステム１０５とル
ーター１１１等のネットワーク接続機器とを介してイン
ターネット１１２または広域ネットワークへ接続可能と
すれば、各コンピュータ端末装置１０２が使用され、会
議が行われている会議室からは遠隔となる場所にデータ
処理サーバー１１３およびネットワークサーバー１１４
を設置することが可能である。この実施の形態であれ
ば、たとえば、企業等においては、データ処理サーバー
１１３を設置した本社とは遠隔地の事業所、営業所にお
いて会議を行うことや、会議の参加者がそれぞれ異なる
事業所、営業所にいながら会議を行うことも可能であ
る。(Fourth Embodiment) If each computer terminal device 102 can be connected to the Internet 112 or a wide area network via the previous communication network system 105 and a network connection device such as a router 111, each computer terminal device can be connected. The data processing server 113 and the network server 114 are located at a location remote from the conference room where the device 102 is used and the conference is held.
Can be installed. According to this embodiment, for example, in a company or the like, a meeting is held at a business office or a business office remote from the head office where the data processing server 113 is installed, and business offices in which participants of the meeting are different from each other. It is also possible to hold a meeting while at the sales office.

【００２６】また、特別に処理速度および記憶領域の大
きなコンピュータをデータ処理サーバー１１３として１
式設置することにより、各コンピュータ端末装置１０２
の処理能力が非力なものであっても、音声認識による文
章変換処理の速度を高速化することが可能となる。さら
に、この実施の形態のようなデータ処理サーバー１１３
の処理サービスをレンタルとする、顧客の会議の波形デ
ータ３０３を議事録とするサービスを行うなどの事業と
することができる。A computer having a particularly large processing speed and storage area is used as the data processing server 113.
By installing the computer, each computer terminal device 102
Even if the processing capability of is weak, it is possible to increase the speed of sentence conversion processing by voice recognition. Furthermore, the data processing server 113 as in this embodiment
Can be used as a business such as renting the processing service of (1) or using the waveform data 303 of the customer's meeting as the minutes.

【００２７】このようなサービス、事業を行う場合は、
第１の実施形態において記した方法と同様に発言者の発
音パターンファイルやユーザー辞書はインターネットま
たは広域ネットワークを通じてデータ処理サーバー１１
３が参照するか、データ処理サーバー１１３の記憶領域
へコピーすれば音声認識技術の変換効率を向上できる。When carrying out such services and businesses,
Similar to the method described in the first embodiment, the pronunciation pattern file of the speaker and the user dictionary are stored in the data processing server 11 via the Internet or a wide area network.
3 is referred to or copied to the storage area of the data processing server 113, the conversion efficiency of the voice recognition technology can be improved.

[Brief description of drawings]

【図１】本発明の議事録記録システムの概略構成図であ
る。FIG. 1 is a schematic configuration diagram of a minutes recording system of the present invention.

【図２】本発明の実施の形態におけるマイクロフォンの
配置を示す説明図である。FIG. 2 is an explanatory diagram showing an arrangement of microphones according to the embodiment of the present invention.

【図３】図ａは音声データの構造を示す模式図である。
図ｂは発言データの構造を示す模式図である。FIG. 3A is a schematic diagram showing a structure of audio data.
FIG. B is a schematic diagram showing the structure of utterance data.

【符号の説明】１０１・・・マイクロフォン１０２・・・コンピュータ端末装置１０３・・・データ処理サーバー１０４・・・ネットワークサーバー１０５・・・通信ネットワークシステム１０６・・・無線アクセスポイント１０８・・・キーボード１０９・・・ディスク装置２０１・・・会議参加２０２・・・ヘッドセット型マイクロフォン２０３・・・ピン・マイク型マイクロフォン２０４・・・狭い指向性を持つマイクロフォン２０５・・・ワイヤレス式ピン型マイクロフォン３０１・・・音声データ３０２・・・ヘッダ３０３・・・波形データ３０４・・・発言者タグ３０５・・・時刻タグ３０６・・・意思表明タグ３１１・・・発言データ３１２・・・文章データ[Explanation of symbols] 101 ... Microphone 102 ... Computer terminal device 103 ・・・ Data processing server 104 ・・・ Network server 105 ... communication network system 106 ... Wireless access point 108 ... Keyboard 109 ... Disk device 201 ・・・ Participation in the conference 202 ・・・ Headset type microphone 203 ... pin microphone type microphone 204 ... Microphone with narrow directivity 205 ・・・ Wireless pin microphone 301 ・・・ Voice data 302 ・・・ Header 303 ・・・ Waveform data 304 ... Speaker tag 305 ・・・ Time tag 306 ... Indication of intention tag 311 ... utterance data 312 ・・・ Sentence data

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 3/00 ５２１Ｐ ─────────────────────────────────────────────────── ─── Continued Front Page (51) Int.Cl. ⁷ Identification Code FI Theme Coat (Reference) G10L 3/00 521P

Claims

[Claims]

1. A plurality of notebook-type, notebook-type or list-type computer terminal devices, the number of which is equal to the number of participants of the conference, a data processing server, a wired or wireless communication network system, and a network server. The computer terminal device is connected to a wired or wireless communication network connection function and a microphone capable of inputting the voice of each participant's speech during the meeting. And an A / D converter that converts the analog waveform of the input voice signal into a digital signal, which is input from the microphone installed near each conference participant and digitized by the A / D converter. The voice data of each of the conference participants who were Natural performs processing for converting language into text data consisting of character codes representing the adds the data indicating the sign or name to identify a speaker corresponding to each of the data coded with data time of the speech,
A minutes recording system characterized by storing and accumulating the voice data and the text data subjected to the voice recognition in a storage area of a data processing server or a storage area of a network server.

2. The minutes recording system according to claim 1, wherein the speaker is specified from the user name registered in advance in the computer terminal device into which the voice is input, whereby the digitized voice data and the document data are recorded. The minutes recording system is characterized in that the code for identifying the speaker or the coded name is added to and stored in the storage area of the data processing server or the storage area of the network server.

3. The minutes recording system according to claim 1, wherein the speaker is specified from the network address unique to the computer terminal device to which the voice is input, whereby the digitized voice data and the document data are
A minutes recording system characterized in that the code for identifying the speaker or the coded name is added and stored in the storage area of the data processing server or the storage area of the network server.

4. The minutes recording system according to claim 1, wherein a speaker inputs his / her name by voice from the microphone, and data for identifying the speaker is generated from the voice data by voice recognition technology. A minutes recording system characterized by the following.

5. The minutes recording system according to claim 2 or 3, wherein the data processing server is based on the digitized voice data and the data indicating the time of speech added to the documented data. And a document data are rearranged in order of time, and are stored and accumulated in the storage area of the data processing server or the storage area of the network server.

6. The minutes recording system according to claim 4, wherein the coded data or the name for identifying the speaker is added to the documented data sorted in the order of occurrence time, and immediately displayed on the screen of each computer terminal device. A minutes recording system characterized by being displayed as text on the.

7. The minutes recording system according to claim 4, wherein a code or name for identifying the speaker is added to the documented data sorted in order of time of occurrence, and a plasma display installed in the conference room or A minutes recording system characterized by instantly displaying as text on the screen of a large-screen image display device such as image data projector.

8. The minutes recording system according to claim 1, wherein the processor of the data processing server is used to convert the voice data into text data by voice recognition technology. A minutes recording system, wherein a dictionary file stored in each of the computer terminal devices is referred to in the conversion processing into.

9. The minutes recording system according to claim 1, wherein the processor of each computer terminal device assigned to each conference participant is used to convert each voice data into text data by voice recognition technology. It is a minutes recording system that stores and stores in the storage area of the network server after transferring it to the network system, and then storing it in the storage area of the network server. A minutes recording system characterized by referring to a dictionary file stored in.

10. The minutes recording system according to claim 1, wherein each computer terminal device or microphone device assigned to each conference participant is provided with a button or a switch, and a specific button or switch is pressed for a certain period of time. A minutes recording system characterized by not recording utterances.

11. The minutes recording system according to claim 1, wherein each computer terminal device or microphone device assigned to each conference participant is provided with a button or switch, and A minutes recording system characterized by recording utterances.

12. The minutes recording system according to claim 1, wherein each computer terminal device assigned to each conference participant is provided with a plurality of buttons or switches.
It is characterized in that information corresponding to each button is added and recorded to voice data or its text data obtained by digitizing the speech during the time when a specific button or switch selected by the speaker is pressed, or the speech immediately after the switch is pressed. A minutes recording system.

13. The minutes recording system according to claim 1, 2, or 3, wherein the communication network system is connectable to a wide area network system or the Internet via a router device or the like, and is collected by the computer terminal device. The voice data of the speech of the attended conference participant is converted into natural language character / sentence data by the voice recognition function of the data processing server via the wireless or wired communication network system and wide area network system or the Internet. A minutes recording system, which is stored and stored in a storage area of the network server.

14. The minutes recording system according to claim 13, wherein the data processing server stores a dictionary file stored in each computer terminal device and a dictionary file stored in the data processing server via a wide area network system or the Internet. Referring to, a minutes recording system characterized by performing a process of converting voice data into text data by a voice recognition technology.