JP2008009693A

JP2008009693A - Transcribing system, its server, and program for server

Info

Publication number: JP2008009693A
Application number: JP2006179177A
Authority: JP
Inventors: Masami Nakamura; 雅巳中村; Hiroatsu Fujii; 博厚藤井; Masao Shinkai; 正男新開
Original assignee: Advanced Media Inc
Current assignee: Advanced Media Inc
Priority date: 2006-06-29
Filing date: 2006-06-29
Publication date: 2008-01-17

Abstract

<P>PROBLEM TO BE SOLVED: To provide a transcribing system capable of reducing the burden on an operator of transcribing operation and the burden on a terminal used by the operator compared to a conventional system. <P>SOLUTION: The transcribing system 10 comprises a plurality of transcribing terminals 60 engaged in transcribing operation for generating character data based on voice data, and a management server 40 communicating with the plurality of transcribing terminals 60. The management server 40 comprises an original voice data dividing means for dividing original voice data which is voice data used as the source of transcribing operation, into a plurality of transmission voice data which are voice data to be transmitted to the transcribing terminals 60, and a distributing transmission means for distributing and transmitting the plurality of transmission voice data generated by the voice data dividing means, to the plurality of transcribing terminals 60. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、音声データに基づいて文字データを生成するための聞き起こしシステムに関する。 The present invention relates to a transcription system for generating character data based on voice data.

従来の聞き起こしシステムとしては、音声データをストリーミング配信によりリアルタイム又はそれに近いタイミングで各オペレータまで送信し、各オペレータが反訳データの任意の箇所を同時に編集することによって複数のオペレータによって１つの反訳データを生成するデジタル速記システムが知られている（例えば、特許文献１参照。）。
特開２００４−７７９６６号公報 As a conventional transcribe system, audio data is transmitted to each operator in real time or at a timing close to it by streaming delivery, and each operator simultaneously edits an arbitrary portion of the retranslation data, thereby allowing one operator to generate one retranslation data. A digital shorthand system to be generated is known (for example, see Patent Document 1).
JP 2004-77966 A

しかしながら、従来の聞き起こしシステムにおいては、複数のオペレータによる同一箇所の更新を回避するために全ての端末装置間で行単位で書き込み保護をかけることによりデータの保護をかける必要があり、端末装置の負担が大きいという問題があった。また、保護がかけられたテキスト行に対応する音声に保護をかける仕組みが提案されていないので、オペレータが聞き起こしをするかどうかの判断は音声を聞くしかないので、その作業がオペレータに負担となる。さらに、同じ行を複数の作業者が同時に更新し始めた場合は保護をかける前であるので、作業が重複してしまいオペレータに負担となる可能性もある。 However, in the conventional awakening system, it is necessary to protect data by applying write protection in line units between all terminal devices in order to avoid updating the same location by a plurality of operators. There was a problem that the burden was large. Also, since no mechanism has been proposed to protect the voice corresponding to a protected text line, the operator only has to listen to the voice to determine whether or not to wake up. Become. Furthermore, when a plurality of workers start updating the same line at the same time, it is before protection is applied, so that the operations may be duplicated and burden the operator.

本発明は、従来の問題を解決するためになされたもので、聞き起こし作業の作業者の負担と、作業者によって使用される端末の負担とを従来より低減することができる聞き起こしシステムを提供することを目的とする。 The present invention has been made in order to solve the conventional problems, and provides a transcribe system capable of reducing the burden on the worker of the awakening work and the burden on the terminal used by the worker as compared with the prior art. The purpose is to do.

本発明のサーバは、音声データに基づいて文字データを生成する聞き起こし作業に携わる複数の作業者によって使用される複数の作業者用端末と通信を行うサーバであって、前記聞き起こし作業の元となる音声データである元音声データを前記作業者用端末への送信用の音声データである複数の送信用音声データに分割する元音声データ分割手段と、前記元音声データ分割手段によって生成された複数の前記送信用音声データを前記複数の作業者用端末に振り分けて送信する振分送信手段とを備えたことを特徴とする。 The server of the present invention is a server that communicates with a plurality of worker terminals that are used by a plurality of workers engaged in a speech task that generates character data based on voice data, and Generated by the original voice data dividing means, the original voice data dividing means for dividing the original voice data that is the voice data to be divided into a plurality of transmission voice data that is the voice data for transmission to the worker terminal. A distribution transmission unit configured to distribute and transmit the plurality of transmission audio data to the plurality of worker terminals;

この構成により、本発明のサーバは、複数の送信用音声データのうち聞き起こし作業の作業者が聞き起こすべき送信用音声データを作業者用端末に送信するので、作業者の作業を従来より効率化して作業者の負担を従来より低減することができる。また、本発明のサーバは、作業者用端末同士における文字データの同期を必要としないので、作業者用端末の負担を従来より低減することができる。また、本発明のサーバは、元音声データの聞き起こし作業を複数の作業者に同時に分担させることができるので、元音声データ全体の聞き起こし作業に要する時間を短縮することができる。また、本発明のサーバは、送信用音声データを作業者用端末に電子的に送るので、例えばサーバの管理者が送信用音声データを記録媒体に格納して作業者に郵送する場合と比較して、元音声データ全体の聞き起こし作業に要する時間や費用を低減することができる。 With this configuration, the server according to the present invention transmits the transmission voice data to be woken up by the worker of the listening work among the plurality of transmission voice data to the worker terminal. Thus, the burden on the operator can be reduced as compared with the prior art. Moreover, since the server of this invention does not require the synchronization of the character data between the terminals for workers, the burden of the terminal for workers can be reduced conventionally. In addition, since the server of the present invention can simultaneously share the work of arousing the original voice data among a plurality of workers, it is possible to reduce the time required for the whole voice of the voice data. In addition, since the server of the present invention electronically sends the transmission voice data to the worker terminal, for example, the server administrator stores the transmission voice data in a recording medium and mails it to the worker. Thus, it is possible to reduce the time and cost required for the entire original voice data.

また、本発明のサーバの前記元音声データ分割手段は、前記元音声データに含まれる無音部分を区切りとして前記送信用音声データを生成することが好ましい。 Further, it is preferable that the original voice data dividing unit of the server of the present invention generates the transmission voice data with a silent part included in the original voice data as a delimiter.

この構成により、本発明のサーバは、送信用音声データがストリーミングで配信される構成と比較して、聞き直しや一時停止などの操作を作業者が行う必要性が減少するので、作業者が送信用音声データの聞き起こし作業に要する時間を短縮することができ、元音声データ全体の聞き起こし作業に要する時間を短縮することができる。 With this configuration, the server according to the present invention reduces the need for the operator to perform operations such as re-listening and pause, as compared with the configuration in which the audio data for transmission is distributed by streaming. It is possible to reduce the time required for the work of revoking the trusted sound data, and to reduce the time required for the work of revoking the entire original sound data.

また、本発明のサーバの前記元音声データ分割手段は、前記元音声データに含まれる文の切れ目部分を区切りとして前記送信用音声データを生成することが好ましい。 Further, it is preferable that the original voice data dividing unit of the server of the present invention generates the transmission voice data with a break portion of a sentence included in the original voice data as a delimiter.

この構成により、本発明のサーバは、送信用音声データが大き過ぎる構成と比較して、作業者が送信用音声データの聞き起こし作業に要する時間を短縮することができるので、元音声データ全体の聞き起こし作業に要する時間を短縮することができる。また、本発明のサーバは、送信用音声データが小さ過ぎる構成と比較して、送信用音声データに含まれる話の内容を聞き起こし作業の作業者に容易に理解させることができるので、作業者による送信用音声データの聞き起こし作業を容易化することができる。 With this configuration, the server according to the present invention can reduce the time required for the worker to wake up the transmission voice data compared to a configuration in which the transmission voice data is too large. It is possible to shorten the time required for the work of listening. In addition, the server of the present invention enables the worker of the work to easily hear the contents of the story included in the voice data for transmission compared with the configuration in which the voice data for transmission is too small. It is possible to facilitate the task of awakening the voice data for transmission.

また、本発明のサーバの前記元音声データ分割手段は、前記元音声データに含まれる話者の切り替わり部分を区切りとして前記送信用音声データを生成することが好ましい。 In addition, it is preferable that the original voice data dividing unit of the server of the present invention generates the transmission voice data with a speaker switching portion included in the original voice data as a delimiter.

また、本発明のサーバの前記元音声データ分割手段は、前記元音声データに含まれる話題の切り替わり部分を区切りとして前記送信用音声データを生成することが好ましい。 In addition, it is preferable that the original voice data dividing unit of the server of the present invention generates the transmission voice data with a topic switching portion included in the original voice data as a delimiter.

また、本発明のサーバの前記振分送信手段は、前記送信用音声データを送信する前記作業者用端末を前記送信用音声データの話者に応じて選択することが好ましい。 Moreover, it is preferable that the said distribution transmission means of the server of this invention selects the said operator terminal which transmits the said audio data for transmission according to the speaker of the said audio data for transmission.

この構成により、本発明のサーバは、作業者毎に特定の話者に特化させることができるので、作業者による聞き起こし作業の効率を向上させることができる。また、本発明のサーバは、作業者用端末毎に特定の話者に特化させることができるので、作業者用端末が音声認識を行う場合、話者の声質や話題によって音声認識の精度を向上することができる。 With this configuration, the server according to the present invention can be specialized for a specific speaker for each worker, so that the efficiency of the wake-up work by the worker can be improved. In addition, since the server of the present invention can be specialized for a specific speaker for each worker terminal, when the worker terminal performs speech recognition, the accuracy of speech recognition depends on the voice quality or topic of the speaker. Can be improved.

また、本発明のサーバは、音声認識によって前記音声データに基づいて前記文字データを生成する文字データ生成手段を備え、前記振分送信手段は、前記作業者用端末に送信する前記送信用音声データに対応する前記文字データを前記作業者用端末に送信することが好ましい。 The server of the present invention further includes character data generation means for generating the character data based on the voice data by voice recognition, and the distribution transmission means transmits the transmission voice data to the worker terminal. It is preferable that the character data corresponding to is transmitted to the worker terminal.

この構成により、本発明のサーバは、音声認識によって大体の文字データを生成するので、送信用音声データを最初から作業者に聞き起こさせる構成と比較して、作業者が送信用音声データの聞き起こし作業に要する時間や労力を低減することができる。したがって、本発明のサーバは、元音声データ全体の聞き起こし作業に要する時間や人件費を低減することができる。 With this configuration, the server of the present invention generates a large amount of character data by voice recognition, so that the worker can listen to the transmission voice data as compared with a configuration in which the transmission voice data is aroused from the beginning. The time and labor required for the wake-up work can be reduced. Therefore, the server of the present invention can reduce the time and labor cost required for the whole original voice data to be aroused.

また、本発明のサーバの前記振分送信手段は、前記送信用音声データの要求を前記作業者用端末から受けたときに前記要求を行った前記作業者用端末に前記送信用音声データを送信することが好ましい。 The distribution transmission unit of the server of the present invention transmits the transmission voice data to the worker terminal that has made the request when the request for the transmission voice data is received from the worker terminal. It is preferable to do.

この構成により、本発明のサーバは、作業者による聞き起こし作業の進捗状況に応じて作業者用端末に送信用音声データを送信することができるので、作業者による聞き起こし作業の進捗状況とは無関係に送信用音声データを作業者用端末に送信する構成と比較して、元音声データ全体の聞き起こし作業を効率化して所要時間を短縮することができる。 With this configuration, the server of the present invention can transmit the audio data for transmission to the worker terminal according to the progress status of the audible work by the worker, so what is the progress status of the wake-up work by the worker? Irrespective of the configuration in which the transmission voice data is transmitted to the worker terminal regardless of the above, it is possible to streamline the entire original voice data and reduce the required time.

また、本発明のサーバは、前記振分送信手段によって送信される前記送信用音声データを選択する送信用音声データ選択手段を備え、前記送信用音声データ選択手段は、前記送信用音声データの話題と、前記作業者とに基づいて前記送信用音声データを選択することが好ましい。 The server of the present invention further includes a transmission voice data selection unit that selects the transmission voice data transmitted by the distribution transmission unit, and the transmission voice data selection unit includes a topic of the transmission voice data. It is preferable to select the transmission voice data based on the operator.

この構成により、本発明のサーバは、作業者に適した送信用音声データを作業者用端末に送信することができるので、話題とは無関係に送信用音声データを作業者用端末に送信する構成と比較して、作業者が聞き起こし作業に要する時間を短縮することができる。したがって、本発明のサーバは、元音声データ全体の聞き起こし作業に要する時間を短縮することができる。 With this configuration, the server according to the present invention can transmit the transmission voice data suitable for the worker to the worker terminal, and therefore the transmission voice data is transmitted to the worker terminal regardless of the topic. Compared with, it is possible to shorten the time required for the operator to wake up. Therefore, the server of the present invention can shorten the time required for the work of rehearsing the entire original voice data.

また、本発明のサーバは、前記振分送信手段によって送信される前記送信用音声データを選択する送信用音声データ選択手段を備え、前記送信用音声データ選択手段は、前記作業者用端末が音声認識によって前記音声データに基づいて前記文字データを生成する場合、前記送信用音声データの話題と、前記音声認識によって使用される音声認識辞書が対象とする話題とに基づいて前記送信用音声データを選択することが好ましい。 The server of the present invention further includes a transmission voice data selection unit that selects the transmission voice data transmitted by the distribution transmission unit, and the transmission voice data selection unit is configured such that the worker terminal has a voice. When generating the character data based on the voice data by recognition, the voice data for transmission is determined based on the topic of the voice data for transmission and the topic targeted by the voice recognition dictionary used by the voice recognition. It is preferable to select.

この構成により、本発明のサーバは、作業者用端末に適した送信用音声データを作業者用端末に送信することができるので、話題とは無関係に送信用音声データを作業者用端末に送信する構成と比較して、作業者が聞き起こし作業に要する時間を短縮することができる。したがって、本発明のサーバは、元音声データ全体の聞き起こし作業に要する時間を短縮することができる。 With this configuration, the server of the present invention can transmit the transmission voice data suitable for the worker terminal to the worker terminal, and therefore transmits the transmission voice data to the worker terminal regardless of the topic. Compared with the structure to perform, it can shorten the time which an operator requires for arousing work. Therefore, the server of the present invention can shorten the time required for the work of rehearsing the entire original voice data.

また、本発明のサーバの前記振分送信手段は、前記作業者用端末に前記送信用音声データを暗号化して送信することが好ましい。 Moreover, it is preferable that the said distribution transmission means of the server of this invention encrypts and transmits the said audio | voice data for transmission to the said terminal for workers.

この構成により、本発明のサーバは、例えば機密情報や個人情報などの秘密情報が元音声データに含まれる場合であっても、秘密を守りながら公衆ネットワークを介して作業者用端末と通信を行うことができる。 With this configuration, the server of the present invention communicates with the worker's terminal via the public network while protecting the secret even when the confidential information such as confidential information and personal information is included in the original voice data. be able to.

また、本発明のサーバ用プログラムは、音声データに基づいて文字データを生成する聞き起こし作業に携わる複数の作業者によって使用される複数の作業者用端末と通信を行うサーバを動作させるサーバ用プログラムであって、前記聞き起こし作業の元となる音声データである元音声データを前記作業者用端末への送信用の音声データである複数の送信用音声データに分割する元音声データ分割手段と、前記元音声データ分割手段によって生成された複数の前記送信用音声データを前記複数の作業者用端末に振り分けて送信する振分送信手段としてサーバを機能させることを特徴とする。 Further, the server program of the present invention is a server program for operating a server that communicates with a plurality of worker terminals used by a plurality of workers engaged in a transcription work that generates character data based on voice data. And original voice data dividing means for dividing the original voice data which is the voice data which is the source of the audible work into a plurality of transmission voice data which is voice data for transmission to the worker terminal, The server is made to function as a distribution transmission unit that distributes and transmits the plurality of transmission audio data generated by the original audio data division unit to the plurality of worker terminals.

この構成により、本発明のサーバ用プログラムは、複数の送信用音声データのうち聞き起こし作業の作業者が聞き起こすべき送信用音声データを作業者用端末に送信するので、作業者の作業を従来より効率化して作業者の負担を従来より低減することができる。また、本発明のサーバ用プログラムは、作業者用端末同士における文字データの同期を必要としないので、作業者用端末の負担を従来より低減することができる。また、本発明のサーバ用プログラムは、元音声データの聞き起こし作業を複数の作業者に同時に分担させることができるので、元音声データ全体の聞き起こし作業に要する時間を短縮することができる。また、本発明のサーバ用プログラムは、送信用音声データを作業者用端末に電子的に送るので、送信用音声データを記録媒体に格納して作業者に郵送する場合と比較して、元音声データ全体の聞き起こし作業に要する時間や費用を低減することができる。 With this configuration, the server program of the present invention transmits the voice data for transmission to be woken up by the worker of the listening work among the plurality of voice data for transmission to the worker terminal. It is more efficient and the burden on the operator can be reduced than before. Moreover, since the server program of the present invention does not require the synchronization of character data between the worker terminals, the burden on the worker terminal can be reduced as compared with the prior art. In addition, the server program of the present invention can share the work of rehearsing the original voice data to a plurality of workers at the same time, so that the time required for the work of rehearsing the whole original sound data can be reduced. In addition, since the server program of the present invention electronically sends the transmission voice data to the worker terminal, the original voice is compared with the case where the transmission voice data is stored in the recording medium and mailed to the worker. It is possible to reduce the time and cost required for the whole data rehearsal work.

また、本発明の聞き起こしシステムは、音声データに基づいて文字データを生成する聞き起こし作業に携わる複数の作業者によって使用される複数の作業者用端末と、前記複数の作業者用端末と通信を行うサーバとを備えた聞き起こしシステムであって、前記サーバは、前記聞き起こし作業の元となる音声データである元音声データを前記作業者用端末への送信用の音声データである複数の送信用音声データに分割する元音声データ分割手段と、前記元音声データ分割手段によって生成された複数の前記送信用音声データを前記複数の作業者用端末に振り分けて送信する振分送信手段とを備えたことを特徴とする。 In addition, the speech system according to the present invention includes a plurality of worker terminals used by a plurality of workers engaged in a speech task that generates character data based on voice data, and communicates with the plurality of worker terminals. And a server for performing a speech process, wherein the server is a plurality of voice data for transmission to the worker terminal of original voice data that is voice data that is a source of the work to be heard Original voice data dividing means for dividing the voice data for transmission, and distribution transmitting means for distributing the plurality of transmission voice data generated by the original voice data dividing means to the plurality of worker terminals for transmission. It is characterized by having.

この構成により、本発明の聞き起こしシステムは、複数の送信用音声データのうち聞き起こし作業の作業者が聞き起こすべき送信用音声データを作業者用端末に送信するので、作業者の作業を従来より効率化して作業者の負担を従来より低減することができる。また、本発明の聞き起こしシステムは、作業者用端末同士における文字データの同期を必要としないので、作業者用端末の負担を従来より低減することができる。また、本発明の聞き起こしシステムは、元音声データの聞き起こし作業を複数の作業者に同時に分担させることができるので、元音声データ全体の聞き起こし作業に要する時間を短縮することができる。また、本発明の聞き起こしシステムは、送信用音声データを作業者用端末に電子的に送るので、送信用音声データを記録媒体に格納して作業者に郵送する場合と比較して、元音声データ全体の聞き起こし作業に要する時間や費用を低減することができる。 With this configuration, the speech system according to the present invention transmits the voice data for transmission to be heard by the worker of the speech work among the plurality of voice data for transmission to the worker terminal. It is more efficient and the burden on the operator can be reduced than before. Moreover, since the transcription system of this invention does not require the synchronization of the character data in worker terminals, the burden of a worker terminal can be reduced conventionally. In addition, since the audible system of the present invention can share the utterance work of the original voice data to a plurality of workers at the same time, the time required for the wakeup work of the entire original voice data can be shortened. In addition, since the transmission system of the present invention electronically transmits the transmission voice data to the worker terminal, the original voice is compared with the case where the transmission voice data is stored in the recording medium and mailed to the worker. It is possible to reduce the time and cost required for the whole data rehearsal work.

また、本発明の聞き起こしシステムの前記作業者用端末は、前記音声データに基づいて音声を出力する音声出力手段と、作業者の指示に応じて前記文字データを編集する文字データ編集手段と、前記文字データを前記サーバに送信する文字データ送信手段とを備えることが好ましい。 Further, the worker terminal of the speech system of the present invention, the voice output means for outputting the voice based on the voice data, the character data editing means for editing the character data according to the instructions of the worker, It is preferable to include character data transmission means for transmitting the character data to the server.

この構成により、本発明の聞き起こしシステムは、作業者によって聞き起こされた文字データを作業者用端末からサーバに電子的に送るので、例えば作業者が文字データを記録媒体に格納してサーバの管理者に郵送する場合と比較して、元音声データ全体の聞き起こし作業に要する時間や費用を低減することができる。 With this configuration, the speech system according to the present invention electronically sends the character data aroused by the worker from the worker terminal to the server. For example, the worker stores the character data in the recording medium and stores the character data on the server. Compared with the case of mailing to an administrator, the time and cost required for the work of revoking the entire original voice data can be reduced.

また、本発明の聞き起こしシステムの前記振分送信手段は、前記元音声データに含まれる無音部分、文の切れ目部分、話者の切り替わり部分及び話題の切り替わり部分の少なくとも１種類からなる区切り情報を前記送信用音声データとともに前記作業者用端末に送信し、前記作業者用端末は、前記区切り情報に基づいて前記送信用音声データを前記音声出力手段による前記音声の出力用の音声データである複数の出力用音声データに分割する送信用音声データ分割手段を備えることが好ましい。 In addition, the distribution transmission means of the speech system according to the present invention includes delimiter information including at least one of a silent part, a sentence break part, a speaker switching part, and a topic switching part included in the original voice data. The worker terminal is transmitted to the worker terminal together with the transmission voice data, and the worker terminal is a plurality of voice data for outputting the voice by the voice output means based on the delimiter information. It is preferable to include transmission audio data dividing means for dividing the output audio data into the output audio data.

この構成により、本発明の聞き起こしシステムは、作業者の短期記憶に残る程度の量の音声を出力することができるので、作業者の短期記憶に残らないほどに大量の音声を連続して出力する構成と比較して、作業者による聞き起こし作業の効率を向上させることができる。したがって、本発明の聞き起こしシステムは、元音声データ全体の聞き起こし作業に要する時間を短縮することができる。 With this configuration, the speech system according to the present invention can output an amount of sound that remains in the worker's short-term memory, and thus outputs a large amount of sound continuously so as not to remain in the worker's short-term memory. Compared with the structure to perform, it is possible to improve the efficiency of the worker's awakening work. Therefore, the rehearsal system of the present invention can reduce the time required for rehearsing the entire original voice data.

また、本発明の聞き起こしシステムの前記文字データ編集手段は、前記出力用音声データ毎に前記文字データを編集することが好ましい。 Further, it is preferable that the character data editing means of the speech system of the present invention edits the character data for each output audio data.

この構成により、本発明の聞き起こしシステムは、出力中の音声に対応する箇所の文字データを作業者に編集させることができるので、作業者による聞き起こし作業の効率を向上させることができる。したがって、本発明の聞き起こしシステムは、元音声データ全体の聞き起こし作業に要する時間を短縮することができる。 With this configuration, the speech system according to the present invention can cause the worker to edit the character data at the location corresponding to the voice being output, so that the efficiency of the speech work by the worker can be improved. Therefore, the rehearsal system of the present invention can reduce the time required for rehearsing the entire original voice data.

また、本発明の聞き起こしシステムの前記作業者用端末は、音声認識によって前記音声データに基づいて前記文字データを生成する文字データ生成手段を備えることが好ましい。 Moreover, it is preferable that the said operator terminal of the transcription system of this invention is provided with the character data production | generation means which produces | generates the said character data based on the said audio | voice data by audio | voice recognition.

この構成により、本発明の聞き起こしシステムは、音声認識によって大体の文字データを生成するので、送信用音声データを最初から作業者に聞き起こさせる構成と比較して、作業者が送信用音声データの聞き起こし作業に要する時間や労力を低減することができる。したがって、本発明の聞き起こしシステムは、元音声データ全体の聞き起こし作業に要する時間や人件費を低減することができる。 With this configuration, the speech system according to the present invention generates a large amount of character data by speech recognition, so that the worker can transmit the voice data for transmission compared to the configuration in which the worker transmits the voice data for transmission from the beginning. It is possible to reduce the time and labor required for the task of awakening. Therefore, the speech system according to the present invention can reduce the time and labor cost required for the entire speech processing of the original voice data.

また、本発明の聞き起こしシステムの前記振分送信手段は、前記作業者用端末に前記送信用音声データを暗号化して送信し、前記文字データ送信手段は、前記サーバに前記文字データを暗号化して送信することが好ましい。 Further, the distribution transmission means of the transcribe system of the present invention encrypts and transmits the transmission voice data to the worker terminal, and the character data transmission means encrypts the character data to the server. Are preferably transmitted.

この構成により、本発明の聞き起こしシステムは、例えば機密情報や個人情報などの秘密情報が元音声データに含まれる場合であっても、サーバと作業者用端末との間で秘密を守りながら公衆ネットワークを介して通信を行うことができる。 With this configuration, the speech system according to the present invention allows the public system to keep the secret between the server and the worker terminal even when the original voice data includes confidential information such as confidential information and personal information. Communication can be performed via a network.

本発明によれば、聞き起こし作業の作業者の負担と、作業者によって使用される端末の負担とを従来より低減することができる聞き起こしシステムを提供することができる。 According to the present invention, it is possible to provide a wake-up system that can reduce the burden on the worker of the wake-up work and the load on the terminal used by the worker.

以下、本発明の一実施の形態について、図面を用いて説明する。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.

まず、本実施の形態に係る聞き起こしシステムの構成について説明する。 First, the configuration of the speech system according to the present embodiment will be described.

図１に示すように、本実施の形態に係る聞き起こしシステム１０は、公共の議会や会社の会議室などの場所に設置されて会議を録音する録音装置２０と、録音装置２０によって録音された音声に基づいて文字データを生成する聞き起こし作業を管理するサーバとしての管理サーバ４０と、聞き起こし作業に携わる複数の作業者によって使用される複数の作業者用端末としての複数の聞き起こし端末６０とを備えている。録音装置２０、管理サーバ４０及び聞き起こし端末６０は、インターネットなどの公衆ネットワーク１１に接続されている。 As shown in FIG. 1, a speech system 10 according to the present embodiment is installed in a place such as a public parliament or a company meeting room, and a recording device 20 that records the conference, and the recording device 20 records the recording. A management server 40 serving as a server for managing a speech task that generates character data based on speech, and a plurality of speech terminals 60 serving as a plurality of worker terminals used by a plurality of workers involved in the speech task. And. The recording device 20, the management server 40, and the listening terminal 60 are connected to a public network 11 such as the Internet.

図２に示すように、管理サーバ４０は、コンピュータであり、サーバ用プログラムによって動作させられるようになっている。サーバ用プログラムは、録音装置２０（図１参照。）や聞き起こし端末６０（図１参照。）との通信を行う通信手段４１と、各種のデータを記憶する記憶手段４２と、聞き起こし作業の元となる音声データである元音声データ８０（図３参照。）を聞き起こし端末６０への送信用の音声データである複数の送信用音声データ８１（図３参照。）に分割する元音声データ分割手段４３と、通信手段４１によって送信される送信用音声データ８１を選択する送信用音声データ選択手段４４として、管理サーバ４０を機能させるようになっている。 As shown in FIG. 2, the management server 40 is a computer and is operated by a server program. The server program includes a communication means 41 for communicating with the recording device 20 (see FIG. 1) and the audible terminal 60 (see FIG. 1), a storage means 42 for storing various data, and a wake-up work. Original voice data 80 (see FIG. 3) that is the original voice data is awakened and divided into a plurality of transmission voice data 81 (see FIG. 3) that is voice data for transmission to the terminal 60. The management server 40 is caused to function as the dividing means 43 and the transmission voice data selection means 44 for selecting the transmission voice data 81 transmitted by the communication means 41.

通信手段４１は、元音声データ分割手段４３によって生成された複数の送信用音声データ８１を公衆ネットワーク１１を介して複数の聞き起こし端末６０に振り分けて送信するようになっており、振分送信手段としても機能するようになっている。なお、通信手段４１は、送信用音声データ８１の要求を聞き起こし端末６０から受けたときに、要求を行った聞き起こし端末６０に送信用音声データ８１を送信するようになっている。 The communication means 41 distributes a plurality of transmission voice data 81 generated by the original voice data dividing means 43 to a plurality of audible terminals 60 via the public network 11 and transmits them. It is supposed to function as well. Note that when the communication means 41 hears a request for the transmission voice data 81 and receives the request from the terminal 60, the communication means 41 transmits the transmission voice data 81 to the requesting call terminal 60.

また、記憶手段４２は、送信用音声データ８１について、元音声データ８０における先頭からの順番と、送信先である聞き起こし端末６０との対応関係を示すテーブルである順番テーブル（図４参照。）を記憶するようになっている。順番テーブルの「送信先」項目には、管理サーバ４０が聞き起こし端末６０に未だ送信していない送信用音声データ８１については、「未」という値が格納され、管理サーバ４０が聞き起こし端末６０に送信した送信用音声データ８１については、送信先の聞き起こし端末６０の名称が格納され、管理サーバ４０が聞き起こし端末６０から対応する文字データを受信した送信用音声データ８１については、「済」という値が格納される。 Further, the storage means 42 is an order table (see FIG. 4) showing the correspondence between the order from the beginning of the original voice data 80 and the listening terminal 60 as the transmission destination for the voice data 81 for transmission. Is to be remembered. In the “transmission destination” item of the order table, a value of “not yet” is stored for the transmission voice data 81 that is not yet transmitted to the terminal 60 by the management server 40, and the management server 40 awakens the terminal 60. For the transmission voice data 81 transmitted to, the name of the transmission terminal 60 is stored, and for the transmission voice data 81 for which the management server 40 has received the corresponding character data from the terminal 60 Is stored.

また、元音声データ分割手段４３は、音声認識機能と自然言語処理機能とによって元音声データ８０に含まれる文の切れ目部分を判断し、文の切れ目部分を区切りとして送信用音声データ８１を生成するようになっている。即ち、元音声データ分割手段４３は、文単位で送信用音声データ８１を生成するようになっている。 Further, the original voice data dividing unit 43 determines a sentence break included in the original voice data 80 by using the voice recognition function and the natural language processing function, and generates the transmission voice data 81 with the sentence break as a delimiter. It is like that. That is, the original voice data dividing unit 43 generates the transmission voice data 81 in sentence units.

また、送信用音声データ選択手段４４は、送信用音声データ８１の話題と、聞き起こし端末６０における音声認識によって使用される音声認識辞書が対象とする話題とに基づいて送信用音声データ８１を選択するようになっている。 Further, the transmission voice data selection means 44 selects the transmission voice data 81 based on the topic of the transmission voice data 81 and the topic targeted by the voice recognition dictionary used for voice recognition in the listening terminal 60. It is supposed to be.

図５に示すように、聞き起こし端末６０は、コンピュータであり、端末用プログラムによって動作させられるようになっている。端末用プログラムは、管理サーバ４０（図１参照。）との通信を行う通信手段６１と、各種のデータを記憶する記憶手段６２と、送信用音声データ８１（図６参照。）を音声の出力用の音声データである複数の出力用音声データ８２（図６参照。）に分割する送信用音声データ分割手段６３と、音声認識によって文字データを生成する文字データ生成手段６４と、出力用音声データ８２に基づいて音声を出力する音声出力手段６５と、作業者の指示に応じて文字データを編集する文字データ編集手段６６として、聞き起こし端末６０を機能させるようになっている。 As shown in FIG. 5, the awakening terminal 60 is a computer and is operated by a terminal program. The terminal program outputs a communication means 61 for communicating with the management server 40 (see FIG. 1), a storage means 62 for storing various data, and audio data for transmission 81 (see FIG. 6). Transmission voice data dividing means 63 for dividing the output voice data 82 into a plurality of output voice data 82 (see FIG. 6), character data generating means 64 for generating character data by voice recognition, and output voice data The speech terminal 60 is made to function as a voice output means 65 that outputs voice based on 82 and a character data editing means 66 that edits character data in accordance with instructions from the operator.

通信手段６１は、文字データを管理サーバ４０に送信するようになっており、文字データ送信手段としても機能するようになっている。 The communication means 61 transmits character data to the management server 40, and functions as character data transmission means.

また、送信用音声データ分割手段６３は、音声認識機能と自然言語処理機能とによって送信用音声データ８１に含まれる文節の切れ目部分を判断し、文節の切れ目部分を区切りとして出力用音声データ８２を生成するようになっている。即ち、送信用音声データ分割手段６３は、文節単位で出力用音声データ８２を生成するようになっている。 Further, the transmission voice data dividing means 63 determines the break portion of the phrase included in the transmission voice data 81 by the voice recognition function and the natural language processing function, and outputs the output voice data 82 by using the break portion of the phrase as a delimiter. It is designed to generate. That is, the transmission audio data dividing means 63 generates the output audio data 82 in units of phrases.

また、文字データ編集手段６６は、出力用音声データ８２毎に文字データを編集するようになっている。 The character data editing unit 66 edits the character data for each output audio data 82.

次に、聞き起こしシステム１０の動作について説明する。 Next, the operation of the speech system 10 will be described.

まず、録音装置２０の動作について説明する。 First, the operation of the recording device 20 will be described.

録音装置２０は、会議の録音が録音装置２０の管理者によって指示されると、会議の録音を行って元音声データ８０を録音しながら、公衆ネットワーク１１を介して管理サーバ４０に元音声データ８０をリアルタイムに送信する。 When the recording of the conference is instructed by the administrator of the recording device 20, the recording device 20 performs the recording of the conference and records the original voice data 80, while sending the original voice data 80 to the management server 40 via the public network 11. Is sent in real time.

次に、管理サーバ４０の動作について説明する。 Next, the operation of the management server 40 will be described.

管理サーバ４０は、録音装置２０から公衆ネットワーク１１を介して元音声データ８０を通信手段４１によって受信し始めると、図７に示すように、元音声データ８０を記憶手段４２によって記憶し始める（Ｓ１０１）。 When the management server 40 starts to receive the original voice data 80 from the recording device 20 via the public network 11 by the communication means 41, the management server 40 starts to store the original voice data 80 by the storage means 42 as shown in FIG. 7 (S101). ).

次いで、管理サーバ４０は、記憶手段４２によって記憶されていく元音声データ８０を元音声データ分割手段４３によって図３に示すように送信用音声データ８１に分割し始める（Ｓ１０２）。ここで、管理サーバ４０は、元音声データ８０に含まれる文の切れ目部分を区切りとして送信用音声データ８１を生成する。即ち、送信用音声データ８１は、文単位の音声データである。 Next, the management server 40 starts to divide the original voice data 80 stored in the storage means 42 into the transmission voice data 81 as shown in FIG. 3 by the original voice data dividing means 43 (S102). Here, the management server 40 generates the transmission voice data 81 by using the breaks of the sentence included in the original voice data 80 as a delimiter. That is, the transmission voice data 81 is sentence-by-sentence voice data.

なお、管理サーバ４０は、送信用音声データ８１を生成すると、生成した送信用音声データ８１についての項目を順番テーブルに追加し、「送信先」項目に「未」という値を格納する。 Note that, when the transmission server 81 generates the transmission voice data 81, the management server 40 adds an item for the generated transmission voice data 81 to the order table, and stores a value of “not yet” in the “transmission destination” item.

そして、管理サーバ４０は、複数の送信用音声データ８１を通信手段４１によって公衆ネットワーク１１を介して複数の聞き起こし端末６０に振り分けて送信し始める（Ｓ１０３）。即ち、管理サーバ４０は、以後、送信用音声データ８１の要求を公衆ネットワーク１１を介して聞き起こし端末６０から通信手段４１によって受けたときに、要求を行った聞き起こし端末６０に公衆ネットワーク１１を介して通信手段４１によって送信用音声データ８１を送信する。ただし、管理サーバ４０は、順番テーブルの「送信先」項目に名称が格納されている聞き起こし端末６０からの要求には応じない。 Then, the management server 40 starts transmitting the plurality of transmission voice data 81 by the communication means 41 to the plurality of listening terminals 60 via the public network 11 (S103). In other words, when the management server 40 subsequently asks for the transmission voice data 81 via the public network 11 and receives the request from the terminal 60 by the communication means 41, the management server 40 connects the public network 11 to the requesting talk terminal 60. Via the communication means 41, transmission voice data 81 is transmitted. However, the management server 40 does not respond to the request from the awakening terminal 60 whose name is stored in the “transmission destination” item of the order table.

なお、管理サーバ４０は、送信用音声データ８１を聞き起こし端末６０に送信すると、送信した送信用音声データ８１について、順番テーブルの「送信先」項目に送信先の聞き起こし端末６０の名称を格納する。 When the management server 40 listens to the transmission voice data 81 and transmits it to the terminal 60, the name of the transmission destination voice terminal 60 is stored in the “transmission destination” item of the order table for the transmitted transmission voice data 81. To do.

管理サーバ４０によって送信される送信用音声データ８１は、送信用音声データ選択手段４４によって選択される。即ち、管理サーバ４０は、送信用音声データ８１の話題と、送信用音声データ８１の要求を行った聞き起こし端末６０によって使用される音声認識辞書が対象とする話題とに基づいて、送信用音声データ８１の要求を行った聞き起こし端末６０に送信する送信用音声データ８１を選択する。例えば、送信用音声データ８１の要求を行った聞き起こし端末６０によって使用される音声認識辞書が対象とする話題が医療関連である場合、管理サーバ４０は、送信用音声データ８１の要求を行った聞き起こし端末６０に送信する送信用音声データ８１として、話題が医療関連である送信用音声データ８１を選択する。送信用音声データ８１の話題は、予め会議の議題に基づいて登録されていても良いし、管理サーバ４０が音声認識によって元音声データ８０や送信用音声データ８１から抽出しても良い。送信用音声データ８１の要求を行った聞き起こし端末６０によって使用される音声認識辞書が対象とする話題は、予め管理サーバ４０に登録されていても良いし、送信用音声データ８１の要求とともに聞き起こし端末６０から管理サーバ４０に通知されても良い。 The transmission voice data 81 transmitted by the management server 40 is selected by the transmission voice data selection means 44. In other words, the management server 40 transmits the transmission voice based on the topic of the transmission voice data 81 and the topic targeted by the voice recognition dictionary used by the listening terminal 60 that has requested the transmission voice data 81. The transmission voice data 81 to be transmitted to the listening terminal 60 that has requested the data 81 is selected. For example, when the topic targeted by the speech recognition dictionary used by the listening terminal 60 that has requested the transmission voice data 81 is related to medical care, the management server 40 has requested the transmission voice data 81. As the transmission voice data 81 to be transmitted to the listening terminal 60, the transmission voice data 81 whose topic is related to medical care is selected. The topic of the transmission voice data 81 may be registered in advance based on the agenda of the meeting, or may be extracted from the original voice data 80 or the transmission voice data 81 by the management server 40 by voice recognition. The topic targeted by the speech recognition dictionary used by the awakening terminal 60 that requested the transmission voice data 81 may be registered in the management server 40 in advance, or listened together with the request for the transmission voice data 81. The management server 40 may be notified from the wake-up terminal 60.

そして、管理サーバ４０は、公衆ネットワーク１１を介して聞き起こし端末６０から文字データを通信手段４１によって受信すると、受信した文字データを並べながら記憶手段４２によって記憶し始める（Ｓ１０４）。即ち、管理サーバ４０は、以後、文字データを聞き起こし端末６０から公衆ネットワーク１１を介して受信すると、受信した文字データの送信元である聞き起こし端末６０に管理サーバ４０自身が直前に送信した送信用音声データ８１について、順番テーブルに基づいて元音声データ８０における順番を求め、求めた順番通りに文字データを並べながら記憶する。即ち、管理サーバ４０によって受信された文字データは、元になった送信用音声データ８１と同じ順番で並べられ、最終的に全体として元音声データ８０に対応した文字データとなる。 Then, when the management server 40 receives the character data from the terminal 60, which is awakened via the public network 11, by the communication means 41, the management server 40 starts storing the received character data while arranging the received character data (S104). In other words, when the management server 40 subsequently wakes up character data and receives it from the terminal 60 via the public network 11, the management server 40 itself sends to the wake-up terminal 60 which is the transmission source of the received character data. For the trusted voice data 81, the order in the original voice data 80 is obtained based on the order table, and character data is stored while being arranged in the obtained order. That is, the character data received by the management server 40 is arranged in the same order as the original transmission voice data 81 and finally becomes character data corresponding to the original voice data 80 as a whole.

なお、管理サーバ４０は、文字データを記憶すると、記憶した文字データの送信元である聞き起こし端末６０に管理サーバ４０自身が直前に送信した送信用音声データ８１について、順番テーブルの「送信先」項目に「済」という値を格納する。 When the management server 40 stores the character data, the “transmission destination” in the order table for the transmission voice data 81 transmitted immediately before by the management server 40 itself to the listening terminal 60 that is the transmission source of the stored character data. Stores the value “done” in the item.

次に、聞き起こし端末６０の動作について説明する。 Next, the operation of the listening terminal 60 will be described.

聞き起こし端末６０は、作業者によって送信用音声データ８１の取り込みが指示されると、図８に示すように、公衆ネットワーク１１を介して管理サーバ４０に通信手段６１によって送信用音声データ８１を要求する（Ｓ１２１）。 When the operator instructs the capture of the transmission voice data 81, the communication terminal 61 requests the transmission voice data 81 from the communication server 61 to the management server 40 via the public network 11, as shown in FIG. (S121).

そして、聞き起こし端末６０は、管理サーバ４０から公衆ネットワーク１１を介して送信用音声データ８１が送信されたと判断するまで、管理サーバ４０から送信用音声データ８１が送信されたか否かを判断する（Ｓ１２２）。 Then, the listening terminal 60 determines whether or not the transmission voice data 81 is transmitted from the management server 40 until it is determined that the transmission voice data 81 is transmitted from the management server 40 via the public network 11 ( S122).

聞き起こし端末６０は、管理サーバ４０から送信用音声データ８１が送信されたとＳ１２２において判断すると、管理サーバ４０から送信された送信用音声データ８１を通信手段６１によって受信して（Ｓ１２３）、記憶手段６２によって記憶する（Ｓ１２４）。 When the listening terminal 60 determines in S122 that the transmission voice data 81 has been transmitted from the management server 40, the communication terminal 61 receives the transmission voice data 81 transmitted from the management server 40 (S123), and the storage means. 62 (S124).

次いで、聞き起こし端末６０は、Ｓ１２４において記憶した送信用音声データ８１を送信用音声データ分割手段６３によって図６に示すように出力用音声データ８２として文節単位に分割し（Ｓ１２５）、Ｓ１２４において記憶した送信用音声データ８１に基づいて文字データ生成手段６４によって音声認識を行って出力用音声データ８２毎に文字データを生成する（Ｓ１２６）。 Next, the listening terminal 60 divides the transmission audio data 81 stored in S124 into phrase units as output audio data 82 as shown in FIG. 6 by the transmission audio data dividing means 63 (S125), and stores it in S124. Based on the transmitted voice data 81, the character data generating means 64 performs voice recognition to generate character data for each output voice data 82 (S126).

そして、聞き起こし端末６０は、Ｓ１２６において生成された全ての文字データの編集の完了が作業者によって指示されたか否かを判断する（Ｓ１２７）。 Then, the awakening terminal 60 determines whether or not the operator has instructed completion of editing all the character data generated in S126 (S127).

聞き起こし端末６０は、全ての文字データの編集の完了が作業者によって指示されていないとＳ１２７において判断すると、Ｓ１２５において分割された何れかの出力用音声データ８２が作業者によって指定されたか否かを判断する（Ｓ１２８）。 If it is determined in S127 that the completion of editing of all the character data is not instructed by the operator, the listening terminal 60 determines whether any of the output audio data 82 divided in S125 is specified by the operator. Is determined (S128).

聞き起こし端末６０は、何れの出力用音声データ８２も作業者によって指定されていないとＳ１２８において判断すると、再びＳ１２７の処理を実行する。 If it is determined in S128 that no voice data 82 for output has been designated by the operator, the awakening terminal 60 executes the process of S127 again.

聞き起こし端末６０は、何れかの出力用音声データ８２が作業者によって指定されたとＳ１２８において判断すると、Ｓ１２８において作業者によって指定された出力用音声データ８２に基づいて音声出力手段６５によって音声を出力し（Ｓ１２９）、Ｓ１２８において作業者によって指定された出力用音声データ８２に対応する文字データを作業者の指示に基づいて文字データ編集手段６６によって編集した後（Ｓ１３０）、再びＳ１２７の処理を実行する。したがって、作業者は、聞き起こし端末６０の図示していないスピーカなどの音声出力装置から音声を聞きながら、聞き起こし端末６０の図示していないキーボードなどの入力装置を介して文字データを編集するという作業を、出力用音声データ８２毎、即ち、文節単位毎に実行することができる。 When it is determined in S128 that any of the output audio data 82 has been designated by the operator, the awakening terminal 60 outputs the audio by the audio output means 65 based on the output audio data 82 designated by the operator in S128. (S129) After the character data corresponding to the output voice data 82 designated by the operator in S128 is edited by the character data editing means 66 based on the operator's instruction (S130), the process of S127 is executed again. To do. Therefore, the operator edits the character data through an input device such as a keyboard (not shown) of the speech terminal 60 while listening to the voice from a speech output device such as a speaker (not shown) of the speech terminal 60. The work can be executed for each output audio data 82, that is, for each phrase.

聞き起こし端末６０は、全ての文字データの編集の完了が作業者によって指示されたとＳ１２７において判断すると、公衆ネットワーク１１を介して管理サーバ４０に通信手段６１によって一連の文字データを送信する（Ｓ１３１）。 When it is determined in S127 that the operator has instructed completion of editing all the character data, the transcribe terminal 60 transmits a series of character data to the management server 40 via the public network 11 by the communication means 61 (S131). .

以上に説明したように、聞き起こしシステム１０は、複数の送信用音声データ８１のうち聞き起こし作業の作業者が聞き起こすべき送信用音声データ８１を聞き起こし端末６０に送信するので、作業者の作業を従来より効率化して作業者の負担を従来より低減することができる。 As described above, the awakening system 10 awakens and transmits to the terminal 60 the transmission audio data 81 to be awakened by the worker of the awakening work among the plurality of transmission sound data 81. The work can be made more efficient than before, and the burden on the operator can be reduced more than before.

また、聞き起こしシステム１０は、聞き起こし端末６０同士における文字データの同期を必要としないので、聞き起こし端末６０の負担を従来より低減することができる。 In addition, since the speech system 10 does not require synchronization of character data between the speech terminals 60, the burden on the speech terminals 60 can be reduced as compared with the conventional case.

また、聞き起こしシステム１０は、元音声データ８０の聞き起こし作業を複数の作業者に同時に分担させることができるので、元音声データ８０全体の聞き起こし作業に要する時間を短縮することができる。 In addition, since the audible system 10 can simultaneously share the audible work of the original voice data 80 to a plurality of workers, the time required for the wakeup work of the entire original voice data 80 can be reduced.

また、聞き起こしシステム１０は、送信用音声データ８１を聞き起こし端末６０に電子的に送るので、送信用音声データ８１を記録媒体に格納して作業者に郵送する場合と比較して、元音声データ８０全体の聞き起こし作業に要する時間や費用を低減することができる。 In addition, since the audible system 10 audibly transmits the transmission voice data 81 and electronically sends it to the terminal 60, the original voice is compared with the case where the transmission voice data 81 is stored in a recording medium and mailed to the operator. It is possible to reduce the time and cost required for the entire data 80 to be transcribed.

そして、聞き起こしシステム１０は、元音声データ８０全体の聞き起こし作業に要する時間を短縮することができるので、例えば午前中の会議の内容を記した議事録を午後に用意することができる。 Then, since the audible system 10 can shorten the time required for the audible work of the entire original voice data 80, for example, a minutes describing the contents of the morning meeting can be prepared in the afternoon.

また、聞き起こしシステム１０は、送信用音声データ８１を聞き起こし端末６０に電子的に送るので、勤労意欲がありながら在宅勤務という条件に拘束される人、例えばシングルマザーに勤労機会を提供することができる。 In addition, since the awakening system 10 awakens the transmission voice data 81 and sends it electronically to the terminal 60, it provides a working opportunity to a person who is motivated to work but is restricted by the conditions of working from home, for example, a single mother. Can do.

また、聞き起こしシステム１０は、作業者によって聞き起こされた文字データを聞き起こし端末６０から管理サーバ４０に電子的に送るので、例えば作業者が文字データを記録媒体に格納して管理サーバ４０の管理者に郵送する場合と比較して、元音声データ８０全体の聞き起こし作業に要する時間や費用を低減することができる。 In addition, the transcribe system 10 transcribes character data aroused by the worker and electronically sends it from the terminal 60 to the management server 40. For example, the worker stores the character data in a recording medium and stores the character data in the management server 40. Compared with the case of mailing to the administrator, the time and cost required for the work of awakening the entire original voice data 80 can be reduced.

また、管理サーバ４０は、元音声データ８０に含まれる文の切れ目部分を区切りとして送信用音声データ８１を生成するようになっているので、送信用音声データ８１が大き過ぎる構成と比較して、作業者が送信用音声データ８１の聞き起こし作業に要する時間を短縮することができ、元音声データ８０全体の聞き起こし作業に要する時間を短縮することができる。また、管理サーバ４０は、元音声データ８０に含まれる文の切れ目部分を区切りとして送信用音声データ８１を生成するようになっているので、送信用音声データ８１が小さ過ぎる構成と比較して、送信用音声データ８１に含まれる話の内容を聞き起こし作業の作業者に容易に理解させることができ、作業者による送信用音声データ８１の聞き起こし作業を容易化することができる。 In addition, since the management server 40 generates the transmission voice data 81 with the sentence break included in the original voice data 80 as a delimiter, compared to the configuration in which the transmission voice data 81 is too large, It is possible to reduce the time required for the operator to wake up the transmission voice data 81, and to reduce the time required to wake up the entire original voice data 80. In addition, since the management server 40 is configured to generate the transmission voice data 81 with the sentence break included in the original voice data 80 as a delimiter, compared to the configuration in which the transmission voice data 81 is too small, The content of the story contained in the transmission voice data 81 can be aroused to make the worker of the work understand easily, and the worker can easily make the transmission voice data 81 awake.

なお、管理サーバ４０の元音声データ分割手段４３は、元音声データ８０に含まれる文の切れ目部分以外の部分を区切りとして送信用音声データ８１を生成するようになっていても良い。送信用音声データ８１は、作業者が聞き起こし作業を請け負うか否かを即決できる程度に小さいことが好ましい。 Note that the original voice data dividing unit 43 of the management server 40 may generate the transmission voice data 81 with a portion other than the break portion of the sentence included in the original voice data 80 as a delimiter. It is preferable that the transmission voice data 81 is small enough to promptly determine whether or not the worker can wake up and accept the work.

例えば、元音声データ分割手段４３は、所定の音量以下の状態が所定の秒数以上続いた場合に無音部分と判断するなどして、元音声データ８０に含まれる息継ぎなどの無音部分を判断し、無音部分を区切りとして送信用音声データ８１を生成するようになっていても良い。この場合、管理サーバ４０は、送信用音声データ８１がストリーミングで配信される構成と比較して、聞き直しや一時停止などの操作を作業者が行う必要性が減少するので、作業者が送信用音声データ８１の聞き起こし作業に要する時間を短縮することができ、元音声データ８０全体の聞き起こし作業に要する時間を短縮することができる。 For example, the original voice data dividing unit 43 determines a silent part such as a breathing included in the original voice data 80 by determining a silent part when a state of a predetermined volume or lower continues for a predetermined number of seconds or more. The transmission audio data 81 may be generated with the silent part as a break. In this case, the management server 40 reduces the need for the operator to perform operations such as re-listening and pause as compared with the configuration in which the transmission audio data 81 is distributed by streaming. It is possible to reduce the time required for the speech work of the voice data 81, and to shorten the time required for the work of rehearsing the original voice data 80 as a whole.

また、元音声データ分割手段４３は、声紋などを利用した話者識別機能によって元音声データ８０に含まれる話者の切り替わり部分を判断し、話者の切り替わり部分を区切りとして送信用音声データ８１を生成するようになっていても良い。また、元音声データ分割手段４３は、音声認識機能と自然言語処理機能とによって元音声データ８０に含まれる話題の切り替わり部分を判断し、話題の切り替わり部分を区切りとして送信用音声データ８１を生成するようになっていても良い。これらの場合、管理サーバ４０は、元音声データ８０に含まれる文の切れ目部分を区切りとして送信用音声データ８１を生成する構成と同様な効果を得ることができる。 Further, the original voice data dividing means 43 determines a speaker switching portion included in the original voice data 80 by a speaker identification function using a voiceprint or the like, and sets the transmission voice data 81 by using the speaker switching portion as a delimiter. It may be generated. Further, the original voice data dividing unit 43 determines a topic switching portion included in the original voice data 80 by the voice recognition function and the natural language processing function, and generates transmission voice data 81 with the topic switching portion as a delimiter. It may be like this. In these cases, the management server 40 can obtain the same effect as that of the configuration in which the transmission voice data 81 is generated with the sentence break included in the original voice data 80 as a break.

また、元音声データ分割手段４３は、元音声データ８０に含まれる無音部分、文の切れ目部分及び話者の切り替わり部分の少なくとも１種類を区切りとして元音声データ８０を分割した後、所定の時間分、例えば３分間分を再結合して送信用音声データ８１を生成するようになっていても良い。この場合、管理サーバ４０は、作業者が送信用音声データ８１の聞き起こし作業に要する時間を略均一にすることができる。 The original voice data dividing unit 43 divides the original voice data 80 by separating at least one of a silent part, a sentence break part, and a speaker switching part included in the original voice data 80, and then for a predetermined time. For example, the audio data 81 for transmission may be generated by recombining three minutes. In this case, the management server 40 can make the time required for the worker to hear the transmission voice data 81 substantially uniform.

また、管理サーバ４０の通信手段４１は、元音声データ８０に含まれる話者の切り替わり部分を区切りとして元音声データ分割手段４３が送信用音声データ８１を生成する場合、送信用音声データ８１を送信する聞き起こし端末６０を送信用音声データ８１の話者に応じて選択するようになっていても良い。この場合、管理サーバ４０は、聞き起こし端末６０毎に特定の話者に特化させることができるので、話者の声質や話題によって聞き起こし端末６０による音声認識の精度を向上することができる。また、管理サーバ４０は、作業者毎に特定の話者に特化させることができるので、作業者による聞き起こし作業の効率を向上させることができる。 Further, the communication means 41 of the management server 40 transmits the transmission voice data 81 when the original voice data division means 43 generates the transmission voice data 81 with the speaker switching part included in the original voice data 80 as a delimiter. It is also possible to select the listening terminal 60 to be selected according to the speaker of the transmission voice data 81. In this case, since the management server 40 can be specialized for a specific speaker for each listening terminal 60, it is possible to improve the accuracy of speech recognition by the terminal 60 based on the voice quality and topic of the speaker. In addition, since the management server 40 can be specialized for a specific speaker for each worker, the efficiency of the awakening work by the worker can be improved.

また、管理サーバ４０は、送信用音声データ８１の要求を聞き起こし端末６０から受けたときに、要求を行った聞き起こし端末６０に送信用音声データ８１を送信するようになっているので、作業者による聞き起こし作業の進捗状況に応じて聞き起こし端末６０に送信用音声データ８１を送信することができる。したがって、管理サーバ４０は、作業者による聞き起こし作業の進捗状況とは無関係に送信用音声データ８１を聞き起こし端末６０に送信する構成と比較して、元音声データ８０全体の聞き起こし作業を効率化して所要時間を短縮することができる。 In addition, when the management server 40 hears a request for the transmission voice data 81 and receives it from the terminal 60, the management server 40 transmits the transmission voice data 81 to the requesting voice terminal 60. The transmission voice data 81 can be transmitted to the listening terminal 60 in accordance with the progress of the listening work by the person. Therefore, the management server 40 is more efficient in performing the wake-up work of the entire original sound data 80 as compared to the configuration in which the sound data 81 for transmission is auded and transmitted to the terminal 60 regardless of the progress of the wake-up work by the worker. To reduce the required time.

また、管理サーバ４０は、送信用音声データ８１の話題と、聞き起こし端末６０における音声認識によって使用される音声認識辞書が対象とする話題とに基づいて送信用音声データ８１を選択するようになっているので、聞き起こし端末６０に適した送信用音声データ８１を聞き起こし端末６０に送信することができる。したがって、管理サーバ４０は、話題とは無関係に送信用音声データ８１を聞き起こし端末６０に送信する構成と比較して、作業者が聞き起こし作業に要する時間を短縮することができ、元音声データ８０全体の聞き起こし作業に要する時間を短縮することができる。 Further, the management server 40 selects the transmission voice data 81 based on the topic of the transmission voice data 81 and the topic targeted by the voice recognition dictionary used by the voice recognition in the listening terminal 60. Therefore, the transmission voice data 81 suitable for the listening terminal 60 can be heard and transmitted to the terminal 60. Therefore, the management server 40 can shorten the time required for the operator to wake up the work compared to the configuration in which the voice data 81 for transmission is audible and transmitted to the terminal 60 regardless of the topic. It is possible to reduce the time required for the entire 80 rehearsal operation.

なお、管理サーバ４０の送信用音声データ選択手段４４は、送信用音声データ８１の話題と、音声認識辞書が対象とする話題とに基づいた方法以外の方法によって送信用音声データ８１を選択するようになっていても良い。 The transmission voice data selection unit 44 of the management server 40 selects the transmission voice data 81 by a method other than the method based on the topic of the transmission voice data 81 and the topic targeted by the voice recognition dictionary. It may be.

例えば、送信用音声データ選択手段４４は、送信用音声データ８１の話題と、聞き起こし端末６０における作業者とに基づいて送信用音声データ８１を選択するようになっていても良い。この場合、管理サーバ４０は、例えば法律関連の話題を得意とする作業者の聞き起こし端末６０に法律関係の話題の送信用音声データ８１を送信するなど、作業者に適した送信用音声データ８１を聞き起こし端末６０に送信することができるので、話題とは無関係に送信用音声データ８１を聞き起こし端末６０に送信する構成と比較して、作業者が聞き起こし作業に要する時間を短縮することができる。したがって、管理サーバ４０は、元音声データ８０全体の聞き起こし作業に要する時間を短縮することができる。 For example, the transmission voice data selection unit 44 may select the transmission voice data 81 based on the topic of the transmission voice data 81 and the worker in the listening terminal 60. In this case, for example, the management server 40 transmits the transmission voice data 81 on the topic related to the law to the worker's awakening terminal 60 who is good at the topic related to the law. Can be transmitted to the terminal 60, so that the time required for the operator to wake up can be shortened compared to a configuration in which the transmission voice data 81 is audible and transmitted to the terminal 60 regardless of the topic. Can do. Therefore, the management server 40 can shorten the time required for the task of awakening the entire original voice data 80.

また、聞き起こしシステム１０は、音声認識によって大体の文字データを生成するので、送信用音声データ８１を最初から作業者に聞き起こさせる構成と比較して、作業者が送信用音声データ８１の聞き起こし作業に要する時間や労力を低減することができる。したがって、聞き起こしシステム１０は、元音声データ８０全体の聞き起こし作業に要する時間や人件費を低減することができる。 In addition, since the awakening system 10 generates almost character data by voice recognition, the worker can listen to the transmission voice data 81 in comparison with a configuration in which the transmission voice data 81 is aroused from the beginning. The time and labor required for the wake-up work can be reduced. Therefore, the rehearsal system 10 can reduce the time and labor cost required for rehearsing the entire original voice data 80.

なお、聞き起こしシステム１０は、聞き起こし端末６０が音声認識によって文字データを生成するのではなく、管理サーバ４０が元音声データ８０に基づいて音声認識によって文字データを生成し、送信用音声データ８１に対応する文字データを送信用音声データ８１とともに管理サーバ４０が聞き起こし端末６０に送信するようになっていても良い。 In the speech system 10, the speech terminal 60 does not generate character data by voice recognition, but the management server 40 generates character data by voice recognition based on the original voice data 80, and transmission voice data 81. The management server 40 may awaken and transmit the character data corresponding to to the terminal 60 together with the transmission voice data 81.

もちろん、聞き起こしシステム１０は、管理サーバ４０及び聞き起こし端末６０の双方とも音声認識によって文字データを生成しないものであっても良い。 Of course, both the management server 40 and the listening terminal 60 may not generate character data by voice recognition.

また、聞き起こし端末６０は、送信用音声データ８１を出力用音声データ８２に分割するので、作業者の短期記憶に残る程度の量の音声を出力することができる。したがって、聞き起こしシステム１０は、作業者の短期記憶に残らないほどに大量の音声を聞き起こし端末６０が連続して出力する構成と比較して、作業者による聞き起こし作業の効率を向上させることができ、元音声データ８０全体の聞き起こし作業に要する時間を短縮することができる。 In addition, since the listening terminal 60 divides the transmission voice data 81 into the output voice data 82, it is possible to output an amount of voice that remains in the worker's short-term memory. Therefore, the rehearsal system 10 improves the efficiency of the requisition work by the operator as compared with the configuration in which the terminal 60 continuously outputs a large amount of sound that does not remain in the worker's short-term memory. Thus, it is possible to shorten the time required for the entire original voice data 80 to be transcribed.

なお、出力用音声データ８２は、本実施の形態において、文節単位であるが、文節単位以外であっても良い。例えば、聞き起こし端末６０は、送信用音声データ８１に含まれる無音部分、文の切れ目部分、話者の切り替わり部分及び話題の切り替わり部分の少なくとも１種類からなる区切り情報を送信用音声データ８１とともに管理サーバ４０から受信し、区切り情報に基づいて送信用音声データ８１に含まれる無音部分、文の切れ目部分、話者の切り替わり部分及び話題の切り替わり部分の少なくとも１種類を区切りとして出力用音声データ８２を生成するようになっていても良い。送信用音声データ８１が元音声データ８０における話者の切り替わり部分を区切りとして生成されている場合には、聞き起こし端末６０は、例えば、送信用音声データ８１に含まれる文の切れ目部分を区切りとして出力用音声データ８２を生成するようになっていても良い。 The output audio data 82 is in phrase units in the present embodiment, but may be other than phrase units. For example, the awakening terminal 60 manages the delimiter information including at least one of a silent part, a sentence break part, a speaker switching part, and a topic switching part included in the transmission voice data 81 together with the transmission voice data 81. Based on the delimiter information, the output audio data 82 is separated from at least one of a silence portion, a sentence break portion, a speaker switching portion, and a topic switching portion received from the server 40 based on the delimiter information. It may be generated. When the transmission voice data 81 is generated with a speaker switching portion in the original voice data 80 as a delimiter, the awakening terminal 60 uses, for example, a sentence break included in the transmission voice data 81 as a delimiter. The output audio data 82 may be generated.

もちろん、聞き起こし端末６０は、送信用音声データ８１を分割せずに送信用音声データ８１全体の音声を連続して出力するようになっていても良い。 Of course, the awakening terminal 60 may be configured to continuously output the voice of the entire transmission voice data 81 without dividing the transmission voice data 81.

また、聞き起こしシステム１０は、出力用音声データ８２毎に文字データを編集するので、出力中の音声に対応する箇所の文字データを作業者に編集させることができる。したがって、聞き起こしシステム１０は、作業者による聞き起こし作業の効率を向上させることができ、元音声データ８０全体の聞き起こし作業に要する時間を短縮することができる。 In addition, since the awakening system 10 edits the character data for each output voice data 82, the operator can edit the character data at the location corresponding to the voice being output. Therefore, the rehearsal system 10 can improve the efficiency of the rehearsal work by the operator, and can reduce the time required for the rehearsal work of the entire original voice data 80.

もちろん、聞き起こし端末６０は、出力用音声データ８２毎に文字データを編集できるようになっていなくても良い。 Of course, the awakening terminal 60 may not be able to edit the character data for each output audio data 82.

録音装置２０は、元音声データ８０を管理サーバ４０に送信するときに、元音声データ８０を暗号化するようになっていても良い。管理サーバ４０は、送信用音声データ８１を聞き起こし端末６０に送信するときに、送信用音声データ８１を暗号化するようになっていても良い。聞き起こし端末６０は、作業者によって聞き起こされた文字データを管理サーバ４０に送信するときに、文字データを暗号化するようになっていても良い。聞き起こしシステム１０は、録音装置２０、管理サーバ４０及び聞き起こし端末６０の間で暗号化通信を行うようになっているとき、例えば機密情報や個人情報などの秘密情報が元音声データ８０に含まれる場合であっても、録音装置２０、管理サーバ４０及び聞き起こし端末６０の間で秘密を守りながら公衆ネットワーク１１を介して通信を行うことができる。 The recording device 20 may encrypt the original voice data 80 when transmitting the original voice data 80 to the management server 40. The management server 40 may encrypt the transmission voice data 81 when the transmission voice data 81 is heard and transmitted to the terminal 60. The audible terminal 60 may encrypt the character data when transmitting the character data evoked by the worker to the management server 40. When the listening system 10 performs encrypted communication among the recording device 20, the management server 40, and the listening terminal 60, confidential information such as confidential information and personal information is included in the original voice data 80, for example. Even in such a case, communication can be performed via the public network 11 while protecting the secret among the recording device 20, the management server 40, and the listening terminal 60.

また、元音声データ８０は、録音装置２０から公衆ネットワーク１１を介して管理サーバ４０に格納されるようになっているが、他の方法によって管理サーバ４０に格納されるようになっていても良い。例えば、元音声データ８０は、録音装置２０の管理者によって記録媒体に格納されて管理サーバ４０の管理者に郵送され、管理サーバ４０の管理者によって管理サーバ４０に格納させられるようになっていても良い。 Further, the original voice data 80 is stored in the management server 40 from the recording device 20 via the public network 11, but may be stored in the management server 40 by other methods. . For example, the original audio data 80 is stored in a recording medium by the administrator of the recording device 20 and mailed to the administrator of the management server 40, and is stored in the management server 40 by the administrator of the management server 40. Also good.

なお、聞き起こしシステム１０は、議事録の作成事業以外の様々な事業に適用されることもできる。例えば、聞き起こしシステム１０は、携帯電話に音声を入力して管理サーバ４０に元音声データ８０を生成し、生成した元音声データ８０を複数の聞き起こし端末６０によって分担して聞き起こし、聞き起こした文字データを管理サーバ４０から携帯電話に送信し、管理サーバ４０から携帯電話に送信された文字データを携帯電話によるメールの本文にするというような事業に適用されることができる。 Note that the awakening system 10 can also be applied to various businesses other than the minutes creation business. For example, the speech system 10 inputs voice to a mobile phone, generates original voice data 80 in the management server 40, shares the generated original voice data 80 with a plurality of voice terminals 60, wakes up, and wakes up. The character data transmitted from the management server 40 to the mobile phone and the character data transmitted from the management server 40 to the mobile phone can be used as a mail text by the mobile phone.

また、端末用プログラムやサーバ用プログラムは、記録媒体に格納されて流通されても良いし、ネットワークを介して流通されても良い。 Further, the terminal program and the server program may be stored and distributed in a recording medium, or may be distributed via a network.

本発明の一実施の形態に係る聞き起こしシステムの構成を示すブロック図The block diagram which shows the structure of the transcription system which concerns on one embodiment of this invention 図１に示す管理サーバの機能を示すブロック図The block diagram which shows the function of the management server shown in FIG. 図１に示す聞き起こしシステムにおいて扱われる元音声データを示す図であって、送信用音声データに分割された状態を示す図The figure which shows the original audio | voice data handled in the speech system shown in FIG. 1, Comprising: The figure which shows the state divided | segmented into the audio | voice data for transmission 図２に示す管理サーバに記憶される順番テーブルを示す図The figure which shows the order table memorize | stored in the management server shown in FIG. 図１に示す聞き起こし端末の機能を示すブロック図Block diagram showing the function of the listening terminal shown in FIG. 図３に示す送信用音声データを示す図であって、出力用音声データに分割された状態を示す図FIG. 4 is a diagram showing the audio data for transmission shown in FIG. 3, and shows a state where the audio data is divided into output audio data 図２に示す管理サーバの動作を示すフローチャートThe flowchart which shows operation | movement of the management server shown in FIG. 図５に示す聞き起こし端末の動作を示すフローチャートFlowchart showing the operation of the listening terminal shown in FIG.

Explanation of symbols

１０聞き起こしシステム
１１公衆ネットワーク
２０録音装置
４０管理サーバ（サーバ）
４１通信手段（振分送信手段）
４２記憶手段
４３元音声データ分割手段
４４送信用音声データ選択手段
６０聞き起こし端末（作業者用端末）
６１通信手段（文字データ送信手段）
６２記憶手段
６３送信用音声データ分割手段
６４文字データ生成手段
６５音声出力手段
６６文字データ編集手段
８０元音声データ
８１送信用音声データ
８２出力用音声データ
10 Transcription System 11 Public Network 20 Recording Device 40 Management Server (Server)
41 Communication means (distribution transmission means)
42 Storage means 43 Original voice data division means 44 Transmission voice data selection means 60 Talking terminal (worker terminal)
61 Communication means (character data transmission means)
62 Storage means 63 Transmission voice data division means 64 Character data generation means 65 Voice output means 66 Character data editing means 80 Original voice data 81 Transmission voice data 82 Output voice data

Claims

A server that communicates with a plurality of worker terminals used by a plurality of workers engaged in a speech task that generates character data based on voice data,
Original voice data dividing means for dividing original voice data, which is voice data that is a source of the awakening work, into a plurality of transmission voice data that is voice data for transmission to the worker terminal; and the original voice data A server comprising: a distribution transmission unit that distributes and transmits the plurality of transmission audio data generated by the division unit to the plurality of worker terminals.

2. The server according to claim 1, wherein the original voice data dividing unit generates the transmission voice data with a silent part included in the original voice data as a delimiter. 3.

2. The server according to claim 1, wherein the original voice data dividing unit generates the transmission voice data using a break portion of a sentence included in the original voice data as a delimiter.

2. The server according to claim 1, wherein the original voice data dividing unit generates the transmission voice data with a speaker switching portion included in the original voice data as a delimiter. 3.

2. The server according to claim 1, wherein the original voice data dividing unit generates the transmission voice data with a topic switching portion included in the original voice data as a delimiter. 3.

The server according to claim 3, wherein the distribution transmission unit selects the worker terminal that transmits the transmission voice data according to a speaker of the transmission voice data.

Character data generating means for generating the character data based on the voice data by voice recognition,
The server according to claim 1, wherein the distribution transmission unit transmits the character data corresponding to the transmission voice data to be transmitted to the worker terminal to the worker terminal.

The distribution transmission unit transmits the transmission voice data to the worker terminal that has made the request when the request for the transmission voice data is received from the worker terminal. Item 4. The server according to item 1.

Voice data selection means for transmission for selecting the voice data for transmission transmitted by the distribution transmission means,
The server according to claim 8, wherein the transmission voice data selection unit selects the transmission voice data based on a topic of the transmission voice data and the worker.

Voice data selection means for transmission for selecting the voice data for transmission transmitted by the distribution transmission means,
The transmission voice data selection means, when the worker terminal generates the character data based on the voice data by voice recognition, the topic of the voice data for transmission and the voice recognition used by the voice recognition 9. The server according to claim 8, wherein the transmission voice data is selected based on a topic targeted by a dictionary.

The server according to claim 1, wherein the distribution transmission unit encrypts and transmits the transmission voice data to the worker terminal.

A server program for operating a server that communicates with a plurality of worker terminals used by a plurality of workers engaged in a speech task that generates character data based on voice data,
Original voice data dividing means for dividing original voice data, which is voice data that is a source of the awakening work, into a plurality of transmission voice data that is voice data for transmission to the worker terminal; and the original voice data A server program that causes a server to function as a distribution transmission unit that distributes and transmits a plurality of transmission audio data generated by a division unit to the plurality of worker terminals.

A speech system comprising a plurality of worker terminals used by a plurality of workers engaged in a speech task that generates character data based on voice data, and a server that communicates with the plurality of worker terminals Because
The server includes original voice data dividing means for dividing original voice data, which is voice data that is a source of the listening work, into a plurality of transmission voice data that is voice data for transmission to the worker terminal; 2. A hearing system comprising: a distribution transmission unit that distributes and transmits a plurality of the transmission audio data generated by the original audio data division unit to the plurality of worker terminals.

The worker terminal transmits voice data to the server, voice output means for outputting voice based on the voice data, character data editing means for editing the character data in accordance with an instruction from the worker, and the server. The speech system according to claim 13, further comprising character data transmission means.

The distribution transmission means includes at least one type of delimiter information included in the original voice data, including a silent part, a sentence break part, a speaker switching part, and a topic switching part, together with the transmission voice data. To the device for
The worker terminal includes transmission voice data dividing means for dividing the transmission voice data into a plurality of output voice data which are voice data for outputting the voice by the voice output means based on the delimiter information. 15. The speech system according to claim 14, further comprising:

16. The system according to claim 15, wherein the character data editing unit edits the character data for each output voice data.

The speech system according to claim 14, wherein the worker terminal includes character data generation means for generating the character data based on the voice data by voice recognition.

The distribution transmission means encrypts and transmits the transmission voice data to the worker terminal,
15. The system according to claim 14, wherein the character data transmission means encrypts the character data and transmits it to the server.