JP5184071B2

JP5184071B2 - Transcription text creation support device, transcription text creation support program, and transcription text creation support method

Info

Publication number: JP5184071B2
Application number: JP2007337067A
Authority: JP
Inventors: 奇方趙
Original assignee: NTT Data Corp
Current assignee: NTT Data Corp
Priority date: 2007-12-27
Filing date: 2007-12-27
Publication date: 2013-04-17
Anticipated expiration: 2027-12-27
Also published as: JP2009157774A

Abstract

<P>PROBLEM TO BE SOLVED: To provide an apparatus and a program for supporting audio documentation, significantly reducing the risk of information leak. <P>SOLUTION: A voice division device 2 divides voice data in a voice data storage part 1 to relatively short voice sections. An identifier assignment device 3 performs voice recognition to each divided voice file and collates each file using a keyword list for confidential information. A documentation control device 4 requests documentation to staff with high reliability when the recognition result of the voice file contains a keyword, and requests documentation to general staff when the recognition result contains no keyword for confidential information. A text extraction device 5 extracts a documentation text from a documentation result and stores it in a text data storage part 6. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、書き起こしテキスト作成支援装置及び書き起こしテキスト作成支援プログラムに関する。 The present invention relates to a transcription text creation support apparatus and a transcription text creation support program.

従来、音声データをテキストデータに変換する、いわゆる書き起こし作業は、様々な分野で必要とされている。例えば、会議の録音から議事録を作成する、あるいはコールセンタにおけるユーザとオペレータとの対話の録音をテキストにし、さらに情報分析などに活用する。従来の書き起こし作業は、ほとんど人間の手作業に頼っていたが、近年、コンピュータによる音声認識技術の進歩によって一部では音声認識によるテキスト自動作成を導入するところもある。 Conventionally, so-called transcription work for converting voice data into text data is required in various fields. For example, the minutes are created from the recording of the meeting, or the recording of the dialogue between the user and the operator in the call center is made into text and used for information analysis. The conventional transcription work mostly relies on human manual work, but in recent years, with the advancement of speech recognition technology by computers, some automatic text creation by speech recognition has been introduced.

しかし、現状の音声認識技術のレベルから考えると、誤認識がまだ多いため、音声認識による精度の高い書き起こしがほぼ不可能である。したがって、音声データの書き起こしにおいて音声認識を利用する場合には、まず、コンピュータによる音声認識で音声データをテキストに変換し、そして、音声認識の結果に対して手作業による確認、修正を行うやり方が一般的である。
例えば、音声情報を認識してテキスト化し、当該テキスト化した情報を小さな単位に分割し、テキスト化した情報を分割した単位で画面に表示し、分割単位でテキスト化された音声を確認しながら、テキストを更新する技術が提案されている（例えば、特許文献１参照）。 However, considering the current level of speech recognition technology, there are still many misrecognitions, so it is almost impossible to transcribe with high accuracy by speech recognition. Therefore, when using speech recognition in the transcription of speech data, first, the speech data is converted into text by speech recognition by a computer, and then the result of speech recognition is checked and corrected manually. Is common.
For example, voice information is recognized and converted into text, the text information is divided into small units, the text information is displayed on the screen in divided units, and the text converted into text in divided units is checked. A technique for updating text has been proposed (see, for example, Patent Document 1).

音声認識導入の有無にかかわらず、音声データの書き起こしは膨大な手作業が発生するため、たくさんの作業スタッフが必要である。しかし、書き起こし作業を請け負う業者などにとって、受託できる業務の量が時期による変動が大きいため、コスト的には、全ての作業を正社員のような専属スタッフに頼ることができない。通常、稼働可能な多くのスタッフを予め登録しておいて、必要なときだけ、作業量に応じて登録したスタッフを動員して作業に当たらせる。
特開２００５−２２８１７８号公報 Regardless of whether or not speech recognition is introduced, transcription of speech data requires a large amount of manual work, and thus requires a lot of work staff. However, since the amount of work that can be entrusted varies greatly depending on the time for a contractor who undertakes transcription work, all work cannot be relied on by dedicated staff such as full-time employees. Usually, a large number of staff members who can be operated are registered in advance, and only when necessary, the registered staff members are mobilized according to the amount of work to perform work.
JP 2005-228178 A

ところで、上述した従来技術においては、膨大な量の音声データの書き起こしにおいて、コスト的には大量の非専属スタッフに頼らざるを得ない状況もあり得る。ここで、音声データの中に機密情報や、プライバシー情報が混ざっている場合には、スタッフの経歴や、能力、守秘意識において、ばらつきが存在するため、スタッフを経由した情報の漏洩リスクを考慮することが望ましい。 By the way, in the above-described prior art, there may be a situation in which a large amount of non-exclusive staff has to be relied upon in cost for transcription of a huge amount of audio data. Here, when confidential information and privacy information are mixed in the audio data, there is a variation in staff history, ability, and confidentiality, so consider the risk of information leakage via the staff. It is desirable.

従来の書き起こし技術においては、スタッフに誓約書を書かせたり、情報漏洩が発生した場合に損害賠償を求めたりするような措置で、情報漏洩リスクに対応してきた。しかし、このような対策では、モラルの高くない、あるいは悪意のあるスタッフによる情報漏洩には、対処できない可能性もある。 Conventional transcription technology has dealt with the risk of information leakage by taking measures such as letting the staff write a pledge or seeking damages in the event of information leakage. However, such measures may not be able to cope with information leakage by moral or malicious staff.

また、多様、かつ大量のスタッフが書き起こし作業にかかわっている現状においては、情報の秘密を守るため、個々のスタッフに対してそれぞれ違った対応を行う必要がある。さらにまた、音声データも機密情報の有無によって対処の仕方が変わる。すなわち、スタッフ及び音声データの状況に柔軟に対応できるようなシステムや、仕組みであることが望ましい。 In addition, in the present situation where diverse and large numbers of staff are involved in the transcription work, it is necessary to take different measures for each staff in order to protect the confidentiality of information. Furthermore, how to deal with audio data varies depending on the presence or absence of confidential information. In other words, it is desirable that the system and mechanism be capable of flexibly responding to the situation of staff and audio data.

しかしながら、従来技術では、音声データに含まれる機密情報に応じて、スタッフの守秘における信頼度に応じたきめ細かな作業配分を行うことができないという問題があった。 However, the conventional technique has a problem in that it cannot perform fine work distribution according to the reliability of staff confidentiality according to confidential information included in audio data.

本発明は、このような事情を考慮してなされたものであり、その目的は、情報漏洩のリスクを大幅に軽減することができる書き起こしテキスト作成支援装置及び書き起こしテキスト作成支援プログラムを提供することにある。 The present invention has been made in consideration of such circumstances, and an object thereof is to provide a transcription text creation support apparatus and a transcription text creation support program capable of greatly reducing the risk of information leakage. There is.

上述した課題を解決するために、本発明は、音声データからテキストデータを書き起こすための書き起こしテキスト作成支援装置であって、書き起こし作業者毎に、それぞれの書き起こし作業者の信頼度に応じた信頼度値を記憶する信頼度記憶手段と、音声データを所定の区間に分割して分割音声データを生成する分割音声データ生成手段と、前記分割音声データ生成手段により分割された分割音声データ毎に音声認識を行い、複数の一次テキストデータに変換する変換手段と、機密情報に関するキーワードを記憶するキーワード記憶手段と、前記変換手段により変換された一次テキストデータ毎に、前記キーワード記憶手段に記憶されているキーワードが含まれるか否かを判定する判定手段と、前記判定手段による判定結果に基づいて、前記信頼度記憶手段の信頼度値を参照し、分割音声データ生成手段により分割された分割音声データ毎に、書き起こし作業を依頼すべき書き起こし作業者を選定する選定手段とを具備することを特徴とする書き起こしテキスト作成支援装置である。 In order to solve the above-described problems, the present invention is a transcription text creation support apparatus for transcription of text data from speech data, and each transcription worker has a reliability of each transcription worker. Reliability storage means for storing the corresponding reliability value, divided sound data generation means for generating divided sound data by dividing the sound data into predetermined sections, and divided sound data divided by the divided sound data generation means A conversion means for performing speech recognition every time and converting it into a plurality of primary text data, a keyword storage means for storing a keyword relating to confidential information, and storing each primary text data converted by the conversion means in the keyword storage means Based on the determination result by the determination means and the determination result by the determination means A selection unit that refers to the reliability value of the reliability storage unit and selects a transcription worker to which a transcription operation should be requested for each divided voice data divided by the divided voice data generation unit. Is a transcription text creation support device.

本発明は、上記の発明において、前記選定手段により選定された書き起こし作業者の端末に、該当分割音声データを送信する送信手段と、前記書き起こし作業者による書き起こし作業結果のテキストデータを受信する受信手段と、前記受信手段により受信したテキストデータを統合し、元の音声データに対する書き起こしテキストデータとして保存する保存手段とを更に具備することを特徴とする。 According to the present invention, in the above invention, the transmission means for transmitting the corresponding divided voice data to the terminal of the transcription operator selected by the selection means, and the text data of the result of the transcription work by the transcription operator are received. And receiving means for integrating the text data received by the receiving means and storing the data as transcription text data for the original voice data.

本発明は、上記の発明において、前記選定手段は、前記信頼度記憶手段の信頼度値を参照し、前記判定手段により前記一次テキストに機密情報に関するキーワードが含まれていると判定された場合には、信頼度値が高い書き起こし作業者を選定し、前記一次テキストに機密情報に関するキーワードが含まれていないと判定された場合には、信頼度値が低い書き起こし作業者を選定することを特徴とする。 According to the present invention, in the above invention, when the selection unit refers to a reliability value of the reliability storage unit and the determination unit determines that the primary text includes a keyword related to confidential information. Select a transcription worker with a high reliability value, and if it is determined that the primary text does not contain a keyword related to confidential information, select a transcription worker with a low reliability value. Features.

また、上述した課題を解決するために、本発明は、音声データからテキストデータを書き起こすための書き起こし作業を支援する書き起こしテキスト作成支援プログラムであって、コンピュータに、音声データを所定の区間に分割して分割音声データを生成するステップと、分割された分割音声データ毎に音声認識を行い、複数の一次テキストデータに変換するステップと、変換された一次テキストデータ毎に、機密情報に関するキーワードが含まれるか否かを判定するステップと、前記一次テキストに機密情報に関するキーワードが含まれていると判定された場合には、信頼度値が高い書き起こし作業者を選定し、前記一次テキストに機密情報に関するキーワードが含まれていないと判定された場合には、信頼度値が低い書き起こし作業者を選定するステップとを実行させることを特徴とする書き起こしテキスト作成支援プログラムである。
また、上述した課題を解決するために、本発明は、コンピュータで実行される、音声データからテキストデータを書き起こすための書き起こし作業を支援する書き起こしテキスト作成支援方法であって、分割音声データ生成手段が、音声データを所定の区間に分割して分割音声データを生成するステップと、変換手段が、分割された分割音声データ毎に音声認識を行い、複数の一次テキストデータに変換するステップと、判定手段が、変換された一次テキストデータ毎に、機密情報に関するキーワードが含まれるか否かを判定するステップと、選定手段が、前記一次テキストに機密情報に関するキーワードが含まれていると判定された場合には、信頼度値が高い書き起こし作業者を選定し、前記一次テキストに機密情報に関するキーワードが含まれていないと判定された場合には、信頼度値が低い書き起こし作業者を選定するステップとを含むことを特徴とする書き起こしテキスト作成支援方法である。 In order to solve the above-described problem, the present invention is a transcription text creation support program for supporting a transcription work for transcribing text data from voice data. Generating divided audio data by dividing the data into a plurality of steps, a step of performing speech recognition for each of the divided divided audio data and converting it into a plurality of primary text data, and a keyword related to confidential information for each converted primary text data And when it is determined that a keyword related to confidential information is included in the primary text, a transcription operator having a high reliability value is selected, and the primary text is selected. If it is determined that keywords related to confidential information are not included, the transcription worker with a low reliability value A text creation support program transcription, characterized in that and a step of selecting.
In order to solve the above-described problem, the present invention provides a transcription text creation support method for supporting a transcription work for transcription of text data from voice data , which is executed by a computer, and which is divided voice data. A generating unit that divides the audio data into predetermined sections to generate divided audio data; and a conversion unit that performs speech recognition for each of the divided divided audio data and converts the audio data into a plurality of primary text data. determination means, for each transformed primary text data, and determining whether the keyword is included about confidential information, the selection means is determined to contain a keyword relating to confidential information to the primary text If a transcriber with a high reliability value is selected, a keyword related to confidential information is selected in the primary text. If it is determined that no rare, is transcribed text creation support method characterized by comprising the steps of confidence value is selected lower transcription worker.

この発明によれば、音声データを所定の区間に分割して分割音声データを生成し、分割された分割音声データ毎に音声認識を行い、一次テキストデータに変換し、変換された一次テキストデータ毎に、機密情報に関するキーワードが含まれるか否かを判定し、判定結果に基づいて、分割音声データ生成手段により分割された分割音声データ毎に、書き起こし作業を依頼すべき書き起こし作業者を選定する。したがって、音声データを分割し、機密情報を含めた部分の書き起こし作業を信頼度の高いスタッフに依頼することで、情報漏洩のリスクを大幅に軽減することができるという利点が得られる。 According to the present invention, voice data is divided into predetermined sections to generate divided voice data, voice recognition is performed for each divided voice data, converted into primary text data, and each converted primary text data is converted. Whether or not a keyword related to confidential information is included, and based on the determination result, for each divided voice data divided by the divided voice data generation means, a transcription worker to be requested for the transcription work is selected. To do. Therefore, it is possible to obtain an advantage that the risk of information leakage can be greatly reduced by dividing voice data and requesting a staff member with high reliability to transcribe a portion including confidential information.

また、本発明によれば、選定された書き起こし作業者の端末に、該当分割音声データを送信し、書き起こし作業者による書き起こし作業結果のテキストデータを受信し、受信したテキストデータを統合し、元の音声データに対する書き起こしテキストデータとして保存する。したがって、音声データを分割して、異なるスタッフに依頼することにより、情報漏洩のリスクをさらに軽減することができるとともに、最終的に元の音声データに対する書き起こしテキストデータが得られるという利点が得られる。 Further, according to the present invention, the divided voice data is transmitted to the selected transcription worker's terminal, the text data of the transcription work result by the transcription worker is received, and the received text data is integrated. , And save it as a transcript text data for the original audio data. Therefore, by dividing the voice data and requesting it from different staff, the risk of information leakage can be further reduced, and there is an advantage that the transcript text data for the original voice data can be finally obtained. .

また、本発明によれば、信頼度記憶手段の信頼度値を参照し、前記判定手段により前記一次テキストに機密情報に関するキーワードが含まれていると判定された場合には、信頼度値が高い書き起こし作業者を選定し、前記一次テキストに機密情報に関するキーワードが含まれていないと判定された場合には、信頼度値が低い書き起こし作業者を選定する。したがって、音声データを分割し、機密情報を含めた部分の書き起こし作業を信頼度の高いスタッフに依頼することで、情報漏洩のリスクを大幅に軽減することができるという利点が得られる。 According to the present invention, the reliability value is high when the reliability value of the reliability storage unit is referred to and the determination unit determines that the primary text includes a keyword related to confidential information. A transcription worker is selected, and if it is determined that the primary text does not contain a keyword related to confidential information, a transcription worker having a low reliability value is selected. Therefore, it is possible to obtain an advantage that the risk of information leakage can be greatly reduced by dividing voice data and requesting a staff member with high reliability to transcribe a portion including confidential information.

以下、本発明の一実施形態を、図面を参照して説明する。
図１は、本発明の実施形態による書き起こしテキスト作成支援装置の構成を示すブロック図である。図において、音声データ記憶部１は、書き起こしの対象である音声データを記憶する。音声分割装置２は、音声データ記憶部１から音声データを読み出し、発話中の無音区間を利用して、音声データを比較的に短い音声区間（例えば、文単位）に分割し、音声区間に分割した音声データを、それぞれ個別のファイルとして保存する。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
FIG. 1 is a block diagram showing a configuration of a transcription text creation support apparatus according to an embodiment of the present invention. In the figure, a voice data storage unit 1 stores voice data to be transcribed. The voice dividing device 2 reads the voice data from the voice data storage unit 1, divides the voice data into relatively short voice sections (for example, sentence units) using the silent section during speech, and divides the voice data into voice sections. Save the audio data as individual files.

また、音声分割装置（分割音声データ生成手段）２は、各音声区間の音声ファイル（以下、分割音声ファイルという）の管理を行う管理テーブルを作成し、各分割音声ファイルのファイル名を生成して上記分割音声ファイルと対応付けて管理テーブルに保存する。なお、管理テーブルの詳細については後述する。 The audio dividing device (divided audio data generating means) 2 creates a management table for managing audio files (hereinafter referred to as divided audio files) in each audio section, and generates file names of the divided audio files. The file is stored in the management table in association with the divided audio file. Details of the management table will be described later.

次に、識別子付与装置（変換手段）３は、それぞれの分割音声ファイルに対して音声認識を行い、認識結果のテキストを管理テーブルに保存する。また、識別子付与装置３は、認識結果に対して、予め設定したキーワードリスト（機密情報に対応）を用いて照合を行い、キーワードを１つでも含む分割音声ファイルに対して、機密情報が含まれることを示す識別子を付与する。 Next, the identifier assigning device (conversion unit) 3 performs speech recognition on each divided speech file and stores the recognition result text in the management table. Further, the identifier assigning device 3 collates the recognition result using a preset keyword list (corresponding to confidential information), and the confidential information is included in the divided audio file including at least one keyword. An identifier indicating that is assigned.

書き起こし制御装置４は、各分割音声ファイルを、ネットワーク７を介して、クライアント端末（書き起こし作業を依頼する作業スタッフのＰＣなど）８−１〜８−ｎに割り振って送信し、書き起こし作業を実施させるとともに、クライアント端末８−１〜８−ｎから作業結果を受信して保存する。 The transcription control device 4 allocates each divided audio file to the client terminals (such as PCs of work staffs who request the transcription work) 8-1 to 8-n via the network 7 and transmits them to the transcription work. And the work results are received and stored from the client terminals 8-1 to 8-n.

作業スタッフについては、事前に機密保持における信頼度を評価し、信頼度の高いスタッフと一般スタッフとに分類する。クライアント端末（作業スタッフ）８−１〜８−ｎへの割り振りにおいては、機密情報の識別子が付与された分割音声ファイルの作業を信頼度の高いスタッフのみに依頼し、ほかの分割音声ファイルは、信頼度の高いスタッフと一般スタッフのどちらに依頼してもよい。 Regarding the working staff, the reliability in confidentiality is evaluated in advance, and classified into highly reliable staff and general staff. In the allocation to the client terminals (work staff) 8-1 to 8-n, only the staff with high reliability is requested to work on the divided audio file to which the identifier of the confidential information is assigned. You can ask either highly reliable staff or general staff.

次に、テキスト抽出装置５は、クライアント端末から受信した各分割音声ファイルの作業結果からテキストを抽出し、連結して、分割前の元の音声データに対応するテキストデータを生成する。テキストデータ記憶部（保存手段）６は、ハードディスクなどの記憶装置からなり、分割前の元の音声データに対応する連結した書き起こしテキストデータを記憶する。ネットワーク７は、インターネット、ＬＡＮなどからなる。クライアント端末８−１〜８−ｎは、書き起こし作業を行う作業スタッフの端末（ＰＣなど）である。 Next, the text extraction device 5 extracts text from the work result of each divided audio file received from the client terminal, connects it, and generates text data corresponding to the original audio data before division. The text data storage unit (storing means) 6 includes a storage device such as a hard disk, and stores linked transcription text data corresponding to the original voice data before division. The network 7 includes the Internet, a LAN, and the like. The client terminals 8-1 to 8-n are work staff terminals (PCs, etc.) that perform the transcription work.

次に、図２は、本実施形態による、音声分割装置２の構成を示すブロック図である。図において、管理テーブル生成部２−１は、作業中の情報を一元に記憶、管理する管理テーブルを生成する。無音区間閾値計算部２−２は、音声データのパワーの値（大きさ）を用いて無音区間と音声区間とを識別するため、まず、閾値を計算する。音声データの一番先端には、通常、数十ミリセカンドから数百ミリセカンドまでの無音区間があるので、この先端の無音区間を利用して閾値を計算する。 Next, FIG. 2 is a block diagram showing a configuration of the audio dividing device 2 according to the present embodiment. In the figure, a management table generation unit 2-1 generates a management table for storing and managing information in operation in a unified manner. The silent section threshold value calculation unit 2-2 first calculates a threshold value in order to identify the silent section and the voice section using the power value (magnitude) of the voice data. Since there is usually a silent section from several tens of milliseconds to several hundred milliseconds at the very front end of the audio data, the threshold is calculated using the silent section at the front end.

音声分割部２−３は、音声データのパワーを最初（始端）からフレーム単位で順次計算し、閾値よりも小さい区間が、ある程度、例えば、３００ミリセカンド続いたら、この区間を無音区間とする。なお、デジタル信号処理では、音声波形が通常ブロック単位で逐次処理され、このブロックのことをフレームという。このフレームの長さは、２０〜４０ミリセカンドである場合が多い。 The voice dividing unit 2-3 sequentially calculates the power of the voice data in units of frames from the beginning (starting end), and if a section smaller than the threshold continues for a certain amount, for example, 300 milliseconds, this section is set as a silent section. In digital signal processing, a speech waveform is normally sequentially processed in units of blocks, and this block is called a frame. The length of this frame is often 20 to 40 milliseconds.

音声分割部２−３は、始端から無音区間の中間点までの音声データを切り出して分割音声ファイルとして分割音声ファイル記憶部２−５に保存する。このとき、音声分割部２−３は、その分割音声データのファイル名を管理テーブルに書き込み、同時に連番となる一意的なＩｎｄｅｘ番号を付与し、「機密度フラグ」、「スタッフ番号」、「作業状況フラグ」の３つの項目を初期化する。音声分割部２−３は、一回目の切り出しが終わったら、残った音声データに対して同じ作業を繰り返して行い、これによって全入力音声データを比較的に短い音声区間の分割音声データに分割する。分割音声データのファイル名及びＩｎｄｅｘは、管理テーブルに保存される。管理テーブル記憶部２−４は、分割音声ファイルに関する各種情報を保持する、後述する管理テーブルを記憶する。また、分割音声ファイル記憶部２−５は、固有のファイル名が付与された分割音声ファイルを記憶する。 The audio dividing unit 2-3 cuts out audio data from the beginning to the midpoint of the silent section, and saves it as a divided audio file in the divided audio file storage unit 2-5. At this time, the voice dividing unit 2-3 writes the file name of the divided voice data in the management table, and simultaneously assigns a unique index number that is a serial number, and sets the “confidentiality flag”, “staff number”, “ Three items of “work status flag” are initialized. When the first segmentation is completed, the voice dividing unit 2-3 repeats the same operation for the remaining voice data, and thereby divides all input voice data into divided voice data of a relatively short voice section. . The file name and index of the divided audio data are stored in the management table. The management table storage unit 2-4 stores a management table (to be described later) that holds various types of information related to the divided audio file. The divided audio file storage unit 2-5 stores the divided audio file to which a unique file name is assigned.

図３は、本実施形態による、管理テーブル２−４−１のデータ構成を示す概念図である。管理テーブル２−４−１は、「文書番号」「Ｉｎｄｅｘ」、「ファイル名」、「認識結果」、「書き起こし」、「機密度フラグ」、「スタッフ番号」、「作業状況フラグ」の項目の情報が記憶される。「文書番号」は、書き起こし作業を行う所定の単位の音声ファイル（分割前）に付与される一意的な番号である。「Ｉｎｄｅｘ」は、連番となる一意的な番号である。「ファイル名」は、分割音声ファイルの名前である。「認識結果」は、各分割音声ファイルに対して行う音声認識の結果（テキスト）である。「書き起こし」は、作業スタッフがクライアント端末を操作して行う書き起こし作業によって作成される該当分割音声ファイルのテキストである。 FIG. 3 is a conceptual diagram showing the data configuration of the management table 2-4-1 according to the present embodiment. The management table 2-4-1 includes items of “document number”, “index”, “file name”, “recognition result”, “transcription”, “confidentiality flag”, “staff number”, and “work status flag”. Is stored. The “document number” is a unique number assigned to an audio file (before division) of a predetermined unit that performs a transcription work. “Index” is a unique number that is a serial number. “File name” is the name of the divided audio file. The “recognition result” is a result (text) of speech recognition performed on each divided speech file. “Transcription” is the text of the corresponding divided audio file created by the transcription work performed by the operation staff operating the client terminal.

「機密度フラグ」は、該当分割音声ファイルに機密情報が含まれるかどうかを示すフラグであり、初期値（機密なし）は、「０」で、機密情報があると判断されたら「１」に書き換える。「スタッフ番号」は、該当分割音声ファイルの書き起こし作業を担当するスタッフを識別するための番号である。初期値は、「０」で、当該分割音声ファイルを担当するスタッフが決まった時点でスタッフ番号が書き込まれる。「作業状況フラグ」は、該当分割音声ファイルの全作業が終わっているかどうかを示すフラグであり、初期値（終わっていない）は「０」で、全部終わった場合には「１」に書き換えられる。 The “confidentiality flag” is a flag indicating whether or not confidential information is included in the corresponding divided audio file. The initial value (non-confidential) is “0”, and is set to “1” when it is determined that there is confidential information. rewrite. The “staff number” is a number for identifying a staff member who is responsible for the transcription work of the divided audio file. The initial value is “0”, and the staff number is written when the staff in charge of the divided audio file is determined. The “work status flag” is a flag indicating whether or not all work of the corresponding divided audio file has been completed. The initial value (not finished) is “0”, and is rewritten to “1” when all work is completed. .

次に、図４は、本実施形態による識別子付与装置３の構成を示すブロック図である。図において、音声認識部３−１は、分割音声ファイル記憶部２−４の分割音声ファイルに対して、大語彙自由発話対応の音声認識エンジンを用いて音声認識を行い、各分割音声ファイルの認識結果であるテキストを、管理テーブル記憶部２−４の管理テーブル２−４−１の該当分割音声ファイルの「認識結果」に保存する。 Next, FIG. 4 is a block diagram showing a configuration of the identifier assigning device 3 according to the present embodiment. In the figure, a voice recognition unit 3-1 performs voice recognition on a divided voice file in the divided voice file storage unit 2-4 using a voice recognition engine that supports large vocabulary free speech, and recognizes each divided voice file. The text as a result is stored in the “recognition result” of the corresponding divided audio file in the management table 2-4-1 of the management table storage unit 2-4.

キーワード記憶部（キーワード記憶手段）３−２は、機密情報となるキーワードを保存したキーワードテーブル（後述）を記憶する。キーワード照合部（判定手段）３−３は、管理テーブル記憶部２−４の管理テーブル２−４−１の最初のレコードの認識結果に対して、キーワード記憶部３−２のキーワードを１つずつ用いて逐次照合し、照合が成立した時点で該当分割音声ファイルに対する照合を終了し、管理テーブル２−４−１の「機密度フラグ」を「０」から「１」に書き換える。このとき、全てのキーワードで照合し終わっても照合が成立しなかったら、そのまま該当分割音声ファイルに対する照合を終了させる。また、キーワード照合部３−３は、最初の分割音声ファイルに対する照合が終わったら、同じ手順で順次残った全ての分割音声ファイルに対して照合を行う。 The keyword storage unit (keyword storage unit) 3-2 stores a keyword table (described later) in which keywords that are confidential information are stored. The keyword collation unit (determination unit) 3-3 assigns one keyword from the keyword storage unit 3-2 to the recognition result of the first record in the management table 2-4-1 of the management table storage unit 2-4. When the collation is established, the collation for the corresponding divided audio file is terminated, and the “confidentiality flag” in the management table 2-4-1 is rewritten from “0” to “1”. At this time, if collation is not established even after collation with all keywords, collation with respect to the divided audio file is terminated as it is. In addition, after the collation with respect to the first divided voice file is completed, the keyword collation unit 3-3 performs collation with respect to all the divided voice files remaining sequentially in the same procedure.

次に、図５は、本実施形態による、キーワードテーブル３−２−１の一例を示す概念図である。キーワードテーブル３−２−１は、各キーワードを識別するための「Ｉｎｄｅｘ」と「キーワード」とからなる。該キーワードのリストは、書き起こしの依頼主であるユーザが提供するか、依頼主であるユーザの指示に従って作成する。「キーワード」には、通常、個人情報、企業機密情報などが含まれる。図５に示す例では、「キーワード」として、個人名、年齢、性別、住所などが登録されている。 Next, FIG. 5 is a conceptual diagram showing an example of the keyword table 3-2-1 according to the present embodiment. The keyword table 3-2-1 includes “Index” and “Keyword” for identifying each keyword. The keyword list is provided by the user who is the transcription requester, or is created according to the instruction of the user who is the requester. The “keyword” usually includes personal information, confidential business information, and the like. In the example shown in FIG. 5, an individual name, age, sex, address, and the like are registered as “keyword”.

次に、図６は、本実施形態による書き起こし制御装置４の構成を示すブロック図である。図において、スタッフ情報取得部４−１は、クライアント端末８−１〜８−ｎから作業スタッフの作業状況（ビジー／待機）とスタッフ番号とを取得し、作業状況を作業スタッフ情報記憶部４−２の作業スタッフ情報テーブル（後述）に保存する。また、スタッフ情報取得部４−１は、一定の時間間隔（例えば、３０分）で、この情報を更新する。 Next, FIG. 6 is a block diagram showing the configuration of the transcription control device 4 according to the present embodiment. In the figure, the staff information acquisition unit 4-1 acquires the work status (busy / standby) and the staff number of the work staff from the client terminals 8-1 to 8-n, and stores the work status as the work staff information storage unit 4- 2 is stored in the work staff information table (described later). Further, the staff information acquisition unit 4-1 updates this information at regular time intervals (for example, 30 minutes).

ここで、図７は、本実施形態による、作業スタッフ情報テーブル４−２−１のデータ構成例を示す概念図である。作業スタッフ情報テーブル（信頼度記憶手段）４−２−１において、「スタッフ番号」は、作業スタッフを識別するための番号であり、「０」以外の数字を用いる。「氏名」は、作業スタッフの名前である。「信頼度」は、「一般」と「高」との２種類に分けられ、作業スタッフの経歴や、実績に基づいて予め決められる。なお、「信頼度」は、通常、「一般」と「高」との２種類に分けられるが、必要に応じて３種類以上に分けてもよい。「作業状況」は、作業スタッフの現況（ビジー／待機）を示し、クライアントから取得される。「連絡先」は、作業スタッフの連絡先であり、例えば、メールアドレスなどが記憶されている。 Here, FIG. 7 is a conceptual diagram showing a data configuration example of the work staff information table 4-2-1 according to the present embodiment. In the work staff information table (reliability storage means) 4-2-1, “staff number” is a number for identifying the work staff, and a number other than “0” is used. “Name” is the name of the work staff. The “reliability” is classified into two types, “general” and “high”, and is determined in advance based on the work staff's career and results. The “reliability” is usually divided into two types, “general” and “high”, but may be divided into three or more types as necessary. The “work status” indicates the current status (busy / standby) of the work staff and is acquired from the client. “Contact” is the contact information of the work staff, and stores, for example, an e-mail address.

次に、図６の説明に戻ると、管理テーブル情報取得部４−３は、管理テーブル記憶部２−４の管理テーブル２−４−１の中で「スタッフ番号」が「０」になっている分割音声ファイル（作業依頼していない）の中から、「Ｉｎｄｅｘ」が最も小さい分割音声ファイルを送信対象として選定する。そして、クライアント端末８−１〜８−ｎのいずれかに送信される情報を、管理テーブル記憶部２−４の管理テーブル２−４−１及び分割音声ファイル記憶部２−５から抽出する。最低限でも、「Ｉｎｄｅｘ」、「ファイル名」、及び該「ファイル名」に対応した分割音声ファイルがクライアント端末８−１〜８−ｎのいずれかに送信されるが、「認識結果」も送信すれば、認識結果を確認修正することで、書き起こしテキストを容易に作成することができる。すなわち、スタッフの作業がより効率よく進められるので、通常、「認識結果」に保存されている認識結果のテキストも送信する。 Next, returning to the description of FIG. 6, the management table information acquisition unit 4-3 becomes “0” in the “staff number” in the management table 2-4-1 of the management table storage unit 2-4. The divided audio file having the smallest “Index” is selected as the transmission target from the divided audio files (not requested for work). Then, information transmitted to any one of the client terminals 8-1 to 8-n is extracted from the management table 2-4-1 and the divided audio file storage unit 2-5 of the management table storage unit 2-4. At a minimum, “Index”, “file name”, and the divided audio file corresponding to the “file name” are transmitted to any of the client terminals 8-1 to 8-n, but the “recognition result” is also transmitted. By doing so, it is possible to easily create a transcription text by confirming and correcting the recognition result. That is, since the work of the staff can be carried out more efficiently, the text of the recognition result stored in the “recognition result” is usually transmitted.

送信部（選定手段、送信手段）４−４は、管理テーブル記憶部２−４の管理テーブル２−４−１から、選定した分割音声ファイルの「機密度フラグ」を参照し、送信先の作業スタッフを決め、管理テーブル情報取得部４−３で抽出した送信情報（「Ｉｎｄｅｘ」、「ファイル名」、分割音声ファイル、「認識結果」）をクライアント端末８−１〜８−ｎのいずれかに送信する。具体的には、作業スタッフ情報テーブル４−２−１の「信頼度」と「作業状況」とを参照し、「機密度フラグ」が「０」の場合には、「信頼度」が「一般」で、「作業状況」が「待機」のスタッフを優先的に選択して、該選択した待機スタッフの「スタッフ番号」を管理テーブル２−４−１の「スタッフ番号」に保存し、上記送信情報を該「スタッフ番号」のスタッフの「連絡先」メールアドレスへ例えば電子メールで送信する。 The transmission unit (selection unit, transmission unit) 4-4 refers to the “confidentiality flag” of the selected divided audio file from the management table 2-4-1 of the management table storage unit 2-4, and performs the work of the transmission destination. The transmission information (“Index”, “file name”, divided audio file, “recognition result”) extracted by the management table information acquisition unit 4-3 is assigned to any of the client terminals 8-1 to 8-n. Send. Specifically, the “reliability” and “work status” in the work staff information table 4-2-1 are referred to. When the “confidentiality flag” is “0”, the “reliability” is “general”. ”, The staff whose“ working status ”is“ standby ”is preferentially selected, the“ staff number ”of the selected waiting staff is stored in the“ staff number ”of the management table 2-4-1, and the above transmission is performed. The information is transmitted to the “contact” mail address of the staff of the “staff number” by e-mail, for example.

一方、「機密度フラグ」が「１」の場合には、「信頼度」が「高」の待機スタッフのみを対象として選択し、該待機スタッフの「スタッフ番号」を管理テーブル２−４−１の「スタッフ番号」に保存して送信を行う。また、「信頼度」が「高」の待機スタッフがいない場合には、送信を一旦休止し、「信頼度」が「高」の待機スタッフが現れた時点で送信を再開する。上記管理テーブル情報取得部４−３及び送信部４−４は、管理テーブル２−４−１の「スタッフ番号」の内容が全て埋まるまで行われる。 On the other hand, when the “confidentiality flag” is “1”, only the waiting staff whose “reliability” is “high” is selected, and the “staff number” of the waiting staff is selected as the management table 2-4-1. Save and send to “Staff Number”. If there is no standby staff with “high” “reliability”, transmission is paused, and transmission is resumed when a standby staff with “high” reliability appears. The management table information acquisition unit 4-3 and the transmission unit 4-4 are performed until the contents of the “staff number” in the management table 2-4-1 are all filled.

受信部（受信手段）４−５は、書き起こし作業が終わったクライアント端末８−１〜８−ｎのいずれかから「Ｉｎｄｅｘ」と書き起こしテキストとを受信し、該「Ｉｎｄｅｘ」を参照して書き起こしテキストを管理テーブル２−４−１の「書き起こし」に保存する。このとき、管理テーブル２−４−１の「作業状況フラグ」を「０」から「１」に書き換える。 The receiving unit (receiving means) 4-5 receives “Index” and the transcribed text from any of the client terminals 8-1 to 8-n that have finished the transcription work, and refers to the “Index”. The transcription text is stored in “transcription” of the management table 2-4-1. At this time, the “work status flag” in the management table 2-4-1 is rewritten from “0” to “1”.

次に、図８（ａ）〜（ｄ）は、本実施形態による、管理テーブル２−４−１の状態遷移を説明するための概念図である。まず、図８（ａ）には、音声分割が終わった時点での管理テーブル２−４−１の状態を示している。この時点では、分割音声ファイル名が保存され、「機密度フラグ」、「スタッフ番号」、及び「作業状況フラグ」は、「０」にリセットされている。次に、図８（ｂ）には、音声ファイルの機密度に応じて、識別子が付与された時点での管理テーブル２−４−１の状態を示している。この場合、「Ｉｎｄｅｘ」が「２」の、「ファイル名」が「ｓｐ２．ｗａｖ」の分割音声ファイルの機密度フラグが「１」となる。 Next, FIGS. 8A to 8D are conceptual diagrams for explaining the state transition of the management table 2-4-1 according to the present embodiment. First, FIG. 8A shows the state of the management table 2-4-1 at the time when the audio division is finished. At this time, the divided audio file names are stored, and the “confidentiality flag”, “staff number”, and “work status flag” are reset to “0”. Next, FIG. 8B shows the state of the management table 2-4-1 at the time when an identifier is assigned according to the sensitivity of the audio file. In this case, the confidentiality flag of the divided audio file whose “Index” is “2” and whose “file name” is “sp2.wav” is “1”.

また、図８（ｃ）には、分割音声ファイル等をクライアント端末８−１〜８−ｎのいずれかに送信した後の管理テーブル２−４−１の状態を示している。この時点では、各分割音声ファイルに対して、書き起こしを行う作業スタッフが決定されているので、管理テーブル２−４−１の各分割音声ファイルに対して、書き起こしを行う作業スタッフを示す「スタッフ番号」が保存されている。また、図８（ｄ）には、クライアント端末８−１〜８−ｎのいずれかから受信した後の管理テーブル２−４−１の状態を示している。この場合、各分割音声ファイルに対して、作業の状況を示す「作業状況フラグ（「１」は作業終了）」と、書き起こし作業によって作成される該当分割音声ファイルのテキストである「書き起こし」とが保存されている。 FIG. 8C shows the state of the management table 2-4-1 after the divided audio file or the like has been transmitted to any of the client terminals 8-1 to 8-n. At this time, since the work staff who performs the transcription is determined for each divided audio file, the work staff who performs the transcription for each divided audio file in the management table 2-4-1 is indicated. "Staff number" is stored. FIG. 8D shows the state of the management table 2-4-1 after being received from any of the client terminals 8-1 to 8-n. In this case, for each divided audio file, a “work status flag (“ 1 ”is work completed)” indicating the status of the work and “transcription” which is the text of the corresponding divided audio file created by the transcription work. And are saved.

次に、図９は、本実施形態によるテキスト抽出装置５の構成を示すブロック図である。図において、管理テーブル情報取得部５−１は、管理テーブル記憶部２−４の管理テーブル２−４−１の「作業状況フラグ」を取得し、作業の進捗を確認する。管理テーブル２−４−１の「作業状況フラグ」が全て「１」になるまで、この確認処理を繰り返し、作業状況をモニタする。全ての「作業状況フラグ」が「１」になると、管理テーブル２−４−１の内容を読み込む。 Next, FIG. 9 is a block diagram showing a configuration of the text extraction device 5 according to the present embodiment. In the figure, the management table information acquisition unit 5-1 acquires the “work status flag” of the management table 2-4-1 of the management table storage unit 2-4, and confirms the progress of the work. This confirmation process is repeated until the “work status flag” in the management table 2-4-1 becomes “1”, and the work status is monitored. When all “work status flags” become “1”, the contents of the management table 2-4-1 are read.

書き起こし抽出部５−２は、管理テーブル情報取得部５−１により読み込まれた管理テーブル２−４−１の内容から、「Ｉｎｄｅｘ」が「１」の分割音声ファイルから順番に「書き起こし」項目のテキストを抽出し、最後のレコードまで続けて連結し、分割前の元の音声ファイルに対するテキストデータを生成してテキストデータ記憶部６に保存する。 The transcription extracting unit 5-2 “transcribes” sequentially from the divided audio file whose “Index” is “1” based on the contents of the management table 2-4-1 read by the management table information acquisition unit 5-1. The text of the item is extracted and concatenated to the last record, text data for the original audio file before division is generated and stored in the text data storage unit 6.

次に、図１０は、本実施形態による、クライアント端末８−１〜８−ｎの構成を示すブロック図である。図において、受信部９−１は、書き起こし制御装置４から、「Ｉｎｄｅｘ」、「ファイル名」、分割音声ファイル、及び「認識結果」を受信する。情報表示部９−２は、受信した情報が閲覧できるように端末画面に表示する。音声ファイル再生部９−３は、受信した分割音声ファイルをスピーカまたはヘッドセットで再生する。作業スタッフは、分割音声を聞きながら、「認識結果」を確認する。 Next, FIG. 10 is a block diagram showing the configuration of the client terminals 8-1 to 8-n according to the present embodiment. In the figure, the receiving unit 9-1 receives “Index”, “file name”, divided audio file, and “recognition result” from the transcription control device 4. The information display unit 9-2 displays the received information on the terminal screen so that the received information can be browsed. The audio file reproducing unit 9-3 reproduces the received divided audio file with a speaker or a headset. The work staff confirms the “recognition result” while listening to the divided speech.

書き起こし入力部９−４は、作業スタッフから「認識結果」に誤りがないとの指示を受けると、それをそのままコピーして書き起こしのテキストとする。一方、作業スタッフから「認識結果」に誤りがあるとの指示を受けると、作業スタッフが入力するテキストで修正して正しい内容を書き起こしのテキストとする。受信した全ての音声ファイルに対して順次行う。送信部９−５は、書き起こし作業が完了した分割音声ファイルについて、「Ｉｎｄｅｘ」と書き起こしテキストとを随時書き起こし制御装置４に例えば電子メールで送信する。 When the transcription input unit 9-4 receives an instruction from the work staff that there is no error in the “recognition result”, the transcription input unit 9-4 copies it as it is as a transcription text. On the other hand, when an instruction that the “recognition result” is incorrect is received from the work staff, the text is corrected by the text input by the work staff and the correct content is set as a transcription text. Repeat for all received audio files. The transmission unit 9-5 transcribes “Index” and the transcription text at any time for the divided audio file for which the transcription work has been completed, and transmits it to the control device 4 by e-mail, for example.

次に、上述した実施形態の動作について説明する。
図１１は、本実施形態による書き起こしテキスト作成支援装置の動作を説明するためのフローチャートである。まず、音声分割装置２は、音声データ記憶部１から音声データを読み出し、発話中の無音区間を利用して、音声データを比較的に短い音声区間（例えば、文単位）に分割する（ステップＳ１）。次に、識別子付与装置３は、それぞれの分割音声ファイルに対して音声認識を行い（ステップＳ２）、認識結果に対して、キーワード記憶部３−２のキーワードリストを用いて照合を実行する（ステップＳ３）。 Next, the operation of the above-described embodiment will be described.
FIG. 11 is a flowchart for explaining the operation of the transcription text creation support apparatus according to the present embodiment. First, the voice dividing device 2 reads the voice data from the voice data storage unit 1 and divides the voice data into relatively short voice sections (for example, sentence units) using a silent section during speech (step S1). ). Next, the identifier assigning device 3 performs voice recognition on each divided voice file (step S2), and executes matching on the recognition result using the keyword list in the keyword storage unit 3-2 (step S2). S3).

次に、書き起こし制御装置４は、分割音声ファイルの認識結果にキーワードが含まれるか否かを判定し（ステップＳ４）、機密情報に関するキーワードが含まれる場合には、信頼度の高いスタッフに書き起こしを依頼するべく、信頼度の高いスタッフのクライアント端末に、該当分割音声ファイルを送信し（ステップＳ５）、該スタッフによる書き起こし作業の実行を指示する（ステップＳ６）。一方、分割音声ファイルの認識結果に、機密情報に関するキーワードが含まれない場合には、一般スタッフに書き起こしを依頼するべく、一般スタッフのクライアント端末に、該当分割音声ファイルを送信し（ステップＳ７）、該スタッフによる書き起こし作業の実行を指示する（ステップＳ８）。いずれの場合も、書き起こし作業が終了すると、クライアント端末から送信される作業結果を受信し（ステップＳ９）、書き起こしテキストを抽出して統合し、分割前の元の音声データの書き起こしテキストデータとしてテキスト記憶部６に保存する（ステップＳ１０）。 Next, the transcription control device 4 determines whether or not a keyword is included in the recognition result of the divided audio file (step S4). If a keyword related to confidential information is included, the transcription control device 4 writes it to a highly reliable staff. In order to request a wake-up, the corresponding divided audio file is transmitted to the client terminal of the highly reliable staff (step S5), and the execution of the transcription work by the staff is instructed (step S6). On the other hand, if the keyword related to confidential information is not included in the recognition result of the divided audio file, the corresponding divided audio file is transmitted to the client terminal of the general staff in order to request the general staff to transcribe (step S7). Instruct the execution of the transcription work by the staff (step S8). In any case, when the transcription work is finished, the work result transmitted from the client terminal is received (step S9), the transcription text is extracted and integrated, and the transcription text data of the original voice data before division. Is stored in the text storage unit 6 (step S10).

上述した実施形態によれば、音声データを分割し、機密情報を含めた部分の書き起こし作業を信頼度の高いスタッフに依頼することで、情報漏洩のリスクを大幅に軽減することができる。 According to the above-described embodiment, the risk of information leakage can be greatly reduced by dividing the voice data and requesting a staff member with high reliability to transcribe a portion including confidential information.

また、分割なしで機密保持を保とうとすると、機密なしの部分を含めてすべての音声データを専属スタッフなどの信頼度の高いスタッフに依頼するしかないが、こういったスタッフは、一般スタッフより単価が高いため、コスト的に割高になる。上述した実施形態によれば、機密情報のない音声データを安心して単価の安い一般スタッフに依頼することができ、全体的にコストを削減することができる。 Also, if you want to keep confidentiality without division, you have no choice but to ask all the voice data including the non-confidential part to a highly reliable staff such as a dedicated staff. Is expensive, so it is expensive. According to the above-described embodiment, it is possible to request voice data without confidential information to a general staff with a low unit price with peace of mind, and to reduce the cost as a whole.

なお、上述した実施形態において、音声分割装置２、識別子付与装置３、書き起こし制御装置４、テキスト抽出装置５などの機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータ装置に読み込ませ、実行することにより、音声分割装置２、識別子付与装置３、書き起こし制御装置４、テキスト抽出装置５などの機能を実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものであってもよい。また、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、フラッシュメモリ等の書き込み可能な不揮発性メモリ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。 In the above-described embodiment, a program for realizing the functions of the voice dividing device 2, the identifier assigning device 3, the transcription control device 4, the text extracting device 5, and the like is recorded on a computer-readable recording medium. Functions such as the voice dividing device 2, the identifier assigning device 3, the transcription control device 4, and the text extracting device 5 may be realized by causing a computer device to read and execute a program recorded on a recording medium. Here, the “computer system” may include an OS and hardware such as peripheral devices. Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used. The “computer-readable recording medium” means a flexible disk, a magneto-optical disk, a ROM, a writable nonvolatile memory such as a flash memory, a portable medium such as a CD-ROM, a hard disk built in a computer system, etc. This is a storage device.

さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（例えばＤＲＡＭ（Dynamic Random Access Memory））のように、一定時間プログラムを保持しているものも含むものとする。また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。 Further, the “computer-readable recording medium” means a volatile memory (for example, DRAM (Dynamic DRAM) in a computer system that becomes a server or a client when a program is transmitted through a network such as the Internet or a communication line such as a telephone line. Random Access Memory)), etc., which hold programs for a certain period of time. The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

本発明の実施形態による書き起こしテキスト作成支援装置の構成を示すブロック図である。It is a block diagram which shows the structure of the transcription text creation assistance apparatus by embodiment of this invention. 本実施形態による、音声分割装置２の構成を示すブロック図である。It is a block diagram which shows the structure of the audio | voice division | segmentation apparatus 2 by this embodiment. 本実施形態による、管理テーブル２−４−１のデータ構成を示す概念図である。It is a conceptual diagram which shows the data structure of the management table 2-4-1 by this embodiment. 本実施形態による識別子付与装置３の構成を示すブロック図である。It is a block diagram which shows the structure of the identifier provision apparatus 3 by this embodiment. 本実施形態による、キーワードテーブル３−２−１の一例を示す概念図である。It is a conceptual diagram which shows an example of the keyword table 3-2-1 by this embodiment. 本実施形態による書き起こし制御装置４の構成を示すブロック図である。It is a block diagram which shows the structure of the transcription control apparatus 4 by this embodiment. 本実施形態による、作業スタッフ情報テーブル４−２−１のデータ構成例を示す概念図である。It is a conceptual diagram which shows the data structural example of the work staff information table 4-2-1 by this embodiment. 本実施形態による、管理テーブル２−４−１の状態遷移を説明するための概念図である。It is a conceptual diagram for demonstrating the state transition of the management table 2-4-1 by this embodiment. 本実施形態によるテキスト抽出装置５の構成を示すブロック図である。It is a block diagram which shows the structure of the text extraction apparatus 5 by this embodiment. 本実施形態による、クライアント端末８−１〜８−ｎの構成を示すブロック図である。It is a block diagram which shows the structure of the client terminals 8-1 to 8-n by this embodiment. 本実施形態による書き起こしテキスト作成支援装置の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of the transcription text creation assistance apparatus by this embodiment.

Explanation of symbols

１音声データ記憶部
２音声分割装置
２−１管理テーブル生成部
２−２無音区間閾値計算部
２−３音声分割部
２−４管理テーブル記憶部
２−４−１管理テーブル
２−５分割音声ファイル記憶部
３識別子付与装置
３−１音声認識部
３−２キーワード記憶部
３−２−１キーワードテーブル
３−３キーワード照合部
４書き起こし制御装置
４−１スタッフ情報取得部
４−２作業スタッフ情報記憶部
４−２−１作業スタッフ情報テーブル
４−３管理テーブル情報取得部
４−４送信部
４−５受信部
５テキスト抽出装置
５−１管理テーブル情報取得部
５−２書き起こし抽出部
６テキストデータ記憶部
７ネットワーク
８−１〜８−ｎクライアント端末
９−１受信部
９−２情報表示部
９−３音声ファイル再生部
９−４書き起こし入力部
９−５送信部 DESCRIPTION OF SYMBOLS 1 Audio | voice data memory | storage part 2 Audio | voice division | segmentation apparatus 2-1 Management table production | generation part 2-2 Silent section threshold value calculation part 2-3 Voice division | segmentation part 2-4 Management table memory | storage part 2-4-1 Management table 2-5 Division | segmentation audio | voice file Storage unit 3 Identifier assigning device 3-1 Speech recognition unit 3-2 Keyword storage unit 3-2-1 Keyword table 3-3 Keyword collation unit 4 Transcription control device 4-1 Staff information acquisition unit 4-2 Work staff information storage Section 4-2-1 Work staff information table 4-3 Management table information acquisition section 4-4 Transmission section 4-5 Reception section 5 Text extraction device 5-1 Management table information acquisition section 5-2 Transcription extraction section 6 Text data Storage unit 7 Network 8-1 to 8-n Client terminal 9-1 Reception unit 9-2 Information display unit 9-3 Audio file playback unit 9-4 Causing the input unit 9-5 transmission unit

Claims

A transcription text creation support device for transcription of text data from voice data,
For each transcription worker, reliability storage means for storing a reliability value corresponding to the reliability of each transcription worker;
Divided audio data generating means for dividing the audio data into predetermined sections and generating divided audio data;
Conversion means for performing voice recognition for each divided voice data divided by the divided voice data generating means, and converting it into a plurality of primary text data;
Keyword storage means for storing keywords relating to confidential information;
Determination means for determining whether or not the keyword stored in the keyword storage means is included for each primary text data converted by the conversion means;
Based on the determination result by the determination means, the reliability value of the reliability storage means is referred to, and for each divided voice data divided by the divided voice data generation means, a transcription worker who should request a transcription work is determined. A transcription text creation support apparatus comprising: a selection means for selecting.

Transmitting means for transmitting the divided audio data to the terminal of the transcription operator selected by the selecting means;
Receiving means for receiving text data of a transcription work result by the transcription worker;
The transcription text creation support apparatus according to claim 1, further comprising: a storage unit that integrates the text data received by the reception unit and stores it as transcription text data with respect to the original voice data.

The selecting means is
With reference to the reliability value of the reliability storage means, if the determination means determines that a keyword related to confidential information is included in the primary text, a transcription operator having a high reliability value is selected. The transcription text according to claim 1 or 2, wherein if it is determined that a keyword related to confidential information is not included in the primary text, a transcription worker having a low reliability value is selected. Creation support device.

A transcribed text creation support program that supports a transcription work for transcribing text data from audio data,
On the computer,
Dividing the audio data into predetermined intervals to generate divided audio data;
Performing voice recognition for each divided divided voice data and converting it into a plurality of primary text data;
Determining whether or not a keyword related to confidential information is included for each converted primary text data;
When it is determined that a keyword related to confidential information is included in the primary text, a transcription worker having a high reliability value is selected, and it is determined that a keyword related to confidential information is not included in the primary text. And a step of selecting a transcription worker having a low reliability value.

A transcribed text creation support method for supporting a transcription work to transcribe text data from voice data executed on a computer ,
Divided speech data generating means, and generating a divided speech data by dividing the audio data into a predetermined interval,
A conversion means for performing voice recognition for each divided divided voice data and converting the voice data into a plurality of primary text data;
A step of determining whether or not a keyword relating to confidential information is included for each converted primary text data;
If the selection means determines that the primary text includes a keyword related to confidential information , the selecting means selects a transcription worker having a high reliability value, and the primary text includes a keyword related to confidential information. And a step of selecting a transcription worker having a low reliability value if it is determined that there is no confidence value.