JP5533865B2

JP5533865B2 - Editing support system, editing support method, and editing support program

Info

Publication number: JP5533865B2
Application number: JP2011519574A
Authority: JP
Inventors: 清一三木
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2009-06-18
Filing date: 2010-06-17
Publication date: 2014-06-25
Anticipated expiration: 2030-06-17
Also published as: JPWO2010146869A1; WO2010146869A1

Description

本発明は、編集支援システム、編集支援方法および編集支援プログラムに関する。 The present invention relates to an editing support system, an editing support method, and an editing support program.

近年、たとえば会議等、複数の発言者が存在する場において、議事録作成を容易にする等の目的のために、音声認識技術を用いることが検討されている。音声認識技術を用いて議事録等を作成する場合、ユーザが音声を聞きながら音声認識結果のテキストを表示させて、誤認識部分の修正作業を行うことがある。 In recent years, use of speech recognition technology has been studied for the purpose of facilitating the creation of minutes, for example, when there are a plurality of speakers such as a conference. When creating a minutes or the like using the voice recognition technology, the user may correct the misrecognized portion by displaying the text of the voice recognition result while listening to the voice.

特許文献１（特開２００６−１１９５３４号公報）には、生成される字幕に対する責任者によって操作され、音声認識装置による音声認識結果に対して編集すべき部分を特定するマウス字幕編集装置と、マウス字幕編集装置から渡された字幕に対し、キーボードにより、音声に対応する正しい文字列を入力する作業を行う操作者によって操作されるキーボード字幕編集装置とを含むシステムが記載されている。これにより、キーボード字幕編集装置の操作者を比較的スキルレベルが低く、責任も低い人とすることができ、人件費を節約できるという効果が期待できるとされている。 Japanese Patent Laid-Open No. 2006-119534 discloses a mouse subtitle editing apparatus that is operated by a person responsible for generated subtitles and specifies a portion to be edited with respect to a voice recognition result by the voice recognition apparatus, and a mouse A system is described that includes a keyboard subtitle editing apparatus that is operated by an operator who performs an operation of inputting a correct character string corresponding to sound with a keyboard with respect to subtitles passed from the subtitle editing apparatus. As a result, it can be expected that the operator of the keyboard subtitle editing apparatus can be a person with a relatively low skill level and low responsibility and can save labor costs.

特開２００６−１１９５３４号公報JP 2006-119534 A

しかし、特許文献１に記載された技術では、マウス字幕編集装置を操作する責任者が音声認識結果の全部に対して編集すべき部分を特定する作業を行う必要があり、迅速な処理ができないという問題がある。また、同じ箇所について、責任者が特定するとともに、キーボード字幕編集装置の操作者が文字列を入力するという作業を行い、複数の人でチェックすることになり、効率が悪いという問題もあった。 However, in the technique described in Patent Document 1, it is necessary for the person in charge of operating the mouse caption editing device to perform an operation for specifying a portion to be edited with respect to all of the speech recognition results, and thus it is impossible to perform quick processing. There's a problem. In addition, the person in charge identifies the same part, and the operator of the keyboard subtitle editing apparatus performs a work of inputting a character string, which is checked by a plurality of people, resulting in poor efficiency.

一方、従来、ある程度のレベルの作業者を複数確保できていて、複数の作業者で分担して音声認識結果の編集作業を行いたい場合や、音声認識結果の特定の箇所を緊急で編集したいような場合に、編集するためのデータを効率よく準備する手順がなかった。そのため、音声認識結果の部分的な編集作業を迅速に行うことができないという問題があった。 On the other hand, if you have secured several workers at a certain level and you want to edit the speech recognition results by sharing them among the workers, or you want to edit a specific part of the speech recognition results urgently In such cases, there was no procedure for efficiently preparing data for editing. Therefore, there has been a problem that partial editing of the speech recognition result cannot be performed quickly.

本発明の目的は、上述した課題である、音声認識結果の部分的な編集作業を迅速に行うことができないという問題を解決する編集支援システムおよび編集支援方法を提供することにある。 An object of the present invention is to provide an editing support system and an editing support method that solve the above-described problem that a partial editing operation of a speech recognition result cannot be performed quickly.

本発明によれば、
音声データを時刻情報に対応づけて記憶する音声データ記憶手段と、
前記音声データの音声認識結果のテキストデータを単語単位で時刻情報に対応づけて所定の形式で記憶する音声認識結果記憶手段と、
前記テキストデータを所定の表示領域内に表示するとともに、前記表示領域内に、前記テキストデータを選択するカーソルを表示する第１の表示処理手段と、
前記第１の表示処理手段により表示された前記テキストデータの任意の選択範囲を前記カーソルにより受け付けるとともに、分割データの生成指示を受け付ける指示受付手段と、
前記指示受付手段により受け付けられた選択範囲に含まれる前記テキストデータを前記音声認識結果記憶手段から前記所定の形式を保ったままで抽出し、分割データを生成する分割データ生成手段と、
を含む音声認識結果の編集支援システムが提供される。According to the present invention,
Voice data storage means for storing voice data in association with time information;
Speech recognition result storage means for storing text data of a speech recognition result of the speech data in a predetermined format in association with time information in units of words;
First display processing means for displaying the text data in a predetermined display area and displaying a cursor for selecting the text data in the display area;
Instruction accepting means for accepting an arbitrary selection range of the text data displayed by the first display processing means by the cursor and accepting an instruction for generating divided data;
A divided data generating unit that extracts the text data included in the selection range received by the instruction receiving unit while maintaining the predetermined format from the voice recognition result storage unit, and generates divided data;
A speech recognition result editing support system including

本発明によれば、
音声データの音声認識結果のテキストデータを単語単位で時刻情報に対応づけて所定の形式で記憶する音声認識結果記憶手段から前記テキストデータを読み出し、前記テキストデータを所定の表示領域内に表示するとともに、前記表示領域内に、前記テキストデータを選択するカーソルを表示する第１の表示ステップと、
前記第１の表示ステップにおいて表示された前記テキストデータの任意の選択範囲を前記カーソルにより受け付けるとともに、分割データの生成指示を受け付けるステップと、
前記選択範囲に含まれる前記テキストデータを前記音声認識結果記憶手段から前記所定の形式を保ったままで抽出し、分割データを生成するステップと、
を含む音声認識結果の編集支援方法が提供される。According to the present invention,
The text data is read from the voice recognition result storage means for storing the text data of the voice recognition result of the voice data in a predetermined format in association with time information in units of words, and the text data is displayed in a predetermined display area. A first display step of displaying a cursor for selecting the text data in the display area;
Receiving an arbitrary selection range of the text data displayed in the first display step with the cursor and receiving an instruction to generate divided data;
Extracting the text data included in the selection range from the voice recognition result storage means while maintaining the predetermined format, and generating divided data;
A speech recognition result editing support method is provided.

本発明によれば、
コンピュータを、
音声データを時刻情報に対応づけて記憶する音声データ記憶手段、
前記音声データの音声認識結果のテキストデータを単語単位で時刻情報に対応づけて所定の形式で記憶する音声認識結果記憶手段、
前記テキストデータを所定の表示領域内に表示するとともに、前記表示領域内に、前記テキストデータを選択するカーソルを表示する第１の表示処理手段、
前記第１の表示処理手段により表示された前記テキストデータの任意の選択範囲を前記カーソルにより受け付けるとともに、分割データの生成指示を受け付ける指示受付手段、
前記指示受付手段により受け付けられた選択範囲に含まれる前記テキストデータを前記音声認識結果記憶手段から前記所定の形式を保ったままで抽出し、分割データを生成する分割データ生成手段、
として機能させる音声認識結果の編集支援プログラムが提供される。According to the present invention,
Computer
Audio data storage means for storing audio data in association with time information;
Voice recognition result storage means for storing text data of a voice recognition result of the voice data in a predetermined format in association with time information in units of words;
First display processing means for displaying the text data in a predetermined display area and displaying a cursor for selecting the text data in the display area;
Instruction accepting means for accepting an arbitrary selection range of the text data displayed by the first display processing means by the cursor and accepting an instruction for generating divided data;
A divided data generating unit that extracts the text data included in the selection range received by the instruction receiving unit while maintaining the predetermined format from the voice recognition result storage unit, and generates divided data;
A speech recognition result editing support program is provided.

なお、以上の構成要素の任意の組合せ、本発明の表現を方法、装置、システム、記録媒体、コンピュータプログラムなどの間で変換したものもまた、本発明の態様として有効である。 It should be noted that any combination of the above-described constituent elements and a conversion of the expression of the present invention between a method, an apparatus, a system, a recording medium, a computer program, etc. are also effective as an aspect of the present invention.

本発明によれば、音声認識結果の部分的な編集作業を迅速に行うことができる。 According to the present invention, it is possible to quickly perform a partial editing operation of a speech recognition result.

上述した目的、およびその他の目的、特徴および利点は、以下に述べる好適な実施の形態、およびそれに付随する以下の図面によってさらに明らかになる。 The above-described object and other objects, features, and advantages will become more apparent from the preferred embodiments described below and the accompanying drawings.

本発明の実施の形態における編集支援システムの構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the edit assistance system in embodiment of this invention. 本発明の実施の形態における編集管理装置の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the edit management apparatus in embodiment of this invention. 本発明の実施の形態における編集管理装置の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the edit management apparatus in embodiment of this invention. 本発明の実施の形態における音声認識結果記憶部に記憶された音声認識結果のテキストデータの構成の一例を示す図である。It is a figure which shows an example of a structure of the text data of the speech recognition result memorize | stored in the speech recognition result memory | storage part in embodiment of this invention. 本発明の実施の形態において、編集管理装置の表示処理部によりディスプレイに表示される画面の一例を示す図である。In embodiment of this invention, it is a figure which shows an example of the screen displayed on a display by the display process part of an edit management apparatus. 本発明の実施の形態において、編集管理装置の表示処理部によりディスプレイに表示される画面の一例を示す図である。In embodiment of this invention, it is a figure which shows an example of the screen displayed on a display by the display process part of an edit management apparatus. 本発明の実施の形態において、編集管理装置の表示処理部によりディスプレイに表示される画面の一例を示す図である。In embodiment of this invention, it is a figure which shows an example of the screen displayed on a display by the display process part of an edit management apparatus. 本発明の実施の形態において、編集管理装置の表示処理部によりディスプレイに表示される画面の一例を示す図である。In embodiment of this invention, it is a figure which shows an example of the screen displayed on a display by the display process part of an edit management apparatus. 本発明の実施の形態における編集管理装置の表示処理部の管理テーブルの一例を示す図である。It is a figure which shows an example of the management table of the display process part of the edit management apparatus in embodiment of this invention. 本発明の実施の形態における編集管理装置により生成された分割データの構成の一例を示す図である。It is a figure which shows an example of a structure of the division | segmentation data produced | generated by the edit management apparatus in embodiment of this invention. 本発明の実施の形態における編集処理装置の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the edit processing apparatus in embodiment of this invention. 本発明の実施の形態において、編集処理装置の表示処理部によりディスプレイに表示される画面の一例を示す図である。In embodiment of this invention, it is a figure which shows an example of the screen displayed on a display by the display process part of an edit processing apparatus. 本発明の実施の形態における編集処理装置の表示処理部の管理テーブルの一例を示す図である。It is a figure which shows an example of the management table of the display process part of the edit processing apparatus in embodiment of this invention. 本発明の実施の形態において、編集処理装置の表示処理部によりディスプレイに表示される画面の一例を示す図である。In embodiment of this invention, it is a figure which shows an example of the screen displayed on a display by the display process part of an edit processing apparatus. 本発明の実施の形態において、編集処理装置の表示処理部によりディスプレイに表示される画面の一例を示す図である。In embodiment of this invention, it is a figure which shows an example of the screen displayed on a display by the display process part of an edit processing apparatus. 本発明の実施の形態において、編集処理装置の表示処理部によりディスプレイに表示される画面の一例を示す図である。In embodiment of this invention, it is a figure which shows an example of the screen displayed on a display by the display process part of an edit processing apparatus. 本発明の実施の形態における編集処理装置の表示処理部の管理テーブルの一例を示す図である。It is a figure which shows an example of the management table of the display process part of the edit processing apparatus in embodiment of this invention. 本発明の実施の形態における編集済データの構成の一例を示す図である。It is a figure which shows an example of the structure of the edited data in embodiment of this invention. 本発明の実施の形態において、編集管理装置の表示処理部によりディスプレイに表示される画面の一例を示す図である。In embodiment of this invention, it is a figure which shows an example of the screen displayed on a display by the display process part of an edit management apparatus. 本発明の実施の形態において、編集管理装置の表示処理部によりディスプレイに表示される画面の一例を示す図である。In embodiment of this invention, it is a figure which shows an example of the screen displayed on a display by the display process part of an edit management apparatus. 本発明の実施の形態において、編集管理装置の表示処理部によりディスプレイに表示される画面の一例を示す図である。In embodiment of this invention, it is a figure which shows an example of the screen displayed on a display by the display process part of an edit management apparatus. 本発明の実施の形態において、編集管理装置の表示処理部によりディスプレイに表示される画面の一例を示す図である。In embodiment of this invention, it is a figure which shows an example of the screen displayed on a display by the display process part of an edit management apparatus. 本発明の実施の形態における音声認識結果記憶部に記憶された音声認識結果のテキストデータの構成の他の例を示す図である。It is a figure which shows the other example of a structure of the text data of the speech recognition result memorize | stored in the speech recognition result memory | storage part in embodiment of this invention. 本発明の実施の形態における編集管理装置や編集処理装置を構成する装置のハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of the apparatus which comprises the edit management apparatus and edit processing apparatus in embodiment of this invention.

以下、本発明の実施の形態について、図面を用いて説明する。尚、すべての図面において、同様の構成要素には同様の符号を付し、適宜説明を省略する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In all the drawings, the same constituent elements are denoted by the same reference numerals, and the description thereof is omitted as appropriate.

図１は、本実施の形態における編集支援システムの構成を模式的に示すブロック図である。
本実施の形態において、編集支援システム３００は、編集管理装置１００と、一以上の編集処理装置２００を含む。ここでは、編集支援システム３００が２つの編集処理装置２００（編集処理装置２００（Ａ）および編集処理装置２００（Ｂ））を含む例を示す。FIG. 1 is a block diagram schematically showing the configuration of the editing support system in the present embodiment.
In the present embodiment, the editing support system 300 includes an editing management device 100 and one or more editing processing devices 200. Here, an example is shown in which the editing support system 300 includes two editing processing devices 200 (an editing processing device 200 (A) and an editing processing device 200 (B)).

編集管理装置１００は、音声認識結果のテキストデータを所定の形式で記憶し、テキストデータを所定の表示領域内に編集可能に表示する。ユーザが、テキストデータの所望の範囲を選択すると、編集管理装置１００は、その範囲に該当するテキストデータをもとの形式を保ったままで抽出して分割データを生成する。ここで、分割データは、もとのテキストデータの一部分とすることができる。なお、このとき、編集管理装置１００は、テキストデータとともに、対応する音声データを抽出して、分割データに音声データを含めることもできる。本実施の形態において、編集管理装置１００がテキストデータおよび音声データを含む分割データを生成する。このようにして、編集管理装置１００は、複数の分割データを生成することができる。各分割データは、それぞれ、各編集処理装置２００で編集される。編集処理装置２００で編集された分割データは、編集管理装置１００で統合される。 The edit management apparatus 100 stores the text data of the speech recognition result in a predetermined format, and displays the text data in a predetermined display area so that it can be edited. When the user selects a desired range of text data, the edit management apparatus 100 extracts the text data corresponding to the range while maintaining the original format, and generates divided data. Here, the divided data can be a part of the original text data. At this time, the editing management apparatus 100 can extract the corresponding voice data together with the text data and include the voice data in the divided data. In the present embodiment, the edit management device 100 generates divided data including text data and audio data. In this way, the edit management apparatus 100 can generate a plurality of divided data. Each divided data is edited by each editing processing device 200. The divided data edited by the editing processing device 200 is integrated by the editing management device 100.

これにより、簡易な操作で、音声認識結果の所望の範囲を選択して、当該範囲に含まれるテキストデータをもとの形式を保ったままで抽出することができる。これにより、音声認識結果の部分的な編集作業を迅速に行うことができる。また、複数の作業者がいる場合は、複数の分割データを準備して、複数の作業者がそれぞれ編集作業をすることができ、複数の作業者で音声認識結果を修正する際の作業効率を向上させることができる。 Thereby, it is possible to select a desired range of the speech recognition result and extract the text data included in the range while maintaining the original format with a simple operation. Thereby, the partial edit operation | work of a speech recognition result can be performed rapidly. In addition, when there are multiple workers, multiple pieces of divided data are prepared so that multiple workers can edit each of them, improving the work efficiency when correcting the speech recognition results by multiple workers. Can be improved.

図２は、本実施の形態における編集管理装置１００の構成を示すブロック図である。
編集管理装置１００は、音声取得部１０２、音声認識部１０４、表示処理部１１０（第１の表示処理手段）、指示受付部１１２（指示受付手段）、音声再生部１１４（音声再生手段）、分割データ生成部１１６（分割データ生成手段）、編集処理部１１８（編集処理手段）、データ統合部１２０（データ統合手段）、アクセス制御部１２２、および記憶部１３０を含む。FIG. 2 is a block diagram showing the configuration of the edit management apparatus 100 in the present embodiment.
The edit management apparatus 100 includes a voice acquisition unit 102, a voice recognition unit 104, a display processing unit 110 (first display processing unit), an instruction reception unit 112 (instruction reception unit), a voice reproduction unit 114 (voice reproduction unit), and a division. A data generation unit 116 (divided data generation unit), an editing processing unit 118 (editing processing unit), a data integration unit 120 (data integration unit), an access control unit 122, and a storage unit 130 are included.

記憶部１３０は、音声データ記憶部１３２（音声データ記憶手段）、音声認識結果記憶部１３４（音声認識結果記憶手段）、分割データ記憶部１３６、編集済データ記憶部１３８、および統合データ記憶部１４０を含む。 The storage unit 130 includes a voice data storage unit 132 (voice data storage unit), a voice recognition result storage unit 134 (voice recognition result storage unit), a divided data storage unit 136, an edited data storage unit 138, and an integrated data storage unit 140. including.

音声取得部１０２は、マイクロフォン等の音声入力部（不図示）から入力された発言者の音声データを取得する。ここで、音声取得部１０２は、音声データを時刻情報に対応づけて取得する。音声データ記憶部１３２は、音声取得部１０２が取得した音声データを、時刻情報に対応づけて記憶する。 The voice acquisition unit 102 acquires the voice data of a speaker input from a voice input unit (not shown) such as a microphone. Here, the voice acquisition unit 102 acquires voice data in association with time information. The audio data storage unit 132 stores the audio data acquired by the audio acquisition unit 102 in association with the time information.

音声認識部１０４は、音声取得部１０２が取得した音声データを音声認識し、音声認識結果をテキストデータに変換する。音声認識結果記憶部１３４は、音声認識部１０４が処理した音声認識結果のテキストデータを、単語単位で時刻情報に対応づけて所定の形式で記憶する。本実施の形態において、音声認識結果記憶部１３４は、音声認識結果のテキストデータを文（センテンス）毎、および単語（ワード）毎に把握するとともに、各文、各単語毎に時刻情報を対応づけた形式で記憶する。時刻情報は、開始時刻および終了時刻の両方を含んでもよく、開始時刻のみを含むものでもよい。 The voice recognition unit 104 recognizes the voice data acquired by the voice acquisition unit 102 and converts the voice recognition result into text data. The voice recognition result storage unit 134 stores the text data of the voice recognition result processed by the voice recognition unit 104 in a predetermined format in association with time information in units of words. In the present embodiment, the speech recognition result storage unit 134 grasps text data of the speech recognition result for each sentence (sentence) and each word (word), and associates time information with each sentence and each word. Memorize in the format. The time information may include both the start time and the end time, or may include only the start time.

表示処理部１１０は、音声認識結果のテキストデータを所定の領域内に編集可能に表示するとともに、当該表示領域内に、テキストデータを選択するカーソル（キャレット）を表示する。表示処理部１１０の機能は、テキストエディタにより実現することができる。本実施の形態において、表示処理部１１０は、テキストデータを少なくとも単語単位でカーソルに対する相対位置情報に対応づけて表示することができる。 The display processing unit 110 displays the text data of the speech recognition result in a predetermined area so as to be editable, and displays a cursor (caret) for selecting the text data in the display area. The function of the display processing unit 110 can be realized by a text editor. In the present embodiment, the display processing unit 110 can display text data in association with relative position information with respect to the cursor at least in units of words.

指示受付部１１２は、表示処理部１１０により表示されたテキストデータの任意の選択範囲をカーソルにより受け付けるとともに、分割データの生成指示を受け付ける。 The instruction receiving unit 112 receives an arbitrary selection range of the text data displayed by the display processing unit 110 with a cursor and receives an instruction to generate divided data.

音声再生部１１４は、音声データ記憶部１３２から音声データを読み出し、音声を再生する。ここで、音声再生部１１４は、時刻が指定されると、当該時刻に対応する音声データを出力する。また、音声再生部１１４は、表示処理部１１０により表示されたテキストデータにおいて、カーソルで選択された単語に対応づけられた時刻情報に基づき、対応する時刻の音声データを再生することができる。音声出力装置は、たとえばスピーカとすることができる。 The audio reproduction unit 114 reads audio data from the audio data storage unit 132 and reproduces the audio. Here, when the time is designated, the sound reproduction unit 114 outputs sound data corresponding to the time. In addition, the voice reproduction unit 114 can reproduce the voice data at the corresponding time based on the time information associated with the word selected by the cursor in the text data displayed by the display processing unit 110. The audio output device can be, for example, a speaker.

分割データ生成部１１６は、指示受付部１１２が受け付けた選択範囲に含まれるテキストデータを音声認識結果記憶部１３４から所定の形式を保ったままで抽出する。ここで、形式を保ったままとは、音声認識結果のテキストデータと同様、文（センテンス）毎、および単語（ワード）毎に把握され、各文、各単語毎に時刻情報が対応づけられた形態とすることができる。また、分割データ生成部１１６は、選択範囲に含まれるテキストデータに対応する音声データを時刻情報に対応づけられた状態で音声データ記憶部１３２から抽出する。分割データ生成部１１６は、抽出したテキストデータと音声データとを含む分割データを生成する。 The divided data generation unit 116 extracts the text data included in the selection range received by the instruction receiving unit 112 from the voice recognition result storage unit 134 while maintaining a predetermined format. Here, in the same way as the text data of the speech recognition result, “maintaining the format” is grasped for each sentence (sentence) and each word (word), and time information is associated with each sentence and each word. It can be in the form. Further, the divided data generation unit 116 extracts the audio data corresponding to the text data included in the selection range from the audio data storage unit 132 in a state associated with the time information. The divided data generation unit 116 generates divided data including the extracted text data and audio data.

分割データ生成部１１６は、生成した分割データを分割データ記憶部１３６内の所定のフォルダに保存する。ここで、分割データ記憶部１３６には、分割データに対して編集処理を行うことが想定されている装置毎に準備された予め設定された所定のフォルダを準備しておくことができる。本実施の形態において、たとえば、図１に示した編集処理装置２００（Ａ）や編集処理装置２００（Ｂ）等に対応するフォルダを準備しておくことができる。分割データ生成部１１６は、分割データを、このように準備されたフォルダに保存することができる。 The divided data generation unit 116 stores the generated divided data in a predetermined folder in the divided data storage unit 136. Here, in the divided data storage unit 136, a predetermined folder that is prepared in advance for each apparatus that is supposed to perform editing processing on the divided data can be prepared. In the present embodiment, for example, a folder corresponding to the edit processing device 200 (A), the edit processing device 200 (B), etc. shown in FIG. 1 can be prepared. The divided data generation unit 116 can save the divided data in the folder prepared in this way.

本実施の形態において、音声認識結果のテキストデータに対する編集は、編集処理装置２００において行うことを想定しているが、編集管理装置１００においても、編集処理装置２００と同様に編集作業を行うことができる。編集処理部１１８は、編集管理装置１００においても、音声認識結果のテキストデータの編集を行うために用いるものであり、編集処理装置２００に含まれるものと同様の構成とすることができる。編集処理部１１８の機能については、後に編集処理装置２００を参照して説明する。編集済データ記憶部１３８には、編集済の分割データ（以下、編集済データという）が記憶される。 In the present embodiment, it is assumed that editing of the text data of the speech recognition result is performed by the editing processing apparatus 200. However, the editing management apparatus 100 can perform editing work in the same manner as the editing processing apparatus 200. it can. The edit processing unit 118 is also used for editing the text data of the speech recognition result in the edit management apparatus 100, and can have the same configuration as that included in the edit processing apparatus 200. The function of the editing processing unit 118 will be described later with reference to the editing processing device 200. The edited data storage unit 138 stores edited divided data (hereinafter referred to as edited data).

データ統合部１２０は、複数の分割データのテキストデータを、時刻情報に基づき、時刻順に並べて統合する。データ統合部１２０は、統合したデータを統合データ記憶部１４０に記憶する。なお、本実施の形態においては、分割データ記憶部１３６とは別に編集済データ記憶部１３８を準備した例を示しているが、他の例においては、編集済データ記憶部１３８を準備せず、分割データ記憶部１３６に記憶された編集前の分割データを編集済の分割データで上書きする構成とすることもできる。また、同様に、本実施の形態において、音声認識結果記憶部１３４とは別に統合データ記憶部１４０を準備した例を示しているが、他の例においては、統合データ記憶部１４０を準備せず、音声認識結果記憶部１３４に記憶された編集前の音声認識結果のテキストデータを編集済の統合データで上書きする構成とすることもできる。 The data integration unit 120 arranges and integrates text data of a plurality of divided data in order of time based on time information. The data integration unit 120 stores the integrated data in the integrated data storage unit 140. In the present embodiment, an example in which the edited data storage unit 138 is prepared separately from the divided data storage unit 136 is shown. However, in another example, the edited data storage unit 138 is not prepared, A configuration may be adopted in which the divided data before editing stored in the divided data storage unit 136 is overwritten with the edited divided data. Similarly, in the present embodiment, an example in which the integrated data storage unit 140 is prepared separately from the speech recognition result storage unit 134 is shown, but in another example, the integrated data storage unit 140 is not prepared. The text data of the speech recognition result before editing stored in the speech recognition result storage unit 134 may be overwritten with the edited integrated data.

アクセス制御部１２２は、編集処理装置２００等の外部の装置からのアクセスを制御する。本実施の形態においては、分割データ生成部１１６が生成した分割データは、編集管理装置１００の分割データ記憶部１３６の所定のフォルダに記憶される。編集処理装置２００で各分割データに対する編集作業を行うユーザは、編集管理装置１００にアクセスして、分割データを取得する。アクセス制御部１２２は、このような他の端末からのアクセスを制御する。 The access control unit 122 controls access from an external device such as the editing processing device 200. In the present embodiment, the divided data generated by the divided data generation unit 116 is stored in a predetermined folder of the divided data storage unit 136 of the editing management apparatus 100. A user who edits each piece of divided data in the editing processing apparatus 200 accesses the edit management apparatus 100 and acquires divided data. The access control unit 122 controls access from such other terminals.

次に、本実施の形態において、分割データが生成される手順を説明する。図３は、本実施の形態の編集管理装置１００において、分割データが生成される手順を示すフローチャートである。 Next, a procedure for generating divided data in the present embodiment will be described. FIG. 3 is a flowchart showing a procedure for generating divided data in the edit management apparatus 100 according to the present embodiment.

まず、表示処理部１１０は、音声認識結果記憶部１３４に記憶された音声認識結果のテキストデータをディスプレイに表示する（ステップＳ１０２）。 First, the display processing unit 110 displays the text data of the speech recognition result stored in the speech recognition result storage unit 134 on the display (step S102).

図４は、本実施の形態における音声認識結果記憶部１３４に記憶された音声認識結果のテキストデータの構成の一例を示す図である。
音声認識結果記憶部１３４は、文番号欄、単語番号欄、話者欄、開始時刻欄、終了時刻欄、音声認識結果欄、および文字数欄を含む。
音声認識結果欄には、音声認識結果のテキストデータが単語単位で記憶されている。ここでは、「ｓ１１」および「ｓ１２」で識別される文に含まれる単語が表示されている。各単語にも、各文中でその単語を識別する識別情報が付されている。つまり、たとえば「ｓ１１」と「ｗ１」との識別情報に基づき、「昨年、」という単語が識別される。この単語は、話者「２」による発言であり、開始時刻が「１３：４４：０９」、終了時刻が「１３：４４：１０」となっている。また、文字数は３文字である。FIG. 4 is a diagram illustrating an example of the configuration of text data of a speech recognition result stored in the speech recognition result storage unit 134 according to the present embodiment.
The speech recognition result storage unit 134 includes a sentence number field, a word number field, a speaker field, a start time field, an end time field, a speech recognition result field, and a character number field.
In the speech recognition result column, text data of the speech recognition result is stored in units of words. Here, words included in the sentences identified by “s11” and “s12” are displayed. Each word is also attached with identification information for identifying the word in each sentence. That is, for example, based on the identification information of “s11” and “w1”, the word “Last year” is identified. This word is a statement by the speaker “2”, the start time is “13:44:09”, and the end time is “13:44:10”. The number of characters is three.

図５から図８は、表示処理部１１０によりディスプレイに表示されるテキストエディタの画面４００を示す図である。
図５に示すように、画面４００には、テキスト表示領域４０２、時刻表示領域４０４、時刻変更ボタン４０６、音声再生ボタン４０８、速度変更ボタン４１０等が表示されている。テキスト表示領域４０２には、音声認識結果のテキストデータと、カーソル４２０とが表示される。5 to 8 are diagrams showing a text editor screen 400 displayed on the display by the display processing unit 110.
As shown in FIG. 5, the screen 400 displays a text display area 402, a time display area 404, a time change button 406, an audio playback button 408, a speed change button 410, and the like. In the text display area 402, text data of a speech recognition result and a cursor 420 are displayed.

ここで、テキスト表示領域４０２に、一行に２５文字、９行のテキストデータが表示される例を示す。表示処理部１１０は、音声認識結果記憶部１３４に記憶されたテキストデータを、２５文字毎に改行してテキスト表示領域４０２に表示する。 Here, an example in which text data of 25 characters and 9 lines is displayed in one line in the text display area 402 is shown. The display processing unit 110 displays the text data stored in the speech recognition result storage unit 134 in the text display area 402 with a line feed every 25 characters.

表示処理部１１０は、画面４００に表示されたテキストデータに含まれる各単語の位置を把握するための管理テーブルを含む。図９は、表示処理部１１０の管理テーブルを示す図である。
表示処理部１１０の管理テーブルは、行毎に、当該行に含まれる文字列（text）、文（sentences）および単語（words)の識別情報を保持する。また、管理テーブルは、各文および各単語毎に、それぞれ開始位置（start）と文字長（len）を示す情報を保持する。The display processing unit 110 includes a management table for grasping the position of each word included in the text data displayed on the screen 400. FIG. 9 is a diagram illustrating a management table of the display processing unit 110.
The management table of the display processing unit 110 holds, for each row, identification information of character strings (text), sentences (sentences), and words (words) included in the row. The management table holds information indicating a start position (start) and a character length (len) for each sentence and each word.

以下、図５に示した画面４００のテキスト表示領域４０２の２行目の文字列を例として説明する。２行目には、「○話者２昨年、Ａ検討委員会から報告書を受領しまし」と表示されている。図９の「Ｌ２」には、この行に表示された文字列に関する表示情報が対応づけられている。ここで、この文字列の最初の５文字「○話者２」は、音声認識結果ではなく、話者を表示するためのラベルであるので、文字列（ｔｅｘｔ）の情報として、ラベルの識別情報を示す「ｉ１１」が記入されている。また、「昨年、Ａ検討委員会から報告書を受領しまし」は、それぞれ、「昨年、」、「Ａ検討委員会」、「から」、「報告書」、「を」、「受領」、「しました。」に対応する。そのため、文字列（ｔｅｘｔ）の情報として、各単語の識別情報を示す「ｓ１１＿ｗ１」、「ｓ１１＿ｗ２」、「ｓ１１＿ｗ３」、「ｓ１１＿ｗ４」、「ｓ１１＿ｗ５」、「ｓ１１＿ｗ６」、「ｓ１１＿ｗ７」が記入されている。 Hereinafter, the character string in the second line of the text display area 402 of the screen 400 shown in FIG. 5 will be described as an example. On the second line, “Speaker 2 Last year we received a report from the A Review Committee” is displayed. “L2” in FIG. 9 is associated with display information relating to the character string displayed in this line. Here, since the first five characters “○ Speaker 2” of this character string are not voice recognition results but labels for displaying the speaker, label identification information is used as character string (text) information. “I11” is entered. Also, “Received a report from the A Review Committee last year” means “Last year,” “A Review Committee,” “From”, “Report”, “O”, “Receive”, Corresponds to "I did." Therefore, “s11_w1”, “s11_w2”, “s11_w3”, “s11_w4”, “s11_w5”, “s11_w6”, and “s11_w7” indicating the identification information of each word are entered as character string (text) information. .

また、各文、各単語についても、その文および単語における開始位置、および文字長が記載されている。たとえば、「ｓ１１＿ｗ７」で識別される単語は、図４を参照すると「しました。」である。このうち、最初の３文字「しまし」のみが２行目に含まれる。そのため、開始位置はゼロ、文字長は３であり、「ｓ１１＿ｗ７，ｓｔａｒｔ＝０，ｌｅｎ＝３」と記入される。 Also, for each sentence and each word, the start position and the character length in the sentence and word are described. For example, the word identified by “s11_w7” is “I did” with reference to FIG. Of these, only the first three characters “shimashi” are included in the second line. Therefore, the start position is zero, the character length is 3, and “s11_w7, start = 0, len = 3” is entered.

以上のように、表示処理部１１０は、テキスト表示領域４０２に表示される各単語の位置（行、文字位置）を把握することができる。また、表示処理部１１０は、カーソル４２０の位置（行、文字位置）も把握する。これにより、表示処理部１１０は、カーソル４２０の位置に基づき、どの文のどの単語が指されているかを把握することができる。 As described above, the display processing unit 110 can grasp the position (line, character position) of each word displayed in the text display area 402. The display processing unit 110 also grasps the position (line, character position) of the cursor 420. Thus, the display processing unit 110 can grasp which word of which sentence is pointed based on the position of the cursor 420.

図５に戻り、ユーザは、マウス等の操作部（不図示）を用いてカーソル４２０を移動させることにより、テキスト表示領域４０２に表示されたテキストデータの任意の選択範囲を指定することができる。表示処理部１１０は、カーソルの位置情報にもとづき、管理テーブルを参照して、選択範囲に含まれる単語を把握する。指示受付部１１２は、表示処理部１１０から選択範囲に含まれる単語の情報を取得する。また、ユーザがマウス等の操作部（不図示）を用いて画面４００に表示された各種ボタン（４０４〜４１０）を操作すると、その指示を指示受付部１１２が受け付ける。 Returning to FIG. 5, the user can designate an arbitrary selection range of the text data displayed in the text display area 402 by moving the cursor 420 using an operation unit (not shown) such as a mouse. The display processing unit 110 refers to the management table based on the cursor position information and grasps words included in the selection range. The instruction receiving unit 112 acquires information on words included in the selection range from the display processing unit 110. When the user operates various buttons (404 to 410) displayed on the screen 400 using an operation unit (not shown) such as a mouse, the instruction receiving unit 112 receives the instruction.

たとえば、ユーザが音声再生ボタン４０８を操作すると、指示受付部１１２がその指示を受け付け、音声再生部１１４に通知する。音声再生部１１４は、ユーザの指示に基づき、音声データの再生、停止、早送り、巻き戻し等を行う。同様に、ユーザが速度変更ボタン４１０を操作すると、指示受付部１１２がその指示を受け付け、音声再生部１１４に通知する。音声再生部１１４は、ユーザの指示に基づき、音声データの再生速度を変更する。 For example, when the user operates the audio reproduction button 408, the instruction receiving unit 112 receives the instruction and notifies the audio reproducing unit 114 of the instruction. The audio reproduction unit 114 performs reproduction, stop, fast forward, rewind, and the like of audio data based on a user instruction. Similarly, when the user operates the speed change button 410, the instruction receiving unit 112 receives the instruction and notifies the sound reproducing unit 114. The audio playback unit 114 changes the playback speed of the audio data based on a user instruction.

時刻表示領域４０４には、音声データに対応する時刻が表示される。ユーザが時刻変更ボタン４０６を操作することにより、時刻表示領域４０４に表示された時刻を変更することができる。カーソル４２０と時刻表示領域４０４に表示される時刻とは連動させることができ、時刻表示領域４０４に表示された時刻に対応する単語に対応する箇所にカーソル４２０が表示されるようにすることもできる。 In the time display area 404, the time corresponding to the audio data is displayed. When the user operates the time change button 406, the time displayed in the time display area 404 can be changed. The cursor 420 and the time displayed in the time display area 404 can be linked to each other, and the cursor 420 can be displayed at a position corresponding to the word corresponding to the time displayed in the time display area 404. .

図３に戻り、指示受付部１１２が、ユーザから範囲の選択および分割データの生成指示を受け付けると（ステップＳ１０４のＹＥＳ）、分割データ生成部１１６は、分割データを生成する。まず、ユーザが範囲を選択して分割データの生成を指示する手順を、図５から図８を参照して説明する。 Returning to FIG. 3, when the instruction receiving unit 112 receives an instruction for selecting a range and generating divided data from the user (YES in step S <b> 104), the divided data generating unit 116 generates divided data. First, a procedure for the user to select a range and instruct generation of divided data will be described with reference to FIGS.

ユーザがマウス等によりカーソル４２０を選択範囲の開始点に合わせ（図５）、たとえばマウスの左ボタンをクリックした状態でカーソル４２０を選択範囲の終了点まで移動させると、開始点と終了点との間の選択範囲４２２のテキストデータが反転等して選択される（図６）。ここで、ユーザがたとえばマウスの右ボタンをクリックする等の操作を行うと、ボックス４３０が表示される（図７）。ボックス４３０には、分割データ生成ボタン４３２等、各種作業項目が表示される。ここで、ユーザが分割データ生成ボタン４３２を選択すると、保存画面４４０が表示される（図８）。保存画面４４０には、予め設定された所定の複数のフォルダとファイル名を入力する欄と、保存ボタン４４２およびキャンセルボタン４４４等が表示される。ユーザがいずれかのフォルダを選択して、ファイル名を入力し、保存ボタン４４２を押すと、図３に示したステップＳ１０４の範囲の選択および分割データの生成指示が行われる。なお、ファイル名は、自動的に付されるようにすることもできる。また、ユーザが新たなフォルダを作成することもできる。 When the user moves the cursor 420 to the start point of the selection range with the mouse or the like (FIG. 5) and moves the cursor 420 to the end point of the selection range with the left button of the mouse clicked, for example, The text data in the selection range 422 in the middle is selected by being inverted (FIG. 6). Here, when the user performs an operation such as clicking the right button of the mouse, for example, a box 430 is displayed (FIG. 7). In the box 430, various work items such as a divided data generation button 432 are displayed. Here, when the user selects the divided data generation button 432, a save screen 440 is displayed (FIG. 8). The save screen 440 displays fields for inputting a plurality of predetermined folders and file names, a save button 442, a cancel button 444, and the like. When the user selects any folder, inputs a file name, and presses the save button 442, the selection of the range and the generation of divided data in step S104 shown in FIG. 3 are performed. The file name can be automatically assigned. The user can also create a new folder.

図３に戻り、次いで、分割データ生成部１１６は、選択された範囲に含まれる単語を決定する（ステップＳ１０６）。また、分割データ生成部１１６は、決定された単語に基づいて、開始時刻および終了時刻を決定する（ステップＳ１０８）。次いで、分割データ生成部１１６は、音声認識結果記憶部１３４から、選択された範囲に対応するテキストデータを抽出する（ステップＳ１１０）。その後、分割データ生成部１１６は、開始時刻および終了時刻に基づき、対応する時刻の音声データを抽出する（ステップＳ１１２）。分割データ生成部１１６は、選択された部分のテキストデータと音声データとを含む分割データを生成して（ステップＳ１１４）、所定のフォルダに保存する（ステップＳ１１６）。 Returning to FIG. 3, the divided data generation unit 116 then determines words included in the selected range (step S106). Further, the divided data generation unit 116 determines a start time and an end time based on the determined word (step S108). Next, the divided data generation unit 116 extracts text data corresponding to the selected range from the speech recognition result storage unit 134 (step S110). Thereafter, the divided data generation unit 116 extracts audio data at the corresponding time based on the start time and the end time (step S112). The divided data generation unit 116 generates divided data including text data and audio data of the selected portion (step S114) and stores it in a predetermined folder (step S116).

図１０は、分割データ記憶部１３６に保存された分割データのテキストデータの一例を示す図である。分割データのテキストデータは、音声認識結果記憶部１３４に記憶された音声認識結果のテキストデータと同じ形式で生成される。つまり、分割データのテキストデータは、文番号欄、単語番号欄、話者欄、開始時刻欄、終了時刻欄、音声認識結果欄、および文字数欄を含む。 FIG. 10 is a diagram illustrating an example of the text data of the divided data stored in the divided data storage unit 136. The text data of the divided data is generated in the same format as the text data of the speech recognition result stored in the speech recognition result storage unit 134. That is, the text data of the divided data includes a sentence number field, a word number field, a speaker field, a start time field, an end time field, a voice recognition result field, and a character number field.

図１１は、本実施の形態における編集処理装置２００の構成を示す図である。
編集処理装置２００は、表示処理部２１０（第２の表示処理手段）、指示受付部２１２、音声再生部２１４、編集処理部２１８（編集処理手段）、データ取得・送出部２２０、および記憶部２３０を含む。記憶部２３０は、分割データ記憶部２３６および編集済データ記憶部２３８を含む。FIG. 11 is a diagram showing a configuration of the editing processing apparatus 200 in the present embodiment.
The editing processing apparatus 200 includes a display processing unit 210 (second display processing unit), an instruction receiving unit 212, an audio reproduction unit 214, an editing processing unit 218 (editing processing unit), a data acquisition / transmission unit 220, and a storage unit 230. including. Storage unit 230 includes divided data storage unit 236 and edited data storage unit 238.

データ取得・送出部２２０は、編集管理装置１００の記憶部１３０の分割データ記憶部１３６や編集済データ記憶部１３８にアクセスし、分割データを取得したり、編集済データを保存したりする。分割データ記憶部２３６は、データ取得・送出部２２０が分割データ記憶部１３６から取得した分割データを記憶する。データ取得・送出部２２０が取得した分割データは、図１０に示したのと同様の構成を有する。 The data acquisition / transmission unit 220 accesses the divided data storage unit 136 and the edited data storage unit 138 of the storage unit 130 of the editing management apparatus 100 to acquire the divided data and save the edited data. The divided data storage unit 236 stores the divided data acquired from the divided data storage unit 136 by the data acquisition / transmission unit 220. The divided data acquired by the data acquisition / transmission unit 220 has the same configuration as that shown in FIG.

表示処理部２１０、指示受付部２１２、および音声再生部２１４は、それぞれ、編集管理装置１００の表示処理部１１０、指示受付部１１２、および音声再生部１１４と同様の機能を有する構成とすることができる。 The display processing unit 210, the instruction receiving unit 212, and the sound reproducing unit 214 are configured to have the same functions as the display processing unit 110, the instruction receiving unit 112, and the sound reproducing unit 114 of the editing management apparatus 100, respectively. it can.

表示処理部２１０は、分割データに含まれるテキストデータを所定の領域内に編集可能に表示するとともに、当該表示領域内に、テキストデータを選択するカーソル（キャレット）を表示する。表示処理部２１０の機能は、表示処理部１１０と同様のテキストエディタにより実現することができる。 The display processing unit 210 displays the text data included in the divided data so as to be editable in a predetermined area, and displays a cursor (caret) for selecting the text data in the display area. The function of the display processing unit 210 can be realized by a text editor similar to the display processing unit 110.

図１２は、表示処理部２１０によりディスプレイに表示されるテキストエディタの画面５００を示す図である。画面５００には、テキスト表示領域５０２、時刻表示領域４０４、時刻変更ボタン４０６、音声再生ボタン４０８、速度変更ボタン４１０等が表示されている。テキスト表示領域５０２には、分割データのテキストデータと、カーソル５２０とが表示される。時刻表示領域４０４、時刻変更ボタン４０６、音声再生ボタン４０８、および速度変更ボタン４１０は、図５から図８を参照して説明したのと同様の機能を有する。ここでは説明を省略する。 FIG. 12 is a diagram showing a text editor screen 500 displayed on the display by the display processing unit 210. On the screen 500, a text display area 502, a time display area 404, a time change button 406, an audio playback button 408, a speed change button 410, and the like are displayed. In the text display area 502, text data of divided data and a cursor 520 are displayed. The time display area 404, the time change button 406, the sound reproduction button 408, and the speed change button 410 have the same functions as described with reference to FIGS. The description is omitted here.

図１３は、図１２に示した状態の表示処理部２１０の管理テーブルを示す図である。
表示処理部２１０は、行毎に、当該行に含まれる文字列（text）、文（sentences）および単語（words)の識別情報を保持する。また、各文および各単語毎に、それぞれ開始位置（start）と文字長（len）を示す情報を保持する。FIG. 13 is a diagram showing a management table of the display processing unit 210 in the state shown in FIG.
The display processing unit 210 holds, for each line, identification information of character strings (text), sentences (sentences), and words (words) included in the line. In addition, information indicating a start position (start) and a character length (len) is held for each sentence and each word.

以下、図１２に示した画面５００のテキスト表示領域５０２の３行目の文字列について説明する。３行目には、「ならびにＣ市の学校長やＢ県の市町村教育委員会の綿棒」と表示されている。図１３の「Ｌ３」には、この行に表示された文字列に関する表示情報が対応づけられている。ここで、この文字列の最後の２文字「綿棒」は、図１０を参照すると「ｓ１２＿ｗ１６」で識別される。そのため、開始位置はゼロ、文字長は２であり、「ｓ１２＿ｗ１６，ｓｔａｒｔ＝０，ｌｅｎ＝２」と記入される。 Hereinafter, the character string in the third line of the text display area 502 of the screen 500 illustrated in FIG. 12 will be described. In the third line, “and the school chief of C city and the cotton swab of the municipality school board of B prefecture” are displayed. “L3” in FIG. 13 is associated with display information related to the character string displayed in this line. Here, the last two characters “cotton swab” of this character string are identified by “s12_w16” with reference to FIG. Therefore, the start position is zero, the character length is 2, and “s12_w16, start = 0, len = 2” is entered.

図１１に戻り、指示受付部２１２は、表示処理部２１０により表示されたテキストデータの任意の選択範囲をカーソルにより受け付けるとともに、表示処理部２１０に表示されたテキストデータへの編集を受け付ける。音声再生部２１４は、分割データ記憶部２３６から分割データに含まれる音声データを読み出し、音声を再生する。音声再生部２１４は、時刻が指定されると、当該時刻に対応する音声データを出力する。本実施の形態において、編集処理装置２００のユーザは、表示処理部２１０により表示されたテキストデータを見ながら、対応する音声データを再生させて、音声認識結果が正しいか否かを判断する。音声認識結果に間違い等があった場合、対応する部分を修正して編集する。 Returning to FIG. 11, the instruction receiving unit 212 receives an arbitrary selection range of the text data displayed by the display processing unit 210 with the cursor, and receives an edit to the text data displayed on the display processing unit 210. The audio reproducing unit 214 reads out audio data included in the divided data from the divided data storage unit 236, and reproduces the audio. When the time is designated, the sound reproduction unit 214 outputs sound data corresponding to the time. In the present embodiment, the user of the editing processing apparatus 200 reproduces the corresponding voice data while viewing the text data displayed by the display processing unit 210, and determines whether or not the voice recognition result is correct. If there is an error in the speech recognition result, the corresponding part is corrected and edited.

編集処理部２１８は、指示受付部２１２が表示処理部２１０に表示されたテキストデータへの編集を受け付けると、分割データのテキストデータの対応する単語を書き換える。また、いずれかの単語が削除されると、分割データのテキストデータのその単語に対応する部分をｎｕｌｌ文字列に書き換える。また、ある単語に新たな文字列が入力されると、分割データのテキストデータの対応する箇所にその文字列を挿入する。 When the instruction receiving unit 212 receives an edit to the text data displayed on the display processing unit 210, the editing processing unit 218 rewrites the corresponding word in the text data of the divided data. When any word is deleted, the portion corresponding to the word in the text data of the divided data is rewritten to a null character string. When a new character string is input to a certain word, the character string is inserted into a corresponding position in the text data of the divided data.

次に、図１４から図１６を参照して、画面５００のテキスト表示領域５０２に表示されたテキストデータを編集する手順を説明する。
ユーザがマウス等によりカーソル５２０で３行目の「綿棒」を選択して（図１４）、「メンバー」と入力すると、「綿棒」が「メンバー」に変更される。また、同様に、ユーザがマウス等によりカーソル５２０で５行目の「綿棒」を選択して（図１５）、「メンバー」と入力すると、「綿棒」が「メンバー」に変更される（図１６）。テキスト表示領域５０２に表示されたテキストデータが編集されると、表示処理部２１０の管理テーブルも変化する。Next, a procedure for editing text data displayed in the text display area 502 of the screen 500 will be described with reference to FIGS.
When the user selects “swab” on the third line with the cursor 520 with the mouse or the like (FIG. 14) and inputs “member”, “swab” is changed to “member”. Similarly, when the user selects “cotton swab” on the fifth line with the cursor 520 using a mouse or the like (FIG. 15) and inputs “member”, “swab” is changed to “member” (FIG. 16). ). When the text data displayed in the text display area 502 is edited, the management table of the display processing unit 210 also changes.

図１７は、図１６に示した状態の表示処理部２１０の管理テーブルを示す図である。
ここでは、３行目（Ｌ３）の表示情報は、図１３に示したのと同様であるが、３行目の「綿棒」を「メンバー」に変更したことにより、４行目以降に表示される単語が変更されている。たとえば、４行目（Ｌ４）の最初の単語は、図１３に示した例では「を」を示す「ｓ１２＿ｗ１７，ｓｔａｒｔ＝０，ｌｅｎ＝１」であるが、図１７では、「メンバー」の「バー」を示す「ｓ１２＿ｗ１６，ｓｔａｒｔ＝２，ｌｅｎ＝２」となる。FIG. 17 is a diagram showing a management table of the display processing unit 210 in the state shown in FIG.
Here, the display information of the third line (L3) is the same as that shown in FIG. 13, but is displayed after the fourth line by changing the “cotton swab” of the third line to “member”. The word has been changed. For example, the first word on the fourth line (L4) is “s12_w17, start = 0, len = 1” indicating “O” in the example shown in FIG. 13, but in FIG. “S12_w16, start = 2, len = 2” indicating “bar”.

また、図１６に示した画面５００において、ユーザがたとえばマウスの右ボタンをクリックする等の操作を行うと、ボックス５３０が表示される。ボックス５３０には、保存ボタン５３２が表示される。ここで、ユーザが保存ボタン５３２を選択すると、編集されたデータが編集済データとして編集済データ記憶部２３８に保存される。ここで、ファイル名は、自動的に付されるようにすることもでき、またユーザが入力できるようにすることもできる。 When the user performs an operation such as clicking the right button of the mouse on the screen 500 shown in FIG. 16, a box 530 is displayed. In the box 530, a save button 532 is displayed. Here, when the user selects the save button 532, the edited data is saved in the edited data storage unit 238 as edited data. Here, the file name can be automatically assigned or can be input by the user.

図１８は、編集済データ記憶部２３８に保存された編集済データのテキストデータの一例を示す図である。編集済データは、分割データのテキストデータと同じ形式で生成される。つまり、編集済データのテキストデータは、文番号欄、単語番号欄、話者欄、開始時刻欄、終了時刻欄、音声認識結果欄、および文字数欄を含む。 FIG. 18 is a diagram illustrating an example of text data of edited data stored in the edited data storage unit 238. The edited data is generated in the same format as the text data of the divided data. That is, the text data of the edited data includes a sentence number field, a word number field, a speaker field, a start time field, an end time field, a voice recognition result field, and a character number field.

ここで、「綿棒」を「メンバー」と変更すると、文字数は２から４に増加する。しかし、この単語に対応づけられた時刻情報は変化しない。そのため、図１６に示した画面５００において、メンバーに該当する位置にカーソル５２０をおくと、もともと「綿棒」に対応づけられていたのと同様の音声データが再生される。ある単語を除去してしまうと、その単語は画面５００のテキスト表示領域５０２には表示されなくなる。そのため、削除した単語に対応する時刻情報の音声データは、テキスト表示領域５０２上でカーソル５２０を移動させることによっては再生できなくなる。しかし、音声データは、消去されるのではないため、その単語の前後の単語から連続音声再生等を行うことにより再生することができる。 Here, when the “cotton swab” is changed to “member”, the number of characters increases from 2 to 4. However, the time information associated with this word does not change. Therefore, on the screen 500 shown in FIG. 16, when the cursor 520 is placed at a position corresponding to a member, the same audio data as that originally associated with “swab” is reproduced. If a certain word is removed, the word is not displayed in the text display area 502 of the screen 500. For this reason, the audio data of the time information corresponding to the deleted word cannot be reproduced by moving the cursor 520 on the text display area 502. However, since the audio data is not erased, it can be reproduced by performing continuous audio reproduction or the like from words before and after the word.

編集作業が終了すると、ユーザの指示により、データ取得・送出部２２０は、編集済データを編集管理装置１００の編集済データ記憶部１３８に保存する。 When the editing operation is completed, the data acquisition / transmission unit 220 stores the edited data in the edited data storage unit 138 of the editing management apparatus 100 according to a user instruction.

また、本実施の形態において、編集管理装置１００は、テキストデータに含まれる所定の文字列に対して、つなぎ文字登録をする機能を有する構成とすることができる。ここで、つなぎ文字とは、複数の分割データに重複して含まれるべき共通文字列とすることができる。このようなつなぎ文字を登録しておくことにより、つなぎ文字をキーとして分割データを統合することができ、簡易かつ精度よく統合データを生成することができる。 Further, in the present embodiment, the edit management device 100 can be configured to have a function of registering connected characters for a predetermined character string included in text data. Here, the connecting character can be a common character string that should be included in a plurality of divided data. By registering such connecting characters, the divided data can be integrated using the connecting characters as keys, and integrated data can be generated easily and accurately.

図１９および図２０を参照して、画面４００のテキスト表示領域４０２に表示されたテキストデータにつなぎ文字登録をする手順を説明する。
ユーザがマウス等によりカーソル４２０で２行目の「昨年」を選択して（４２２は選択範囲）、たとえばマウスの右ボタンをクリックする等の操作を行うと、ボックス４３０が表示される。この手順は、図７を参照して説明したのと同様である。ここで、ボックス４３０には、分割データ生成ボタン４３２に加えてさらにつなぎ文字登録ボタン４３４が表示される。ここで、ユーザがつなぎ文字登録ボタン４３４を選択すると、この文字列がつなぎ文字として登録される。With reference to FIG. 19 and FIG. 20, a procedure for registering characters connected to text data displayed in the text display area 402 of the screen 400 will be described.
When the user selects “Last year” on the second line with the cursor 420 using a mouse or the like (422 is a selection range) and performs an operation such as clicking the right button of the mouse, for example, a box 430 is displayed. This procedure is the same as described with reference to FIG. Here, in the box 430, in addition to the divided data generation button 432, a connection character registration button 434 is further displayed. Here, when the user selects the connection character registration button 434, this character string is registered as a connection character.

図２０に示すように、表示処理部１１０は、つなぎ文字を枠４２４で囲む等して強調表示把握可能に表示することができる。編集管理装置１００のユーザが分割データを生成する処理を行う前につなぎ文字を登録しておくことにより、ユーザが画面４００を見て、つなぎ文字を境界として分割データの範囲を選択するようにすることができる。なお、つなぎ文字が登録されている場合、つなぎ文字が複数の分割データに共通に含まれるようにすることができる。図２１にこの例を示す。ここでは、「昨年」がつなぎ文字として登録されている。この場合、それぞれこのつなぎ文字を含む第１の分割データ４５０および第２の分割データ４５２を生成することができる。これにより、第１の分割データ４５０および第２の分割データ４５２それぞれへの編集処理が終了した後に、「昨年」というつなぎ文字をキーとして統合データを生成することができる。 As illustrated in FIG. 20, the display processing unit 110 can display a highlight character by enclosing a connection character with a frame 424 or the like. By registering the connecting characters before the user of the editing management apparatus 100 performs the process of generating the divided data, the user views the screen 400 and selects the range of the divided data using the connecting characters as a boundary. be able to. In the case where a connecting character is registered, the connecting character can be included in a plurality of divided data in common. FIG. 21 shows an example of this. Here, “Last Year” is registered as a connecting character. In this case, it is possible to generate the first divided data 450 and the second divided data 452 each including the connecting character. Thereby, after the editing process for each of the first divided data 450 and the second divided data 452 is completed, the integrated data can be generated using the connecting character “Last Year” as a key.

また、本実施の形態において、編集管理装置１００は、テキストデータの所定の位置に、任意の再生開始位置をマーキングするためのインデックスを付与する機能を有する構成とすることができる。ユーザが表示されたテキストデータの所定の位置にインデックスを付与することにより、その位置から再生可能とすることができる。 In the present embodiment, the editing management apparatus 100 can be configured to have a function of providing an index for marking an arbitrary reproduction start position at a predetermined position of text data. By giving an index to a predetermined position of the displayed text data, the user can reproduce from the position.

図２２を参照して、画面４００のテキスト表示領域４０２に表示されたテキストデータにインデックスを付与する手順を説明する。
ユーザがマウス等により２行目の「昨年」の前にカーソル４２０を移動して、たとえばマウスの右ボタンをクリックする等の操作を行うと、ボックス４３０が表示される。この手順は、図１９を参照して説明したのと同様である。ここで、ボックス４３０には、分割データ生成ボタン４３２およびつなぎ文字登録ボタン４３４に加えてさらにインデックス付与ボタン４３６が表示される。ここで、ユーザがインデックス付与ボタン４３６を選択すると、この位置にインデックスが付与される。
ユーザがつなぎ文字やインデックスを登録した場合、図２３に示すように、音声認識結果記憶部１３４の対応する単語にフラグが付される。With reference to FIG. 22, a procedure for assigning an index to the text data displayed in the text display area 402 of the screen 400 will be described.
When the user moves the cursor 420 before “last year” on the second line with the mouse or the like and performs an operation such as clicking the right button of the mouse, for example, a box 430 is displayed. This procedure is the same as described with reference to FIG. Here, in addition to the divided data generation button 432 and the connecting character registration button 434, an index assignment button 436 is further displayed in the box 430. Here, when the user selects the index assignment button 436, an index is assigned to this position.
When the user registers a connecting character or index, a flag is attached to the corresponding word in the speech recognition result storage unit 134 as shown in FIG.

以上のように、本実施の形態における編集支援システム３００によれば、簡易な操作で、音声認識結果の所望の範囲を選択して、当該範囲に含まれるテキストデータをもとの形式を保ったままで抽出することができる。これにより、音声認識結果の部分的な編集作業を迅速に行うことができる。また、複数の作業者がいる場合は、複数の分割データを準備して、複数の作業者がそれぞれ編集作業をすることができ、複数の作業者で音声認識結果を修正する際の作業効率を向上させることができる。 As described above, according to the editing support system 300 in the present embodiment, a desired range of the speech recognition result can be selected with a simple operation, and the original format of the text data included in the range can be maintained. Can be extracted. Thereby, the partial edit operation | work of a speech recognition result can be performed rapidly. In addition, when there are multiple workers, multiple pieces of divided data are prepared so that multiple workers can edit each of them, improving the work efficiency when correcting the speech recognition results by multiple workers. Can be improved.

なお、図２に示した編集管理装置１００および図１１に示した編集処理装置２００の各構成要素は、ハードウェア単位の構成ではなく、機能単位のブロックを示している。編集管理装置１００および編集処理装置２００の各構成要素は、任意のコンピュータのＣＰＵ、メモリ、メモリにロードされた本図の構成要素を実現するプログラム、そのプログラムを格納するハードディスクなどの記憶ユニット、ネットワーク接続用インターフェイスを中心にハードウェアとソフトウエアの任意の組合せによって実現される。そして、その実現方法、装置にはいろいろな変形例があることは、当業者には理解されるところである。 Note that each component of the edit management device 100 shown in FIG. 2 and the edit processing device 200 shown in FIG. 11 is not a hardware unit configuration but a functional unit block. Each component of the edit management device 100 and the edit processing device 200 includes a CPU, a memory, a program that realizes the components shown in the figure loaded in the memory, a storage unit such as a hard disk that stores the program, and a network. It is realized by an arbitrary combination of hardware and software, centering on the connection interface. It will be understood by those skilled in the art that there are various modifications to the implementation method and apparatus.

たとえば、図２を参照して説明した音声取得部１０２が取得した音声データと、音声認識部１０４が処理した音声認識結果のテキストデータとは、一つのファイルに含めた構成とすることができる。つまり、図４に示した音声認識結果のテキストデータが、音声データに対応づけられ、一つのファイルとして構成することができる。また、図２に示した、音声データ記憶部１３２および音声認識結果記憶部１３４は、機能的に分離して示したものであり、これらは物理的には明確に分離されていなくてもよい。 For example, the voice data acquired by the voice acquisition unit 102 described with reference to FIG. 2 and the text data of the voice recognition result processed by the voice recognition unit 104 can be included in one file. That is, the text data of the speech recognition result shown in FIG. 4 can be associated with the speech data and configured as one file. Further, the voice data storage unit 132 and the voice recognition result storage unit 134 illustrated in FIG. 2 are functionally separated and may not be physically separated clearly.

また、編集管理装置１００および編集処理装置２００は、それぞれ、たとえばパーソナルコンピュータ等の装置１０により構成される。図２４は、編集管理装置１００や編集処理装置２００を構成する装置１０のハードウェア構成を示すブロック図である。
装置１０は、ＣＰＵ１２、メモリ１４、ＨＤＤ（ハードディスク）１６、通信ＩＦ（インターフェイス）１８、ディスプレイ３０、操作部３２、音声出力装置３４、およびこれらを接続するバス４０を含む。The edit management device 100 and the edit processing device 200 are each configured by a device 10 such as a personal computer. FIG. 24 is a block diagram illustrating a hardware configuration of the apparatus 10 that constitutes the edit management apparatus 100 and the edit processing apparatus 200.
The apparatus 10 includes a CPU 12, a memory 14, an HDD (hard disk) 16, a communication IF (interface) 18, a display 30, an operation unit 32, an audio output device 34, and a bus 40 for connecting them.

以上、図面を参照して本発明の実施形態について述べたが、これらは本発明の例示であり、上記以外の様々な構成を採用することもできる。 As mentioned above, although embodiment of this invention was described with reference to drawings, these are the illustrations of this invention, Various structures other than the above are also employable.

以上の実施の形態では、編集処理装置２００が編集管理装置１００にアクセスして分割データを取得する構成を示したが、編集管理装置１００は、分割データを生成すると、適宜編集処理装置２００に分割データを配信して編集依頼をするようにすることもできる。 In the above embodiment, the editing processing apparatus 200 accesses the editing management apparatus 100 and acquires the divided data. However, when the editing management apparatus 100 generates the divided data, the editing processing apparatus 200 divides the editing processing apparatus 200 as appropriate. Data can be distributed to request editing.

また、以上の実施の形態においては、分割データが、テキストデータに対応する部分の音声データを含む構成を示した。これにより、各編集処理装置２００で取得する分割データのデータ量を減らすことができる。しかし、分割データに含まれる音声データは、音声認識結果のテキストデータ全体に対応するものとすることもできる。この場合でも、編集処理装置２００のユーザは、時刻情報に基づき、対応する部分の音声データを再生させるようにすることができる。さらに、分割データは、音声データを含まない構成とすることもできる。この場合、編集処理装置２００のユーザは、編集管理装置１００の音声データ記憶部１３２にアクセスして、時刻情報に基づき、対応する部分の音声データを再生させるようにすることができる。 Moreover, in the above embodiment, the structure which the division | segmentation data included the audio | voice data of the part corresponding to text data was shown. Thereby, the data amount of the divided data acquired by each editing processing apparatus 200 can be reduced. However, the voice data included in the divided data may correspond to the entire text data of the voice recognition result. Even in this case, the user of the editing processing apparatus 200 can reproduce the corresponding portion of the audio data based on the time information. Furthermore, the divided data may be configured not to include audio data. In this case, the user of the editing processing apparatus 200 can access the audio data storage unit 132 of the editing management apparatus 100 and reproduce the corresponding portion of audio data based on the time information.

この出願は、２００９年６月１８日に出願された日本出願特願２００９−１４５５２９号を基礎とする優先権を主張し、その開示の全てをここに取り込む。
以下、参考形態の例を付記する。
１．音声データを時刻情報に対応づけて記憶する音声データ記憶手段と、
前記音声データの音声認識結果のテキストデータを単語単位で時刻情報に対応づけて所定の形式で記憶する音声認識結果記憶手段と、
前記テキストデータを所定の表示領域内に表示するとともに、前記表示領域内に、前記テキストデータを選択するカーソルを表示する第１の表示処理手段と、
前記第１の表示処理手段により表示された前記テキストデータの任意の選択範囲を前記カーソルにより受け付けるとともに、分割データの生成指示を受け付ける指示受付手段と、
前記指示受付手段により受け付けられた前記選択範囲に含まれる前記テキストデータを前記音声認識結果記憶手段から前記所定の形式を保ったままで抽出し、分割データを生成する分割データ生成手段と、
を含む音声認識結果の編集支援システム。
２．１に記載の編集支援システムにおいて、
前記分割データ生成手段は、前記テキストデータを抽出するとともに、前記音声データ記憶手段から、当該テキストデータに対応する音声データを抽出し、
前記分割データは、抽出された前記テキストデータと前記音声データとを含む編集支援システム。
３．１または２に記載の編集支援システムにおいて、
前記第１の表示処理手段は、前記テキストデータを少なくとも単語単位で前記カーソルに対する相対位置情報に対応づけて表示する編集支援システム。
４．１から３いずれかに記載の編集支援システムにおいて、
前記分割データ生成手段は、前記分割データを、前記分割データに対して編集処理を行う装置毎に準備された予め設定された所定のフォルダに保存する編集支援システム。
５．１から４いずれかに記載の編集支援システムにおいて、
前記第１の表示処理手段により表示された前記テキストデータにおいて、前記カーソルで選択された単語に対応づけられた前記時刻情報に基づき、対応する音声データを再生する音声再生手段をさらに含む編集支援システム。
６．１から５いずれかに記載の編集支援システムにおいて、
前記分割データ生成手段は、複数の前記分割データを生成し、
前記複数の分割データの前記テキストデータを、前記時刻情報に基づき、時刻順に並べて統合するデータ統合手段をさらに含む編集支援システム。
７．１から６いずれかに記載の編集支援システムにおいて、
前記分割データ生成手段は、複数の前記分割データを生成し、
前記第１の表示処理手段は、複数の分割データに重複して含まれるべき共通文字列であるつなぎ文字を把握可能に表示する編集支援システム。
８．１から７いずれかに記載の編集支援システムにおいて、
前記分割データを取得するデータ取得手段と、
前記データ取得手段が取得した前記分割データに含まれる前記テキストデータを、所定の表示領域内に表示するとともに、前記表示領域内に、前記テキストデータを選択するカーソルを表示する第２の表示処理手段と、
前記第２の表示処理手段により表示された前記テキストデータへの編集を受け付け、編集済データを生成する編集処理手段と、
をさらに含む編集支援システム。
９．音声データの音声認識結果のテキストデータを単語単位で時刻情報に対応づけて所定の形式で記憶する音声認識結果記憶手段から前記テキストデータを読み出し、前記テキストデータを所定の表示領域内に表示するとともに、前記表示領域内に、前記テキストデータを選択するカーソルを表示する第１の表示ステップと、
前記第１の表示ステップにおいて表示された前記テキストデータの任意の選択範囲を前記カーソルにより受け付けるとともに、分割データの生成指示を受け付けるステップと、
前記選択範囲に含まれる前記テキストデータを前記音声認識結果記憶手段から前記所定の形式を保ったままで抽出し、分割データを生成するステップと、
を含む音声認識結果の編集支援方法。
１０．コンピュータを、
音声データを時刻情報に対応づけて記憶する音声データ記憶手段、
前記音声データの音声認識結果のテキストデータを単語単位で時刻情報に対応づけて所定の形式で記憶する音声認識結果記憶手段、
前記テキストデータを所定の表示領域内に表示するとともに、前記表示領域内に、前記テキストデータを選択するカーソルを表示する第１の表示処理手段、
前記第１の表示処理手段により表示された前記テキストデータの任意の選択範囲を前記カーソルにより受け付けるとともに、分割データの生成指示を受け付ける指示受付手段、
前記指示受付手段により受け付けられた前記選択範囲に含まれる前記テキストデータを前記音声認識結果記憶手段から前記所定の形式を保ったままで抽出し、分割データを生成する分割データ生成手段、
として機能させる音声認識結果の編集支援プログラム。 This application claims the priority on the basis of Japanese application Japanese Patent Application No. 2009-145529 for which it applied on June 18, 2009, and takes in those the indications of all here.
Hereinafter, examples of the reference form will be added.
1. Voice data storage means for storing voice data in association with time information;
Speech recognition result storage means for storing text data of a speech recognition result of the speech data in a predetermined format in association with time information in units of words;
First display processing means for displaying the text data in a predetermined display area and displaying a cursor for selecting the text data in the display area;
Instruction accepting means for accepting an arbitrary selection range of the text data displayed by the first display processing means by the cursor and accepting an instruction for generating divided data;
Division data generation means for extracting the text data included in the selection range received by the instruction reception means while maintaining the predetermined format from the speech recognition result storage means, and generating divided data;
Editing support system for speech recognition results including
2. In the editing support system according to 1,
The divided data generation means extracts the text data and extracts voice data corresponding to the text data from the voice data storage means,
The editing support system, wherein the divided data includes the extracted text data and the voice data.
3. In the editing support system according to 1 or 2,
The first display processing means is an editing support system that displays the text data in association with relative position information with respect to the cursor at least in units of words.
4). In the editing support system according to any one of 1 to 3,
The divided data generation means is an editing support system that stores the divided data in a predetermined folder prepared for each device that performs an editing process on the divided data.
5. In the editing support system according to any one of 1 to 4,
In the text data displayed by the first display processing means, the editing support system further includes an audio reproducing means for reproducing the corresponding audio data based on the time information associated with the word selected by the cursor. .
6). In the editing support system according to any one of 1 to 5,
The divided data generation means generates a plurality of the divided data,
An editing support system further comprising a data integration unit that arranges and integrates the text data of the plurality of divided data in order of time based on the time information.
7). In the editing support system according to any one of 1 to 6,
The divided data generation means generates a plurality of the divided data,
The first display processing means is an editing support system that displays a connected character that is a common character string that should be included in a plurality of divided data so as to be recognized.
In the editing support system according to any one of 8.1 to 7,
Data acquisition means for acquiring the divided data;
Second display processing means for displaying the text data included in the divided data acquired by the data acquisition means in a predetermined display area and displaying a cursor for selecting the text data in the display area. When,
Edit processing means for receiving edits to the text data displayed by the second display processing means and generating edited data;
An editing support system further comprising:
9. The text data is read from the voice recognition result storage means for storing the text data of the voice recognition result of the voice data in a predetermined format in association with time information in units of words, and the text data is displayed in a predetermined display area. A first display step of displaying a cursor for selecting the text data in the display area;
Receiving an arbitrary selection range of the text data displayed in the first display step with the cursor and receiving an instruction to generate divided data;
Extracting the text data included in the selection range from the voice recognition result storage means while maintaining the predetermined format, and generating divided data;
Editing support method for speech recognition results including
10. Computer
Audio data storage means for storing audio data in association with time information;
Voice recognition result storage means for storing text data of a voice recognition result of the voice data in a predetermined format in association with time information in units of words;
First display processing means for displaying the text data in a predetermined display area and displaying a cursor for selecting the text data in the display area;
Instruction accepting means for accepting an arbitrary selection range of the text data displayed by the first display processing means by the cursor and accepting an instruction for generating divided data;
A divided data generating unit that extracts the text data included in the selection range received by the instruction receiving unit while maintaining the predetermined format from the voice recognition result storage unit, and generates divided data;
Editing support program for voice recognition results to function as.

Claims

Voice data storage means for storing voice data in association with time information;
Speech recognition result storage means for storing text data of a speech recognition result of the speech data in a predetermined format in association with time information in units of words;
First display processing means for displaying the text data in a predetermined display area and displaying a cursor for selecting the text data in the display area;
Instruction accepting means for accepting an arbitrary selection range of the text data displayed by the first display processing means by the cursor and accepting an instruction for generating divided data;
Division data generation means for extracting the text data included in the selection range received by the instruction reception means while maintaining the predetermined format from the speech recognition result storage means, and generating divided data;
Editing support system for speech recognition results including

The editing support system according to claim 1,
The divided data generation means extracts the text data and extracts voice data corresponding to the text data from the voice data storage means,
The editing support system, wherein the divided data includes the extracted text data and the voice data.

The editing support system according to claim 1 or 2,
The first display processing means is an editing support system that displays the text data in association with relative position information with respect to the cursor at least in units of words.

The editing support system according to any one of claims 1 to 3,
The divided data generation means is an editing support system that stores the divided data in a predetermined folder prepared for each device that performs an editing process on the divided data.

The editing support system according to any one of claims 1 to 4,
In the text data displayed by the first display processing means, the editing support system further includes an audio reproducing means for reproducing the corresponding audio data based on the time information associated with the word selected by the cursor. .

The editing support system according to any one of claims 1 to 5,
The divided data generation means generates a plurality of the divided data,
An editing support system further comprising a data integration unit that arranges and integrates the text data of the plurality of divided data in order of time based on the time information.

  The editing support system according to any one of claims 1 to 6,
  Means for accepting an input for registering an arbitrary character string in the text data as a common character string;
  Data integration means for integrating a plurality of the divided data;
Further comprising
  The data integration unit is an editing support system that aligns a plurality of pieces of the divided data by using the common character string when the common character string is included in a plurality of pieces of the divided data.

  The editing support system according to claim 7,
  The first display processing means displays the text data so that the registered common character string can be identified,
  The instruction support unit is an editing support system that receives, with the cursor, an arbitrary selection range of the text data in which the common character string is displayed in an identifiable manner by the first display processing unit.

The editing support system according to any one of claims 1 to 8 ,
Data acquisition means for acquiring the divided data;
Second display processing means for displaying the text data included in the divided data acquired by the data acquisition means in a predetermined display area and displaying a cursor for selecting the text data in the display area. When,
Edit processing means for receiving edits to the text data displayed by the second display processing means and generating edited data;
An editing support system further comprising:

The text data is read from the voice recognition result storage means for storing the text data of the voice recognition result of the voice data in a predetermined format in association with time information in units of words, and the text data is displayed in a predetermined display area. A first display step of displaying a cursor for selecting the text data in the display area;
Receiving an arbitrary selection range of the text data displayed in the first display step with the cursor and receiving an instruction to generate divided data;
Extracting the text data included in the selection range from the voice recognition result storage means while maintaining the predetermined format, and generating divided data;
Editing support method for speech recognition results including

Computer
Audio data storage means for storing audio data in association with time information;
Voice recognition result storage means for storing text data of a voice recognition result of the voice data in a predetermined format in association with time information in units of words;
First display processing means for displaying the text data in a predetermined display area and displaying a cursor for selecting the text data in the display area;
Instruction accepting means for accepting an arbitrary selection range of the text data displayed by the first display processing means by the cursor and accepting an instruction for generating divided data;
A divided data generating unit that extracts the text data included in the selection range received by the instruction receiving unit while maintaining the predetermined format from the voice recognition result storage unit, and generates divided data;
Editing support program for voice recognition results to function as.