JP2004226910A

JP2004226910A - Device, method, and program for speech recognition error correction

Info

Publication number: JP2004226910A
Application number: JP2003017623A
Authority: JP
Inventors: Takeshi Mishima; 剛三島; Norifumi Oide; 訓史大出; Atsushi Imai; 篤今井; Toru Tsugi; 徹都木
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2003-01-27
Filing date: 2003-01-27
Publication date: 2004-08-12
Anticipated expiration: 2023-01-27
Also published as: JP3986015B2

Abstract

<P>PROBLEM TO BE SOLVED: To make efficient operation for correcting a speech recognition error of text data as a speech recognition result without taking into consideration the balance between the numbers of finders and correctors. <P>SOLUTION: A speech recognition error correcting device comprises a character data reception part 15, a correction information reception part 17, a correcting terminal operation decision part 19, a character presenting speed variation part 21, a character attribute change part 23, a character attribute information transmission part 25, a display area setting part 27, an unnecessary character insertion part 29, a presentation information transmission part 31, a text data protection part 33, a character string integration part 35, and a screen display information transmission part 39. A speech reproduction part 3 comprises a speech reception part 5, a speech storage part 7, a speech data transmission part 9, and a speech presenting speed variation part 11. Such a speech recognition error correcting device 1 enables a corrector to make corrections alone without partially allotting the correcting operation to the finder and a corrector of a speech recognition error. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、音声認識誤りを修正する音声認識誤り修正装置、音声認識誤り修正方法および音声認識誤り修正プログラムに関するものである。
【０００２】
【従来の技術】
従来、音声認識技術を用いた音声認識誤り修正装置には、音声字幕化装置や文字データ修正装置がある（特許文献１、特許文献２を参照）。
前者は、音声認識結果をリアルタイムで字幕化するために、音声認識誤りを修正する際に、音声認識に供された話者の音声の提示と、音声認識結果であるテキストデータの提示とのタイミングの適正化によって、音声認識誤りの修正作業の効率化を図ったものである。
後者は、音声認識結果をリアルタイムで字幕化するために、テキストデータ出力装置（音声認識装置）から出力されたテキストデータ（音声認識結果）の音声認識誤りを見つけて選択する指摘端末（ポインティング用端末装置）と、この音声認識の誤りを修正して置き換える修正端末（修正用端末装置）とにそれぞれ役割を分担することによって、音声認識結果の誤りの修正作業の効率化を図ったものである。
【０００３】
【特許文献１】
特開２００１−１４２４８２号公報（段落番号００１８〜００４３、
第１図）
【特許文献２】
特開２００１−６０１９２号公報（段落番号００２３〜００４３、第
１図）
【０００４】
【発明が解決しようとする課題】
しかしながら、従来の音声認識誤り修正装置では、音声認識の誤りを発見する発見オペレータ（以下、発見者という）と、その誤りを修正する修正オペレータ（以下、修正者という）とに作業の役割が分担されて音声認識結果の誤りが修正されるため、発見者と修正者の作業内容が異なることから生ずる次のような弊害があった。
【０００５】
１．発見者と修正者の人数の均衡を考慮しなければならず、人員配置などの人事労務管理の点で不都合が生じること。
２．発見者と修正者に対して各々別々に教育訓練をする必要があること。
３．音声認識誤り修正装置を用いて、音声認識結果の誤りを修正する際に、常に発見者と修正者が対で作業を行う必要があるため、同時に、かつ、同一箇所での作業をせざるを得ず、作業効率が良くないこと。
また、作業場所の制約も生じることなどの問題があった。
【０００６】
そこで、本発明はこのような問題を解決するために、発見者と修正者の人数の均衡を考慮することなく、また、別々に教育訓練をする必要がなく作業場所の制約も最小限にして、効率よく誤り修正作業をすることができる音声認識誤り修正装置、その方法およびそのプログラムを提供することを目的としたものである。
【０００７】
【課題を解決するための手段】
本発明は、前記した目的を達成するため、以下に示す構成とした。
請求項１に記載の発明は、音声認識装置から出力された音声認識の対象となった音声データおよび音声認識結果であるテキストデータを受信し、当該テキストデータに含まれている音声認識誤りを、複数の端末により修正する音声認識誤り修正装置であって、前記音声データおよび前記テキストデータを前記端末に提示する提示手段と、この提示手段によって前記端末に提示されたテキストデータに対して、前記端末により指摘されて表わされる修正文字範囲を示す指摘データおよび前記修正文字範囲のテキストデータを修正した修正テキストデータを受信するデータ受信手段と、このデータ受信手段で受信した前記指摘データに基づいて、前記指摘データを送信した前記端末である指摘データ送信端末による修正が完了するまで前記修正文字範囲を、当該指摘データ送信端末以外の端末による指摘から保護するテキストデータ保護手段と、前記データ受信手段で受信した修正テキストデータに基づいて前記テキストデータを修正し、前記端末それぞれに出力する出力手段と、を備えることを特徴とする。
【０００８】
かかる構成によれば、音声認識装置から出力された音声データおよびテキストデータを受信して複数の修正端末に提示される。そして、これら端末から音声認識誤りを指摘した修正文字範囲を示す指摘データおよび修正した修正テキストデータが受信され、テキストデータ保護手段によって、この指摘データを送信した指摘データ送信端末以外の端末は当該指摘データ送信端末による修正が完了するまでの間、音声認識誤りを指摘することができない。つまり、最も先に音声認識誤りを指摘した指摘データ送信端末によって最優先に当該音声認識誤りの修正作業がなされ、指摘データ送信端末から送信された修正テキストデータが受信され、テキストデータが修正される。そして出力手段により最新の修正情報が逐次各端末に送信されて通知され、修正作業が続行される。
なお、端末を使用する修正者が、誤って正しい音声認識結果であるテキストデータを修正文字範囲として指摘してしまった場合には、例えば、削除キーを選択するなどの方法により、当該修正文字範囲の指定操作結果を操作前の状態に戻すことができる。
【０００９】
請求項２に記載の発明は、請求項１または請求項２に記載の音声認識誤り修正装置において、前記複数の端末は、ネットワークを介して前記音声認識誤り修正装置と接続されることを特徴とする。
【００１０】
かかる構成によれば、音声認識誤り修正装置は、ネットワークで接続される各端末により操作される。そのため、各端末の設置場所、すなわち、修正作業場の設定に柔軟に対応することができる。
【００１１】
請求項３に記載の発明は、請求項１または請求項２に記載の音声認識誤り修正装置において、前記提示手段は、前記指摘データに含まれる文字について、表示色、表示の大きさ、文字の種類のうち少なくとも一つを含む文字の属性の変更を行うことによって、前記指摘データ送信端末以外の端末にも当該文字属性の変更を提示する指摘文字属性変更機能と、前記指摘データ送信端末によって指摘データを修正中である場合に、修正中の前記指摘データの文字属性を前記指摘文字属性変更機能で変更した文字属性とは異なる文字属性とする修正文字属性変更機能と、を備えることを特徴とする。
【００１２】
かかる構成によれば、指摘文字属性変更機能によって、音声認識結果であるテキストデータのうち音声認識誤り部分を指摘した文字の属性が変更され、当該音声認識誤りを指摘した指摘データ送信端末以外の端末に対しても、この変更された文字属性のテキストデータが提示される。そして、修正文字属性変更機能によって、修正作業をしている文字の属性が音声認識誤りの文字の指摘時に変更された文字属性以外の文字属性に変更され、修正作業中の端末以外の端末に対しても、この変更された文字属性のテキストデータが提示される。
【００１３】
請求項４に記載の発明は、請求項１または請求項２に記載の音声認識誤り修正装置において、前記提示手段は、前記音声認識結果であるテキストデータに係る品詞のうち少なくとも助詞について、表示色、表示の大きさ、文字の種類のうち少なくとも一つを含む文字の属性を変更する文字品詞属性変更機能を備えることを特徴とする。
【００１４】
かかる構成によれば、文字属性変更機能によって、音声認識結果であるテキストデータの品詞のうち少なくとも助詞について前記した文字属性が変更されるため、特に音声認識誤りの発生頻度の高い助詞についての注意の喚起を促すことができる。
【００１５】
請求項５に記載の発明は、請求項１または請求項２に記載の音声認識誤り修正装置において、前記提示手段は、前記端末に提示している前記音声認識結果であるテキストデータに係る文字列の表示領域の表示幅と背景色との少なくとも一方を任意に設定できる表示領域設定機能を備えることを特徴とする。
【００１６】
かかる構成によれば、表示領域設定機能によって、音声認識結果であるテキストデータの表示領域について、端末の使用者からの要望に基づいて、表示幅と背景色との少なくとも一方を設定（変更）すれば、当該テキストデータの表示領域が見易くなり、表示画面の見易さが向上される。
【００１７】
請求項６に記載の発明は、請求項１または請求項２に記載の音声認識誤り修正装置において、前記提示手段は、前記端末に提示している前記音声認識結果であるテキストデータ中に不要文字を挿入する不要文字挿入機能を備えることを特徴とする。
【００１８】
かかる構成によれば、不要文字挿入機能によって、音声認識結果であるテキストデータ中に不要文字が挿入されるため、端末を使用する修正者は、この不要文字を削除する作業が必要となり、端末に提示されるテキストデータの音声認識誤り率が低く、修正すべきテキストデータが少なく単調作業が継続する場合においても、修正作業に対する集中力の維持がなされる。
【００１９】
請求項７に記載の発明は、音声認識装置から出力された音声認識の対象となった音声データおよび音声認識結果であるテキストデータを受信し、当該テキストデータに含まれている音声認識誤りを、複数の端末により修正する音声認識誤り修正方法であって、前記音声データおよび前記テキストデータを前記端末に提示する提示ステップと、この提示ステップによって前記端末に提示されたテキストデータに対して、前記端末により指摘されて表わされる修正文字範囲を示す指摘データおよび前記修正文字範囲のテキストデータを修正した修正テキストデータを受信するデータ受信ステップと、このデータ受信ステップで受信した前記指摘データに基づいて、前記指摘データを送信した前記端末である指摘データ送信端末による修正が完了するまで前記修正文字範囲を、当該指摘データ送信端末以外の端末による指摘から保護するテキストデータ保護ステップと、前記データ受信ステップで受信した修正テキストデータに基づいて前記テキストデータを修正し、前記端末それぞれに出力する出力ステップと、を含むことを特徴とする。
【００２０】
かかる音声認識誤り修正方法によれば、まず、提示ステップで音声認識装置から出力された音声データおよび音声認識結果であるテキストデータが前記した端末に提示される。続いてデータ受信ステップで、前記した提示ステップによって前記端末に提示されたテキストデータのうち音声認識誤りを前記端末の使用者によって指摘した修正文字範囲を示す指摘データおよび修正した修正テキストデータを受信する。そして、テキストデータ保護ステップで、当該指摘データに基づいて、前記した音声認識誤りのテキストデータを指摘データ送信端末以外の端末による指摘から当該指摘データ送信端末による修正作業が完了するまでの間、保護する。そして、データ受信ステップで受信した修正テキストデータに基づいて、音声認識誤りに係るテキストデータが修正される。つまり、最も先に音声認識誤りを指摘した指摘データ送信端末によって最優先に当該音声認識誤りの修正作業がなされ、指摘データ送信端末から送信された修正テキストデータが受信され、テキストデータが修正される。次に出力ステップで、当該修正されたテキストデータが最新の修正情報として逐次各端末に送信し、修正作業が続行される。
【００２１】
請求項８に記載の発明は、音声認識装置から出力された音声認識の対象となった音声データおよび音声認識結果であるテキストデータを受信し、当該テキストデータに含まれている音声認識誤りを、複数の端末により修正する音声認識誤り修正装置を、前記音声データおよび前記テキストデータを前記端末に提示する提示手段、この提示手段によって前記端末に提示されたテキストデータに対して、前記端末により指摘されて表わされる修正文字範囲を示す指摘データおよび前記修正文字範囲のテキストデータを修正した修正テキストデータを受信するデータ受信手段、このデータ受信手段で受信した前記指摘データに基づいて、前記指摘データを送信した前記端末である指摘データ送信端末による修正が完了するまで前記修正文字範囲を、当該指摘データ送信端末以外の端末による指摘から保護するテキストデータ保護手段、前記データ受信手段で受信した修正テキストデータに基づいて前記テキストデータを修正し、前記端末それぞれに出力する出力手段、として機能させることを特徴とする。
【００２２】
かかる音声認識誤り修正プログラムによれば、音声認識誤り修正装置としての機能を生じさせて、このプログラムの処理手順に従って実行させることができるので、音声認識装置から出力された音声データおよびテキストデータを受信して複数の修正端末に提示され、これら端末を使用する使用者が音声認識誤りを指摘した修正文字範囲を示す指摘データおよび修正した修正データを受信し、テキストデータ保護手段が、この指摘データに基づいて、当該指摘データを送信した端末以外の端末について、当該指摘データ送信端末による修正作業が完了するまでの間、音声認識誤りを指摘することを不可能とする。このことによって、最も先に音声認識誤りを指摘した指摘データ送信端末によって最優先に当該音声認識誤りの修正作業がなされる。そして、受信された指摘データ送信端末から送信された修正テキストデータに基づいて、テキストデータが修正されて最新の修正情報として逐次各端末に送信され、修正作業が続行される。
【００２３】
【発明の実施の形態】
以下、本発明の実施の形態について図面を参照しながら説明する。
（音声認識誤り修正システムの概略）
まず、図１を参照しながら、音声認識誤り修正システムの概略について説明する。
図１は、本発明の一実施形態に係る音声認識誤り修正システムの構成を示す概略図である。
【００２４】
本発明の一実施形態に係る音声認識誤り修正システムは、発声者が発声した音声を音声認識装置２によって音声認識した結果であるテキストデータおよび音声認識の対象となった音声データを、ネットワークを介して接続されている各修正端末４１に提示し、これら修正端末４１によって修正された修正テキストデータを統合する音声認識誤り修正装置１と、当該音声認識結果であるテキストデータから修正文字範囲を指摘して正確な文字に修正する修正端末４１により構成されている。
なお、本実施の形態では、修正端末４１が特許請求の範囲に記載の端末に相当する。
【００２５】
また、本実施の形態においては、音声認識誤り修正装置１の修正対象となるテキストデータを出力する装置として音声認識装置２を例とするが、テキストデータを出力する装置であれば、ワードプロセッサ機能や音声認識機能を搭載したパーソナルコンピュータ等であってもよい。
【００２６】
本実施の形態では、当該テキストデータを音声認識誤り修正装置１に接続されている修正端末４１（４１Ａ、４１Ｂ、４１Ｃ）によって修正テキストデータに修正し、当該修正テキストデータを音声認識誤り修正装置１が受信して音声認識誤り修正装置１上の当該テキストデータを、修正して統合した修正統合テキストデータをリアルタイムで出力する（ここまでが音声認識誤り修正システムの動作）。
そして、この修正統合テキストデータを、字幕送出装置４３を利用して、放送通信網（放送波ＥＷ）を介して、テレビジョン４５またはテレビジョン４５としての機能を搭載したパーソナルコンピュータ等へ文字放送として放送する場合を例として説明する。
【００２７】
以下、この実施の形態では、当該音声認識誤り修正装置１から送信されたテキストデータに含まれる音声認識誤りを指摘した指摘データ（詳細は後記）を当該音声認識誤り修正装置１に送信した、この修正端末４１を指摘データ送信端末４１と表記することとする。
【００２８】
（音声認識誤り修正装置の構成）
次に、図１、図２を参照しながら音声認識誤り修正装置１の構成について説明する。
図２は、本発明の一実施形態に係る音声認識誤り修正装置１の構成を示すブロック構成図である。
音声認識誤り修正装置１は、図１に示すように、テキストデータ修正部１３と、音声再生部３を備え、修正端末４１（４１Ａ、４１Ｂ、４１Ｃ）とネットワークを介して接続されている。
【００２９】
テキストデータ修正部１３は、図２に示すように、文字データ受信部１５、修正情報受信部１７、修正端末動作判定部１９、文字提示速度可変部２１、文字属性変更部２３、文字属性情報送信部２５、表示領域設定部２７、不要文字挿入部２９、提示情報送信部３１、テキストデータ保護部３３、文字列統合部３５、画面表示情報送信部３７および字幕出力構成部３９により構成されている。
【００３０】
なお、本実施の形態では、文字データ受信部１５および修正情報受信部１７が特許請求の範囲に記載のデータ受信手段に相当し、文字提示速度可変部２１および文字属性変更部２３が特許請求の範囲に記載の提示手段に相当し、テキストデータ保護部３３が特許請求の範囲に記載のテキストデータ保護手段に相当し、文字属性情報送信部２５および提示情報送信部３１が特許請求の範囲に記載の出力手段に相当し、表示領域設定部２７が特許請求の範囲に記載の表示領域設定機能に相当し、不要文字挿入部２９が特許請求の範囲に記載の不要文字挿入機能に相当するものである。
【００３１】
文字データ受信部１５は、音声認識装置２（図１参照）の音声認識結果であるテキストデータを受信するものである。
修正情報受信部１７は、修正端末１５から修正テキストＩＤ（Ｉｄｅｎｔｉｆｉｃａｔｉｏｎ）、修正端末ＩＤ、制御情報および音声認識装置２の音声認識誤りを修正端末４１によって正確なテキストデータに修正された修正テキストデータを受信するものである。
【００３２】
なお、「修正テキストＩＤ」とは、修正端末４１が修正作業に入る際に音声認識結果であるテキストデータの誤り部分を指摘した修正文字範囲を識別するための識別記号（指摘データ。後記する）のことである。「修正端末ＩＤ」とは、修正端末１５が修正作業に入る際に音声認識結果であるテキストデータの誤り部分である修正文字範囲を指摘した修正端末４１を識別するための識別記号のことである。「制御情報」とは、修正端末４１から修正情報受信部１７によって受信され、修正端末４１へのテキストデータの提示動作について、その再生速度または停止に関する制御命令の基準となる情報と、修正端末４１への音声データの提示動作について、その再生速度または停止に関する制御命令の基準となる情報とのことである。
【００３３】
修正端末動作判定部１９は、修正情報受信部１７から受信した修正端末ＩＤおよび制御情報を文字提示速度可変部２１（詳細は後記する）へ送出する機能と、音声データの修正端末４１への提示タイミングの基準となる文字提示タイミング情報および制御情報を音声提示速度可変部１１（詳細は後記する）へ送出する機能とを有するものである。また、修正端末４１へのテキストデータまたは音声データの提示を一時停止した場合において、その一時停止における遅延分を判定し、テキストデータまたは音声データの修正端末４１への高速提示命令を行うものである。さらに、全ての修正端末４１が修正作業に入ると、修正端末４１へのテキストデータおよび音声データの提示の一時停止命令を文字提示速度可変部２１（後記する）および音声提示速度可変部１１（後記する）へ送信するものである。
【００３４】
なお、「文字提示タイミング情報」とは、修正文字範囲のテキストデータを正しい文字に修正する作業を支援するための音声データを修正端末４１へ提示するタイミングとなる基準情報のことである。この文字提示タイミング情報によって音声提示速度可変部１１（後記する）は、修正端末４１への音声データの提示のタイミングを制御するものである。
【００３５】
文字提示速度可変部２１は、修正端末動作判定部１９から受信した修正端末ＩＤおよび制御情報に基づいて、テキストデータの修正端末４１への提示動作について、その再生速度または停止に関する制御命令を修正端末ＩＤに係る修正端末４１へ送信するものである。また、全ての修正端末４１が修正作業に着手した場合において、修正端末４１へのテキストデータの提示の一時停止命令を修正端末動作判定部１９から受信して、全ての修正端末４１（４１Ａ、４１Ｂ、４１Ｃ）へ当該命令を送信するものである。
【００３６】
文字属性変更部２３は、指摘文字属性変更部２３ａと、修正文字属性変更部２３ｂと、文字品詞属性変更部２３ｃとを備えている。
なお、本実施の形態では、指摘文字属性変更部２３ａが特許請求の範囲に記載の指摘文字属性変更機能に相当し、修正文字属性変更部２３ｂが特許請求の範囲に記載の修正文字属性変更機能に相当し、文字品詞属性変更部２３ｃが特許請求の範囲に記載の文字品詞属性変更機能に相当する。
【００３７】
指摘文字属性変更部２３ａは、修正端末４１によって音声認識誤りを指摘したテキストデータである修正文字範囲を指摘した際に修正端末４１から修正情報受信部１７へ送信する修正テキストＩＤに係る修正文字範囲の文字に関して、修正端末４１から修正情報受信部１７によって受信した修正テキストデータについて、表示色、表示の大きさ、文字の種類のうち少なくとも一つを含む文字属性の変更を行うものである。また、当該指摘データ（修正テキストＩＤ）を送信した修正端末４１（指摘データ送信端末４１）以外の修正端末４１に、当該修正文字範囲の文字属性が変更された修正テキストデータを提示することによって、当該指摘データに係る指摘状況を当該修正端末４１（指摘データ送信端末４１）以外の修正端末からも確認可能となる。
【００３８】
なお、「指摘データ」とは、指摘データ送信端末４１によって指摘した修正文字範囲または当該指摘データ送信端末４１を示すデータをいい、具体的には、前記した修正テキストＩＤまたは修正端末ＩＤを示す。この修正テキストＩＤを送信した指摘データ送信端末４１は、前記した修正端末ＩＤを修正情報受信部１７へ送信する。
また、文字属性の変更には、例えば、字体を斜体、ボールド（ｂｏｌｄ）、影付け、立体文字、袋文字等に変更したり、文字に網掛け、飾り網点等の模様を付けたり、その他、下線、文字囲み等の文字修飾を含むものである。
【００３９】
修正文字属性変更部２３ｂは、修正文字範囲を修正中の文字について、指摘文字属性変更部２３ａで修正文字範囲の文字属性を変更した文字属性以外の文字属性に変更するものである。また、当該修正文字範囲を修正中の端末４１（修正文字範囲のテキストデータを修正中の指摘データ送信端末４１）以外の修正端末４１に当該修正文字範囲の修正中の文字の文字属性が変更された修正テキストデータが提示されることによって、当該指摘データに係る修正状況を修正中の端末４１以外の修正端末４１からも確認可能となる。
なお、文字属性の変更については、指摘文字属性変更部２３ａでの説明と同様なので、その説明を省略する。
【００４０】
文字品詞属性変更部２３ｃは、文字データ受信部１５から送出されたテキストデータ中の文字列の品詞のうち少なくとも助詞について、図４に示すように、文字の表示色、表示の大きさ、文字の種類のうち少なくとも一つを含む文字属性を変更して修正端末４１（指摘データ送信端末４１）から修正情報受信部１７よって受信した修正テキストデータを変更するものである。特に助詞は発現頻度が高く、修正者による音声認識誤りの発見を看過することが非常に多いため、助詞の文字属性を変化させ、修正者に特に助詞の音声認識誤りの発見について注意の喚起を行うものである。助詞の文字表示を大きくする文字属性の変更は、ポインティングデバイスとしてタッチパネルを使用する場合に、助詞の音声認識誤りの発見の精度を高め、また、タッチパネル上で指摘（選択）し易くなるため特に有効である。
【００４１】
なお、図示していない制御部によって、テキストデータ中の文字列の品詞のうち少なくとも助詞について、予め文字属性の変更後に修正端末４１へテキストデータを送信するように初期設定することもできる。
なお、文字属性の変更については、指摘文字属性変更部２３ａでの説明と同様なので、その説明を省略する。
【００４２】
図２に戻って説明を続ける。
文字属性情報送信部２５は、前記した指摘文字属性変更部２３ａ、修正文字属性変更部２３ｂ、文字品詞属性変更部２３ｃによって、変更された文字属性の情報である文字属性情報を各修正端末４１へ送信するものである。
【００４３】
表示領域設定部２７は、表示画面の見易さを向上させるため、図５の（ａ）に示すように、修正端末４１に提示している音声認識結果であるテキストデータの表示領域ＡＲの表示幅と背景色との少なくとも一方を任意に設定可能とするものである。
【００４４】
再び図２に戻って説明を続ける。
不要文字挿入部２９は、修正端末４１に提示している音声認識結果であるテキストデータ中に不要文字を挿入するものである。図５の（ｂ）では、不要文字としてアスタリスクマーク「＊」をテキストデータ中に挿入した例を示している。
音声認識率が高く、音声認識誤りが少ない場合、修正作業は音声認識誤りを発見するための単調な作業となる。その結果、修正者の修正作業に対する集中力の低下、修正作業の精度の低下を招く。そこでそれを防止するために、音声認識結果であるテキストデータに不要文字を挿入し、修正者に当該不要文字の削除作業をしてもらうことによって集中力を持続させるものである。
【００４５】
再び図２に戻って説明を続ける。
提示情報送信部３１は、表示領域設定部２７による表示領域設定情報、不要文字挿入部２９による不要文字挿入情報を修正端末４１へ送信するものである。
テキストデータ保護部３３は、修正ガード端末判定部３３ａ、修正ガードテキスト判定部３３ｂを備えている。
【００４６】
テキストデータ保護部３３は、修正情報受信部１７で受信した指摘データ（修正テキストＩＤ、修正端末ＩＤ）に基づいて、音声認識誤りのテキストデータを、当該指摘データを送信した指摘データ送信端末４１以外の修正端末４１による指摘から当該指摘データ送信端末４１による修正作業が完了するまでの間、保護するものである。
【００４７】
図３を参照しながら、テキストデータ保護部３３によるテキストデータの保護がどのように実現されるかについて、その一例を説明する。図３には、音声認識結果のテキストデータである「当時多発テロ事件」のテキストデータが、修正端末４１Ａ、修正端末４１Ｂおよび修正端末４１Ｃに提示されている。この場合、テキストデータ「当時」の部分（修正文字範囲）が音声認識誤りに相当するが、修正端末４１Ａの修正者が最も早くこの音声認識誤りを発見して指摘したときは、他の修正端末４１の修正者、すなわち、修正端末４１Ｂの修正者および修正端末４１Ｃの修正者が、修正文字範囲「当時」の部分と同一箇所（「当時」）については、指摘することができないようにガードされる。
【００４８】
この際、修正端末４１Ｂの修正者および修正端末４１Ｃの修正者は、同一箇所以外の部分については指摘可能である。また、同一箇所であっても修正端末４１Ａの修正者が当該修正作業を終了した後は、修正端末４１Ｂの修正者および修正端末４１Ｃの修正者は、当該同一箇所であった箇所（「当時」）について指摘し、修正作業を行うことができるものである。
【００４９】
次に、再び図２に戻って、テキストデータ保護部３３の構成要素について説明する。
修正ガード端末判定部３３ａは、修正情報受信部１７から送出された修正端末ＩＤ（修正端末４１が修正作業に入る際に音声認識結果であるテキストデータの誤り部分である修正文字範囲を指摘した修正端末４１を識別するための識別記号）を受信して、修正文字範囲（「当時」）を指摘した修正端末４１（修正端末４１Ａ）を識別し、その識別した情報を修正ガード端末情報として、修正端末４１（修正端末４１Ｂおよび修正端末４１Ｃ）へ送信するものである。
【００５０】
修正ガードテキスト判定部３３ｂは、修正情報受信部１７から送出された修正テキストＩＤ（修正端末１５が修正作業に入る際に音声認識結果であるテキストデータの誤り部分を指摘した修正文字範囲を識別するための識別記号）を受信して、修正端末４１（修正端末４１Ａ）によって指摘された修正文字範囲（「当時」）を識別し、その識別した情報を修正ガードテキスト情報として、修正端末４１（修正端末４１Ｂおよび修正端末４１Ｃ）へ送信するものである。
【００５１】
続いて文字列統合部３５に関して説明する。
文字列統合部３５は、修正情報受信部１７によって受信した各修正端末４１による修正テキストデータを修正テキストＩＤに基づいて、１つの統一された文章に統合するものである。この文字列統合部３５によって統合された修正テキスト統合データは、逐次各修正端末４１にフィードバックされ、常に最新の修正情報（修正統合テキストデータ）が反映される。また、この修正統合テキストデータは、字幕送出装置４３（図１参照）によって、字幕放送用のデータとして利用される。
【００５２】
画面表示情報送信部３７は、文字列統合部３５によって、修正テキストデータが修正テキストＩＤに基づいて、１つの文章に統合された修正統合テキストデータを各修正端末４１（４１Ａ、４１Ｂ、４１Ｃ）へ送信するものである。
【００５３】
字幕出力構成部３９は、文字列統合部３５によって、修正テキストデータが修正テキストＩＤに基づいて１つの文章に統合された修正統合テキストデータを文字放送として利用するために、修正統合テキストデータを字幕出力データとして、字幕送出装置４３（図１参照）へ送信するものである。
【００５４】
次に、音声再生部３について説明する。
音声再生部３は、音声受信部５、音声蓄積部７、音声データ送信部９、音声提示速度可変部１１から構成されている。
音声再生部３は、音声認識の対象となった音声データを再生するものである。
この音声データが文字提示タイミング情報に同期してネットワークを介して修正端末４１に送信され、修正端末４１に接続されているヘッドフォンで出力された音声と、音声認識結果であるテキストデータとを各修正端末４１（４１Ａ、４１Ｂ、４１Ｃ）の修正者が比較照合することによって、修正者の修正作業が支援される。
【００５５】
音声受信部５は、音声認識の対象となった音声データを音声認識装置２（図１参照）から受信するものである。
音声蓄積部７は、音声受信部５で受信した音声認識の対象となった音声データを各修正端末４１へ提示するために一時的にまたは長期保存用に蓄積するものである。
一般に、一時記憶用には半導体メモリを利用した主記憶装置（メインメモリ）が利用され、長期保存用にはハードディスク、フレキシブルディスク、ＤＡＴ（ＤｉｇｉｔａｌＡｕｄｉｏＴａｐｅｒｅｃｏｒｄｅｒ）などの外部記憶装置（補助記憶装置）が利用されている。
【００５６】
音声データ送信部９は、音声蓄積部７から音声データを受信して各修正端末４１に提示するために各修正端末４１へ送信するものである。
【００５７】
音声提示速度可変部１１は、音声認識装置２（図１参照）による音声認識の対象となった音声データの修正端末４１への提示動作について、修正端末動作判定部１９から受信した文字提示タイミング情報および制御情報に基づいて、その再生速度または停止に関する制御命令を修正端末４１へ送信するものである。また、全ての修正端末４１が修正作業に着手した場合において、修正端末４１への音声データの提示の一時停止命令を修正端末動作判定部１９から受信して、全ての修正端末４１（４１Ａ、４１Ｂ、４１Ｃ）へ当該命令を、音声データ送信部９を介して送信するものである。
【００５８】
したがって、音声提示速度可変部１１は、音声認識装置２（図１参照）による音声認識の対象となった音声データを音声認識装置２（図１参照）の音声認識結果であるテキストデータと同時に（テキストデータの修正端末４１（４１Ａ、４１Ｂ、４１Ｃ）への提示タイミングと同期して）各修正端末４１に提示させることもできるし、また、文字提示タイミング情報に同期して修正端末４１への音声データの提示に関して、一時停止させて遅く提示させることや、この一時停止による遅延を回復するための高速提示、繰り返し当該音声データの提示を行うリプレイ動作を行わせることもできるものである。
【００５９】
続いて、図１に戻って修正端末４１について説明する。
修正端末４１（４１Ａ、４１Ｂ、４１Ｃ）は、音声認識誤り修正装置１を使用するための装置であり、ディスプレイ、キーボード、修正文字範囲を指定等するためのタッチパネル、音声データを出力するためのヘッドフォン、音声データの再生について、その高速化または一時停止若しくは再開の切替え信号を図示していない制御部に送出して、音声データの再生動作をコントロールするための足入力インターフェース（フットペダル）等を備え、一般にパーソナルコンピュータが利用されている。
【００６０】
なお、フットペダルは、音声データの再生停止を制御する足踏スイッチとＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）またはＲＳ２３２Ｃ等のインターフェースを備えており、これらのインターフェースを介してパーソナルコンピュータに接続している。このフットペダルは、足踏スイッチ（ペダル）の「踏込」または「放す」操作によって、スイッチＯＮ／ＯＦＦの切替操作を行うもので、記録テープに録音されている記録テープ内容（音声データ）の再生動作の制御をする際に利用されるものである。
【００６１】
また、音声認識誤り修正装置１は、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）またはＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）接続されており、修正端末４１の設置場所に制約されず、任意にシステム設計が行えるものである。よって、遠隔地間での音声認識誤り修正作業も可能である。
【００６２】
また、ネットワークについては、前記したようにＬＡＮ、ＷＡＮなどその形態を問わないが、ネットワークケーブル、無線、赤外線等、その方式も問わない。
しかし、通信パケット漏れなどの安全面や高速処理の観点からネットワークケーブルを使用することが好ましい。
さらにまた、音声認識誤り修正装置１によって、修正者（修正端末４１）を複数設定でき、かつ、修正文字範囲の指摘および修正作業を修正者１人単位で行うことができるものである。
【００６３】
（音声認識誤り修正システムの動作）
次に、図６（適宜図２参照）のシーケンシャルチャートを参照しながら音声認識誤り修正システムの動作の一例について説明する。
まず、発話者の発声により生じた音声を音声認識装置２（図１参照）が音声認識を行い、その音声認識の結果であるテキストデータをテキストデータ修正部１３へ送信し、音声認識の対象となった音声データを音声再生部３へ送信する（Ａ１）。
【００６４】
なお、ここでは、発話者が「こんにちは、お昼のニュースです。」と発声し、この音声について、音声認識装置２が誤った音声認識結果として「こんにちは、御ひるのニュースです」のテキストデータを音声認識誤り修正装置１（テキストデータ修正部１３および音声再生部３）へ送信する場合を例に説明する。
【００６５】
次に、当該音声データを受信した音声認識誤り修正装置１のテキストデータ修正部１３は、当該テキストデータ「こんにちは、御ひるのニュースです」を修正端末４１Ａ、修正端末４１Ｂおよび修正端末４１Ｃへ送信すると同時に、音声再生部３へ文字提示タイミング情報および制御情報を送出する（Ｂ１）。これを受けた音声再生部３は、音声認識の対象となった音声データ「こんにちは、お昼のニュースです」を当該文字提示タイミング情報および制御情報に基づいて（例えば、音声認識の結果であるテキストデータの修正端末４１Ａ、修正端末４１Ｂおよび修正端末４１Ｃへの提示と同時に）、修正端末４１Ａ、修正端末４１Ｂおよび修正端末４１Ｃへ送信する（Ｂ２）。
【００６６】
当該テキストデータ「こんにちは、御ひるのニュースです」および当該音声データ「こんにちは、お昼のニュースです」を受信した各修正端末４１（４１Ａ、４１Ｂ、４１Ｃ）は、修正端末４１の修正者によって、音声認識の誤り部分の文字「御ひる」（修正文字範囲）がタッチパネルをタッチすることにより指摘され、キーボードで正確な文字「お昼」への修正作業が行われる。
【００６７】
このとき、例えば、修正端末４１Ａの修正者が最も早く音声認識誤り部分の文字「御ひる」を指摘した場合（Ｃ１）、指摘データ（修正テキストＩＤ、修正端末ＩＤ）が、修正端末４１Ａからテキストデータ修正部１３へ送信され（Ｃ２）、これを受信したテキストデータ修正部１３は、修正端末４１Ｂおよび修正端末４１Ｃへ修正ガード端末情報および修正ガードテキスト情報を送信する（Ｂ３）。そして、これを受信した修正端末４１Ｂおよび修正端末４１Ｃは、修正端末４１Ａの修正作業中は修正作業を行うことができず、修正端末４１Ａのみが優先的に修正作業が可能となる。
【００６８】
なお、修正端末４１Ａの修正者が修正作業を行う際に、音声再生部３から送出された音声データ「こんにちは、お昼のニュースです」を修正者がヘッドフォンで聴取し、修正端末１５Ａに提示されているテキストデータ「こんにちは、御ひるのニュースです」と比較照合することによって修正精度を高め、修正作業が行われる。
【００６９】
そして、修正端末４１Ａは、修正結果である修正テキストデータ「お昼」をテキストデータ修正部１３へ送信し（Ｃ３）、これを受信したテキストデータ修正部１３は、修正テキストデータ「お昼」を１つの文章に統合した修正統合テキストデータ「こんにちは、お昼のニュースです」を作成し、当該修正統合テキストデータが逐次、修正端末４１Ａ、修正端末４１Ｂおよび修正端末４１Ｃへ送信され最新の修正情報が全ての端末（本実施の形態の一例としては修正端末４１Ａ、修正端末４１Ｂおよび修正端末４１Ｃ）に反映され、修正作業が行われる（Ｂ４）。
【００７０】
次に、修正テキストデータの有無が判断され（Ｂ５）、修正テキストデータがある場合（Ｂ５、ＹＥＳ）は、当該修正テキストデータを修正統合テキストデータに統合して当該修正統合テキストデータが正しいテキストデータとして確定される（Ｂ６）。そして、テキストデータ修正部１３から字幕送出装置４３へ当該修正統合テキストデータ「こんにちは、お昼のニュースです」が送信される（Ｂ７）。
【００７１】
修正テキストデータがない場合（Ｂ５、ＮＯ）は、例えば、各修正端末４１へテキストデータを送信後、１分間を経過しているか否かを判断（Ｂ８）し、経過している場合（Ｂ８、ＹＥＳ）はＢ６へすすみ修正統合テキストデータが確定され、経過していない場合（Ｂ８、ＮＯ）はＢ５へ戻る。
なお、このフローチャートに示していないが、音声認識結果であるテキストデータの提示がある間、音声認識誤り修正システムとして、Ａ１からＢ７までのステップが繰り返され、このテキストデータがなくなった時点で音声認識誤り修正システムの動作が終了する。
【００７２】
また、本実施の形態の一例として、Ｂ７で送信された修正統合テキストデータを受信した字幕送出装置４３から当該修正統合テキストデータをテレビジョン４５へ放送波ＥＷ（図１参照）を介して送信され（Ｄ１）、テレビジョン４５によって、「こんにちは、お昼のニュースです」の字幕付きの文字放送として受信される（Ｅ１）場合の一例をこのフローチャートに図示（図６の左下に示す破線より内側部分）しておく。
【００７３】
なお、Ｂ６における修正統合テキストデータの確定にあたっては、テキストデータ修正部１３が、最も遅く修正テキストデータを送信した修正端末４１から修正テキストデータを受信した後に修正テキストデータを統合し、この修正統合テキストデータを作成した時点をもって確定することもできる。
【００７４】
以上、本発明の実施の形態について説明したが、本発明は、前記した実施の形態に限定されることなく、様々な形態で実施可能である。
また、音声認識誤り修正方法と、このような音声認識誤り修正方法を音声認識誤り修正装置１に実現させる音声認識誤り修正プログラムと、音声認識誤り修正プログラムを記録した記録媒体も本発明の対象とするものである。
【００７５】
【発明の効果】
以上説明したように、本発明によれば以下の効果を奏する。
請求項１、７、８に記載の発明によれば、音声認識誤り修正装置に接続されている端末から音声認識誤りを指摘した修正文字範囲を示す指摘データおよび修正した修正テキストデータを受信する。そして、テキストデータ保護手段によって、この指摘データを送信した指摘データ送信端末以外の端末については、当該指摘データ送信端末による修正が完了するまでの間、音声認識誤りを指摘して修正作業をすることができないようにすることができる。
【００７６】
そのため、音声認識誤り部分である修正文字範囲の発見者と修正者とに修正作業を分担することなく、修正者一人単位で修正作業を行うことができる。そのため、修正作業の効率化を図ることができる。また、人員配置や人員確保など人事労務管理が容易となり、教育講習についても修正者単位でできるため、効率的に講習を行うことができる。そして省力化したシステム設計を行うことができる。
また、音声認識誤り部分の文字を最初に指摘した修正者（端末）に対して、優先的に修正作業を可能とすることにより、修正作業の効率化を実現することができる。
【００７７】
請求項２に記載の発明によれば、音声認識誤り修正装置と端末間をネットワークで接続することにより、作業場所について柔軟に対応することが可能となり、簡素化したシステム設計をすることができる。そのため、効率化した音声認識誤り修正システムの構築が実現できる。
【００７８】
請求項３および請求項４に記載の発明によれば、音声認識誤り部分（修正文字範囲）の文字について、その文字属性を変更して各端末に提示することにより、指摘作業または修正作業状況の明確化を図ることができる。そのため、修正作業を効率化することができる。
【００７９】
請求項５に記載の発明によれば、音声認識結果であるテキストデータの表示領域について、端末の使用者からの要望に基づいて、表示幅と背景色との少なくとも一方を設定することができるので、当該テキストデータの表示領域が見易くなる。そのため、表示画面の見易さを向上させることができる。
【００８０】
請求項６に記載の発明によれば、音声認識結果であるテキストデータ中に不要文字を挿入し、当該不要文字を修正者に削除してもらうことによって、音声認識誤りが少なく単調作業の傾向が強い場合においても、修正者の集中力を持続させることができる。そのため修正作業を高精度に維持することができる。
【図面の簡単な説明】
【図１】本発明の一実施形態に係る音声認識誤り修正システムの構成を示す概略図である。
【図２】本発明の一実施形態に係る音声認識誤り修正装置の構成を示すブロック構成図である。
【図３】本発明の一実施形態に係るテキストデータ修正部の誤りテキストデータ保護部の動作の一例を説明するための図である。
【図４】本発明の一実施形態に係るテキストデータ修正部の文字属性変更部の動作の一例を説明するための図である。
【図５】（ａ）本発明の一実施形態に係るテキストデータ修正部の不要文字挿入部の動作の一例を説明するための図である。
（ｂ）本発明の一実施形態に係るテキストデータ修正部の表示領域設定部の動作の一例を説明するための図である。
【図６】本発明の一実施形態に係る音声認識誤り修正装置の動作の一例を説明するためのシーケンシャルチャートである。
【符号の説明】
１音声認識誤り修正装置
２音声認識装置
３音声再生部
５音声受信部
７音声蓄積部
９音声データ送信部
１１音声提示速度可変部
１３テキストデータ修正部
１５文字データ受信部（データ受信手段）
１７修正情報受信部（データ受信手段）
１９修正端末動作判定部
２１文字提示速度可変部（提示手段）
２３文字属性変更部（提示手段）
２３ａ指定文字属性変更部（指摘文字属性変更機能）
２３ｂ修正文字属性変更部（修正文字属性変更機能）
２３ｃ文字品詞属性変更部（文字品詞属性変更機能）
２５文字属性情報送信部（出力手段）
２７表示領域設定部（表示領域設定機能）
２９不要文字挿入部（不要文字挿入機能）
３１提示情報送信部（出力手段）
３３テキストデータ保護部（テキストデータ保護手段）
３３ａ修正ガード端末判定部
３３ｂ修正ガードテキスト判定部
３５文字列統合部
３７画面表示情報送信部（出力手段）
３９字幕出力構成部（出力手段）
４１（４１Ａ〜４１Ｃ）修正端末、指摘データ送信端末（端末）
４３字幕送出装置
４５テレビジョン
ＡＲテキストデータの表示領域
ＡＦ修正文字入力枠[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a speech recognition error correction device for correcting a speech recognition error, a speech recognition error correction method, and a speech recognition error correction program.
[0002]
[Prior art]
2. Description of the Related Art Conventionally, a speech recognition error correction device using a speech recognition technology includes a voice captioning device and a character data correction device (see Patent Documents 1 and 2).
In the former, when correcting a speech recognition error in order to convert the speech recognition result into subtitles in real time, the timing between the presentation of the speaker's speech used for speech recognition and the presentation of text data as the speech recognition result In this case, the efficiency of the operation of correcting the speech recognition error is improved by optimizing the speech recognition.
The latter is a pointing device (pointing terminal) that finds and selects a speech recognition error in text data (speech recognition result) output from a text data output device (speech recognition device) in order to convert the speech recognition result into subtitles in real time. The device and the correction terminal (correction terminal device) that corrects and replaces the error in the speech recognition are assigned roles, thereby improving the efficiency of the operation of correcting the error in the speech recognition result.
[0003]
[Patent Document 1]
JP 2001-142482 A (paragraphs 0018 to 0043,
(Fig. 1)
[Patent Document 2]
JP 2001-60192 A (paragraph numbers 0023 to 0043,
(Fig. 1)
[0004]
[Problems to be solved by the invention]
However, in the conventional speech recognition error correction device, the role of work is shared between a discovery operator (hereinafter, referred to as a discoverer) for finding an error in speech recognition and a correction operator (hereinafter, referred to as a corrector) for correcting the error. As a result, the error in the speech recognition result is corrected, so that the following disadvantages arise from the difference in the work contents of the discoverer and the corrector.
[0005]
1. Consideration must be given to the balance between the number of discoverers and the number of correctors, causing inconvenience in personnel management such as staffing.
2. Need to provide separate training for discoverers and correctors.
3. When using the speech recognition error correction device to correct errors in speech recognition results, the discoverer and the corrector must always work in pairs, so they must work simultaneously and in the same place. Not work efficiency.
In addition, there is a problem that the work place is restricted.
[0006]
Therefore, the present invention solves such a problem without considering the balance between the number of discoverers and correctors, and without the need for separate education and training and minimizing the restrictions on the work place. It is an object of the present invention to provide a speech recognition error correction device, a method thereof, and a program capable of performing an error correction operation efficiently.
[0007]
[Means for Solving the Problems]
The present invention has the following configuration to achieve the above object.
The invention according to claim 1 receives the speech data output from the speech recognition device as the speech recognition target and the text data as the speech recognition result, and detects a speech recognition error included in the text data. A speech recognition error correction device for correcting by a plurality of terminals, a presentation unit for presenting the voice data and the text data to the terminal, and a text data presented to the terminal by the presentation unit. Data receiving means for receiving the indicated data indicating the corrected character range indicated and represented by the corrected text data obtained by correcting the text data of the corrected character range, and based on the indicated data received by the data receiving means, The correction sentence until the correction by the pointing data transmission terminal which is the terminal that transmitted the pointing data is completed. Text data protection means for protecting the range from indications by a terminal other than the indication data transmission terminal; and output means for correcting the text data based on the corrected text data received by the data reception means and outputting the text data to each of the terminals. And the following.
[0008]
According to this configuration, the voice data and the text data output from the voice recognition device are received and presented to the plurality of correction terminals. Then, from these terminals, the indication data indicating the corrected character range indicating the speech recognition error and the corrected text data corrected are received, and the text data protection unit transmits the indicated data to the terminal other than the indicated data transmitting terminal that transmitted the indicated data. Until the correction by the data transmitting terminal is completed, a speech recognition error cannot be pointed out. In other words, the pointing data transmitting terminal that has pointed out the voice recognition error first corrects the voice recognition error with the highest priority, receives the corrected text data transmitted from the pointing data transmitting terminal, and corrects the text data. . Then, the latest correction information is sequentially transmitted to each terminal and notified by the output means, and the correction work is continued.
If the corrector using the terminal incorrectly points out the text data that is the correct speech recognition result as the corrected character range, the correct character range may be selected by, for example, selecting a delete key. Can be returned to the state before the operation.
[0009]
The invention according to claim 2 is the speech recognition error correction device according to claim 1 or 2, wherein the plurality of terminals are connected to the speech recognition error correction device via a network. I do.
[0010]
According to this configuration, the speech recognition error correction device is operated by each terminal connected via the network. Therefore, it is possible to flexibly cope with the setting place of each terminal, that is, the setting of the correction work place.
[0011]
According to a third aspect of the present invention, in the speech recognition error correction device according to the first or second aspect, the presenting unit includes a display color, a display size, a character By changing the attribute of a character including at least one of the types, a pointing character attribute changing function for presenting the change of the character attribute to a terminal other than the pointing data transmitting terminal; When the data is being modified, a modified character attribute change function that sets a character attribute of the indicated data being modified to a character attribute different from the character attribute changed by the indicated character attribute change function, I do.
[0012]
According to this configuration, the attribute of the character indicating the speech recognition error portion in the text data as the speech recognition result is changed by the indicated character attribute changing function, and the terminal other than the indicated data transmission terminal indicating the speech recognition error is changed. , The text data of the changed character attribute is presented. Then, the corrected character attribute change function changes the attribute of the character being corrected to a character attribute other than the character attribute changed at the time of pointing out the character of speech recognition error, However, the text data of the changed character attribute is presented.
[0013]
According to a fourth aspect of the present invention, in the speech recognition error correction device according to the first or second aspect, the presenting means is configured to display color of at least particles of a part of speech related to the text data as the speech recognition result. , A character part of speech attribute changing function for changing an attribute of a character including at least one of a display size and a character type.
[0014]
According to such a configuration, the character attribute change function changes the character attribute of at least the particles of the part of speech of the text data as the speech recognition result. Arousing can be encouraged.
[0015]
According to a fifth aspect of the present invention, in the speech recognition error correction device according to the first or second aspect, the presenting unit includes a character string relating to text data as the speech recognition result presented to the terminal. A display area setting function for arbitrarily setting at least one of the display width of the display area and the background color.
[0016]
According to this configuration, the display area setting function sets (changes) at least one of the display width and the background color in the display area of the text data as the speech recognition result based on a request from the user of the terminal. If this is the case, the display area of the text data becomes easier to see, and the viewability of the display screen is improved.
[0017]
According to a sixth aspect of the present invention, in the speech recognition error correction device according to the first or second aspect, the presenting unit includes an unnecessary character in text data as the speech recognition result presented to the terminal. It is characterized by having an unnecessary character insertion function for inserting a.
[0018]
According to this configuration, the unnecessary character insertion function inserts unnecessary characters into the text data that is the result of speech recognition, so that the corrector using the terminal needs to delete the unnecessary characters, Even when the speech recognition error rate of the presented text data is low, the text data to be corrected is small, and the monotonous work continues, the concentration of the correction work can be maintained.
[0019]
The invention according to claim 7 is configured to receive the speech data output from the speech recognition device as the speech recognition target and the text data as the speech recognition result, and to recognize the speech recognition error included in the text data. A method for correcting a speech recognition error by a plurality of terminals, the method comprising: presenting the voice data and the text data to the terminal; and providing the terminal with text data presented to the terminal by the presenting step. A data reception step of receiving corrected text data obtained by correcting the text data of the corrected character range indicated by the corrected character range indicated and represented by, and based on the specified data received in the data receiving step, Correction by the pointing data transmission terminal, which is the terminal that transmitted the pointing data, is completed. Up to the modified character range, a text data protection step to protect from indications by a terminal other than the indication data transmission terminal, and the text data is modified based on the modified text data received in the data receiving step, and each of the terminals And an output step of outputting.
[0020]
According to the speech recognition error correction method, first, the speech data output from the speech recognition device and the text data as the speech recognition result in the presentation step are presented to the terminal. Subsequently, in the data receiving step, of the text data presented to the terminal by the presentation step, the pointing data indicating the corrected character range in which the user of the terminal has pointed out a speech recognition error and the corrected text data corrected are received. . Then, in the text data protection step, based on the pointed data, the text data of the speech recognition error is protected from the pointed out by a terminal other than the pointed out data transmitting terminal until the correction work by the pointed out data transmitting terminal is completed. I do. Then, the text data relating to the speech recognition error is corrected based on the corrected text data received in the data receiving step. In other words, the pointing data transmitting terminal that has pointed out the voice recognition error first corrects the voice recognition error with the highest priority, receives the corrected text data transmitted from the pointing data transmitting terminal, and corrects the text data. . Next, in the output step, the corrected text data is sequentially transmitted to each terminal as the latest correction information, and the correction work is continued.
[0021]
The invention according to claim 8 is configured to receive the speech data output from the speech recognition device and the text data as the speech recognition result, and to recognize the speech recognition error included in the text data. A speech recognition error correction device to be corrected by a plurality of terminals, presentation means for presenting the speech data and the text data to the terminal, and text data presented to the terminal by the presentation means, indicated by the terminal. Data receiving means for receiving indication data indicating a corrected character range represented by text data and corrected text data obtained by correcting text data in the corrected character range, and transmitting the indication data based on the indication data received by the data receiving means. Until the correction by the indicated data transmission terminal is completed, the corrected character range, The text data protecting means protects from the indication by a terminal other than the indication data transmitting terminal, and the output means for correcting the text data based on the corrected text data received by the data receiving means and outputting the corrected text data to each of the terminals. It is characterized by the following.
[0022]
According to such a speech recognition error correction program, a function as a speech recognition error correction device can be generated and executed according to the processing procedure of the program, so that the speech data and text data output from the speech recognition device can be received. The modified data is presented to the plurality of correction terminals, and the user of these terminals receives the indicated data indicating the corrected character range indicating the voice recognition error and the corrected data. On the basis of this, it is impossible to point out a speech recognition error for a terminal other than the terminal that transmitted the indicated data until the correction work by the indicated data transmitting terminal is completed. As a result, the work of correcting the speech recognition error is performed with the highest priority by the indication data transmitting terminal that has indicated the speech recognition error first. Then, the text data is corrected based on the received corrected text data transmitted from the indication data transmitting terminal, and the corrected text data is sequentially transmitted to each terminal as the latest correction information, and the correction work is continued.
[0023]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
(Outline of speech recognition error correction system)
First, an outline of a speech recognition error correction system will be described with reference to FIG.
FIG. 1 is a schematic diagram showing a configuration of a speech recognition error correction system according to an embodiment of the present invention.
[0024]
The speech recognition error correction system according to one embodiment of the present invention converts text data, which is the result of speech recognition of a speech uttered by a speaker by the speech recognition device 2, and speech data targeted for speech recognition via a network. And a speech recognition error correcting device 1 that integrates the corrected text data corrected by these correcting terminals 41, and points out a corrected character range from the text data that is the voice recognition result. And a correction terminal 41 for correcting the characters to correct characters.
In the present embodiment, the correction terminal 41 corresponds to the terminal described in the claims.
[0025]
Further, in the present embodiment, the speech recognition device 2 is used as an example of a device that outputs text data to be corrected by the speech recognition error correction device 1, but any device that outputs text data can use a word processor function or the like. A personal computer or the like equipped with a voice recognition function may be used.
[0026]
In the present embodiment, the text data is corrected to corrected text data by the correction terminals 41 (41A, 41B, 41C) connected to the speech recognition error correction device 1, and the corrected text data is corrected. Receives and corrects the text data on the speech recognition error correction device 1, and outputs the corrected integrated text data in real time (this is the operation of the speech recognition error correction system).
Then, the corrected integrated text data is transmitted as a text broadcast to the television 45 or a personal computer equipped with the function as the television 45 via the broadcast communication network (broadcast wave EW) using the subtitle transmission device 43. The case of broadcasting will be described as an example.
[0027]
Hereinafter, in this embodiment, the pointing data (details described later) indicating the speech recognition error included in the text data transmitted from the speech recognition error correction device 1 is transmitted to the speech recognition error correction device 1. The correction terminal 41 will be referred to as a pointing data transmission terminal 41.
[0028]
(Configuration of speech recognition error correction device)
Next, the configuration of the speech recognition error correction device 1 will be described with reference to FIGS.
FIG. 2 is a block diagram showing the configuration of the speech recognition error correction device 1 according to one embodiment of the present invention.
As shown in FIG. 1, the voice recognition error correction device 1 includes a text data correction unit 13 and a voice reproduction unit 3, and is connected to correction terminals 41 (41A, 41B, 41C) via a network.
[0029]
As shown in FIG. 2, the text data correction unit 13 includes a character data reception unit 15, a correction information reception unit 17, a correction terminal operation determination unit 19, a character presentation speed variable unit 21, a character attribute change unit 23, and character attribute information transmission. It comprises a unit 25, a display area setting unit 27, an unnecessary character insertion unit 29, a presentation information transmission unit 31, a text data protection unit 33, a character string integration unit 35, a screen display information transmission unit 37, and a subtitle output configuration unit 39. .
[0030]
In this embodiment, the character data receiving unit 15 and the correction information receiving unit 17 correspond to the data receiving unit described in the claims, and the character presentation speed varying unit 21 and the character attribute changing unit 23 correspond to the claims. The text data protection unit 33 corresponds to the text data protection unit described in the claims, and the character attribute information transmission unit 25 and the presentation information transmission unit 31 correspond to the claims in the claims. The display area setting section 27 corresponds to the display area setting function described in the claims, and the unnecessary character insertion section 29 corresponds to the unnecessary character insertion function described in the claims. is there.
[0031]
The character data receiving unit 15 receives text data as a result of speech recognition by the speech recognition device 2 (see FIG. 1).
The correction information receiving unit 17 outputs the corrected text data obtained by correcting the corrected text ID (Identification), the corrected terminal ID, the control information, and the speech recognition error of the speech recognition device 2 to the correct text data by the correction terminal 41 from the correction terminal 15. To receive.
[0032]
Note that the “corrected text ID” is an identification symbol (pointed data, which will be described later) for identifying a corrected character range that points out an error portion of text data that is a speech recognition result when the correction terminal 41 enters a correction operation. That is. The “correction terminal ID” is an identification symbol for identifying the correction terminal 41 that has pointed out a correction character range that is an error part of text data that is a speech recognition result when the correction terminal 15 enters a correction operation. . The “control information” is information received from the correction terminal 41 by the correction information receiving unit 17, information on the presentation operation of the text data to the correction terminal 41, information serving as a reference of a control command relating to the reproduction speed or stop, and the correction terminal 41. This is information that serves as a reference for a control instruction regarding the reproduction speed or stop of the audio data presentation operation.
[0033]
The correction terminal operation determination unit 19 has a function of transmitting the correction terminal ID and the control information received from the correction information reception unit 17 to the character presentation speed variable unit 21 (details will be described later), and presents voice data to the correction terminal 41. It has a function of transmitting character presentation timing information and control information serving as timing references to the voice presentation speed variable unit 11 (details will be described later). Further, when the presentation of the text data or the voice data to the correction terminal 41 is temporarily stopped, a delay in the pause is determined, and a high-speed presentation instruction of the text data or the voice data to the correction terminal 41 is performed. . Further, when all of the correction terminals 41 start the correction work, a command for temporarily stopping the presentation of the text data and the voice data to the correction terminal 41 is issued to the character presentation speed variable unit 21 (described later) and the voice presentation speed variable unit 11 (described later). To).
[0034]
Note that the “character presentation timing information” is reference information that is a timing for presenting voice data to the correction terminal 41 for supporting the operation of correcting text data in the correction character range to correct characters. The voice presentation speed variable unit 11 (described later) controls the timing of the presentation of the voice data to the correction terminal 41 based on the character presentation timing information.
[0035]
The character presentation speed variable unit 21 corrects a control command related to the reproduction speed or stop of the presentation operation of the text data to the correction terminal 41 based on the correction terminal ID and the control information received from the correction terminal operation determination unit 19. The ID is transmitted to the correction terminal 41 related to the ID. Further, when all the correction terminals 41 have started the correction work, an instruction to suspend the presentation of the text data to the correction terminal 41 is received from the correction terminal operation determination unit 19, and all the correction terminals 41 (41A, 41B) are received. , 41C).
[0036]
The character attribute changing unit 23 includes a pointed character attribute changing unit 23a, a corrected character attribute changing unit 23b, and a character part of speech attribute changing unit 23c.
In the present embodiment, the indicated character attribute changing unit 23a corresponds to the indicated character attribute changing function described in the claims, and the modified character attribute changing unit 23b corresponds to the modified character attribute changing function described in the claims. And the character part-of-speech attribute changing unit 23c corresponds to a character part-of-speech attribute changing function described in the claims.
[0037]
When the correction terminal 41 points out a correction character range that is text data indicating a speech recognition error by the correction terminal 41, the correction character range according to the correction text ID transmitted from the correction terminal 41 to the correction information receiving unit 17 With respect to the character (1), the character attribute including at least one of a display color, a display size, and a character type is changed for the corrected text data received by the correction information receiving unit 17 from the correction terminal 41. Also, by presenting the corrected text data in which the character attribute of the corrected character range has been changed to the correction terminals 41 other than the correction terminal 41 (pointed data transmission terminal 41) that transmitted the indicated data (corrected text ID), It is also possible to check the indication status related to the indication data from a modification terminal other than the modification terminal 41 (the indication data transmission terminal 41).
[0038]
The “pointed data” refers to a corrected character range pointed out by the pointed data transmitting terminal 41 or data indicating the pointed data transmitting terminal 41, and specifically, the corrected text ID or the corrected terminal ID described above. The pointed-out data transmitting terminal 41 that has transmitted the corrected text ID transmits the corrected terminal ID to the correction information receiving unit 17.
In addition, for changing the character attribute, for example, the font is changed to italic, bold, shadowing, three-dimensional character, bag character, etc. , Underlining, character enclosing, and the like.
[0039]
The corrected character attribute changing unit 23b changes the character whose correction character range is being corrected to a character attribute other than the character attribute whose character attribute of the correction character range has been changed by the indicated character attribute changing unit 23a. In addition, the character attributes of the characters being corrected in the corrected character range are changed to the correction terminals 41 other than the terminal 41 whose correction character range is being corrected (the pointing data transmission terminal 41 whose text data in the corrected character range is being corrected). By presenting the corrected text data, it is possible to check the correction status related to the indicated data from the correction terminals 41 other than the terminal 41 that is correcting.
Note that the change of the character attribute is the same as the description in the pointed-out character attribute changing unit 23a, and therefore, the description thereof is omitted.
[0040]
The character part-of-speech attribute changing unit 23c, as shown in FIG. 4, displays the character display color, the display size, and the character size of at least particles of the part of speech of the character string in the text data transmitted from the character data The character attribute including at least one of the types is changed to change the corrected text data received by the correction information receiving unit 17 from the correction terminal 41 (pointing data transmission terminal 41). In particular, particles have a high frequency of occurrence, and it is very common to overlook the corrector's discovery of speech recognition errors. Is what you do. Changing the character attribute to enlarge the character display of particles is particularly effective when using a touch panel as a pointing device because it increases the accuracy of finding particle recognition errors in particles and makes it easier to point out (select) on the touch panel. It is.
[0041]
It should be noted that a control unit (not shown) may be initially set to transmit text data to the correction terminal 41 after changing the character attribute of at least particles of the part of speech of the character string in the text data.
Note that the change of the character attribute is the same as the description in the pointed-out character attribute changing unit 23a, and therefore, the description thereof is omitted.
[0042]
Returning to FIG. 2, the description will be continued.
The character attribute information transmitting unit 25 sends the character attribute information, which is the information of the character attribute changed by the indicated character attribute changing unit 23a, the corrected character attribute changing unit 23b, and the character part of speech attribute changing unit 23c, to each correction terminal 41. What to send.
[0043]
As shown in FIG. 5A, the display area setting unit 27 displays the display area AR of the text data, which is the speech recognition result presented to the correction terminal 41, in order to improve the visibility of the display screen. At least one of the width and the background color can be arbitrarily set.
[0044]
Returning to FIG. 2, the description will be continued.
The unnecessary character insertion unit 29 inserts unnecessary characters into text data that is the result of speech recognition presented to the correction terminal 41. FIG. 5B shows an example in which an asterisk mark “*” is inserted into text data as an unnecessary character.
When the speech recognition rate is high and the speech recognition error is small, the correction work is a monotonous work for finding the speech recognition error. As a result, the corrector's concentration on the correction work is reduced, and the accuracy of the correction work is reduced. Therefore, in order to prevent this, unnecessary characters are inserted into the text data as the speech recognition result, and the corrector is allowed to delete the unnecessary characters, thereby maintaining the concentration.
[0045]
Returning to FIG. 2, the description will be continued.
The presentation information transmission unit 31 transmits the display area setting information by the display area setting unit 27 and the unnecessary character insertion information by the unnecessary character insertion unit 29 to the correction terminal 41.
The text data protection unit 33 includes a modified guard terminal determination unit 33a and a modified guard text determination unit 33b.
[0046]
The text data protection unit 33 converts the text data of the speech recognition error based on the pointed data (corrected text ID, corrected terminal ID) received by the correction information receiving unit 17 into a text data other than the pointed data transmission terminal 41 that transmitted the pointed data. From the point indicated by the correction terminal 41 to the completion of the correction work by the pointed data transmission terminal 41.
[0047]
An example of how text data protection is achieved by the text data protection unit 33 will be described with reference to FIG. In FIG. 3, text data of the “frequent terrorist incident at that time”, which is text data of a speech recognition result, is presented to the correction terminal 41A, the correction terminal 41B, and the correction terminal 41C. In this case, the portion of the text data “at that time” (corrected character range) corresponds to a speech recognition error. When the corrector of the correction terminal 41A finds and points out the voice recognition error first, the other correction terminal is used. The corrector 41, that is, the corrector of the correction terminal 41B and the corrector of the correction terminal 41C are guarded so as to be unable to point out the same portion ("at the time") as the portion of the correction character range "at the time". You.
[0048]
At this time, the corrector of the correction terminal 41B and the corrector of the correction terminal 41C can point out portions other than the same portion. Also, even if the corrector of the correction terminal 41A finishes the correction work even in the same place, the corrector of the correction terminal 41B and the corrector of the correction terminal 41C are in the same place (“at that time”). ) And corrective action can be taken.
[0049]
Next, returning to FIG. 2, the components of the text data protection unit 33 will be described.
The correction guard terminal determination unit 33a determines the correction terminal ID sent from the correction information receiving unit 17 (correction pointed out a correction character range that is an error part of text data that is a speech recognition result when the correction terminal 41 starts correction work). Receiving the modified terminal 41 (the modified terminal 41A) indicating the modified character range ("at the time"), and using the identified information as modified guard terminal information. This is transmitted to the terminal 41 (the correction terminal 41B and the correction terminal 41C).
[0050]
The correction guard text determination unit 33b identifies the correction text ID sent from the correction information receiving unit 17 (a correction character range that indicates an error portion of text data that is a speech recognition result when the correction terminal 15 enters a correction operation). ), The modified character range ("at the time") pointed out by the modifying terminal 41 (the modifying terminal 41A) is identified, and the identified information is used as the modified guard text information to modify the modified terminal 41 (the modified terminal 41A). This is transmitted to the terminal 41B and the correction terminal 41C).
[0051]
Next, the character string integrating unit 35 will be described.
The character string integrating unit 35 integrates the corrected text data of each correction terminal 41 received by the correction information receiving unit 17 into one unified sentence based on the corrected text ID. The corrected text integrated data integrated by the character string integrating unit 35 is sequentially fed back to each correction terminal 41, and the latest correction information (corrected integrated text data) is always reflected. The modified integrated text data is used as caption broadcast data by the caption transmission device 43 (see FIG. 1).
[0052]
The screen display information transmitting unit 37 sends the corrected integrated text data in which the corrected text data is integrated into one sentence based on the corrected text ID by the character string integrating unit 35 to each of the correction terminals 41 (41A, 41B, 41C). What to send.
[0053]
The subtitle output forming unit 39 converts the corrected integrated text data into subtitles in order to use the corrected integrated text data in which the corrected text data is integrated into one sentence based on the corrected text ID by the character string integrating unit 35 as a text broadcast. The output data is transmitted to the subtitle transmitting device 43 (see FIG. 1).
[0054]
Next, the audio reproducing unit 3 will be described.
The audio reproduction unit 3 includes an audio reception unit 5, an audio storage unit 7, an audio data transmission unit 9, and an audio presentation speed variable unit 11.
The voice reproduction unit 3 reproduces voice data that has been subjected to voice recognition.
The voice data is transmitted to the correction terminal 41 via the network in synchronization with the character presentation timing information, and the voice output from the headphones connected to the correction terminal 41 and the text data as a voice recognition result are corrected. The corrector of the terminal 41 (41A, 41B, 41C) performs the comparison and collation, thereby supporting the corrector's correction work.
[0055]
The voice receiving unit 5 receives voice data targeted for voice recognition from the voice recognition device 2 (see FIG. 1).
The voice accumulating unit 7 temporarily or for long-term storage for presenting the voice data targeted for voice recognition received by the voice receiving unit 5 to each correction terminal 41.
Generally, a main storage device (main memory) using a semiconductor memory is used for temporary storage, and an external storage device (auxiliary storage device) such as a hard disk, a flexible disk, or a DAT (Digital Audio Tape Recorder) is used for long-term storage. Is used.
[0056]
The audio data transmission unit 9 receives the audio data from the audio storage unit 7 and transmits the audio data to each correction terminal 41 for presentation to each correction terminal 41.
[0057]
The voice presentation speed variable unit 11 is configured to provide the character presentation timing information received from the corrected terminal operation determination unit 19 regarding the operation of presenting the voice data subjected to voice recognition by the voice recognition device 2 (see FIG. 1) to the correction terminal 41. And a control command relating to the reproduction speed or stop based on the control information and the control information. Further, when all the correction terminals 41 start the correction work, the temporary stop instruction of the presentation of the voice data to the correction terminal 41 is received from the correction terminal operation determination unit 19, and all the correction terminals 41 (41A, 41B) are received. , 41C) via the audio data transmission unit 9.
[0058]
Therefore, the voice presentation speed variable unit 11 combines the voice data targeted for voice recognition by the voice recognition device 2 (see FIG. 1) simultaneously with the text data that is the voice recognition result of the voice recognition device 2 (see FIG. 1) ( The text data can be presented to each of the correction terminals 41 (in synchronization with the presentation timing to the correction terminals 41 (41A, 41B, 41C)), and the voice to the correction terminal 41 can be synchronized with the character presentation timing information. Regarding the presentation of the data, it is also possible to pause the presentation and delay the presentation, to perform a high-speed presentation for recovering the delay caused by the suspension, and to perform a replay operation of repeatedly presenting the audio data.
[0059]
Subsequently, returning to FIG. 1, the correction terminal 41 will be described.
The correction terminal 41 (41A, 41B, 41C) is a device for using the voice recognition error correction device 1, and includes a display, a keyboard, a touch panel for designating a range of corrected characters, and headphones for outputting voice data. And a foot input interface (foot pedal) for transmitting a switching signal for speeding up or suspending or resuming the sound data to a control unit (not shown) to control the sound data reproducing operation. Generally, a personal computer is used.
[0060]
The foot pedal includes a foot switch for controlling the stop of reproduction of audio data and an interface such as USB (Universal Serial Bus) or RS232C, and is connected to a personal computer via these interfaces. This foot pedal is used to switch the switch ON / OFF by an operation of "stepping on" or "releasing" of a foot switch (pedal), and to reproduce the contents of the recording tape (audio data) recorded on the recording tape. It is used when controlling the operation.
[0061]
Further, the speech recognition error correction device 1 is connected to a LAN (Local Area Network) or a WAN (Wide Area Network), and can perform any system design irrespective of the installation location of the correction terminal 41. Therefore, a speech recognition error correction operation between remote locations is also possible.
[0062]
As described above, the network may be in any form such as a LAN or WAN, but may be in any form such as a network cable, wireless, or infrared.
However, it is preferable to use a network cable from the viewpoint of safety such as communication packet leakage and high speed processing.
Furthermore, a plurality of correctors (correction terminals 41) can be set by the speech recognition error correction device 1, and the correction character range can be pointed out and corrected by each corrector.
[0063]
(Operation of the speech recognition error correction system)
Next, an example of the operation of the speech recognition error correction system will be described with reference to the sequential chart of FIG. 6 (see FIG. 2 as appropriate).
First, the speech recognition device 2 (see FIG. 1) performs speech recognition on the speech generated by the utterance of the speaker, and transmits text data, which is the result of the speech recognition, to the text data correction unit 13 to determine the speech recognition target. The converted audio data is transmitted to the audio reproducing unit 3 (A1).
[0064]
Here, the speaker is "Hello, this is lunch of news." Said Say, for this speech, voice text data of "Hello, this is news of your Hill" as the speech recognition result of the voice recognition device 2 wrong The case of transmitting to the recognition error correcting device 1 (the text data correcting unit 13 and the voice reproducing unit 3) will be described as an example.
[0065]
Then, text data correction unit 13 of the speech recognition error correction apparatus 1 which has received the voice data, the text data "Hello, news is of your Hill" modify the terminal 41A, and transmits it to the correction terminal 41B and the correction terminal 41C At the same time, it sends the character presentation timing information and control information to the audio reproduction unit 3 (B1). Audio playback unit 3, which has received this, the text data is the result of the voice data "Hello, noon news," which became the object of speech recognition based on the on the character presentation timing information and control information (for example, voice recognition (Simultaneously with presentation to the correction terminal 41A, the correction terminal 41B, and the correction terminal 41C) and the correction terminal 41A, the correction terminal 41B, and the correction terminal 41C (B2).
[0066]
The text data "Hello, your Hill is of news" and the voice data "Hello, noon news" each modification has received the terminal 41 (41A, 41B, 41C) is, by the corrector of correction terminal 41, speech recognition By touching the touch panel, the character "mihiru" (corrected character range) of the error part of "" is pointed out, and the correct character "noon" is corrected on the keyboard.
[0067]
At this time, for example, when the corrector of the correction terminal 41A points out the character "mihiru" of the speech recognition error portion earliest (C1), the pointed-out data (correction text ID, correction terminal ID) is sent from the correction terminal 41A to The text data correcting unit 13 is transmitted to the data correcting unit 13 (C2), and upon receiving this, transmits the corrected guard terminal information and the corrected guard text information to the correcting terminals 41B and 41C (B3). The correction terminal 41B and the correction terminal 41C which have received this cannot perform the correction work during the correction work of the correction terminal 41A, and only the correction terminal 41A can perform the correction work with priority.
[0068]
It should be noted that, when the corrector of correction terminal 41A carries out the modification work, the audio data "Hello, lunch of news" that has been sent from the audio playback unit 3 Modify the person is listening with headphones, are presented in the editing terminals 15A text data "Hello, your Hill is of news" that are increasing the correction accuracy by comparing against the, modification work is carried out.
[0069]
Then, the correction terminal 41A transmits the corrected text data “lunch” as the correction result to the text data correction unit 13 (C3), and the text data correction unit 13 that has received the correction text data converts the corrected text data “lunch” into one. modify integration was integrated into the sentence text data "Hello, lunch of news" to create the, the amendment integrated text data is sequentially, the editing terminals 41A, is transmitted to the correction terminal 41B and the correction terminal 41C latest fixes information is all terminals (As an example of the present embodiment, the correction is performed on the correction terminal 41A, the correction terminal 41B, and the correction terminal 41C), and a correction operation is performed (B4).
[0070]
Next, the presence / absence of the corrected text data is determined (B5), and if there is the corrected text data (B5, YES), the corrected text data is integrated into the corrected integrated text data to make the corrected integrated text data correct. (B6). Then, text data such amendment integrated text data "Hello, this is lunch of news" from the correction unit 13 to the subtitle delivery device 43 is transmitted (B7).
[0071]
When there is no corrected text data (B5, NO), for example, it is determined whether or not one minute has elapsed after transmitting the text data to each correction terminal 41 (B8). If (YES), the process proceeds to B6, and the corrected integrated text data is determined. If the corrected integrated text data has not passed (B8, NO), the process returns to B5.
Although not shown in this flowchart, the steps from A1 to B7 are repeated as a speech recognition error correction system while text data as a speech recognition result is presented. The operation of the error correction system ends.
[0072]
Further, as an example of the present embodiment, the corrected integrated text data transmitted from the caption transmitting device 43 that has received the corrected integrated text data transmitted in B7 to the television 45 via the broadcast wave EW (see FIG. 1). (D1), the television 45, "Hello, noon news" is received as teletext with subtitles (E1) shown in an example of the flowchart of case (inner portion from the broken line shown in the lower left of FIG. 6) Keep it.
[0073]
In the determination of the corrected integrated text data in B6, the text data correcting unit 13 integrates the corrected text data after receiving the corrected text data from the correction terminal 41 that has transmitted the corrected text data at the latest time. It can also be determined when the data is created.
[0074]
The embodiments of the present invention have been described above, but the present invention is not limited to the above-described embodiments, but can be implemented in various forms.
The present invention also relates to a voice recognition error correction method, a voice recognition error correction program for realizing such a voice recognition error correction method in the voice recognition error correction device 1, and a recording medium on which the voice recognition error correction program is recorded. Is what you do.
[0075]
【The invention's effect】
As described above, the present invention has the following effects.
According to the first, seventh and eighth aspects of the present invention, the pointing data indicating the corrected character range indicating the voice recognition error and the corrected text data are received from the terminal connected to the voice recognition error correction device. For text data protection means, for terminals other than the pointed-data transmitting terminal that transmitted the pointed-data, point out a speech recognition error and perform correction work until the correction by the pointed-data transmitting terminal is completed. Can not be.
[0076]
Therefore, the correction work can be performed by the corrector alone without sharing the correction work between the discoverer and the corrector of the corrected character range, which is the speech recognition error part. Therefore, the efficiency of the correction work can be improved. In addition, personnel and labor management such as staffing and securing of personnel is facilitated, and education courses can be performed in units of modifiers, so that efficient courses can be provided. In addition, it is possible to design a system that saves labor.
In addition, by allowing the corrector (terminal) that first points out the character of the voice recognition error portion to perform the correction work, efficiency of the correction work can be realized.
[0077]
According to the second aspect of the present invention, by connecting the speech recognition error correction device and the terminal via a network, it is possible to flexibly cope with a work place, and it is possible to design a simplified system. Therefore, construction of an efficient speech recognition error correction system can be realized.
[0078]
According to the third and fourth aspects of the present invention, by changing the character attribute of a character in a speech recognition error portion (corrected character range) and presenting it to each terminal, the status of the pointing work or the correction work situation can be improved. Clarification can be achieved. Therefore, the correction work can be made more efficient.
[0079]
According to the fifth aspect of the present invention, at least one of the display width and the background color can be set for the display area of the text data as the speech recognition result based on a request from the terminal user. Thus, the display area of the text data becomes easy to see. Therefore, the visibility of the display screen can be improved.
[0080]
According to the invention described in claim 6, unnecessary characters are inserted into the text data as a result of the voice recognition, and the unnecessary characters are deleted by the corrector. Even in the strong case, the concentration of the corrector can be maintained. Therefore, the correction work can be maintained with high accuracy.
[Brief description of the drawings]
FIG. 1 is a schematic diagram illustrating a configuration of a speech recognition error correction system according to an embodiment of the present invention.
FIG. 2 is a block diagram illustrating a configuration of a speech recognition error correction device according to an embodiment of the present invention.
FIG. 3 is a diagram illustrating an example of an operation of an error text data protection unit of the text data correction unit according to one embodiment of the present invention.
FIG. 4 is a diagram illustrating an example of an operation of a character attribute changing unit of the text data correcting unit according to the embodiment of the present invention.
FIG. 5A is a diagram illustrating an example of an operation of an unnecessary character insertion unit of the text data correction unit according to an embodiment of the present invention.
(B) It is a figure for explaining an example of operation of the display field setting part of the text data correction part concerning one embodiment of the present invention.
FIG. 6 is a sequential chart for explaining an example of the operation of the speech recognition error correction device according to one embodiment of the present invention.
[Explanation of symbols]
1 Speech recognition error correction device
2 Voice recognition device
3 Audio playback unit
5 Voice receiver
7 Voice storage unit
9 Audio data transmission section
11 Voice presentation speed variable section
13 Text data correction unit
15 Character data receiving unit (data receiving means)
17 Correction information receiving unit (data receiving means)
19 Modified terminal operation determination unit
21 Character presentation speed variable section (presentation means)
23 Character attribute change unit (presentation means)
23a Designated character attribute change unit (pointed character attribute change function)
23b Modified character attribute change unit (Modified character attribute change function)
23c Character part of speech attribute changing section (character part of speech attribute changing function)
25 Character attribute information transmission unit (output means)
27 Display area setting section (display area setting function)
29 Unnecessary character insertion section (unnecessary character insertion function)
31 presentation information transmission unit (output means)
33 Text Data Protection Unit (Text Data Protection Means)
33a Modified guard terminal determination unit
33b Modified guard text judgment unit
35 String Integration Unit
37 screen display information transmission unit (output means)
39 Subtitle output component (output means)
41 (41A-41C) Correction terminal, pointing data transmission terminal (terminal)
43 Caption sending device
45 Television
AR text data display area
AF correction character input frame

Claims

A speech recognition error which receives speech data output from a speech recognition device and which is a speech recognition result and text data as a speech recognition result, and corrects a speech recognition error included in the text data by a plurality of terminals. A correction device,
Presenting means for presenting the voice data and the text data to the terminal, and pointing data indicating a corrected character range indicated and represented by the terminal with respect to the text data presented to the terminal by the presenting means; Data receiving means for receiving corrected text data obtained by correcting text data in a corrected character range;
Based on the indication data received by the data receiving means, the corrected character range is indicated by a terminal other than the indication data transmission terminal until the correction by the indication data transmission terminal that is the terminal that transmitted the indication data is completed. Text data protection measures to protect against
An output unit that corrects the text data based on the corrected text data received by the data receiving unit and outputs the corrected text data to each of the terminals.

The apparatus according to claim 1, wherein the plurality of terminals are connected via a network.

The presenting means, for the characters in the correction character range, by changing the attribute of the character including at least one of the display color, display size, character type, to a terminal other than the indication data transmission terminal Pointed character attribute change function that also indicates the change of the character attribute,
When the corrected character range is being corrected by the indication data transmitting terminal, a corrected character attribute that has a character attribute different from the character attribute changed by the specified character attribute change function in the corrected character range being corrected. The speech recognition error correction device according to claim 1, further comprising a change function.

The presenting means is a character part-of-speech attribute change that changes a character attribute including at least one of a display color, a display size, and a character type for at least particles of the part of speech related to the text data as the speech recognition result. The speech recognition error correction device according to claim 1 or 2, further comprising a function.

The said presentation means is provided with the display area setting function which can arbitrarily set at least one of the display width and background color of the display area of the text data which is the said speech recognition result presented to the terminal. The speech recognition error correction device according to claim 1 or 2.

3. The speech recognition according to claim 1, wherein the presentation unit has an unnecessary character insertion function of inserting an unnecessary character into text data that is the speech recognition result presented to the terminal. Error correction device.

A speech recognition error which receives speech data output from a speech recognition device and which is a speech recognition result and text data as a speech recognition result, and corrects a speech recognition error included in the text data by a plurality of terminals. How to fix it,
A presentation step of presenting the voice data and the text data to the terminal;
For the text data presented to the terminal by the presenting step, data reception for receiving indication data indicating a corrected character range indicated by the terminal and corrected text data obtained by correcting the text data of the corrected character range with respect to the text data presented to the terminal. Steps and
Based on the indication data received in this data receiving step, the corrected character range is indicated by a terminal other than the indication data transmission terminal until the correction by the indication data transmission terminal that is the terminal that transmitted the indication data is completed. Text data protection steps to protect against
An output step of correcting the text data based on the corrected text data received in the data receiving step and outputting the corrected text data to each of the terminals.

A speech recognition error which receives speech data output from a speech recognition device and which is a speech recognition result and text data as a speech recognition result, and corrects a speech recognition error included in the text data by a plurality of terminals. Correction device,
Presentation means for presenting the voice data and the text data to the terminal;
Data reception for receiving, with respect to the text data presented to the terminal by the presenting means, indication data indicating a corrected character range indicated by the terminal and corrected text data obtained by correcting the text data of the corrected character range. means,
Based on the indication data received by the data receiving means, the corrected character range is indicated by a terminal other than the indication data transmission terminal until the correction by the indication data transmission terminal that is the terminal that transmitted the indication data is completed. Text data protection means to protect against
A speech recognition error correction program, which functions as output means for correcting the text data based on the corrected text data received by the data receiving means and outputting the corrected text data to each of the terminals.