JP3986015B2

JP3986015B2 - Speech recognition error correction device, speech recognition error correction method, and speech recognition error correction program

Info

Publication number: JP3986015B2
Application number: JP2003017623A
Authority: JP
Inventors: 剛三島; 訓史大出; 篤今井; 徹都木
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2003-01-27
Filing date: 2003-01-27
Publication date: 2007-10-03
Anticipated expiration: 2023-01-27
Also published as: JP2004226910A

Description

【０００１】
【発明の属する技術分野】
本発明は、音声認識誤りを修正する音声認識誤り修正装置、音声認識誤り修正方法および音声認識誤り修正プログラムに関するものである。
【０００２】
【従来の技術】
従来、音声認識技術を用いた音声認識誤り修正装置には、音声字幕化装置や文字データ修正装置がある（特許文献１、特許文献２を参照）。
前者は、音声認識結果をリアルタイムで字幕化するために、音声認識誤りを修正する際に、音声認識に供された話者の音声の提示と、音声認識結果であるテキストデータの提示とのタイミングの適正化によって、音声認識誤りの修正作業の効率化を図ったものである。
後者は、音声認識結果をリアルタイムで字幕化するために、テキストデータ出力装置（音声認識装置）から出力されたテキストデータ（音声認識結果）の音声認識誤りを見つけて選択する指摘端末（ポインティング用端末装置）と、この音声認識の誤りを修正して置き換える修正端末（修正用端末装置）とにそれぞれ役割を分担することによって、音声認識結果の誤りの修正作業の効率化を図ったものである。
【０００３】
【特許文献１】
特開２００１−１４２４８２号公報（段落番号００１８〜００４３、第１図）
【特許文献２】
特開２００１−６０１９２号公報（段落番号００２３〜００４３、第１図）
【０００４】
【発明が解決しようとする課題】
しかしながら、従来の音声認識誤り修正装置では、音声認識の誤りを発見する発見オペレータ（以下、発見者という）と、その誤りを修正する修正オペレータ（以下、修正者という）とに作業の役割が分担されて音声認識結果の誤りが修正されるため、発見者と修正者の作業内容が異なることから生ずる次のような弊害があった。
【０００５】
１．発見者と修正者の人数の均衡を考慮しなければならず、人員配置などの人事労務管理の点で不都合が生じること。
２．発見者と修正者に対して各々別々に教育訓練をする必要があること。
３．音声認識誤り修正装置を用いて、音声認識結果の誤りを修正する際に、常に発見者と修正者が対で作業を行う必要があるため、同時に、かつ、同一箇所での作業をせざるを得ず、作業効率が良くないこと。
また、作業場所の制約も生じることなどの問題があった。
【０００６】
そこで、本発明はこのような問題を解決するために、発見者と修正者の人数の均衡を考慮することなく、また、別々に教育訓練をする必要がなく作業場所の制約も最小限にして、効率よく誤り修正作業をすることができる音声認識誤り修正装置、その方法およびそのプログラムを提供することを目的としたものである。
【０００７】
【課題を解決するための手段】
本発明は、前記した目的を達成するため、以下に示す構成とした。
請求項１に記載の発明は、音声認識装置から出力された音声認識の対象となった音声データおよび音声認識結果であるテキストデータを受信し、当該テキストデータに含まれている音声認識誤りを、複数の端末により修正する音声認識誤り修正装置であって、前記音声データおよび前記テキストデータを前記端末に提示する提示手段と、この提示手段によって前記端末に提示されたテキストデータに対して、前記端末により指摘されて表わされる修正文字範囲を示す指摘データおよび前記修正文字範囲のテキストデータを修正した修正テキストデータを受信するデータ受信手段と、このデータ受信手段で受信した前記指摘データに基づいて、前記指摘データを送信した前記端末である指摘データ送信端末による修正が完了するまで前記修正文字範囲を、当該指摘データ送信端末以外の端末による指摘から保護するテキストデータ保護手段と、前記データ受信手段で受信した修正テキストデータに基づいて前記テキストデータを修正し、前記端末それぞれに出力する出力手段と、を備えることを特徴とする。
【０００８】
かかる構成によれば、音声認識装置から出力された音声データおよびテキストデータを受信して複数の修正端末に提示される。そして、これら端末から音声認識誤りを指摘した修正文字範囲を示す指摘データおよび修正した修正テキストデータが受信され、テキストデータ保護手段によって、この指摘データを送信した指摘データ送信端末以外の端末は当該指摘データ送信端末による修正が完了するまでの間、音声認識誤りを指摘することができない。つまり、最も先に音声認識誤りを指摘した指摘データ送信端末によって最優先に当該音声認識誤りの修正作業がなされ、指摘データ送信端末から送信された修正テキストデータが受信され、テキストデータが修正される。そして出力手段により最新の修正情報が逐次各端末に送信されて通知され、修正作業が続行される。
なお、端末を使用する修正者が、誤って正しい音声認識結果であるテキストデータを修正文字範囲として指摘してしまった場合には、例えば、削除キーを選択するなどの方法により、当該修正文字範囲の指定操作結果を操作前の状態に戻すことができる。
【０００９】
請求項２に記載の発明は、請求項1または請求項２に記載の音声認識誤り修正装置において、前記複数の端末は、ネットワークを介して前記音声認識誤り修正装置と接続されることを特徴とする。
【００１０】
かかる構成によれば、音声認識誤り修正装置は、ネットワークで接続される各端末により操作される。そのため、各端末の設置場所、すなわち、修正作業場の設定に柔軟に対応することができる。
【００１１】
請求項３に記載の発明は、請求項1または請求項２に記載の音声認識誤り修正装置において、前記提示手段は、前記指摘データに含まれる文字について、表示色、表示の大きさ、文字の種類のうち少なくとも一つを含む文字の属性の変更を行うことによって、前記指摘データ送信端末以外の端末にも当該文字属性の変更を提示する指摘文字属性変更機能と、前記指摘データ送信端末によって指摘データを修正中である場合に、修正中の前記指摘データの文字属性を前記指摘文字属性変更機能で変更した文字属性とは異なる文字属性とする修正文字属性変更機能と、を備えることを特徴とする。
【００１２】
かかる構成によれば、指摘文字属性変更機能によって、音声認識結果であるテキストデータのうち音声認識誤り部分を指摘した文字の属性が変更され、当該音声認識誤りを指摘した指摘データ送信端末以外の端末に対しても、この変更された文字属性のテキストデータが提示される。そして、修正文字属性変更機能によって、修正作業をしている文字の属性が音声認識誤りの文字の指摘時に変更された文字属性以外の文字属性に変更され、修正作業中の端末以外の端末に対しても、この変更された文字属性のテキストデータが提示される。
【００１３】
請求項４に記載の発明は、請求項1または請求項２に記載の音声認識誤り修正装置において、前記提示手段は、前記音声認識結果であるテキストデータに係る品詞のうち少なくとも助詞について、表示色、表示の大きさ、文字の種類のうち少なくとも一つを含む文字の属性を変更する文字品詞属性変更機能を備えることを特徴とする。
【００１４】
かかる構成によれば、文字属性変更機能によって、音声認識結果であるテキストデータの品詞のうち少なくとも助詞について前記した文字属性が変更されるため、特に音声認識誤りの発生頻度の高い助詞についての注意の喚起を促すことができる。
【００１５】
請求項５に記載の発明は、請求項1または請求項２に記載の音声認識誤り修正装置において、前記提示手段は、前記端末に提示している前記音声認識結果であるテキストデータに係る文字列の表示領域の表示幅と背景色との少なくとも一方を任意に設定できる表示領域設定機能を備えることを特徴とする。
【００１６】
かかる構成によれば、表示領域設定機能によって、音声認識結果であるテキストデータの表示領域について、端末の使用者からの要望に基づいて、表示幅と背景色との少なくとも一方を設定（変更）すれば、当該テキストデータの表示領域が見易くなり、表示画面の見易さが向上される。
【００１７】
請求項６に記載の発明は、請求項1または請求項２に記載の音声認識誤り修正装置において、前記提示手段は、前記端末に提示している前記音声認識結果であるテキストデータ中に不要文字を挿入する不要文字挿入機能を備えることを特徴とする。
【００１８】
かかる構成によれば、不要文字挿入機能によって、音声認識結果であるテキストデータ中に不要文字が挿入されるため、端末を使用する修正者は、この不要文字を削除する作業が必要となり、端末に提示されるテキストデータの音声認識誤り率が低く、修正すべきテキストデータが少なく単調作業が継続する場合においても、修正作業に対する集中力の維持がなされる。
【００１９】
請求項７に記載の発明は、音声認識装置から出力された音声認識の対象となった音声データおよび音声認識結果であるテキストデータを受信し、当該テキストデータに含まれている音声認識誤りを、複数の端末により修正する音声認識誤り修正方法であって、前記音声データおよび前記テキストデータを前記端末に提示する提示ステップと、この提示ステップによって前記端末に提示されたテキストデータに対して、前記端末により指摘されて表わされる修正文字範囲を示す指摘データおよび前記修正文字範囲のテキストデータを修正した修正テキストデータを受信するデータ受信ステップと、このデータ受信ステップで受信した前記指摘データに基づいて、前記指摘データを送信した前記端末である指摘データ送信端末による修正が完了するまで前記修正文字範囲を、当該指摘データ送信端末以外の端末による指摘から保護するテキストデータ保護ステップと、前記データ受信ステップで受信した修正テキストデータに基づいて前記テキストデータを修正し、前記端末それぞれに出力する出力ステップと、を含むことを特徴とする。
【００２０】
かかる音声認識誤り修正方法によれば、まず、提示ステップで音声認識装置から出力された音声データおよび音声認識結果であるテキストデータが前記した端末に提示される。続いてデータ受信ステップで、前記した提示ステップによって前記端末に提示されたテキストデータのうち音声認識誤りを前記端末の使用者によって指摘した修正文字範囲を示す指摘データおよび修正した修正テキストデータを受信する。そして、テキストデータ保護ステップで、当該指摘データに基づいて、前記した音声認識誤りのテキストデータを指摘データ送信端末以外の端末による指摘から当該指摘データ送信端末による修正作業が完了するまでの間、保護する。そして、データ受信ステップで受信した修正テキストデータに基づいて、音声認識誤りに係るテキストデータが修正される。つまり、最も先に音声認識誤りを指摘した指摘データ送信端末によって最優先に当該音声認識誤りの修正作業がなされ、指摘データ送信端末から送信された修正テキストデータが受信され、テキストデータが修正される。次に出力ステップで、当該修正されたテキストデータが最新の修正情報として逐次各端末に送信し、修正作業が続行される。
【００２１】
請求項８に記載の発明は、音声認識装置から出力された音声認識の対象となった音声データおよび音声認識結果であるテキストデータを受信し、当該テキストデータに含まれている音声認識誤りを、複数の端末により修正する音声認識誤り修正装置を、前記音声データおよび前記テキストデータを前記端末に提示する提示手段、この提示手段によって前記端末に提示されたテキストデータに対して、前記端末により指摘されて表わされる修正文字範囲を示す指摘データおよび前記修正文字範囲のテキストデータを修正した修正テキストデータを受信するデータ受信手段、このデータ受信手段で受信した前記指摘データに基づいて、前記指摘データを送信した前記端末である指摘データ送信端末による修正が完了するまで前記修正文字範囲を、当該指摘データ送信端末以外の端末による指摘から保護するテキストデータ保護手段、前記データ受信手段で受信した修正テキストデータに基づいて前記テキストデータを修正し、前記端末それぞれに出力する出力手段、として機能させることを特徴とする。
【００２２】
かかる音声認識誤り修正プログラムによれば、音声認識誤り修正装置としての機能を生じさせて、このプログラムの処理手順に従って実行させることができるので、音声認識装置から出力された音声データおよびテキストデータを受信して複数の修正端末に提示され、これら端末を使用する使用者が音声認識誤りを指摘した修正文字範囲を示す指摘データおよび修正した修正データを受信し、テキストデータ保護手段が、この指摘データに基づいて、当該指摘データを送信した端末以外の端末について、当該指摘データ送信端末による修正作業が完了するまでの間、音声認識誤りを指摘することを不可能とする。このことによって、最も先に音声認識誤りを指摘した指摘データ送信端末によって最優先に当該音声認識誤りの修正作業がなされる。そして、受信された指摘データ送信端末から送信された修正テキストデータに基づいて、テキストデータが修正されて最新の修正情報として逐次各端末に送信され、修正作業が続行される。
【００２３】
【発明の実施の形態】
以下、本発明の実施の形態について図面を参照しながら説明する。
（音声認識誤り修正システムの概略）
まず、図１を参照しながら、音声認識誤り修正システムの概略について説明する。
図１は、本発明の一実施形態に係る音声認識誤り修正システムの構成を示す概略図である。
【００２４】
本発明の一実施形態に係る音声認識誤り修正システムは、発声者が発声した音声を音声認識装置２によって音声認識した結果であるテキストデータおよび音声認識の対象となった音声データを、ネットワークを介して接続されている各修正端末４１に提示し、これら修正端末４１によって修正された修正テキストデータを統合する音声認識誤り修正装置１と、当該音声認識結果であるテキストデータから修正文字範囲を指摘して正確な文字に修正する修正端末４１により構成されている。
なお、本実施の形態では、修正端末４１が特許請求の範囲に記載の端末に相当する。
【００２５】
また、本実施の形態においては、音声認識誤り修正装置１の修正対象となるテキストデータを出力する装置として音声認識装置２を例とするが、テキストデータを出力する装置であれば、ワードプロセッサ機能や音声認識機能を搭載したパーソナルコンピュータ等であってもよい。
【００２６】
本実施の形態では、当該テキストデータを音声認識誤り修正装置１に接続されている修正端末４１（４１Ａ、４１Ｂ、４１Ｃ）によって修正テキストデータに修正し、当該修正テキストデータを音声認識誤り修正装置１が受信して音声認識誤り修正装置１上の当該テキストデータを、修正して統合した修正統合テキストデータをリアルタイムで出力する（ここまでが音声認識誤り修正システムの動作）。
そして、この修正統合テキストデータを、字幕送出装置４３を利用して、放送通信網（放送波ＥＷ）を介して、テレビジョン４５またはテレビジョン４５としての機能を搭載したパーソナルコンピュータ等へ文字放送として放送する場合を例として説明する。
【００２７】
以下、この実施の形態では、当該音声認識誤り修正装置１から送信されたテキストデータに含まれる音声認識誤りを指摘した指摘データ（詳細は後記）を当該音声認識誤り修正装置１に送信した、この修正端末４１を指摘データ送信端末４１と表記することとする。
【００２８】
（音声認識誤り修正装置の構成）
次に、図１、図２を参照しながら音声認識誤り修正装置１の構成について説明する。
図２は、本発明の一実施形態に係る音声認識誤り修正装置１の構成を示すブロック構成図である。
音声認識誤り修正装置１は、図１に示すように、テキストデータ修正部１３と、音声再生部３を備え、修正端末４１（４１Ａ、４１Ｂ、４１Ｃ）とネットワークを介して接続されている。
【００２９】
テキストデータ修正部１３は、図２に示すように、文字データ受信部１５、修正情報受信部１７、修正端末動作判定部１９、文字提示速度可変部２１、文字属性変更部２３、文字属性情報送信部２５、表示領域設定部２７、不要文字挿入部２９、提示情報送信部３１、テキストデータ保護部３３、文字列統合部３５、画面表示情報送信部３７および字幕出力構成部３９により構成されている。
【００３０】
なお、本実施の形態では、文字データ受信部１５および修正情報受信部１７が特許請求の範囲に記載のデータ受信手段に相当し、文字提示速度可変部２１および文字属性変更部２３が特許請求の範囲に記載の提示手段に相当し、テキストデータ保護部３３が特許請求の範囲に記載のテキストデータ保護手段に相当し、文字属性情報送信部２５および提示情報送信部３１が特許請求の範囲に記載の出力手段に相当し、表示領域設定部２７が特許請求の範囲に記載の表示領域設定機能に相当し、不要文字挿入部２９が特許請求の範囲に記載の不要文字挿入機能に相当するものである。
【００３１】
文字データ受信部１５は、音声認識装置２（図１参照）の音声認識結果であるテキストデータを受信するものである。
修正情報受信部１７は、修正端末１５から修正テキストＩＤ（Ｉｄｅｎｔｉｆｉｃａｔｉｏｎ）、修正端末ＩＤ、制御情報および音声認識装置２の音声認識誤りを修正端末４１によって正確なテキストデータに修正された修正テキストデータを受信するものである。
【００３２】
なお、「修正テキストＩＤ」とは、修正端末４１が修正作業に入る際に音声認識結果であるテキストデータの誤り部分を指摘した修正文字範囲を識別するための識別記号（指摘データ。後記する）のことである。「修正端末ＩＤ」とは、修正端末１５が修正作業に入る際に音声認識結果であるテキストデータの誤り部分である修正文字範囲を指摘した修正端末４１を識別するための識別記号のことである。「制御情報」とは、修正端末４１から修正情報受信部１７によって受信され、修正端末４１へのテキストデータの提示動作について、その再生速度または停止に関する制御命令の基準となる情報と、修正端末４１への音声データの提示動作について、その再生速度または停止に関する制御命令の基準となる情報とのことである。
【００３３】
修正端末動作判定部１９は、修正情報受信部１７から受信した修正端末ＩＤおよび制御情報を文字提示速度可変部２１（詳細は後記する）へ送出する機能と、音声データの修正端末４１への提示タイミングの基準となる文字提示タイミング情報および制御情報を音声提示速度可変部１１（詳細は後記する）へ送出する機能とを有するものである。また、修正端末４１へのテキストデータまたは音声データの提示を一時停止した場合において、その一時停止における遅延分を判定し、テキストデータまたは音声データの修正端末４１への高速提示命令を行うものである。さらに、全ての修正端末４１が修正作業に入ると、修正端末４１へのテキストデータおよび音声データの提示の一時停止命令を文字提示速度可変部２１（後記する）および音声提示速度可変部１１（後記する）へ送信するものである。
【００３４】
なお、「文字提示タイミング情報」とは、修正文字範囲のテキストデータを正しい文字に修正する作業を支援するための音声データを修正端末４１へ提示するタイミングとなる基準情報のことである。この文字提示タイミング情報によって音声提示速度可変部１１（後記する）は、修正端末４１への音声データの提示のタイミングを制御するものである。
【００３５】
文字提示速度可変部２１は、修正端末動作判定部１９から受信した修正端末ＩＤおよび制御情報に基づいて、テキストデータの修正端末４１への提示動作について、その再生速度または停止に関する制御命令を修正端末ＩＤに係る修正端末４１へ送信するものである。また、全ての修正端末４１が修正作業に着手した場合において、修正端末４１へのテキストデータの提示の一時停止命令を修正端末動作判定部１９から受信して、全ての修正端末４１（４１Ａ、４１Ｂ、４１Ｃ）へ当該命令を送信するものである。
【００３６】
文字属性変更部２３は、指摘文字属性変更部２３ａと、修正文字属性変更部２３ｂと、文字品詞属性変更部２３ｃとを備えている。
なお、本実施の形態では、指摘文字属性変更部２３ａが特許請求の範囲に記載の指摘文字属性変更機能に相当し、修正文字属性変更部２３ｂが特許請求の範囲に記載の修正文字属性変更機能に相当し、文字品詞属性変更部２３ｃが特許請求の範囲に記載の文字品詞属性変更機能に相当する。
【００３７】
指摘文字属性変更部２３ａは、修正端末４１によって音声認識誤りを指摘したテキストデータである修正文字範囲を指摘した際に修正端末４１から修正情報受信部１７へ送信する修正テキストＩＤに係る修正文字範囲の文字に関して、修正端末４１から修正情報受信部１７によって受信した修正テキストデータについて、表示色、表示の大きさ、文字の種類のうち少なくとも一つを含む文字属性の変更を行うものである。また、当該指摘データ（修正テキストＩＤ）を送信した修正端末４１（指摘データ送信端末４１）以外の修正端末４１に、当該修正文字範囲の文字属性が変更された修正テキストデータを提示することによって、当該指摘データに係る指摘状況を当該修正端末４１（指摘データ送信端末４１）以外の修正端末からも確認可能となる。
【００３８】
なお、「指摘データ」とは、指摘データ送信端末４１によって指摘した修正文字範囲または当該指摘データ送信端末４１を示すデータをいい、具体的には、前記した修正テキストＩＤまたは修正端末ＩＤを示す。この修正テキストＩＤを送信した指摘データ送信端末４１は、前記した修正端末ＩＤを修正情報受信部１７へ送信する。
また、文字属性の変更には、例えば、字体を斜体、ボールド（ｂｏｌｄ）、影付け、立体文字、袋文字等に変更したり、文字に網掛け、飾り網点等の模様を付けたり、その他、下線、文字囲み等の文字修飾を含むものである。
【００３９】
修正文字属性変更部２３ｂは、修正文字範囲を修正中の文字について、指摘文字属性変更部２３ａで修正文字範囲の文字属性を変更した文字属性以外の文字属性に変更するものである。また、当該修正文字範囲を修正中の端末４１（修正文字範囲のテキストデータを修正中の指摘データ送信端末４１）以外の修正端末４１に当該修正文字範囲の修正中の文字の文字属性が変更された修正テキストデータが提示されることによって、当該指摘データに係る修正状況を修正中の端末４１以外の修正端末４１からも確認可能となる。
なお、文字属性の変更については、指摘文字属性変更部２３ａでの説明と同様なので、その説明を省略する。
【００４０】
文字品詞属性変更部２３ｃは、文字データ受信部１５から送出されたテキストデータ中の文字列の品詞のうち少なくとも助詞について、図４に示すように、文字の表示色、表示の大きさ、文字の種類のうち少なくとも一つを含む文字属性を変更して修正端末４１（指摘データ送信端末４１）から修正情報受信部１７よって受信した修正テキストデータを変更するものである。特に助詞は発現頻度が高く、修正者による音声認識誤りの発見を看過することが非常に多いため、助詞の文字属性を変化させ、修正者に特に助詞の音声認識誤りの発見について注意の喚起を行うものである。助詞の文字表示を大きくする文字属性の変更は、ポインティングデバイスとしてタッチパネルを使用する場合に、助詞の音声認識誤りの発見の精度を高め、また、タッチパネル上で指摘（選択）し易くなるため特に有効である。
【００４１】
なお、図示していない制御部によって、テキストデータ中の文字列の品詞のうち少なくとも助詞について、予め文字属性の変更後に修正端末４１へテキストデータを送信するように初期設定することもできる。
なお、文字属性の変更については、指摘文字属性変更部２３ａでの説明と同様なので、その説明を省略する。
【００４２】
図２に戻って説明を続ける。
文字属性情報送信部２５は、前記した指摘文字属性変更部２３ａ、修正文字属性変更部２３ｂ、文字品詞属性変更部２３ｃによって、変更された文字属性の情報である文字属性情報を各修正端末４１へ送信するものである。
【００４３】
表示領域設定部２７は、表示画面の見易さを向上させるため、図５の（ａ）に示すように、修正端末４１に提示している音声認識結果であるテキストデータの表示領域ＡＲの表示幅と背景色との少なくとも一方を任意に設定可能とするものである。
【００４４】
再び図２に戻って説明を続ける。
不要文字挿入部２９は、修正端末４１に提示している音声認識結果であるテキストデータ中に不要文字を挿入するものである。図５の（ｂ）では、不要文字としてアスタリスクマーク「＊」をテキストデータ中に挿入した例を示している。音声認識率が高く、音声認識誤りが少ない場合、修正作業は音声認識誤りを発見するための単調な作業となる。その結果、修正者の修正作業に対する集中力の低下、修正作業の精度の低下を招く。そこでそれを防止するために、音声認識結果であるテキストデータに不要文字を挿入し、修正者に当該不要文字の削除作業をしてもらうことによって集中力を持続させるものである。
【００４５】
再び図２に戻って説明を続ける。
提示情報送信部３１は、表示領域設定部２７による表示領域設定情報、不要文字挿入部２９による不要文字挿入情報を修正端末４１へ送信するものである。
テキストデータ保護部３３は、修正ガード端末判定部３３ａ、修正ガードテキスト判定部３３ｂを備えている。
【００４６】
テキストデータ保護部３３は、修正情報受信部１７で受信した指摘データ（修正テキストＩＤ、修正端末ＩＤ）に基づいて、音声認識誤りのテキストデータを、当該指摘データを送信した指摘データ送信端末４１以外の修正端末４１による指摘から当該指摘データ送信端末４１による修正作業が完了するまでの間、保護するものである。
【００４７】
図３を参照しながら、テキストデータ保護部３３によるテキストデータの保護がどのように実現されるかについて、その一例を説明する。図３には、音声認識結果のテキストデータである「当時多発テロ事件」のテキストデータが、修正端末４１Ａ、修正端末４１Ｂおよび修正端末４１Ｃに提示されている。この場合、テキストデータ「当時」の部分（修正文字範囲）が音声認識誤りに相当するが、修正端末４１Ａの修正者が最も早くこの音声認識誤りを発見して指摘したときは、他の修正端末４１の修正者、すなわち、修正端末４１Ｂの修正者および修正端末４１Ｃの修正者が、修正文字範囲「当時」の部分と同一箇所（「当時」）については、指摘することができないようにガードされる。
【００４８】
この際、修正端末４１Ｂの修正者および修正端末４１Ｃの修正者は、同一箇所以外の部分については指摘可能である。また、同一箇所であっても修正端末４１Ａの修正者が当該修正作業を終了した後は、修正端末４１Ｂの修正者および修正端末４１Ｃの修正者は、当該同一箇所であった箇所（「当時」）について指摘し、修正作業を行うことができるものである。
【００４９】
次に、再び図２に戻って、テキストデータ保護部３３の構成要素について説明する。
修正ガード端末判定部３３ａは、修正情報受信部１７から送出された修正端末ＩＤ（修正端末４１が修正作業に入る際に音声認識結果であるテキストデータの誤り部分である修正文字範囲を指摘した修正端末４１を識別するための識別記号）を受信して、修正文字範囲（「当時」）を指摘した修正端末４１（修正端末４１Ａ）を識別し、その識別した情報を修正ガード端末情報として、修正端末４１（修正端末４１Ｂおよび修正端末４１Ｃ）へ送信するものである。
【００５０】
修正ガードテキスト判定部３３ｂは、修正情報受信部１７から送出された修正テキストＩＤ（修正端末１５が修正作業に入る際に音声認識結果であるテキストデータの誤り部分を指摘した修正文字範囲を識別するための識別記号）を受信して、修正端末４１（修正端末４１Ａ）によって指摘された修正文字範囲（「当時」）を識別し、その識別した情報を修正ガードテキスト情報として、修正端末４１（修正端末４１Ｂおよび修正端末４１Ｃ）へ送信するものである。
【００５１】
続いて文字列統合部３５に関して説明する。
文字列統合部３５は、修正情報受信部１７によって受信した各修正端末４１による修正テキストデータを修正テキストＩＤに基づいて、１つの統一された文章に統合するものである。この文字列統合部３５によって統合された修正テキスト統合データは、逐次各修正端末４１にフィードバックされ、常に最新の修正情報（修正統合テキストデータ）が反映される。また、この修正統合テキストデータは、字幕送出装置４３（図１参照）によって、字幕放送用のデータとして利用される。
【００５２】
画面表示情報送信部３７は、文字列統合部３５によって、修正テキストデータが修正テキストＩＤに基づいて、１つの文章に統合された修正統合テキストデータを各修正端末４１（４１Ａ、４１Ｂ、４１Ｃ）へ送信するものである。
【００５３】
字幕出力構成部３９は、文字列統合部３５によって、修正テキストデータが修正テキストＩＤに基づいて１つの文章に統合された修正統合テキストデータを文字放送として利用するために、修正統合テキストデータを字幕出力データとして、字幕送出装置４３（図１参照）へ送信するものである。
【００５４】
次に、音声再生部３について説明する。
音声再生部３は、音声受信部５、音声蓄積部７、音声データ送信部９、音声提示速度可変部１１から構成されている。
音声再生部３は、音声認識の対象となった音声データを再生するものである。
この音声データが文字提示タイミング情報に同期してネットワークを介して修正端末４１に送信され、修正端末４１に接続されているヘッドフォンで出力された音声と、音声認識結果であるテキストデータとを各修正端末４１（４１Ａ、４１Ｂ、４１Ｃ）の修正者が比較照合することによって、修正者の修正作業が支援される。
【００５５】
音声受信部５は、音声認識の対象となった音声データを音声認識装置２（図１参照）から受信するものである。
音声蓄積部７は、音声受信部５で受信した音声認識の対象となった音声データを各修正端末４１へ提示するために一時的にまたは長期保存用に蓄積するものである。
一般に、一時記憶用には半導体メモリを利用した主記憶装置(メインメモリ)が利用され、長期保存用にはハードディスク、フレキシブルディスク、ＤＡＴ（ＤｉｇｉｔａｌＡｕｄｉｏＴａｐｅｒｅｃｏｒｄｅｒ）などの外部記憶装置(補助記憶装置)が利用されている。
【００５６】
音声データ送信部９は、音声蓄積部７から音声データを受信して各修正端末４１に提示するために各修正端末４１へ送信するものである。
【００５７】
音声提示速度可変部１１は、音声認識装置２（図１参照）による音声認識の対象となった音声データの修正端末４１への提示動作について、修正端末動作判定部１９から受信した文字提示タイミング情報および制御情報に基づいて、その再生速度または停止に関する制御命令を修正端末４１へ送信するものである。また、全ての修正端末４１が修正作業に着手した場合において、修正端末４１への音声データの提示の一時停止命令を修正端末動作判定部１９から受信して、全ての修正端末４１（４１Ａ、４１Ｂ、４１Ｃ）へ当該命令を、音声データ送信部９を介して送信するものである。
【００５８】
したがって、音声提示速度可変部１１は、音声認識装置２（図１参照）による音声認識の対象となった音声データを音声認識装置２（図１参照）の音声認識結果であるテキストデータと同時に（テキストデータの修正端末４１（４１Ａ、４１Ｂ、４１Ｃ）への提示タイミングと同期して）各修正端末４１に提示させることもできるし、また、文字提示タイミング情報に同期して修正端末４１への音声データの提示に関して、一時停止させて遅く提示させることや、この一時停止による遅延を回復するための高速提示、繰り返し当該音声データの提示を行うリプレイ動作を行わせることもできるものである。
【００５９】
続いて、図１に戻って修正端末４１について説明する。
修正端末４１（４１Ａ、４１Ｂ、４１Ｃ）は、音声認識誤り修正装置１を使用するための装置であり、ディスプレイ、キーボード、修正文字範囲を指定等するためのタッチパネル、音声データを出力するためのヘッドフォン、音声データの再生について、その高速化または一時停止若しくは再開の切替え信号を図示していない制御部に送出して、音声データの再生動作をコントロールするための足入力インターフェース（フットペダル）等を備え、一般にパーソナルコンピュータが利用されている。
【００６０】
なお、フットペダルは、音声データの再生停止を制御する足踏スイッチとＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）またはＲＳ２３２Ｃ等のインターフェースを備えており、これらのインターフェースを介してパーソナルコンピュータに接続している。このフットペダルは、足踏スイッチ（ペダル）の「踏込」または「放す」操作によって、スイッチＯＮ／ＯＦＦの切替操作を行うもので、記録テープに録音されている記録テープ内容（音声データ）の再生動作の制御をする際に利用されるものである。
【００６１】
また、音声認識誤り修正装置１は、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）またはＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）接続されており、修正端末４１の設置場所に制約されず、任意にシステム設計が行えるものである。よって、遠隔地間での音声認識誤り修正作業も可能である。
【００６２】
また、ネットワークについては、前記したようにＬＡＮ、ＷＡＮなどその形態を問わないが、ネットワークケーブル、無線、赤外線等、その方式も問わない。しかし、通信パケット漏れなどの安全面や高速処理の観点からネットワークケーブルを使用することが好ましい。
さらにまた、音声認識誤り修正装置１によって、修正者（修正端末４１）を複数設定でき、かつ、修正文字範囲の指摘および修正作業を修正者１人単位で行うことができるものである。
【００６３】
（音声認識誤り修正システムの動作）
次に、図６（適宜図２参照）のシーケンシャルチャートを参照しながら音声認識誤り修正システムの動作の一例について説明する。
まず、発話者の発声により生じた音声を音声認識装置２（図１参照）が音声認識を行い、その音声認識の結果であるテキストデータをテキストデータ修正部１３へ送信し、音声認識の対象となった音声データを音声再生部３へ送信する（Ａ１）。
【００６４】
なお、ここでは、発話者が「こんにちは、お昼のニュースです。」と発声し、この音声について、音声認識装置２が誤った音声認識結果として「こんにちは、御ひるのニュースです」のテキストデータを音声認識誤り修正装置１（テキストデータ修正部１３および音声再生部３）へ送信する場合を例に説明する。
【００６５】
次に、当該音声データを受信した音声認識誤り修正装置１のテキストデータ修正部１３は、当該テキストデータ「こんにちは、御ひるのニュースです」を修正端末４１Ａ、修正端末４１Ｂおよび修正端末４１Ｃへ送信すると同時に、音声再生部３へ文字提示タイミング情報および制御情報を送出する（Ｂ１）。これを受けた音声再生部３は、音声認識の対象となった音声データ「こんにちは、お昼のニュースです」を当該文字提示タイミング情報および制御情報に基づいて（例えば、音声認識の結果であるテキストデータの修正端末４１Ａ、修正端末４１Ｂおよび修正端末４１Ｃへの提示と同時に）、修正端末４１Ａ、修正端末４１Ｂおよび修正端末４１Ｃへ送信する（Ｂ２）。
【００６６】
当該テキストデータ「こんにちは、御ひるのニュースです」および当該音声データ「こんにちは、お昼のニュースです」を受信した各修正端末４１（４１Ａ、４１Ｂ、４１Ｃ）は、修正端末４１の修正者によって、音声認識の誤り部分の文字「御ひる」（修正文字範囲）がタッチパネルをタッチすることにより指摘され、キーボードで正確な文字「お昼」への修正作業が行われる。
【００６７】
このとき、例えば、修正端末４１Ａの修正者が最も早く音声認識誤り部分の文字「御ひる」を指摘した場合（Ｃ１）、指摘データ（修正テキストＩＤ、修正端末ＩＤ）が、修正端末４１Ａからテキストデータ修正部１３へ送信され（Ｃ２）、これを受信したテキストデータ修正部１３は、修正端末４１Ｂおよび修正端末４１Ｃへ修正ガード端末情報および修正ガードテキスト情報を送信する（Ｂ３）。そして、これを受信した修正端末４１Ｂおよび修正端末４１Ｃは、修正端末４１Ａの修正作業中は修正作業を行うことができず、修正端末４１Ａのみが優先的に修正作業が可能となる。
【００６８】
なお、修正端末４１Ａの修正者が修正作業を行う際に、音声再生部３から送出された音声データ「こんにちは、お昼のニュースです」を修正者がヘッドフォンで聴取し、修正端末１５Ａに提示されているテキストデータ「こんにちは、御ひるのニュースです」と比較照合することによって修正精度を高め、修正作業が行われる。
【００６９】
そして、修正端末４１Ａは、修正結果である修正テキストデータ「お昼」をテキストデータ修正部１３へ送信し（Ｃ３）、これを受信したテキストデータ修正部１３は、修正テキストデータ「お昼」を１つの文章に統合した修正統合テキストデータ「こんにちは、お昼のニュースです」を作成し、当該修正統合テキストデータが逐次、修正端末４１Ａ、修正端末４１Ｂおよび修正端末４１Ｃへ送信され最新の修正情報が全ての端末（本実施の形態の一例としては修正端末４１Ａ、修正端末４１Ｂおよび修正端末４１Ｃ）に反映され、修正作業が行われる（Ｂ４）。
【００７０】
次に、修正テキストデータの有無が判断され（Ｂ５）、修正テキストデータがある場合（Ｂ５、ＹＥＳ）は、当該修正テキストデータを修正統合テキストデータに統合して当該修正統合テキストデータが正しいテキストデータとして確定される（Ｂ６）。そして、テキストデータ修正部１３から字幕送出装置４３へ当該修正統合テキストデータ「こんにちは、お昼のニュースです」が送信される（Ｂ７）。
【００７１】
修正テキストデータがない場合（Ｂ５、ＮＯ）は、例えば、各修正端末４１へテキストデータを送信後、1分間を経過しているか否かを判断（Ｂ８）し、経過している場合（Ｂ８、ＹＥＳ）はＢ６へすすみ修正統合テキストデータが確定され、経過していない場合（Ｂ８、ＮＯ）はＢ５へ戻る。
なお、このフローチャートに示していないが、音声認識結果であるテキストデータの提示がある間、音声認識誤り修正システムとして、Ａ１からＢ７までのステップが繰り返され、このテキストデータがなくなった時点で音声認識誤り修正システムの動作が終了する。
【００７２】
また、本実施の形態の一例として、Ｂ７で送信された修正統合テキストデータを受信した字幕送出装置４３から当該修正統合テキストデータをテレビジョン４５へ放送波ＥＷ（図１参照）を介して送信され（Ｄ１）、テレビジョン４５によって、「こんにちは、お昼のニュースです」の字幕付きの文字放送として受信される（Ｅ１）場合の一例をこのフローチャートに図示（図６の左下に示す破線より内側部分）しておく。
【００７３】
なお、Ｂ６における修正統合テキストデータの確定にあたっては、テキストデータ修正部１３が、最も遅く修正テキストデータを送信した修正端末４１から修正テキストデータを受信した後に修正テキストデータを統合し、この修正統合テキストデータを作成した時点をもって確定することもできる。
【００７４】
以上、本発明の実施の形態について説明したが、本発明は、前記した実施の形態に限定されることなく、様々な形態で実施可能である。
また、音声認識誤り修正方法と、このような音声認識誤り修正方法を音声認識誤り修正装置１に実現させる音声認識誤り修正プログラムと、音声認識誤り修正プログラムを記録した記録媒体も本発明の対象とするものである。
【００７５】
【発明の効果】
以上説明したように、本発明によれば以下の効果を奏する。
請求項１、７、８に記載の発明によれば、音声認識誤り修正装置に接続されている端末から音声認識誤りを指摘した修正文字範囲を示す指摘データおよび修正した修正テキストデータを受信する。そして、テキストデータ保護手段によって、この指摘データを送信した指摘データ送信端末以外の端末については、当該指摘データ送信端末による修正が完了するまでの間、音声認識誤りを指摘して修正作業をすることができないようにすることができる。
【００７６】
そのため、音声認識誤り部分である修正文字範囲の発見者と修正者とに修正作業を分担することなく、修正者一人単位で修正作業を行うことができる。そのため、修正作業の効率化を図ることができる。また、人員配置や人員確保など人事労務管理が容易となり、教育講習についても修正者単位でできるため、効率的に講習を行うことができる。そして省力化したシステム設計を行うことができる。
また、音声認識誤り部分の文字を最初に指摘した修正者（端末）に対して、優先的に修正作業を可能とすることにより、修正作業の効率化を実現することができる。
【００７７】
請求項２に記載の発明によれば、音声認識誤り修正装置と端末間をネットワークで接続することにより、作業場所について柔軟に対応することが可能となり、簡素化したシステム設計をすることができる。そのため、効率化した音声認識誤り修正システムの構築が実現できる。
【００７８】
請求項３および請求項４に記載の発明によれば、音声認識誤り部分（修正文字範囲）の文字について、その文字属性を変更して各端末に提示することにより、指摘作業または修正作業状況の明確化を図ることができる。そのため、修正作業を効率化することができる。
【００７９】
請求項５に記載の発明によれば、音声認識結果であるテキストデータの表示領域について、端末の使用者からの要望に基づいて、表示幅と背景色との少なくとも一方を設定することができるので、当該テキストデータの表示領域が見易くなる。そのため、表示画面の見易さを向上させることができる。
【００８０】
請求項６に記載の発明によれば、音声認識結果であるテキストデータ中に不要文字を挿入し、当該不要文字を修正者に削除してもらうことによって、音声認識誤りが少なく単調作業の傾向が強い場合においても、修正者の集中力を持続させることができる。そのため修正作業を高精度に維持することができる。
【図面の簡単な説明】
【図１】本発明の一実施形態に係る音声認識誤り修正システムの構成を示す概略図である。
【図２】本発明の一実施形態に係る音声認識誤り修正装置の構成を示すブロック構成図である。
【図３】本発明の一実施形態に係るテキストデータ修正部の誤りテキストデータ保護部の動作の一例を説明するための図である。
【図４】本発明の一実施形態に係るテキストデータ修正部の文字属性変更部の動作の一例を説明するための図である。
【図５】（ａ）本発明の一実施形態に係るテキストデータ修正部の不要文字挿入部の動作の一例を説明するための図である。
（ｂ）本発明の一実施形態に係るテキストデータ修正部の表示領域設定部の動作の一例を説明するための図である。
【図６】本発明の一実施形態に係る音声認識誤り修正装置の動作の一例を説明するためのシーケンシャルチャートである。
【符号の説明】
１音声認識誤り修正装置
２音声認識装置
３音声再生部
５音声受信部
７音声蓄積部
９音声データ送信部
１１音声提示速度可変部
１３テキストデータ修正部
１５文字データ受信部（データ受信手段）
１７修正情報受信部（データ受信手段）
１９修正端末動作判定部
２１文字提示速度可変部（提示手段）
２３文字属性変更部（提示手段）
２３ａ指定文字属性変更部（指摘文字属性変更機能）
２３ｂ修正文字属性変更部（修正文字属性変更機能）
２３ｃ文字品詞属性変更部(文字品詞属性変更機能)
２５文字属性情報送信部（出力手段）
２７表示領域設定部（表示領域設定機能）
２９不要文字挿入部（不要文字挿入機能）
３１提示情報送信部（出力手段）
３３テキストデータ保護部（テキストデータ保護手段）
３３ａ修正ガード端末判定部
３３ｂ修正ガードテキスト判定部
３５文字列統合部
３７画面表示情報送信部（出力手段）
３９字幕出力構成部（出力手段）
４１（４１Ａ〜４１Ｃ）修正端末、指摘データ送信端末（端末）
４３字幕送出装置
４５テレビジョン
ＡＲテキストデータの表示領域
ＡＦ修正文字入力枠[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a speech recognition error correction device, a speech recognition error correction method, and a speech recognition error correction program for correcting a speech recognition error.
[0002]
[Prior art]
Conventional speech recognition error correction devices using speech recognition technology include speech subtitle conversion devices and character data correction devices (see Patent Document 1 and Patent Document 2).
The former is the timing of the presentation of the voice of the speaker used for speech recognition and the presentation of text data as the speech recognition result when correcting the speech recognition error in order to convert the speech recognition result into subtitles in real time. In this way, the efficiency of the speech recognition error correction work is improved.
The latter is a pointing terminal (pointing terminal) that finds and selects a speech recognition error in text data (speech recognition result) output from a text data output device (speech recognition device) in order to convert the speech recognition result into subtitles in real time. Device) and a correction terminal (correction terminal device) that corrects and replaces the voice recognition error, thereby improving the efficiency of correcting the error of the voice recognition result.
[0003]
[Patent Document 1]
JP 2001-14482 (paragraph numbers 0018 to 0043, FIG. 1)
[Patent Document 2]
Japanese Patent Laid-Open No. 2001-60192 (paragraph numbers 0023 to 0043, FIG. 1)
[0004]
[Problems to be solved by the invention]
However, in the conventional speech recognition error correction device, the role of work is divided between a discovery operator (hereinafter referred to as a discoverer) that detects a speech recognition error and a correction operator (hereinafter referred to as a corrector) that corrects the error. Since the error in the speech recognition result is corrected, there are the following problems caused by the work contents of the discoverer and the corrector being different.
[0005]
1. The balance between the number of discoverers and correctors must be taken into account, and inconvenience arises in terms of personnel and labor management such as staffing.
2. It is necessary to train the discoverer and the corrector separately.
3. When correcting errors in speech recognition results using a speech recognition error correction device, the discoverer and the corrector must always work in pairs, so it is necessary to work at the same location at the same time. The work efficiency is not good.
In addition, there are problems such as restrictions on work places.
[0006]
Therefore, in order to solve such problems, the present invention does not consider the balance of the number of discoverers and correctors, eliminates the need for separate education and training, and minimizes work site restrictions. It is an object of the present invention to provide a speech recognition error correction apparatus, a method thereof, and a program thereof that can perform error correction work efficiently.
[0007]
[Means for Solving the Problems]
In order to achieve the above-described object, the present invention has the following configuration.
The invention according to claim 1 receives voice data that is a target of voice recognition output from the voice recognition device and text data that is a voice recognition result, and detects a voice recognition error included in the text data. A speech recognition error correcting apparatus for correcting by a plurality of terminals, the presenting means for presenting the voice data and the text data to the terminal, and the text data presented to the terminal by the presenting means for the terminal Based on the indication data indicating the corrected character range indicated and represented by the above, the data receiving means for receiving the corrected text data obtained by correcting the text data of the corrected character range, and the indication data received by the data receiving means, Until the correction by the indication data transmission terminal which is the terminal that has transmitted the indication data is completed, the correction sentence Text data protection means for protecting the range from indications by terminals other than the indicated data transmission terminal, and output means for correcting the text data based on the corrected text data received by the data receiving means and outputting the corrected text data to each of the terminals And.
[0008]
According to such a configuration, voice data and text data output from the voice recognition device are received and presented to a plurality of correction terminals. Then, the pointed data indicating the corrected character range indicating the voice recognition error and the corrected corrected text data are received from these terminals, and the terminals other than the pointed data transmitting terminal that transmitted the pointed data by the text data protection means The voice recognition error cannot be pointed out until the correction by the data transmission terminal is completed. That is, the indication data transmitting terminal that has pointed out the speech recognition error first performs correction of the speech recognition error with the highest priority, receives the corrected text data transmitted from the indicated data transmission terminal, and corrects the text data. . Then, the latest correction information is sequentially transmitted to each terminal and notified by the output means, and the correction work is continued.
If the corrector who uses the terminal mistakenly points out text data that is a correct speech recognition result as the corrected character range, for example, the corrected character range is selected by a method such as selecting a delete key. The designated operation result can be returned to the state before the operation.
[0009]
According to a second aspect of the present invention, in the voice recognition error correction device according to the first or second aspect, the plurality of terminals are connected to the voice recognition error correction device via a network. To do.
[0010]
According to this configuration, the speech recognition error correction apparatus is operated by each terminal connected via the network. Therefore, it is possible to flexibly cope with the installation location of each terminal, that is, the setting of the correction work place.
[0011]
According to a third aspect of the present invention, in the speech recognition error correcting apparatus according to the first or second aspect, the presenting means is configured to display a display color, a display size, a character size of the character included in the indication data. By changing the attribute of the character including at least one of the types, the indicated character attribute changing function for presenting the change of the character attribute to a terminal other than the indicated data transmitting terminal, and the indicated data transmitting terminal A correction character attribute change function for changing the character attribute of the indication data being corrected to a character attribute different from the character attribute changed by the indication character attribute change function when the data is being corrected, To do.
[0012]
According to such a configuration, the attribute of the character that points out the voice recognition error part of the text data that is the voice recognition result is changed by the pointed character attribute change function, and a terminal other than the pointed data transmission terminal that points out the voice recognition error Also, the text data with the changed character attribute is presented. Then, the modified character attribute change function changes the attribute of the character that is being modified to a character attribute other than the character attribute that was changed when the voice recognition error character was pointed out. However, the text data of the changed character attribute is presented.
[0013]
According to a fourth aspect of the present invention, in the speech recognition error correcting apparatus according to the first or second aspect, the presenting means displays a display color for at least a particle part of speech that is related to the text data that is the speech recognition result. A character part-of-speech attribute changing function for changing an attribute of a character including at least one of a display size and a character type is provided.
[0014]
According to such a configuration, the character attribute change function changes the above-described character attribute for at least a particle among the parts of speech of text data that is a speech recognition result. Can provoke arousal.
[0015]
According to a fifth aspect of the present invention, in the speech recognition error correcting device according to the first or second aspect, the presenting means is a character string related to text data that is the speech recognition result presented to the terminal. A display area setting function capable of arbitrarily setting at least one of the display width and the background color of the display area is provided.
[0016]
According to this configuration, the display area setting function sets (changes) at least one of the display width and the background color for the display area of the text data that is the voice recognition result based on a request from the user of the terminal. For example, the display area of the text data is easy to see and the display screen is easy to see.
[0017]
According to a sixth aspect of the present invention, in the speech recognition error correcting device according to the first or second aspect, the presenting means includes an unnecessary character in text data which is the speech recognition result presented to the terminal. It is provided with an unnecessary character insertion function for inserting.
[0018]
According to such a configuration, the unnecessary character insertion function inserts unnecessary characters into the text data that is the speech recognition result. Therefore, the corrector who uses the terminal needs to delete the unnecessary characters. Even when the speech recognition error rate of the text data to be presented is low, the text data to be corrected is small, and the monotonous work is continued, the concentration of the correction work is maintained.
[0019]
The invention according to claim 7 receives the speech data that is the target of speech recognition output from the speech recognition device and the text data that is the speech recognition result, and the speech recognition error included in the text data is A speech recognition error correction method for correcting by a plurality of terminals, the presenting step of presenting the speech data and the text data to the terminal, and the text data presented to the terminal by the presenting step for the terminal Based on the indication data received in this data reception step, the data reception step for receiving the indication data indicating the correction character range indicated and indicated by the correction data and the correction text data obtained by correcting the text data of the correction character range, The correction by the indication data transmission terminal that is the terminal that has transmitted the indication data is completed. Text data protection step for protecting the corrected character range from indication by a terminal other than the indication data transmission terminal, and correcting the text data based on the correction text data received in the data reception step, to each of the terminals And an output step for outputting.
[0020]
According to such a speech recognition error correction method, first, speech data output from the speech recognition device and text data as a speech recognition result in the presenting step are presented to the terminal. Subsequently, in the data receiving step, the indication data indicating the corrected character range indicating the voice recognition error by the user of the terminal and the corrected text data corrected among the text data presented to the terminal by the above-described presentation step are received. . Then, in the text data protection step, based on the indication data, the text data of the voice recognition error is protected from indication by a terminal other than the indication data transmission terminal until the correction work by the indication data transmission terminal is completed. To do. Based on the corrected text data received in the data receiving step, the text data related to the speech recognition error is corrected. That is, the indication data transmitting terminal that has pointed out the speech recognition error first performs correction of the speech recognition error with the highest priority, receives the corrected text data transmitted from the indicated data transmission terminal, and corrects the text data. . Next, in the output step, the corrected text data is sequentially transmitted to each terminal as the latest correction information, and the correction operation is continued.
[0021]
The invention according to claim 8 receives speech data output from the speech recognition apparatus and text data which is a speech recognition result and receives speech recognition errors included in the text data. A speech recognition error correcting device for correcting by a plurality of terminals, presenting means for presenting the speech data and the text data to the terminal, and text data presented to the terminal by the presenting means being pointed out by the terminal Data receiving means for receiving the indicated data indicating the corrected character range and the corrected text data obtained by correcting the text data in the corrected character range, and transmitting the indicated data based on the indicated data received by the data receiving means Until the correction by the indication data transmission terminal which is the terminal is completed, Text data protection means for protecting from indication by a terminal other than the indication data transmitting terminal, and functioning as output means for correcting the text data based on the corrected text data received by the data receiving means and outputting to each of the terminals It is characterized by that.
[0022]
According to such a speech recognition error correction program, a function as a speech recognition error correction device can be generated and executed in accordance with the processing procedure of this program, so that speech data and text data output from the speech recognition device are received. The correction data received by the user using these terminals indicating the corrected character range indicating the voice recognition error and the corrected correction data are received, and the text data protection means receives the corrected data. Based on this, it is impossible to point out a voice recognition error for a terminal other than the terminal that transmitted the indication data until the correction work by the indication data transmission terminal is completed. As a result, the speech recognition error is corrected with the highest priority by the indication data transmission terminal that has pointed out the speech recognition error first. And based on the correction text data transmitted from the received indication data transmission terminal, the text data is corrected and sequentially transmitted to each terminal as the latest correction information, and the correction work is continued.
[0023]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
(Outline of speech recognition error correction system)
First, an outline of a speech recognition error correction system will be described with reference to FIG.
FIG. 1 is a schematic diagram showing the configuration of a speech recognition error correction system according to an embodiment of the present invention.
[0024]
The speech recognition error correction system according to an embodiment of the present invention transmits text data, which is a result of speech recognition of speech uttered by a speaker by the speech recognition apparatus 2, and speech data to be speech-recognized via a network. The speech recognition error correction device 1 that integrates the corrected text data corrected by the correction terminals 41 and points out the corrected character range from the text data as the speech recognition result. And a correction terminal 41 that corrects the characters to be accurate.
In the present embodiment, the correction terminal 41 corresponds to the terminal described in the claims.
[0025]
In the present embodiment, the speech recognition device 2 is taken as an example of a device that outputs text data to be corrected by the speech recognition error correction device 1. However, if the device outputs text data, the word processor function or It may be a personal computer equipped with a voice recognition function.
[0026]
In the present embodiment, the text data is corrected to corrected text data by the correction terminal 41 (41A, 41B, 41C) connected to the voice recognition error correction apparatus 1, and the corrected text data is corrected to the voice recognition error correction apparatus 1. Is received and the corrected text data on the speech recognition error correction apparatus 1 is corrected and integrated in real time (the operation up to this point is the operation of the speech recognition error correction system).
Then, the corrected integrated text data is transmitted as a character broadcast to the television 45 or a personal computer equipped with a function as the television 45 via the broadcast communication network (broadcast wave EW) using the caption transmission device 43. A case of broadcasting will be described as an example.
[0027]
Hereinafter, in this embodiment, indication data (details will be described later) indicating the speech recognition error included in the text data transmitted from the speech recognition error correction device 1 is transmitted to the speech recognition error correction device 1. The correction terminal 41 is referred to as an indication data transmission terminal 41.
[0028]
(Configuration of voice recognition error correction device)
Next, the configuration of the speech recognition error correction apparatus 1 will be described with reference to FIGS.
FIG. 2 is a block configuration diagram showing the configuration of the speech recognition error correction apparatus 1 according to one embodiment of the present invention.
As shown in FIG. 1, the speech recognition error correction device 1 includes a text data correction unit 13 and a voice reproduction unit 3, and is connected to correction terminals 41 (41A, 41B, 41C) via a network.
[0029]
As shown in FIG. 2, the text data correction unit 13 includes a character data reception unit 15, a correction information reception unit 17, a correction terminal operation determination unit 19, a character presentation speed variable unit 21, a character attribute change unit 23, and character attribute information transmission. 25, a display area setting unit 27, an unnecessary character insertion unit 29, a presentation information transmission unit 31, a text data protection unit 33, a character string integration unit 35, a screen display information transmission unit 37, and a caption output configuration unit 39. .
[0030]
In this embodiment, the character data receiving unit 15 and the correction information receiving unit 17 correspond to the data receiving means described in the claims, and the character presentation speed variable unit 21 and the character attribute changing unit 23 are claimed. The text data protection unit 33 corresponds to the text data protection unit described in the claims, and the character attribute information transmission unit 25 and the presentation information transmission unit 31 are described in the claims. The display area setting section 27 corresponds to the display area setting function described in the claims, and the unnecessary character insertion section 29 corresponds to the unnecessary character insertion function described in the claims. is there.
[0031]
The character data receiving unit 15 receives text data that is a voice recognition result of the voice recognition device 2 (see FIG. 1).
The correction information receiving unit 17 receives the corrected text data obtained by correcting the correction text ID (Identification), the correction terminal ID, the control information, and the voice recognition error of the voice recognition device 2 from the correction terminal 15 into correct text data by the correction terminal 41. To receive.
[0032]
The “corrected text ID” is an identification symbol (pointed data, which will be described later) for identifying a corrected character range that points out an error part of text data that is a speech recognition result when the correcting terminal 41 enters a correction operation. That's it. The “correction terminal ID” is an identification symbol for identifying the correction terminal 41 that points out the correction character range that is an error part of the text data that is the voice recognition result when the correction terminal 15 enters the correction work. . The “control information” is received from the correction terminal 41 by the correction information receiving unit 17, information regarding the presentation operation of the text data to the correction terminal 41, information serving as a reference for a control command relating to the reproduction speed or stop, and the correction terminal 41 This is information serving as a reference for a control command relating to the reproduction speed or stop of the voice data presenting operation.
[0033]
The correction terminal operation determination unit 19 sends the correction terminal ID and the control information received from the correction information reception unit 17 to the character presentation speed variable unit 21 (details will be described later), and presents the voice data to the correction terminal 41. It has a function of sending character presentation timing information and control information as a timing reference to the voice presentation speed variable unit 11 (details will be described later). Further, when the presentation of text data or voice data to the correction terminal 41 is paused, a delay in the pause is determined, and a high-speed presentation command to the correction terminal 41 for text data or voice data is issued. . Further, when all the correction terminals 41 have entered the correction work, the text presentation speed variable unit 21 (described later) and the voice presentation speed variable unit 11 (described later) are instructed to pause the presentation of text data and voice data to the correction terminals 41. To send to).
[0034]
“Character presentation timing information” refers to reference information that is the timing at which voice data for supporting the work of correcting the text data in the corrected character range to correct characters is presented to the correction terminal 41. The voice presentation speed variable unit 11 (described later) controls the timing of presentation of voice data to the correction terminal 41 based on the character presentation timing information.
[0035]
Based on the corrected terminal ID and the control information received from the corrected terminal operation determination unit 19, the character presentation speed variable unit 21 outputs a control command related to the reproduction speed or stop of the text data presenting operation to the corrected terminal 41. This is transmitted to the correction terminal 41 related to the ID. In addition, when all the correction terminals 41 have started the correction work, the correction terminal 41 receives the instruction to pause the presentation of the text data to the correction terminal 41 from the correction terminal operation determination unit 19 and receives all the correction terminals 41 (41A, 41B). , 41C).
[0036]
The character attribute changing unit 23 includes an indication character attribute changing unit 23a, a modified character attribute changing unit 23b, and a character part-of-speech attribute changing unit 23c.
In the present embodiment, the indicated character attribute changing unit 23a corresponds to the indicated character attribute changing function described in the claims, and the modified character attribute changing unit 23b is the corrected character attribute changing function described in the claims. The character part-of-speech attribute changing unit 23c corresponds to the character part-of-speech attribute changing function described in the claims.
[0037]
The indicated character attribute changing unit 23a, when the corrected terminal 41 points out the corrected character range that is the text data in which the voice recognition error is pointed out, is transmitted from the corrected terminal 41 to the corrected information receiving unit 17 as the corrected character range related to the corrected text ID. For the corrected text data received by the correction information receiving unit 17 from the correction terminal 41, the character attribute including at least one of the display color, the display size, and the character type is changed. Moreover, by presenting the corrected text data in which the character attribute of the corrected character range is changed to the correction terminal 41 other than the correction terminal 41 (pointed data transmission terminal 41) that has transmitted the indication data (correction text ID), It is possible to check the indication status related to the indication data from a correction terminal other than the correction terminal 41 (indication data transmission terminal 41).
[0038]
The “pointed data” refers to the corrected character range pointed out by the pointed data transmission terminal 41 or data indicating the pointed data transmission terminal 41, and specifically, the above-described corrected text ID or corrected terminal ID. The indication data transmitting terminal 41 that has transmitted the corrected text ID transmits the corrected terminal ID to the correction information receiving unit 17.
In addition, for changing the character attribute, for example, the font is changed to italic, bold, shadowed, three-dimensional characters, bag characters, etc., the characters are shaded, decorated with halftone dots, etc. It includes character modifications such as underline and character box.
[0039]
The corrected character attribute changing unit 23b changes the character whose correction character range is being corrected to a character attribute other than the character attribute in which the character attribute of the corrected character range is changed by the indicated character attribute changing unit 23a. In addition, the character attribute of the character being corrected in the correction character range is changed to the correction terminal 41 other than the terminal 41 that is correcting the correction character range (the indication data transmission terminal 41 that is correcting the text data in the correction character range). By presenting the corrected text data, the correction status relating to the indicated data can be confirmed from the correction terminal 41 other than the terminal 41 being corrected.
Since the change of the character attribute is the same as that described in the pointed character attribute change unit 23a, the description thereof is omitted.
[0040]
The character part-of-speech attribute changing unit 23c, as shown in FIG. 4, displays at least the part-of-speech of the character string in the text data sent from the character data receiving unit 15, as shown in FIG. The text data including at least one of the types is changed to change the corrected text data received by the correction information receiving unit 17 from the correction terminal 41 (pointed data transmission terminal 41). In particular, particles have a high frequency of occurrence, and it is very common to overlook the discovery of speech recognition errors by the corrector.Therefore, the character attributes of the particles are changed, and the corrector is particularly alerted about the discovery of speech recognition errors in the particle. Is what you do. Changing the character attributes that increase the particle display of particles is particularly effective when using a touch panel as a pointing device to increase the accuracy of finding speech recognition errors in particles and to make it easier to point out (select) on the touch panel. It is.
[0041]
Note that a control unit (not shown) can initially set at least a particle part of speech in the text data so that the text data is transmitted to the correction terminal 41 after changing the character attribute in advance.
Since the change of the character attribute is the same as that described in the pointed character attribute change unit 23a, the description thereof is omitted.
[0042]
Returning to FIG. 2, the description will be continued.
The character attribute information transmitting unit 25 sends the character attribute information, which is information of the character attribute changed by the indicated character attribute changing unit 23a, the corrected character attribute changing unit 23b, and the character part-of-speech attribute changing unit 23c, to each correcting terminal 41. To be sent.
[0043]
In order to improve the visibility of the display screen, the display area setting unit 27 displays the display area AR of text data that is the voice recognition result presented to the correction terminal 41 as shown in FIG. At least one of the width and the background color can be arbitrarily set.
[0044]
Returning to FIG. 2 again, the description will be continued.
The unnecessary character insertion unit 29 inserts unnecessary characters in the text data that is the voice recognition result presented to the correction terminal 41. FIG. 5B shows an example in which an asterisk mark “*” is inserted in the text data as an unnecessary character. When the voice recognition rate is high and there are few voice recognition errors, the correction work is a monotonous work for finding the voice recognition error. As a result, the concentration of the corrector on the correction work is reduced and the accuracy of the correction work is reduced. Therefore, in order to prevent this, concentration is maintained by inserting unnecessary characters into the text data that is the speech recognition result and having the corrector delete the unnecessary characters.
[0045]
Returning to FIG. 2 again, the description will be continued.
The presentation information transmission unit 31 transmits display area setting information by the display area setting unit 27 and unnecessary character insertion information by the unnecessary character insertion unit 29 to the correction terminal 41.
The text data protection unit 33 includes a modified guard terminal determining unit 33a and a modified guard text determining unit 33b.
[0046]
Based on the indication data (corrected text ID, corrected terminal ID) received by the correction information receiving unit 17, the text data protection unit 33 converts the text data of the voice recognition error to other than the indicated data transmission terminal 41 that transmitted the indicated data. From the indication by the correction terminal 41 until the correction work by the indication data transmission terminal 41 is completed, it is protected.
[0047]
An example of how text data protection by the text data protection unit 33 is realized will be described with reference to FIG. In FIG. 3, text data of the “terrorist attacks at that time”, which is text data of the speech recognition result, is presented to the correction terminal 41A, the correction terminal 41B, and the correction terminal 41C. In this case, the portion (corrected character range) of the text data “at that time” corresponds to a speech recognition error, but when the corrector of the correction terminal 41A finds and points out the speech recognition error earliest, another correction terminal 41 correctors, that is, the corrector of the corrective terminal 41B and the corrector of the corrective terminal 41C are guarded so that they cannot point out the same part ("at that time") as the part of the correct character range "at that time" The
[0048]
At this time, the corrector of the correction terminal 41B and the corrector of the correction terminal 41C can point out portions other than the same part. Moreover, even if it is the same location, after the corrector of correction terminal 41A complete | finishes the said correction work, the corrector of correction terminal 41B and the corrector of correction terminal 41C will be the location ("the time") which was the said same location. ) Can be pointed out and corrected.
[0049]
Next, returning to FIG. 2 again, components of the text data protection unit 33 will be described.
The correction guard terminal determination unit 33a corrects the correction terminal ID sent from the correction information receiving unit 17 (the correction character range that is the error part of the text data that is the voice recognition result when the correction terminal 41 enters the correction work) Receiving an identification symbol for identifying the terminal 41, identifying the correction terminal 41 (correction terminal 41A) that points out the correction character range ("then"), and correcting the identified information as correction guard terminal information This is transmitted to the terminal 41 (the correction terminal 41B and the correction terminal 41C).
[0050]
The corrected guard text determination unit 33b identifies the corrected text ID sent from the correction information receiving unit 17 (the corrected character range indicating the erroneous portion of the text data that is the speech recognition result when the correction terminal 15 enters the correction operation). For the correction terminal 41 (correction terminal 41A), the correction character range ("then") pointed out by the correction terminal 41 (correction terminal 41A) is identified, and the identified information is used as the correction guard text information. Terminal 41B and correction terminal 41C).
[0051]
Next, the character string integration unit 35 will be described.
The character string integration unit 35 integrates the corrected text data received by the correction information receiving unit 17 by the correction terminals 41 into one unified sentence based on the correction text ID. The corrected text integrated data integrated by the character string integration unit 35 is sequentially fed back to each correction terminal 41, and the latest correction information (corrected integrated text data) is always reflected. The modified integrated text data is used as data for subtitle broadcasting by the subtitle transmission device 43 (see FIG. 1).
[0052]
The screen display information transmitting unit 37 sends the corrected integrated text data in which the corrected text data is integrated into one sentence based on the corrected text ID by the character string integrating unit 35 to each correcting terminal 41 (41A, 41B, 41C). To be sent.
[0053]
The subtitle output configuration unit 39 uses the corrected integrated text data as subtitles in order to use the corrected integrated text data in which the corrected text data is integrated into one sentence based on the corrected text ID by the character string integration unit 35. The output data is transmitted to the caption transmission device 43 (see FIG. 1).
[0054]
Next, the audio reproduction unit 3 will be described.
The voice reproduction unit 3 includes a voice reception unit 5, a voice storage unit 7, a voice data transmission unit 9, and a voice presentation speed variable unit 11.
The voice reproduction unit 3 reproduces voice data that is a target of voice recognition.
The voice data is transmitted to the correction terminal 41 via the network in synchronization with the character presentation timing information, and the voice output from the headphones connected to the correction terminal 41 and the text data as the voice recognition result are corrected. The corrector of the terminal 41 (41A, 41B, 41C) supports the correction work of the corrector by comparing and collating.
[0055]
The voice receiving unit 5 receives voice data that is a target of voice recognition from the voice recognition device 2 (see FIG. 1).
The voice accumulating unit 7 accumulates the voice data received by the voice receiving unit 5 as a target of voice recognition temporarily or for long-term storage for presentation to each correction terminal 41.
In general, a main storage device (main memory) using a semiconductor memory is used for temporary storage, and an external storage device (auxiliary storage device) such as a hard disk, a flexible disk, or a DAT (Digital Audio Tape Recorder) for long-term storage. Is being used.
[0056]
The voice data transmission unit 9 receives voice data from the voice storage unit 7 and transmits the voice data to each correction terminal 41 for presentation to each correction terminal 41.
[0057]
The voice presentation speed variable unit 11 receives the character presentation timing information received from the corrected terminal operation determination unit 19 regarding the presentation operation to the correction terminal 41 of the voice data targeted for voice recognition by the voice recognition device 2 (see FIG. 1). Based on the control information, a control command related to the reproduction speed or stop is transmitted to the correction terminal 41. When all the correction terminals 41 have started the correction work, the instruction to pause the presentation of the audio data to the correction terminals 41 is received from the correction terminal operation determination unit 19 and all the correction terminals 41 (41A, 41B) are received. , 41C), the command is transmitted via the voice data transmitting unit 9.
[0058]
Therefore, the voice presentation speed variable unit 11 simultaneously converts the voice data subjected to voice recognition by the voice recognition device 2 (see FIG. 1) with the text data that is the voice recognition result of the voice recognition device 2 (see FIG. 1) ( The text data can be presented to each correction terminal 41 (in synchronization with the presentation timing to the correction terminal 41 (41A, 41B, 41C)), and the voice to the correction terminal 41 is synchronized with the character presentation timing information. With regard to the presentation of data, it is possible to perform a replay operation in which the data is paused and presented slowly, the high-speed presentation for recovering the delay caused by the suspension, and the voice data are repeatedly presented.
[0059]
Next, returning to FIG. 1, the correction terminal 41 will be described.
The correction terminal 41 (41A, 41B, 41C) is a device for using the voice recognition error correction device 1, and includes a display, a keyboard, a touch panel for specifying a correction character range, and headphones for outputting voice data. For audio data playback, a foot input interface (foot pedal) or the like is provided for controlling the playback operation of audio data by sending a switching signal for speeding up or pausing or resuming to a control unit (not shown). Generally, personal computers are used.
[0060]
The foot pedal includes a foot switch for controlling the stop of reproduction of audio data and an interface such as USB (Universal Serial Bus) or RS232C, and is connected to a personal computer via these interfaces. This foot pedal is a switch ON / OFF switching operation by "depressing" or "releasing" a foot switch (pedal). Playback of recorded tape contents (audio data) recorded on the recording tape It is used when controlling the operation.
[0061]
The voice recognition error correction apparatus 1 is connected to a LAN (Local Area Network) or a WAN (Wide Area Network), and is not limited by the installation location of the correction terminal 41, and can arbitrarily design a system. Therefore, a voice recognition error correction work between remote locations is also possible.
[0062]
As described above, the network is not limited to any form such as LAN or WAN, but may be any system such as a network cable, wireless or infrared. However, it is preferable to use a network cable from the viewpoint of safety such as communication packet leakage and high-speed processing.
Furthermore, a plurality of correctors (correction terminals 41) can be set by the speech recognition error correcting device 1, and the correction character range can be pointed out and corrected by one corrector.
[0063]
(Operation of speech recognition error correction system)
Next, an example of the operation of the speech recognition error correction system will be described with reference to the sequential chart of FIG. 6 (see FIG. 2 as appropriate).
First, the voice recognition device 2 (see FIG. 1) performs voice recognition on the voice generated by the utterance of the speaker, and transmits the text data as a result of the voice recognition to the text data correction unit 13. The audio data thus obtained is transmitted to the audio reproducing unit 3 (A1).
[0064]
Here, the speaker is "Hello, this is lunch of news." Said Say, for this speech, voice text data of "Hello, this is news of your Hill" as the speech recognition result of the voice recognition device 2 wrong An example of transmission to the recognition error correction device 1 (text data correction unit 13 and voice reproduction unit 3) will be described.
[0065]
Then, text data correction unit 13 of the speech recognition error correction apparatus 1 which has received the voice data, the text data "Hello, news is of your Hill" modify the terminal 41A, and transmits it to the correction terminal 41B and the correction terminal 41C At the same time, the character presentation timing information and control information are sent to the voice reproduction unit 3 (B1). Audio playback unit 3, which has received this, the text data is the result of the voice data "Hello, noon news," which became the object of speech recognition based on the on the character presentation timing information and control information (for example, voice recognition Are transmitted to the correction terminal 41A, the correction terminal 41B, and the correction terminal 41C (at the same time as the presentation to the correction terminal 41A, the correction terminal 41B, and the correction terminal 41C) (B2).
[0066]
The text data "Hello, your Hill is of news" and the voice data "Hello, noon news" each modification has received the terminal 41 (41A, 41B, 41C) is, by the corrector of correction terminal 41, speech recognition The character “Mihiru” (corrected character range) is pointed out by touching the touch panel, and the correct character “Lunch” is corrected using the keyboard.
[0067]
At this time, for example, when the corrector of the correction terminal 41A points out the character “Mihiru” of the voice recognition error part earliest (C1), the indication data (correction text ID, correction terminal ID) is sent from the correction terminal 41A to the text. The text data correction unit 13 transmitted to the data correction unit 13 (C2) and receives it transmits the corrected guard terminal information and the corrected guard text information to the correction terminal 41B and the correction terminal 41C (B3). The correction terminal 41B and the correction terminal 41C that have received this cannot perform the correction work during the correction work of the correction terminal 41A, and only the correction terminal 41A can perform the correction work preferentially.
[0068]
It should be noted that, when the corrector of correction terminal 41A carries out the modification work, the audio data "Hello, lunch of news" that has been sent from the audio playback unit 3 Modify the person is listening with headphones, are presented in the editing terminals 15A text data "Hello, your Hill is of news" that are increasing the correction accuracy by comparing against the, modification work is carried out.
[0069]
Then, the correction terminal 41A transmits the correction text data “noon” as the correction result to the text data correction unit 13 (C3), and the text data correction unit 13 that has received the correction text data “noon” receives one correction text data “noon”. modify integration was integrated into the sentence text data "Hello, lunch of news" to create the, the amendment integrated text data is sequentially, the editing terminals 41A, is transmitted to the correction terminal 41B and the correction terminal 41C latest fixes information is all terminals (As an example of the present embodiment, the correction terminal 41A, the correction terminal 41B, and the correction terminal 41C) are reflected, and the correction work is performed (B4).
[0070]
Next, the presence / absence of the corrected text data is determined (B5). If there is corrected text data (B5, YES), the corrected text data is integrated into the corrected integrated text data, and the corrected integrated text data is correct text data. (B6). Then, text data such amendment integrated text data "Hello, this is lunch of news" from the correction unit 13 to the subtitle delivery device 43 is transmitted (B7).
[0071]
If there is no corrected text data (B5, NO), for example, it is determined whether or not one minute has elapsed after transmitting the text data to each corrected terminal 41 (B8), and if it has elapsed (B8, If YES), the process proceeds to B6 and the corrected integrated text data is confirmed. If it has not elapsed (B8, NO), the process returns to B5.
Although not shown in this flowchart, the steps A1 to B7 are repeated as a speech recognition error correction system while the text data that is the speech recognition result is presented, and speech recognition is performed when the text data disappears. The operation of the error correction system ends.
[0072]
Also, as an example of the present embodiment, the modified integrated text data is transmitted from the caption sending device 43 that has received the modified integrated text data transmitted in B7 to the television 45 via the broadcast wave EW (see FIG. 1). (D1), the television 45, "Hello, noon news" is received as teletext with subtitles (E1) shown in an example of the flowchart of case (inner portion from the broken line shown in the lower left of FIG. 6) Keep it.
[0073]
In addition, in determining the corrected integrated text data in B6, the text data correcting unit 13 integrates the corrected text data after receiving the corrected text data from the correcting terminal 41 that has transmitted the corrected text data latest, and this corrected integrated text. It can also be determined when the data is created.
[0074]
As mentioned above, although embodiment of this invention was described, this invention is not limited to above-described embodiment, It can implement with a various form.
Further, a speech recognition error correcting method, a speech recognition error correcting program for causing the speech recognition error correcting apparatus 1 to implement such a speech recognition error correcting method, and a recording medium on which the speech recognition error correcting program is recorded are also objects of the present invention. To do.
[0075]
【The invention's effect】
As described above, the present invention has the following effects.
According to the first, seventh, and eighth aspects of the present invention, the indication data indicating the corrected character range indicating the voice recognition error and the corrected text data are received from the terminal connected to the voice recognition error correcting device. And, for the terminals other than the indicated data transmitting terminal that transmitted the indicated data by the text data protection means, correct the voice recognition error until the correction by the indicated data transmitting terminal is completed. Can not be.
[0076]
Therefore, it is possible to perform the correction work for each corrector without sharing the correction work between the discoverer and the corrector of the corrected character range that is the voice recognition error part. Therefore, the efficiency of the correction work can be improved. In addition, personnel and labor management such as personnel assignment and securing of personnel is facilitated, and education classes can be performed in units of correctors, so that classes can be conducted efficiently. And it is possible to design a system that saves labor.
In addition, it is possible to improve the efficiency of the correction work by preferentially enabling the correction work (terminal) that has pointed out the character of the voice recognition error part first.
[0077]
According to the second aspect of the present invention, it is possible to flexibly cope with the work place by connecting the speech recognition error correction apparatus and the terminal via a network, and a simplified system design can be achieved. Therefore, it is possible to construct an efficient speech recognition error correction system.
[0078]
According to the invention described in claim 3 and claim 4, regarding the character of the voice recognition error portion (corrected character range), by changing the character attribute and presenting it to each terminal, the indication work or the correction work situation Clarification can be achieved. Therefore, the correction work can be made efficient.
[0079]
According to the fifth aspect of the present invention, since at least one of the display width and the background color can be set based on the request from the user of the terminal for the display area of the text data that is the voice recognition result. The display area of the text data becomes easy to see. Therefore, it is possible to improve the visibility of the display screen.
[0080]
According to the sixth aspect of the present invention, by inserting unnecessary characters into the text data that is the result of speech recognition and having the corrector delete the unnecessary characters, there is less tendency for speech recognition errors, and there is a tendency for monotonous work. Even when it is strong, the corrector's concentration can be maintained. Therefore, the correction work can be maintained with high accuracy.
[Brief description of the drawings]
FIG. 1 is a schematic diagram showing the configuration of a speech recognition error correction system according to an embodiment of the present invention.
FIG. 2 is a block configuration diagram showing a configuration of a speech recognition error correction apparatus according to an embodiment of the present invention.
FIG. 3 is a diagram for explaining an example of an operation of an error text data protection unit of a text data correction unit according to an embodiment of the present invention.
FIG. 4 is a diagram for explaining an example of an operation of a character attribute changing unit of a text data correcting unit according to an embodiment of the present invention.
FIG. 5A is a diagram for explaining an example of an operation of an unnecessary character insertion unit of a text data correction unit according to an embodiment of the present invention.
(B) It is a figure for demonstrating an example of operation | movement of the display area setting part of the text data correction part which concerns on one Embodiment of this invention.
FIG. 6 is a sequential chart for explaining an example of the operation of the speech recognition error correction apparatus according to the embodiment of the present invention.
[Explanation of symbols]
1 Voice recognition error correction device
2 Voice recognition device
3 Audio playback unit
5 Audio receiver
7 Sound storage unit
9 Audio data transmitter
11 Voice presentation speed variable part
13 Text data correction part
15 Character data receiver (data receiver)
17 Correction information receiving unit (data receiving means)
19 Modified terminal operation determination unit
21 Character presentation speed variable part (presentation means)
23 Character attribute change part (presentation means)
23a Specified character attribute change part (pointed character attribute change function)
23b Modified character attribute changing section (modified character attribute changing function)
23c Character part-of-speech attribute change section (character part-of-speech attribute change function)
25 Character attribute information transmitter (output means)
27 Display area setting section (Display area setting function)
29 Unnecessary character insertion part (unnecessary character insertion function)
31 Presentation information transmission unit (output means)
33 Text data protection section (text data protection means)
33a Modified guard terminal determination unit
33b Modified guard text determination unit
35 Character string integration part
37 Screen display information transmitter (output means)
39 Subtitle output component (output means)
41 (41A-41C) Correction terminal, indication data transmission terminal (terminal)
43 Subtitle sending device
45 Television
AR text data display area
AF correction character input frame

Claims

A speech recognition error in which speech data output from a speech recognition apparatus and text data that is a speech recognition result are received and speech recognition errors included in the text data are corrected by a plurality of terminals A correction device,
Presenting means for presenting the voice data and the text data to the terminal;
Data reception for receiving pointed-out data indicating a corrected character range pointed out and represented by the terminal and corrected text data obtained by correcting the text data in the corrected character range with respect to the text data presented to the terminal by the presenting means Means,
Based on the indication data received by the data receiving means, the correction character range is indicated by a terminal other than the indication data transmission terminal until correction by the indication data transmission terminal which is the terminal that has transmitted the indication data is completed. Text data protection means to protect against,
Output means for correcting the text data based on the corrected text data received by the data receiving means and outputting the corrected text data to each of the terminals;
A speech recognition error correction apparatus comprising:

The speech recognition error correction apparatus according to claim 1, wherein the plurality of terminals are connected via a network.

The presenting means changes a character attribute including at least one of a display color, a display size, and a character type for a character in the corrected character range, thereby enabling a terminal other than the indication data transmission terminal Also pointed character attribute change function that presents the change of the character attribute,
When the corrected character range is being corrected by the pointed data transmission terminal, the corrected character attribute having a character attribute different from the character attribute changed by the pointed character attribute changing function as the character attribute of the corrected character range being corrected Change function,
The speech recognition error correction apparatus according to claim 1, comprising:

The presenting means changes a character part-of-speech attribute change that changes an attribute of a character including at least one of a display color, a display size, and a character type with respect to at least a part-of-speech related to text data that is the voice recognition result The speech recognition error correction apparatus according to claim 1, further comprising a function.

The display means includes a display area setting function capable of arbitrarily setting at least one of a display width and a background color of a display area of text data that is the voice recognition result presented to the terminal. The speech recognition error correction device according to claim 1 or 2.

The speech recognition according to claim 1, wherein the presenting unit includes an unnecessary character insertion function for inserting an unnecessary character into text data which is the speech recognition result presented to the terminal. Error correction device.

A speech recognition error in which speech data output from a speech recognition apparatus and text data that is a speech recognition result are received and speech recognition errors included in the text data are corrected by a plurality of terminals A correction method,
A presentation step of presenting the voice data and the text data to the terminal;
Data reception for receiving pointed data indicating a corrected character range pointed out and represented by the terminal and corrected text data obtained by correcting the text data in the corrected character range with respect to the text data presented to the terminal by the presenting step Steps,
Based on the indication data received in the data reception step, the correction character range is indicated by a terminal other than the indication data transmission terminal until correction by the indication data transmission terminal that is the terminal that has transmitted the indication data is completed. Text data protection steps to protect against,
Correcting the text data based on the corrected text data received in the data receiving step, and outputting to each of the terminals; and
A speech recognition error correction method comprising:

A speech recognition error in which speech data output from a speech recognition apparatus and text data that is a speech recognition result are received and speech recognition errors included in the text data are corrected by a plurality of terminals Correction device,
Presenting means for presenting the voice data and the text data to the terminal;
Data reception for receiving pointed-out data indicating a corrected character range pointed out and represented by the terminal and corrected text data obtained by correcting the text data in the corrected character range with respect to the text data presented to the terminal by the presenting means means,
Based on the indication data received by the data receiving means, the correction character range is indicated by a terminal other than the indication data transmission terminal until correction by the indication data transmission terminal which is the terminal that has transmitted the indication data is completed. Text data protection means to protect from,
Output means for correcting the text data on the basis of the corrected text data received by the data receiving means and outputting to each of the terminals;
A speech recognition error correction program characterized in that it functions as