JP3682922B2

JP3682922B2 - Real-time character correction device and real-time character correction program

Info

Publication number: JP3682922B2
Application number: JP2002121712A
Authority: JP
Inventors: 章中村; 正人河口
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2002-04-24
Filing date: 2002-04-24
Publication date: 2005-08-17
Anticipated expiration: 2022-04-24
Also published as: JP2003316384A

Description

【０００１】
【発明の属する技術分野】
本発明は、誤字脱字の生じた箇所をリアルタイムで修正するリアルタイム文字修正装置およびリアルタイム文字修正プログラムに関する。
【０００２】
【従来の技術】
従来、テレビ番組などの番組内で出演者が話した音声を字幕により表示することが行われている。この音声情報から文字情報に変換して表示する場合において、テレビ番組が録画放送である場合は、音声情報から変換ソフトを用いてテキストデータに変換し、その変換されたテキストデータの誤りを修正するために通常、ワードプロセッサやエディタ等のアプリケーションソフトウェアが用いられている。そして、テキストの誤りを修正する手順として、マウス等のポインティングデバイス（タッチパネルの場合は、指示ペンあるは手指）を用いて誤って認識された箇所を指定し、正しい単語を入力した後、誤認識された箇所を削除することが行われる。
【０００３】
一方、テレビ番組や講演などの内容に関してリアルタイムで字幕を制作する場合、そのテレビ番組や講演などの内容の音声をもとにコンピュータによって音声認識手段を介してテキスト化する方法が知られている。
【０００４】
このとき、誤字脱字を一字一句チェックしながら修正を行うと、そのテレビ番組や講演などの進行から大幅に遅れ、テンポの速い番組や講演等では、内容の把握が困難になる場合がある。また、リアルタイム性を重んじた場合、誤字脱字が生じても修正を施すことなく、そのままの状態で放置することも多々あった。そのため、従来、文字データ修正装置が提案されている。
【０００５】
この文字データ修正装置は、音声入力された音声データから文字データに変換して、その変換された文字データの誤り箇所を指摘して、その指摘された文字データの誤り箇所を修正するものであり、音声認識装置等のテキストデータ出力装置から出力されるテキストデータから間違いを見つけて選択するポイント用端末装置と、誤っている単語を修正する修正用端末装置とを備え、役割を分担することで修正時間を短縮するものである。
【０００６】
【発明が解決しようとする課題】
しかし、従来の文字データ修正装置では、ポイント用端末装置と修正用端末装置とより役割を分担して修正速度を上げているが、さらなる改良する余地が存在した。
すなわち、文字データ修正装置では、修正用端末装置により訂正する箇所の入力をキーボードからの入力に頼っているため、修正に対する時間がかかり、さらに時間の短縮が可能となる修正をリアルタイムで行うことが望まれていた。
【０００７】
本発明は前記事情に鑑みてなされたものであり、誤字脱字が認められる箇所である指定箇所あるいは誤字脱字が認められる箇所を含む句読点間で挟まれたセンテンス単位で、削除し、音声認識手段により再度再現することにより、リアルタイム性を損なうことなく修正可能な、リアルタイム文字修正装置、およびリアルタイム文字修正プログラムを提供することを目的とする。
【０００８】
【課題を解決するための手段】
本発明は、前記目的を達成するために、以下のように構成した。すなわち、発話者によって入力される音声を、音声認識手段によってテキストに変換して字幕表示するために、モニタ画面に画面生成するためのテキスト画面生成表示手段を用いて、誤字脱字の生じた箇所を、音声入力中にリアルタイムで修正するリアルタイム文字修正装置であって、
変換された前記テキストの誤字脱字がある箇所についてオペレータによる指定に基づき、前記誤字脱字を生じている指定箇所を検出する誤字脱字位置検出手段と、この誤字脱字位置検出手段で検出された誤字脱字があるテキストの箇所に基づき、前記誤字脱字を生じている箇所を含む文頭から読点あるいは句点まで、または、読点あるいは句点から文末、または、読点あるいは句点で挟まれたセンテンスを抽出する句読点間センテンス抽出手段と、この句読点間センテンス抽出手段により抽出した前記センテンスを削除する誤センテンス削除手段と、この誤センテンス削除手段により削除された前記センテンスに対する発話者の音声入力を促がし、そのセンテンスの音声入力を得て音声認識を行いその結果を、削除した前記センテンスの位置に反映させる再認識結果反映手段と、前記誤センテンス削除手段および前記再認識結果反映手段から送信される切替信号により前記音声認識手段からのルートを前記再認識結果反映手段または前記テキスト画面生成表示手段に切り替える切替部と、を備えるリアルタイム文字修正装置とした。
【０００９】
請求項１に記載の発明によれば、リアルタイム文字修正装置は、音声認識されたテキストの誤字脱字がある箇所の指定に基づき、その誤字脱字を生じている箇所を含む読点あるいは句点を基準とするセンテンスを抽出して削除し、その削除されたセンテンスの音声入力を促がすことにより再度音声認識された結果を反映させてリアルタイム修正する。なお、リアルタイム文字修正装置は、切替手段により、前記誤センテンス削除手段および前記再認識結果反映手段からの切替信号により前記音声認識手段からのルートを前記再認識結果反映手段または前記テキスト画面生成表示手段に切り替えている。また、読点あるいは句点で挟まれたセンテンスとは、句点と句点、句点と読点、読点と句点、読点と読点で挟まれたセンテンスである。さらに、変換したテキストのセンテンスに発生する誤字脱字を指定する場合は、発話者本人あるいは誤字脱字をチェックするチェッカー（オペレータ）がモニタ画面により行なっている。
【００１０】
また、発話者によって入力される音声を、音声認識手段によってテキストに変換して字幕表示するために、モニタ画面に画面生成するためのテキスト画面生成表示手段を用いて、誤字脱字の生じた箇所を、音声入力中にリアルタイムで修正するために、コンピュータを、以下の各手段として機能させるリアルタイム文字修正プログラムとした。すなわち、前記各手段とは、誤字脱字位置検出手段、句読点間センテンス抽出手段、誤センテンス削除手段、再認識結果反映手段、切替手段、である。
【００１１】
このように構成したことにより、リアルタイム文字修正プログラムは、句読点間センテンス抽出手段により、変換された前記テキストの誤字脱字がある箇所の指定に基づき、前記誤字脱字を生じている箇所を含む文頭から読点あるいは句点まで、または、読点あるいは句点から文末、または、読点あるいは句点で挟まれたセンテンスを抽出する。また、誤センテンス削除手段により、句読点間センテンス抽出手段で抽出した前記センテンスを削除し、再認識結果反映手段により、誤センテンス削除手段で削除された前記センテンスに対する発話者の音声入力を促がし、そのセンテンスの音声入力を得て音声認識を行いその結果を、削除した前記センテンスの位置に反映させている。さらに、切替手段により、前記誤センテンス削除手段および前記再認識結果反映手段からの切替信号により前記音声認識手段からのルートを前記再認識結果反映手段または前記テキスト画面生成表示手段に切り替えている。
【００１８】
【発明の実施の形態】
図１は、本発明のリアルタイム文字修正装置を含むシステムの構成ならびに概略動作を説明するために引用したブロック図である。
本発明のシステムは、音声認識手段としての音声認識装置１と、リアルタイム文字修正装置２と、字幕変換装置３で構成される。
【００１９】
音声認識装置１は、発話者からのマイクなどを介して音声を受信し（▲１▼）、内蔵する音声認識辞書に基づく類似度比較演算等により、テキストに変換してリアルタイム文字修正装置２へ供給する（▲２▼）。このとき、変換されたテキストには誤字脱字が含まれることがある。なお、発話者の認識できる位置には、発話者の音声がテキストに変換されて文字としてのセンテンスとなった状態を確認できるように、モニタ画面１０が設置されている。
【００２０】
リアルタイム文字修正装置２は、音声認識されたテキストの誤字脱字がある箇所の指定（オペレータがタッチパネル画面を確認することにより指定）に基づき、その誤字脱字を生じている指定箇所（図４の斜線で示す「化膿」）、あるいは、誤字脱字を生じている箇所を含む文頭から読点あるいは句点までのセンテンス、または、読点あるいは句点から文末までのセンテンス、または、読点あるいは句点で挟まれたセンテンスを検出し抽出して削除し（▲３▼）、その削除されたセンテンスの音声入力を促がす（▲４▼）ことにより再度音声認識された結果を反映させてリアルタイム修正し、字幕変換装置３へ引き渡す。
【００２１】
なお、リアルタイム文字修正装置２は、タッチパネル２０を備え、このタッチパネル２０を介して確認画面が表示され、また、誤字脱字部分の指定がなされることとする。このリアルタイム文字修正装置２に接続される字幕変換装置３は、リアルタイム文字修正装置３によって出力されるテキストと放送番組とを合成して字幕放送番組を制作して出力する。
【００２２】
図２は、図１に示すリアルタイム文字修正装置２の内部構成を機能展開して示したブロック図である。
以下に示す各ブロックは、具体的には、ＣＰＵならびにメモリを含む周辺ＬＳＩで構成され、ＣＰＵがメモリに記録されたプログラムを読み出し実行することにより、そのブロックが持つ機能を実現するものとする。なお、ここでは、テキストのセンテンスに対する検出、抽出、および削除を行う構成のリアルタイム文字修正装置として説明するが、テキストの文字に対する検出、抽出、および削除を行う構成のリアルタイム文字修正装置であっても構わない。
【００２３】
リアルタイム文字修正装置２は、テキスト画面生成表示部２１（テキスト画面生成表示手段）と、誤字脱字位置検出部２２（誤字脱字位置検出手段）と、句読点間センテンス抽出部２３（句読点間センテンス抽出手段）と、誤センテンス削除部２４（誤センテンス削除手段）と、再認識結果反映部（再認識結果反映手段）２５と、切替部２６で構成される。
【００２４】
テキスト画面生成表示部２１は、音声認識装置１によって出力されるテキストに基づく画面生成を行い、そしてタッチパネル２０に表示する機能を持ち、ここに表示された内容は、オペレータのチェックを経て字幕変換装置３へ供給される。
【００２５】
誤字脱字位置検出部２２は、音声認識されたテキストの誤字脱字がある箇所の指定に基づき、その指定箇所の単語を検出して抽出する機能を持ち、ここではタッチパネルを使用するために表示画面上に実装されたタブレットにより検出し抽出される。なお、この誤字脱字位置検出部２２により指定されたセンテンス内の位置箇所について、誤字脱字の位置であることを音声認識装置１にフィードバックすることで、音声認識装置１側の正しい文字の変換作業をするための再現確率を高めることができる。
【００２６】
誤字脱字位置検出部２２により検出される座標データは、句読点間センテンス抽出部２３に供給される。この句読点間センテンス抽出部２３は、誤字脱字を生じている箇所を含む読点、あるいは句点から後段のセンテンス、または、読点、あるいは句点で挟まれたセンテンスを抽出する機能を持ち、ここで検出されたセンテンスは、その領域を反転あるいは点滅、または、括弧などで範囲指定した状態で示すようにして、誤センテンス削除部２４へ供給される。なお、読点、あるいは句点で挟まれるセンテンスとは、読点と読点、読点と句点、句点と読点、句点と句点により挟まれる位置の文字群をいう。
【００２７】
誤センテンス削除部２４は、オペレータの操作に基づき誤センテンスを削除する機能を持ち、ここで削除される内容はテキスト画面生成表示部２１に反映され、範囲指定されたセンテンスの表示が、タッチパネル２０から消える。なお、この誤センテンス削除部２４は、タッチパネル２０に表示されているセンテンスが削除されると、切替部２６のルートを切り替えるための切替信号を送信するように構成されている。
【００２８】
再認識結果反映部２５は、削除されたセンテンスの音声入力を促がすことにより音声入力を得て、音声認識装置１による再音声認識を行い、その結果についてテキスト画面生成表示部２１を介して反映させる機能を持つ。この再認識結果反映部２５は、音声入力を促す手段として、話者が認識できるモニタ画面１０においてその話者が認識できるようにセンテンスが反転などして指定された箇所が判断できるようにしても良いし、また、チェッカーから話者にイヤホンを介して音声によって行っても構わない。なお、この再認識結果反映部２５は、音声認識装置１で再認識された結果をテキスト画面生成表示部２１に反映させると、切替部２６にルートを切り替えるための切替信号を送信するように構成されている。
【００２９】
切替部２６は、誤センテンス削除部２４および再認識結果反映部２５からの切替信号により動作するように構成されている。この切替部２６は、通常、音声認識装置１からの出力がテキスト画面生成表示部２１に供給されるように接続制御するが、誤センテンス削除部２４からの切替信号によりルートが切り替えられ、音声認識装置１からの出力が再認識結果反映部２５に供給されるように接続制御され、音声認識装置１で再認識された結果をテキスト画面生成表示部２１に反映させている。
【００３０】
また、この切替手段２６は、再認識結果反映部２５によりセンテンス（単語）が修正されテキスト画面生成表示部２１にセンテンスが表示された場合に、再認識結果反映部２５から発生される切替信号によりもとのルートに切り替える動作を行うように構成されている。また、再認識結果反映部２５によりセンテンスが再度修正を必要とし、同じ位置のセンテンスが再び指定された場合、切替部２６は、誤センテンス削除部２４から切替信号が送られ再認識結果反映部２５側のルートに再び切り替えられることとなる。
【００３１】
つぎに、リアルタイム文字修正装置を含むシステムの動作説明をする。図３、図４は、図１、図２に示す本発明実施形態の動作を説明するために引用した図であり、それぞれ動作フローチャート、タッチパネルに表示される画面構成の一例を示す。なお、図３に示す動作フローチャートは、具体的には、本発明のリアルタイム文字修正プログラムの処理手順を示す。
以下、図３、図４を参照しながら図１、図２に示す本発明実施形態の動作について詳細に説明する。
【００３２】
まず、アナウンサーやキャスターなどの発話者による音声入力が行なわれる（Ｓ３１）。このことにより、音声認識装置１は、その入力音声をもとに、音声辞書による類似度計算に基づく音声認識を行いテキスト化する（Ｓ３２）。ここで認識されたテキストは、テキスト画面生成表示部２１を介してタッチパネル２０上に即座に表示される（Ｓ３３）。
【００３３】
ここで、そのテキストに誤字脱字がある場合、オペレータは、その誤字脱字に相当するタッチパネル２０の表示位置（指定箇所）にタッチする。これを、誤字脱字位置検出部２２を介して検知し（Ｓ３４）、句読点間センテンス抽出部２３により抽出される誤字脱字を生じている箇所を含む読点あるいは句点を基準とするセンテンスの範囲について発話者が認識できるように範囲指定した後、誤センテンス削除部２４が一気に削除する（Ｓ３５、Ｓ３６）。誤り箇所を含むセンテンスが削除されたことはテキスト画面生成表示部２１を介してタッチパネル２０に反映される。それとともに、誤センテンス削除部２４から切替部２６に切替信号が送られてルートを切り替える。
【００３４】
そして、発話者にそのセンテンス部分についての発声を再度促がし、その発声が図２の切替部２６を介して音声認識装置１に再入力され（Ｓ３７）、再認識結果反映部２５によりその音声認識結果がテキスト画面生成表示部２１を介してタッチパネル２０（モニタ画面１０）上に表示される。そして、テキスト画面生成表示部２１によりタッチパネル２０（モニタ画面１０）上にセンテンスが表示されると、この再認識結果反映部２５から切替信号が切替部２６に送られ、その切替部２６がもとのルートに切替られる。なお、このとき、このように誤センテンスが補われることにより、修正に殆ど時間を要さず、リアルタイムで誤字脱字を修正することができる。ここで、誤字脱字が修正されていない場合は、再度発声を促し（Ｓ３７：ｙｅｓ）Ｓ３２以降の処理を繰り返す。
【００３５】
なお、音声認識が適正にいかなかった場合、リアルタイム性は確保できなくなるが、手入力による修正、あるいはリアルタイム性が重視される場合は放置することになる（Ｓ３９）。
【００３６】
また、音声認識装置１による音声認識は、「言いよどみ」、「曖昧な発音」等発声上の問題により誤認識が発声する可能性が高く、この場合は、再度発音を注意しながら発声させることにより正解率がアップする。また、焦って難しい単語を発声し、その単語が音声認識辞書に存在しないため、誤認識が生じた場合は、再度発声するときにその難しい単語を簡単な単語に置換して発生させることにより認識率の向上が見込める。単語を置換して発生させる場合は、発話者がどの文字が音声認識装置１に認識させることが困難であったかを判断できるように、誤字脱字位置検出部２２によりその誤字脱字の位置を指定することができる。
【００３７】
最終的に誤字脱字がなくなったことが確認されたら（Ｓ３４：ｎｏ）、字幕変換装置３による字幕変換処理を起動する（Ｓ３８）。字幕変換装置３により字幕放送用フォーマット（ＮＡＢフォーマット：社団法人日本民間放送連名「Ｔ０２７−１９９６」）に変換され、映像を合成された状態で字幕放送として放送される。
【００３８】
図４にタッチパネル２０に表示される画面イメージが示されている。ここでは、誤字が生じた箇所（“化膿”）をオペレータがタッチすることにより、この単語を含む句読点間（初めて生字幕放送を化膿としました）が範囲指定された後、削除される。そして、削除されたセンテンスに対応する言葉の発声を再度行うことにより、リアルタイム文字修正を実現するものである。なお、どの単語に誤字が発生したかを誤字脱字位置検出部２２により指定することで、その誤字の情報を音声認識装置１にフィードバックさせ、同じ文章が読まれたときに、同じ変換を起こさないようにすることが好都合である。
【００３９】
以上説明のように本発明は、誤字脱字が認められる箇所を含む句読点間で挟まれたセンテンスを抽出して削除することにより、リアルタイム性を損なうことなく修正可能となるものである。
なお、上記した本発明実施形態は、音声認識を用いてテレビ番組や講演等の内容をテキスト化する場合の修正についてのみ説明したが、他に、電子的な速記を用いてテレビ番組や講演等の内容を電子テキスト化する場合の修正にも同様に応用可能である。また、本発明は、リアルタイムで放送を行う字幕制作の分野以外に、難聴者を対象に講演等で字幕をつける分野、あるいは音声認識を用いて事前収録番組について字幕制作を行う分野についても有効である。
【００４０】
また、図２に示すテキスト画面生成表示部２１、誤字脱字位置検出部２２、句読点間センテンス抽出部２３、誤センテンス削除部２４、再認識結果反映部２５、切替部２６のそれぞれで実行される手順をコンピュータ読み取り可能な記録媒体に記録し、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより本発明のリアルタイム文字修正システムが実現されるものとする。ここでいうコンピュータシステムとは、ＯＳや周辺機器等のハードウエアを含むものである。
【００４１】
さらに、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＣＤ−ＲＯＭ等の可搬媒体、またはＲＯＭ、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。そして、この「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のシステムやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。
【００４２】
なお、発話者がモニタ画面１０においてテキストのセンテンスを確認しながら、音声入力作業を行うことができる場合は、発話者のタイミングに合わせて調整することができ、テキストのセンテンスの誤字脱字をチェックするチェッカーとあわせて迅速な修正を行ったうえで、ミスのない文章を提供することができる。また、チェッカーがいない状態でも迅速な修正を行ったうえで、ミスの少ない文章を提供できる。
また、切替手段２６の切替信号は、ここでは誤センテンス削除部２４および再認識結果反映部２５から送られる構成として説明したが、切替部２６の切替動作が適切にできる構成であれば、特に限定されるものではない。さらに、リアルタイム文字修正装置は、テキストに変換されたデータを記憶するハードディスクなどの記憶手段を備える構成としても構わない。
【００４３】
【発明の効果】
以上説明したように、本発明に係るリアルタイム文字修正装置およびリアルタイム文字修正プログラムは、以下に示す優れた効果を奏する。
本発明の請求項１、２によれば、テレビ番組や講演、ＶＴＲなどの記録媒体に収録された音声からテキスト化する場合において、誤字脱字の生じた箇所をリアルタイムに修正することを実現し、修正に殆ど時間を要しないため、修正にリアルタイム性が重視されるアプリケーションへの適用が考えられる。
【００４４】
特に、テレビ番組や講演などの内容についてリアルタイムで字幕を制作する場合に、そのテレビ番組や講演などの内容の音声をもとにコンピュータによって音声認識する手法に対して、音声認識ミスによって誤字脱字が生じても、この誤字脱字が認められる箇所を認識あるいは指示することにより、その誤字脱字がある単語を含む句読点間で挟まれたセンテンスを抽出して削除し、再度、その削除された部分を音声で繰り返し発声してテキスト化することで、迅速な修正を行うことにより、字幕へのリアルタイムでの反映が可能となる。
【００４５】
また、本発明は上記した音声認識による字幕制作の他に、電子速記による字幕制作による場合でも同様、リアルタイム修正を可能とする。
【００４６】
さらに、本発明の請求項１、２によれば、音声認識手段に対してテキストのセンテンス内においてセンテンス単位で、どの文字の変換が間違っていたかを指摘することができるため、音声認識手段に対して同じ指定部分について発話者が音声入力しても、間違った変換文字を繰り返すことがなく、音声による迅速な修正を行うことが可能となる。
【図面の簡単な説明】
【図１】本発明のリアルタイム文字修正システムの構成ならびに概略動作を説明するために引用したブロック図である。
【図２】図１に示すリアルタイム文字修正装置の内部構成を機能展開して示したブロック図である。
【図３】本発明実施形態の動作を説明するために引用した図であり、動作フローチャートを示す図である。
【図４】本発明実施形態の動作を説明するために引用した図であり、タッチパネルに表示される画面構成の一例を示す図である。
【符号の説明】
１音声認識装置
２リアルタイム文字修正装置
３字幕変換装置
２１テキスト画面生成表示部
２２誤字脱字位置検出部
２３句読点間センテンス抽出部
２４誤センテンス削除部
２５再認識結果反映部
２６切替部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a lapis lazuli to correct the resulting position of the erroneous character caret in real time real time character correction device and real-time text modified program.
[0002]
[Prior art]
Conventionally, audio spoken by a performer in a program such as a television program is displayed as subtitles. When the audio information is converted into text information and displayed, and the television program is a recorded broadcast, the audio information is converted into text data using a conversion software, and the error of the converted text data is corrected. Therefore, application software such as a word processor and an editor is usually used. Then, as a procedure for correcting text errors, use a pointing device such as a mouse (in the case of a touch panel, pointing pen or finger) to specify the wrongly recognized part, enter the correct word, The deleted part is deleted.
[0003]
On the other hand, when subtitles are produced in real time for the contents of a TV program or a lecture, a method is known in which the computer converts the text of the content of the TV program or lecture into text via a voice recognition means.
[0004]
At this time, if the correction is made while checking for typographical errors one by one, it may be significantly delayed from the progress of the television program or lecture, and it may be difficult to grasp the contents of the program or lecture with a fast tempo. In addition, when respecting real-time characteristics, there are many cases in which the character is left as it is without being corrected even if a typographical error occurs. Therefore, conventionally, a character data correction device has been proposed.
[0005]
This character data correction device converts voice data inputted by voice into character data, points out the error part of the converted character data, and corrects the error part of the indicated character data. By providing a point terminal device that finds and selects an error from text data output from a text data output device such as a speech recognition device, and a correction terminal device that corrects an erroneous word, The correction time is shortened.
[0006]
[Problems to be solved by the invention]
However, in the conventional character data correction device, the point terminal device and the correction terminal device share a role to increase the correction speed, but there is room for further improvement.
In other words, the character data correction device relies on the keyboard to input the location to be corrected by the correction terminal device, so that it takes time for the correction, and the correction that can further reduce the time can be performed in real time. It was desired.
[0007]
The present invention has been made in view of the above circumstances, and is deleted in units of sentences sandwiched between punctuation marks including a designated place where a typographical error is allowed or a place where a typographical error is allowed. by reproducing again, modifiable without interrupting the real-time property, the real-time character correction device, and to provide a and real-time character modification program.
[0008]
[Means for Solving the Problems]
In order to achieve the above object , the present invention is configured as follows. That is, in order to convert the speech input by the speaker into text by the speech recognition means and display the caption, the text screen generation / display means for generating a screen on the monitor screen is used to identify the place where the typographical error occurs. , A real-time character correction device that corrects in real time during voice input ,
Based on the designation by the operator with respect to the place where there is a typographical error in the converted text, the typographical error position detecting means for detecting the designated location where the typographical error is generated, and the typographical error detected by the typographical error position detecting means based on the location of a text, the beginning of a sentence containing the sections that cause the spelling until a comma or punctuation, or endnotes from comma or punctuation, or punctuation between sentences extracting means for extracting a sentence which is sandwiched by the comma or period And an erroneous sentence deleting means for deleting the sentence extracted by the sentence punctuation sentence extracting means, and a voice input of the speaker for the sentence deleted by the erroneous sentence deleting means is prompted, and the voice input of the sentence is performed. The voice is recognized and the result of the speech recognition is deleted. A re-recognition result reflecting means to be reflected, a switching signal transmitted from the erroneous sentence deleting means and the re-recognition result reflecting means, and a route from the voice recognition means to the re-recognition result reflecting means or the text screen generation display means. And a switching unit for switching.
[0009]
According to the first aspect of the present invention, the real-time character correcting device is based on a reading mark or a punctuation point including a portion where the typographical error is generated based on designation of a location where the typographical error is detected in the speech-recognized text. The sentence is extracted and deleted, and the real-time correction is performed by reflecting the result of the voice recognition again by prompting the voice input of the deleted sentence. Note that the real-time character correction device is configured to switch the route from the voice recognition unit to the re-recognition result reflection unit or the text screen generation display unit by a switching signal from the erroneous sentence deletion unit and the re-recognition result reflection unit. It has been switched to. A sentence sandwiched between reading marks or punctuation marks is a sentence sandwiched between punctuation marks and punctuation marks, punctuation marks and punctuation marks, punctuation marks and punctuation marks, and punctuation marks and punctuation marks. Furthermore, when specifying the typographical error generated in the sentence of the converted text, a checker (operator) for checking the utterance or the typographical error is performed on the monitor screen.
[0010]
Further, in order to convert the voice input by the speaker into text by the voice recognition means and display the subtitles, the text screen generation / display means for generating the screen on the monitor screen is used, and the place where the typographical error has occurred is detected. In order to correct in real time during voice input , the computer is a real-time character correction program that functions as the following means. That is, the above means are a typographical error position detection means, a punctuation sentence extraction means , a false sentence deletion means , a re-recognition result reflection means , and a switching means .
[0011]
With this configuration, the real-time character correction program reads the punctuation marks from the beginning of the sentence including the location where the typographical error has occurred based on the designation of the location where the typographical error in the text has been converted by the inter-punctuation sentence extraction means. Alternatively, sentences up to a punctuation mark or a sentence sandwiched between punctuation marks or a punctuation mark are extracted. In addition, the sentence extracted by the sentence extraction means between the punctuation marks is deleted by the false sentence deletion means, and the speech input of the speaker for the sentence deleted by the mistake sentence deletion means is prompted by the re-recognition result reflection means. A voice input of the sentence is obtained and voice recognition is performed, and the result is reflected in the deleted sentence position. Further, the switching means switches the route from the voice recognition means to the re-recognition result reflecting means or the text screen generation / display means by a switching signal from the erroneous sentence deletion means and the re-recognition result reflecting means.
[0018]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 is a block diagram cited for explaining the configuration and schematic operation of a system including a real-time character correcting device of the present invention.
The system of the present invention includes a speech recognition device 1 as a speech recognition means, a real-time character correction device 2, and a caption conversion device 3.
[0019]
The speech recognition device 1 receives speech from a speaker via a microphone or the like (1), converts it into text by a similarity comparison operation based on a built-in speech recognition dictionary, and the like to the real-time character correction device 2. Supply (2). At this time, the converted text may include a typographical error. A monitor screen 10 is installed at a position that can be recognized by the speaker so that the voice of the speaker can be converted into text and can be confirmed as a sentence.
[0020]
The real-time character correction device 2 specifies the location where the typographical error occurred (indicated by the hatched line in FIG. 4) based on the designation of the location where the typographical error in the speech-recognized text is present (designated by checking the touch panel screen by the operator). ), Or a sentence from the beginning of a sentence containing a typographical error to a punctuation mark or a punctuation mark, or a sentence from a punctuation mark or a punctuation mark to the end of a sentence, or a sentence between punctuation marks or a punctuation mark. Extracting and deleting ((3)), prompting voice input of the deleted sentence ((4)), reflecting the result of voice recognition again, correcting in real time, and delivering to subtitle conversion device 3 .
[0021]
The real-time character correction device 2 includes a touch panel 20, a confirmation screen is displayed via the touch panel 20, and a typographical error part is designated. The caption conversion device 3 connected to the real-time character correction device 2 synthesizes the text output from the real-time character correction device 3 and the broadcast program to produce and output the caption broadcast program.
[0022]
FIG. 2 is a block diagram showing an expanded function of the internal configuration of the real-time character correction apparatus 2 shown in FIG.
Each block shown below is specifically composed of a peripheral LSI including a CPU and a memory, and the CPU has a function stored in the block by reading and executing a program recorded in the memory. Here, the real-time character correction device configured to detect, extract, and delete text sentences will be described. However, even a real-time character correction device configured to detect, extract, and delete text characters may be used. I do not care.
[0023]
Real-time character correction device 2, the text screen generating display unit 21 (the text screen generating display means), spelling position detecting section 22 and (spelling position detecting means), punctuation between sentences extracting section 23 (punctuation between sentences extracted hand stage ) and, erroneous sentence deletion unit 24 (erroneous sentence delete hand stage), re-recognition result reflecting section (re-recognition result reflecting means) 25, and a switching unit 26.
[0024]
The text screen generation / display unit 21 has a function of generating a screen based on the text output by the voice recognition device 1 and displaying the screen on the touch panel 20. The content displayed here is a subtitle conversion device after the operator's check. 3 is supplied.
[0025]
The typographical error position detection unit 22 has a function of detecting and extracting a word at a designated location based on designation of a location where there is a typographical error in the speech-recognized text. Detected and extracted by a tablet installed in In addition, about the position location in the sentence designated by this typographical error position detection unit 22, the correct character conversion operation on the voice recognition device 1 side is performed by feeding back to the voice recognition device 1 that it is the position of the typographical error. It is possible to increase the reproduction probability for doing so.
[0026]
The coordinate data detected by the typographical error position detection unit 22 is supplied to the sentence extraction unit 23 between punctuation marks. This punctuation sentence sentence extracting unit 23 has a function of extracting a punctuation mark including a part where a typographical error has occurred, or a subsequent sentence from a punctuation mark, or a sentence sandwiched between punctuation marks or punctuation marks. The sentence is supplied to the erroneous sentence deletion unit 24 so as to show the area in a state where the area is inverted or blinked, or a range is specified by parentheses. In addition, the sentence sandwiched between reading marks or phrases refers to a character group at a position sandwiched between reading marks and reading marks, reading and phrases, phrases and readings, and phrases and phrases.
[0027]
The erroneous sentence deletion unit 24 has a function of deleting an erroneous sentence based on the operation of the operator. The content deleted here is reflected in the text screen generation display unit 21, and the display of the sentence whose range is specified is displayed from the touch panel 20. Disappear. Incidentally, the erroneous sentence deletion section 24, the ruse Ntensu are displayed on the touch panel 20 is removed, and is configured to transmit a switching signal for switching the route switching unit 26.
[0028]
The re-recognition result reflection unit 25 obtains a voice input by prompting the voice input of the deleted sentence, performs re-speech recognition by the voice recognition device 1, and the result via the text screen generation display unit 21. Has a function to reflect. The re-recognition result reflecting unit 25 can determine the designated place by reversing the sentence so that the speaker can be recognized on the monitor screen 10 that can be recognized by the speaker as means for prompting voice input. It is good, and it may be performed by voice from the checker to the speaker via the earphone. The re-recognition result reflecting unit 25 is configured to transmit a switching signal for switching the route to the switching unit 26 when the result of re-recognition by the voice recognition device 1 is reflected on the text screen generation / display unit 21. Has been.
[0029]
The switching unit 26 is configured to operate according to a switching signal from the erroneous sentence deletion unit 24 and the re-recognition result reflection unit 25. The switching unit 26 normally controls connection so that the output from the speech recognition device 1 is supplied to the text screen generation / display unit 21, but the route is switched by the switching signal from the erroneous sentence deletion unit 24, and the speech recognition is performed. The connection control is performed so that the output from the device 1 is supplied to the re-recognition result reflecting unit 25, and the result re-recognized by the voice recognition device 1 is reflected in the text screen generation / display unit 21.
[0030]
Further, the switching means 26 uses a switching signal generated from the re-recognition result reflecting unit 25 when the sentence (word) is corrected by the re-recognition result reflecting unit 25 and the sentence is displayed on the text screen generation / display unit 21. It is configured to perform an operation of switching to the original route. Also requires modification congenital scan by re-recognition result reflection unit 25 again, sentence again if specified in the same position, the switching unit 26, re-recognition result reflects transmitted switching signal from the erroneous sentence deletion section 24 It will be switched again to the route on the part 25 side.
[0031]
Next, the operation of the system including the real-time character correction device will be described. FIGS. 3 and 4 are diagrams for explaining the operation of the embodiment of the present invention shown in FIGS. 1 and 2, and show an operation flowchart and an example of a screen configuration displayed on the touch panel, respectively. In addition, the operation | movement flowchart shown in FIG. 3 specifically shows the process sequence of the real-time character correction program of this invention.
The operation of the embodiment of the present invention shown in FIGS. 1 and 2 will be described in detail below with reference to FIGS.
[0032]
First, voice input is performed by a speaker such as an announcer or a caster (S31). As a result, the speech recognition apparatus 1 performs speech recognition based on the similarity calculation by the speech dictionary based on the input speech and converts it into text (S32). The recognized text is immediately displayed on the touch panel 20 via the text screen generation display unit 21 (S33).
[0033]
Here, when there is a typographical error in the text, the operator touches the display position (specified location) on the touch panel 20 corresponding to the typographical error. This is detected via the typographical error locating position detection unit 22 (S34), and the utterance of the sentence range based on the punctuation mark or punctuation point including the location where the typographical error punctuation extracted by the sentence punctuation sentence extraction unit 23 occurs. After the range is specified so that can be recognized, the erroneous sentence deletion unit 24 deletes all at once (S35, S36) . The innate scan including erroneous Ri point is deleted is reflected on the touch panel 20 via the text screen generating display unit 21. At the same time, a switching signal is sent from the erroneous sentence deletion unit 24 to the switching unit 26 to switch the route.
[0034]
Then, uttered is again prompting for its innate scan unit content in speaker, the utterance is re-input to the speech recognition apparatus 1 via the switching unit 26 of FIG. 2 (S37), the re-recognition result reflecting section 25 The voice recognition result is displayed on the touch panel 20 (monitor screen 10) via the text screen generation display unit 21. When the text screen generation / display unit 21 displays a sentence on the touch panel 20 (monitor screen 10), a switching signal is sent from the re-recognition result reflecting unit 25 to the switching unit 26. To the root of At this time, by correcting the erroneous sentence as described above, it is possible to correct the typographical error in real time without requiring much time for the correction. Here, when the typographical error is not corrected, the utterance is prompted again (S37: yes), and the processes after S32 are repeated.
[0035]
If the voice recognition is not properly performed, the real-time property cannot be secured, but if the correction by the manual input or the real-time property is important, it is left unattended (S39).
[0036]
Also, the speech recognition by the speech recognition device 1 is likely to cause misrecognition due to utterance problems such as "speaking" and "ambiguous pronunciation". The accuracy rate will increase. Also, because it is difficult to utter a difficult word and the word does not exist in the speech recognition dictionary, if a misrecognition occurs, it can be recognized by replacing the difficult word with a simple word when uttering again. The rate can be expected to improve. When generating by replacing a word, the typographical error position detector 22 designates the position of the typographical error so that the speaker can determine which character is difficult for the speech recognition apparatus 1 to recognize. Can do.
[0037]
When it is finally confirmed that there is no typographical error (S34: no), the caption conversion processing by the caption conversion device 3 is started (S38). It is converted into a subtitle broadcast format (NAB format: Japanese commercial broadcasting joint name “T027-1996”) by the subtitle conversion device 3, and the video is synthesized and broadcast as a subtitle broadcast.
[0038]
FIG. 4 shows a screen image displayed on the touch panel 20. Here, when the operator touches a place where a typographical error has occurred (“Herpus”), the range between punctuation marks including this word (for the first time, subtitle broadcasting is assumed to be Hyu) is deleted after being designated as a range. Then, the words corresponding to the deleted sentence are uttered again to realize real-time character correction. In addition, by designating which word a typographical error has occurred by the typographical error position detection unit 22, the information of the typographical character is fed back to the speech recognition apparatus 1, and the same conversion is not caused when the same sentence is read. It is convenient to do so.
[0039]
As described above, according to the present invention, by extracting and deleting a sentence sandwiched between punctuation marks including a portion where a typographical error is recognized, the correction can be made without impairing real-time performance.
In the above-described embodiment of the present invention, only correction in the case where the content of a TV program or a lecture is converted into text using speech recognition has been described. However, in addition, a television program or a lecture using electronic shorthand etc. It can be applied in the same way to correction when converting the contents of the text to electronic text. Furthermore, the present invention is effective not only in the field of producing captions for broadcasting in real time, but also in the field of providing captions for lectures for the hearing impaired or the field of producing captions for pre-recorded programs using voice recognition. is there.
[0040]
2 are executed by each of the text screen generation display unit 21, the typographical error position detection unit 22, the punctuation sentence extraction unit 23, the erroneous sentence deletion unit 24, the re-recognition result reflection unit 25, and the switching unit 26 shown in FIG. Is recorded on a computer-readable recording medium, and a program recorded on the recording medium is read into a computer system and executed, whereby the real-time character correction system of the present invention is realized. The computer system here includes an OS and hardware such as peripheral devices.
[0041]
Further, the “computer-readable recording medium” refers to a portable medium such as a flexible disk, a magneto-optical disk, and a CD-ROM, or a storage device such as a ROM and a hard disk built in the computer system. The “computer-readable recording medium” refers to a volatile memory (RAM) in a computer system serving as a system or client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. As described above, it is assumed that the program is held for a certain period of time.
[0042]
If the speaker can perform a voice input operation while confirming the sentence of the text on the monitor screen 10, it can be adjusted according to the timing of the speaker, and the typographical omission in the sentence of the text is checked. After making quick corrections together with the checker, you can provide sentences without mistakes. In addition, even if there is no checker, it is possible to provide sentences with few mistakes after making quick corrections.
In addition, the switching signal of the switching unit 26 has been described here as a configuration sent from the erroneous sentence deletion unit 24 and the re-recognition result reflection unit 25, but is not particularly limited as long as the switching operation of the switching unit 26 can be appropriately performed. Is not to be done. Furthermore, the real-time character correction device may be configured to include a storage unit such as a hard disk for storing data converted into text.
[0043]
【The invention's effect】
As described above, the real-time character correction device and the real-time character correction program according to the present invention have the following excellent effects.
According to Claims 1 and 2 of the present invention, in the case of converting from text recorded in a recording medium such as a TV program, a lecture, or a VTR to text, it is possible to correct a portion where a typographical error occurred in real time, Since almost no time is required for the correction, it can be applied to an application in which real-time property is important for the correction.
[0044]
In particular, when subtitles are produced in real time for the contents of a TV program or lecture, typographical typographical errors may occur due to a voice recognition error, compared to the method of speech recognition by a computer based on the sound of the contents of the TV program or lecture. Even if it occurs, by recognizing or indicating the location where this typographical error is recognized, the sentence sandwiched between punctuation marks including the word with the typographical error is extracted and deleted, and the deleted part is spoken again. By repeatedly uttering and converting it into text, it is possible to reflect it in subtitles in real time by making quick corrections.
[0045]
Further, the present invention enables real-time correction in the same way as in the case of subtitle production by electronic shorthand in addition to the above-described subtitle production by voice recognition.
[0046]
Further, according to the first and second aspects of the present invention, since it is possible to indicate which character conversion is wrong in sentence units within the sentence of the text, the voice recognition means. Even if the speaker inputs a voice for the same designated portion, it is possible to make a quick correction by voice without repeating wrong conversion characters.
[Brief description of the drawings]
FIG. 1 is a block diagram cited for explaining the configuration and schematic operation of a real-time character correction system of the present invention.
FIG. 2 is a block diagram showing an expanded function of the internal configuration of the real-time character correction apparatus shown in FIG.
FIG. 3 is a diagram cited for explaining the operation of the embodiment of the present invention, and is a diagram showing an operation flowchart;
FIG. 4 is a diagram cited for explaining the operation of the embodiment of the present invention, and is a diagram showing an example of a screen configuration displayed on the touch panel.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Speech recognition apparatus 2 Real-time character correction apparatus 3 Subtitle conversion apparatus 21 Text screen generation | occurrence | production display part 22 Typing character position detection part 23 Punctuation sentence sentence extraction part 24 Error sentence deletion part 25 Re-recognition result reflection part 26 Switching part

Claims

Voice input by speaker, for caption display is converted into text by a speech recognition unit, using a text screen generating display means for screen generation on the monitor screen, the spelling The resulting location, sound A real-time character correction device that corrects in real time during input ,
Based on the designation by the operator for the location where there is a typographical error in the converted text, a typographical error locating means for detecting the designated location where the typographical lexation has occurred,
Based on the text location with the typographical error detected by the typographical error location detection means, from the beginning of the sentence including the location where the typographical error occurs, to the reading or punctuation, or from the punctuation or punctuation to the end of the sentence, or the punctuation or punctuation Punctuation sentence extraction means for extracting sentences sandwiched between,
Erroneous sentence deletion means for deleting the sentence extracted by the sentence extraction means between punctuation marks;
Re-recognition result that prompts the voice input of the speaker to the sentence deleted by the erroneous sentence deletion means, obtains the voice input of the sentence, performs voice recognition, and reflects the result in the position of the deleted sentence Reflection means,
A switching unit that switches the route from the voice recognition unit to the re-recognition result reflection unit or the text screen generation display unit by a switching signal transmitted from the erroneous sentence deletion unit and the re-recognition result reflection unit;
A real-time character correction device comprising:

Voice input by speaker, for caption display is converted into text by a speech recognition unit, using a text screen generating display means for screen generation on the monitor screen, the spelling The resulting location, sound To correct in real time while typing ,
A typographical error position detecting means for detecting a designated location where the typographical error has occurred based on designation by an operator for a location where there is a typographical error in the converted text;
Based on the text location with the typographical error detected by the typographical error location detection means, from the beginning of the sentence including the location where the typographical error occurs, to the reading or punctuation, or from the punctuation or punctuation to the end of the sentence, or the punctuation or punctuation Punctuation sentence extraction means for extracting sentences sandwiched between
Erroneous sentence deletion means for deleting the sentence extracted by the sentence extraction means between punctuation marks,
Re-recognition result that prompts the voice input of the speaker to the sentence deleted by the erroneous sentence deletion means, obtains the voice input of the sentence, performs voice recognition, and reflects the result in the position of the deleted sentence Reflection means,
Switching means for switching the route from the voice recognition means to the re-recognition result reflecting means or the text screen generation / display means by a switching signal transmitted from the erroneous sentence deletion means and the re-recognition result reflecting means;
Real-time character correction program characterized by functioning as