JP4229627B2

JP4229627B2 - Dictation device, method and program

Info

Publication number: JP4229627B2
Application number: JP2002091846A
Authority: JP
Inventors: 真人矢島; 幸弘福永
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2002-03-28
Filing date: 2002-03-28
Publication date: 2009-02-25
Anticipated expiration: 2022-03-28
Also published as: JP2003288098A

Description

【０００１】
【発明の属する技術分野】
本発明は、感嘆符や疑問符等の符号の挿入を容易にしたディクテーション装置、方法及びプログラムに関する。
【０００２】
【従来の技術】
近年、音声認識技術の進歩に伴い、音声入力によってテキストを生成するディクテーションシステムが開発され、実用化に至っている。ディクテーションシステムでは、従来キーボード等を利用して入力していたテキストを音声によって入力する。基本的には、音響解析部でとらえた音声入力文字列(ひらがな列)を、単語と単語のつながりやすさを表現する言語モデルを用いた言語処理部により、適切なかな漢字交じりの文字列に変換することで、ユーザがしゃべった通りの文章をテキスト化する。
【０００３】
音声−テキスト変換に際して、文章に含まれる句読点については、句点として「まる」、読点として「てん」等の読みを振り当て、ユーザに「まる」、「てん」を発音させることで、指定した文章の区切りに句読点を挿入させている。例えば、特開２０００−２９４９６には連続音声認識で句読点を自動的に生成する装置が示されている。
【０００４】
しかし、日常の会話や文章の朗読等で本来発声することがない句読点を読ませることは、ユーザの負担となる。また、講演やインタビュー等の予め録音された音声を再生して音声認識エンジンに入力させる場合には、句読点が一切入らないテキストが生成されてしまう。
【０００５】
そこで、最近、単語と単語のつながりやすさ同様、単語と句読点のつながりやすさを言語モデルで表現することにより、句読点が入りやすい位置を予測して、適切な文章の区切りに句読点を自動挿入する技術が開発されている。
【０００６】
【発明が解決しようとする課題】
ところで、句読点と同様に文章の区切りを示す符号として感嘆符や疑問符等がある。ところが、自動挿入技術では、これらの符号を適切に選択してテキストに挿入することは極めて困難である。しかも、肯定文の構造をしていながら、文章末尾を強めたり上げたりすることで疑問の意味を表現する文章もある。ディクテーションシステムでは、イントネーションが考慮されないことから、このような文章の認識では、感嘆符や疑問符についての自動挿入処理は一層困難である。
【０００７】
従って、感嘆符や疑問符等の符号を適切な文章の区切りとして挿入するためには、テキスト化された文章中のカーソルをキーボードを使用して挿入位置まで移動させ、挿入する符号に相当するキーを操作する必要がある。本来キーボードにふれることなく音声のみでテキスト入力を行うディクテーションであるにも拘わらず、ユーザは煩雑なキー入力操作を行う必要がある。
【０００８】
本発明はかかる問題点に鑑みてなされたものであって、グラフィカルユーザインタフェースを用いて、テキスト化する文章に簡単な操作で所定の符号を挿入可能にすると共に、挿入する符号に応じた言語処理を可能にすることにより音声認識精度を向上させることができるディクテーション装置、方法及びプログラムを提供することを目的とする。
【０００９】
【課題を解決するための手段】
本発明に係るディクテーション装置は、音声を取込む音声取込み手段と、前記音声取込み手段が取込んだ音声を音声認識して認識文字列に変換する音声認識手段と、前記認識文字列の候補系列を評価して選択する系列評価手段と、前記系列評価手段で選択された認識文字列の文章の区切り位置を検出し、前記区切り位置に符号を挿入する符号自動挿入手段と、前記認識文字列を表示画面上に表示させるための表示制御手段と、前記表示画面上に表示される前記区切り位置に挿入された前記符号を変更するために、又は前記区切り位置に挿入される感嘆符又は疑問符を挿入するために、前記感嘆符又は疑問符を設定するためのグラフィカルユーザインタフェースと、を具備し、前記符号自動挿入手段は、前記グラフィカルユーザインタフェースによる前記感嘆符又は疑問符の設定に従って前記区切り位置に挿入されている前記符号を変更、又は前記グラフィカルユーザインタフェースによる設定に従った前記感嘆符又は疑問符を前記区切り位置に挿入し、前記系列評価手段は、前記グラフィカルユーザインタフェースによる前記感嘆符又は疑問符の設定に従って前記区切り位置に挿入されている前記符号を変更する場合には、挿入される感嘆符又は疑問符を考慮して前記認識文字列の候補系列を再評価して認識文字列の選択をし直すことを特徴とするディクテーション装置である。
【００１０】
なお、装置に係る本発明は方法に係る発明としても成立する。
【００１１】
また、装置に係る本発明は、コンピュータに当該発明に相当する処理を実行させるためのプログラムとしても成立する。
【００１２】
【発明の実施の形態】
以下、図面を参照して本発明の実施の形態について詳細に説明する。図１は本発明の第１の実施の形態に係るディクテーション装置を示すブロック図である。
【００１３】
本実施の形態は音声認識文字列に対して自動挿入された句読点等の符号を、ＧＵＩ（グラフィカルユーザインタフェース）を用いた操作によって、所望の符号に切り替えることを可能にしたものである。
【００１４】
図１のディクテーション装置は、例えばパーソナルコンピュータ等の情報処理装置によって実現されるものである。音声入力部１は、音声を入力するためのもので、例えばマイクロフォン等からなる入力装置である。操作入力部２は、画面上に表示されているアプリケーションを操作するためのもので、例えばキーボード、マウス、タブレット等によって構成されている。特に、マウス及びタブレット等は、画面上にＧＵＩ表示されたボタン等を操作することができるようになっている。また、操作入力部２として、表示部４の表示画面上に構成されたタッチパネルが採用されることもある。
【００１５】
制御部３は、装置全体を司るもので、例えば中央処理ユニット（ＣＰＵ）によって構成される。制御部３は、ディクテーションを実行するために、音声認識部５、系列評価部６、符号自動挿入部７、テキスト格納部８及び符号切替部９の制御を行う。
【００１６】
表示部４は、音声認識により変換されたテキストの文字列等を表示するためのもので、例えばＣＲＴディスプレイ装置又は液晶表示装置等のフラットパネルディスプレイ装置によって構成される。
【００１７】
音声認識部５、系列評価部６、符号自動挿入部７及びテキスト格納部８は、ディクテーション処理に必要な機能要素であり、夫々固有の処理ルーチンと当該処理ルーチンを実行するＣＰＵ等によって実現することができる。
【００１８】
音声認識部５は、音声入力部１から入力された音声を音声認識して、テキスト文字列の候補の組み合わせ認識系列に変換して出力する。系列評価部６は、音声認識部５で変換されたテキスト文字列の認識系列を言語的に評価し、適切な認識系列を選びだすために順位をつけて最適な認識系列を出力する。
【００１９】
符号自動挿入部７は、系列評価部６で選択された最適な認識系列の文章の区切り位置を検出し、区切り位置に通常の符号（句読点）を挿入する。
【００２０】
テキスト格納部８は、ディクテーション処理における各段階のテキスト文字列を格納する。例えば、テキスト格納部８は、符号自動挿入部７から出力された符号付きの認識系列であるテキスト文字列を格納する。
【００２１】
符号切替部９は、テキスト格納部８に格納された符号付き認識系列テキストの文章の区切り位置に挿入する符号を通常の符号（句点）から別の符号である疑問符、感嘆符等に切り替えてテキスト格納部８に格納させるようになっている。
【００２２】
次に、このように構成された実施の形態の動作について図２のフローチャート及び図３の説明図を参照して説明する。
【００２３】
音声入力部１を介して音声が入力されると、ステップＳ1 において、制御部３は取込んだ入力音声を音声認識部５に与える。音声認識部５は、入力された音声を認識して認識文字列の候補を作成して系列評価部６に出力する（ステップＳ２）。
【００２４】
音声認識部５は、基本的には、単語毎に入力音声と類似している単語文字列（ひらがな）を推定する。例えば「きょうはてんきがいい」という音声が入力されると、音声認識部５では「きょう」の音声に対して、「今日」、「京」、「起用」及び「器用」等の単語候補を作成し、また、「てんき」の音声に対して「天気」、「転機」及び「延期」等の単語候補を作成する。図３はこの場合における認識文字列の候補を線でつないで示している。
【００２５】
系列評価部６は、音声認識部５から与えられた認識文字列の候補を言語的に評価し、単語同士の単語の繋がりやすさ等を評価して、選択する単語候補としての評価順位をつけ、評価順位が１位の候補を最適な認識文字列の候補として選択する。例えば、図３に示す認識文字列の候補が音声認識部４から出力された場合には、各単語を結ぶ線によって示す各認識文字列候補の組み合わせについて評価順位をつける。系列評価部６は、評価順位として１位をつけた認識文字列候補の組み合わせである「今日は天気がいい」を最適な認識文字列候補として選択して、符号自動挿入部７に与える（ステップＳ3 ）。
【００２６】
符号自動挿入部７は、系列評価部６から渡された最適な認識文字列候補の文章の区切り位置を検出し、最適な符号（句読点）を挿入する（ステップＳ4 ）。例えば、図３の例では、系列評価部６から「今日は天気がいい」という認識文字列候補が渡されると、符号自動挿入部７は、「いい」の後方位置に句点の挿入位置を検出して「今日は天気がいい。」という自動挿入を行う。
【００２７】
制御部３は、音声−テキスト化の各段階におけるテキスト文字列をテキスト格納部８に格納しており、符号自動挿入部７からの符号付きの認識文字列についてもテキスト格納部８に格納する。また、制御部３は、テキスト格納部８に格納された音声−テキスト化の各段階におけるテキスト文字列を表示部４に与えて表示させるようになっている。
【００２８】
例えば、制御部３は、「きょうはてんきがいい」という音声入力に対して、先ず、音声の取込みに応じて順次「きょうはてんきがいい」というひらがな文字列を表示し、更に、系列評価部６によって評価順位として１位を付された最適な認識文字列候補である「今日は天気がいい」を表示させる。更に、制御部３は、符号自動挿入部７における符号の自動挿入結果である「今日は天気がいい。」を表示画面上に表示させる。
【００２９】
ユーザによる符号切替指示及び音声−テキスト化における音声の取込み終了指示が発生していない場合には、処理がステップＳ5 ，Ｓ8 を介してステップＳ9 に移行し、次の音声入力の待機状態となる。音声入力部１を介して次の音声が入力されると、処理はステップＳ1 に戻って同様の処理が繰返され、テキスト格納部８には次々と音声認識文字列が格納されて、表示部４の表示画面上に認識文字列が表示される。また、ユーザによる取込み終了指示が発生すると、ステップＳ8 から処理を終了する。
【００３０】
いま、ユーザが認識結果である認識文字列を表示部４の表示画面上で確認し、自動挿入された符号を変更しようとするものとする。例えば、図３の例における画面上の「今日は天気がいい。」という認識文字列結果に対して、読点を疑問符に変更するものとする。この場合には、ユーザは、操作入力部２である例えばタッチパネルをタッチペンで操作することにより、例えば画面上に表示されている疑問符挿入用のボタンを押す。
【００３１】
制御部３は、ステップＳ5 において符号切替指示が発生したことを検出し、処理をステップＳ6 からステップＳ7 に移行させる。こうして、ステップＳ7 において、読点「。」を疑問「？」に変更する。変更後の認識文字列「今日は天気がいい？」は、テキスト格納部８に格納されると共に、表示部４に与えられて画面表示される。
【００３２】
なお、ユーザは操作入力部２を操作することで、疑問符だけでなく、感嘆符やその他の任意の符号を挿入可能である。例えば、特定の複数のボタンをそれぞれ「疑問符」ボタン、「感嘆詞」ボタンのように用意することで、所望の符号を挿入するようにしてもよい。また、例えば、１つのボタンをループ処理のスイッチとして用い、「句点→疑問符→感嘆詞→句点→…」のように複数回ボタンを押すことで、挿入する符号を切り替えるようにしてもよい。
【００３３】
このように、本実施の形態においては、自動挿入された符号を、所望の符号に変更することができる。符号の変更は、ユーザのＧＵＩ操作によって行われ、変更後のテキストがテキスト格納部８に格納されると共に、表示画面上に表示される。キーボード上のキーを打鍵することなく文章の区切り位置に、所望の符号を挿入することができ、操作性に優れている。
【００３４】
図４は本発明の第２の実施の形態を示すブロック図である。図４において図１と同一の構成要素には同一符号を付して説明を省略する。
【００３５】
第１の実施の形態においては、ユーザが画面表示された認識文字列を参照して、符号の切替えを指示した。これに対し、本実施の形態は、音声入力の前に、入力する文章が肯定文であるか、疑問文であるか、感嘆文であるか等の文のモードを指定可能にして、指定されたモードに対応した符号を自動挿入するようにしたものである。
【００３６】
本実施の形態は符号切替部９に代えて符号指定部１２を採用すると共に、符号自動挿入部７とは別の動作の符号自動挿入部１１を採用した点が第１の実施の形態と異なる。符号指定部１２は、ユーザの操作入力部２の操作に応答して、音声−テキスト化する文章について、認識文字列に挿入する符号を音声入力の前に予め決定するようになっている。
【００３７】
符号自動挿入部１１は、系列評価部６で選択された最適な認識系列の文章から区切り位置を検出し、検出した区切り位置に符号を挿入する。この場合には、符号自動挿入部１１は、符号指定部１２によって挿入する符号が指定されていない場合には、通常の符号を挿入し、指定されている場合には、指定された符号を挿入するようになっている。
【００３８】
次に、このように構成された実施の形態の動作について図５のフローチャートを参照して説明する。
【００３９】
制御部３は、音声入力前に、文章のモードを指定する操作入力が発生したか否かを判断する（ステップＳ11）。音声の入力がない状態で、操作入力部１によってモードが指定されると、指定されたモードを保持していない場合には（ステップＳ12）、符号指定部１２は、ステップＳ13においてモードを保持する。
【００４０】
本実施の形態においても、ユーザは、ＧＵＩ操作によってモードの指定が可能である。操作入力部２である例えばタッチパネルをタッチペンで操作し、例えば画面上に表示されているモード設定用のボタンを押すことで、モードを指定する。モード設定用のボタンとしては、例えば、疑問文を「？」で表し、感嘆文を「！」で表したボタン等が考えられる。ユーザがタッチペンで「？」を指定することで、「疑問文モード」の状態が保持される。
【００４１】
また、本実施の形態においても、モードの設定用のボタンとしては、種々のボタンが考えられる。例えば、特定の複数のボタンをそれぞれ「疑問文」ボタン、「感嘆文」ボタンのように用意することで、所望の符号を挿入するようにしてもよい。また、例えば、１つのボタンをループ処理のスイッチとして用い、「肯定文→疑問文→感嘆文→肯定文→…」のように複数回ボタンを押すことで、挿入する符号を切り替えるようにしてもよい。
【００４２】
符号指定部１２は、モードの指定に従って、感嘆符や疑問符等の対応する符号を決定して、系列評価部６と符号自動挿入部１１に該当する符号が指定されたことを参照できるように、符号状態を保持しておく。
【００４３】
ステップＳ15において音声入力が発生したことが検出されるまで、モードの設定を受け付ける。ステップＳ15において音声入力が発生すると、制御部３は取込んだ音声を音声認識部５に与える。ステップＳ1 〜Ｓ3 の音声認識及び認識文字列候補の選択処理は第１の実施の形態と同様である。
【００４４】
本実施の形態においては、符号自動挿入部１１は、符号の挿入位置を検出すると共に、ステップＳ16においてモードが保持されているか否かを判定し、モードが保持されていない場合には通常の符号を自動挿入し、モードが保持されている場合にはモードに応じた符号を自動挿入する。
【００４５】
例えば、図３の例において、ステップＳ11でモード指定が行われていない場合には、符号自動挿入部１１は、「いい」の後に「。」を挿入して、「今日は天気がいい。」の認識文字列をテキスト格納部８に格納させる。一方、図３の例において、ステップＳ11で例えば疑問文のモード指定が行われている場合には、符号自動挿入部１１は、「いい」の後に「？」を挿入して、「今日は天気がいい？」の認識文字列をテキスト格納部８に格納させる。また、テキスト格納部８に格納された認識文字列は、表示部４に与えられて画面表示される。
【００４６】
以後同様の動作が繰返されて、文章毎にモードの指定が受け付けられて、ユーザが希望する符号が文章中に挿入される。
【００４７】
このように本実施の形態においては、ユーザが音声入力前に文章のモードを指定しておくことで、ユーザが希望する符号を文章に挿入することが可能である。この場合においても、モードの指定はＧＵＩを利用した極めて簡単な操作で行われる。
【００４８】
なお、本実施の形態においては、音声入力前に指定されたモードに応じて符号を挿入する例について説明したが、音声入力後であっても符号を自動挿入する前であれば、モードの指定を受け付けて、指示されたモードを保持し、保持したモードに応じた符号を自動挿入するようにしてもよい。
【００４９】
更に、第１の実施の形態と組み合わせて、符号を自動挿入した後であっても、次の音声入力前に符号切替指示があれば、符号を指示された符号に切り替えるようにしてもよいことは明らかである。
【００５０】
図６は本発明の第３の実施の形態に係るディクテーション方法を示すフローチャートである。なお、本実施の形態におけるハードウェア構成は図１に示す第１の実施の形態と同様のものを採用することができる。図６において図２と同一の処理には同一符号を付して説明を省略する。
【００５１】
本実施の形態は、ユーザによる符号切替えの指示を、最適候補列の選択に反映させることにより、音声認識精度を向上させるようにしたものである。即ち、符号切替えの指示が発生すると、制御部３は、この指示に基づいて最適候補系列の再選択を行わせるようになっている。
【００５２】
図６はステップＳ21〜Ｓ23の処理を付加した点が図２のフローと異なる。ステップＳ21は、ユーザによる符号切替指示が発生した場合に、指定された符号状態を保持する処理である。ステップＳ22では、系列評価部６は、指定された符号状態に従って、候補系列を再度評価し、最適候補系列を再選択するようになっている。ステップＳ23では、符号自動挿入部７によって、符号状態に応じた符号を自動挿入するようになっている。
【００５３】
次に、このように構成された実施の形態の動作について図７の説明図を参照して説明する。
【００５４】
ステップＳ1 〜Ｓ6 の処理は第１の実施の形態と同様である。即ち、音声入力部１を介して音声が入力されると、制御部３は、入力音声を音声認識部５に与えて認識文字列の候補を作成させる。音声認識部５は、基本的には、単語毎に入力の音声と類似している単語文字列（ひらがな）を推定する。
【００５５】
いま、例えば「はわいにいきますか」という音声が入力されるものとする。図７はこの場合の音声認識部５における認識結果の一例を示している。図７では、音声認識部５は、「はわい」の発声に対して、「ハワイ」及び「河合」の文字列を単語候補として作成し、「いきます」の発声に対して、「行きます」及び「来ます」の文字列を単語候補として作成し、「か」の発声に対して、「か」及び「が」の文字列を単語候補として作成したことを示している。
【００５６】
系列評価部６は、音声認識部５からの認識文字列の候補を言語的に評価し、つながりやすさ等を評価して評価順位をつける。系列評価部６は最適な認識文字列の候補を選択して、符号自動挿入部７に与える。
【００５７】
図７の認識文字列の候補が音声認識部４から出力された場合には、系列評価部６は、各認識文字列候補の組み合わせに対して評価順位をつけることにより、例えば、結果的に「ハワイに行きますが」を最適な認識文字列候補として選択する。符号自動挿入部７は、系列評価部６からの最適な認識文字列候補の文章としての区切り位置を検出し、最適な符号（句読点）を挿入して、制御部３に渡す。
【００５８】
例えば、符号自動挿入部７は、系列評価部６からの最適認識文字列が「ハワイに行きますが」であった場合には、「が」の後方位置を読点の挿入位置であるものと判定して、読点「、」を「が」の後に自動挿入して、「ハワイに行きますが、」という文字列をテキスト格納部８に記憶させる。
【００５９】
ここで、ユーザが操作入力部２を操作して、例えばＧＵＩ画面上の疑問符の表示を選択して疑問符を指示するものとする。制御部は３、ステップＳ21において疑問符の符号状態を保持し、ステップＳ22において、系列評価部６に認識文字列の候補の評価のやり直しを指示する。
【００６０】
系列評価部６は、再度、認識文字列の候補を言語的に評価する。この場合には、系列評価部６は、文字列「ハワイに行きますが」の後に疑問符が付加されるべきであること、即ち、文字列「ハワイに行きますが」が疑問文であることを、認識文字列候補の評価に使用する。例えば、系列評価部６は、文章の最後の文字「が」を、疑問文の最後の文字として妥当な「か」に置き換えた最適認識文字列「ハワイに行きますか」を得る。
【００６１】
次に、符号自動挿入部７は、ステップＳ23において、系列評価部６が決定した最適認識文字列の符号の挿入位置を検出し、この位置に、ユーザが指定した疑問符を挿入する。こうして、疑問符が付加された文字列「ハワイに行きますか？」がテキスト格納部８に格納されると共に、表示部４の画面上に表示される。
【００６２】
このように、本実施の形態においては、ユーザが符号切替ボタン等を操作して、文章に付加する符号を指定した場合には、この指定に応じて、系列評価部６に再度、認識文字列の系列候補の評価をやり直させている。これにより、ユーザが希望する符号が付加されるだけでなく、認識結果をユーザが指定した符号に応じた文章に切り替えることができる。こうして、音声認識精度を向上させることが可能である。
【００６３】
図８は本発明の第４の実施の形態に係るディクテーション方法を示すフローチャートである。なお、本実施の形態におけるハードウェア構成は図４に示す第２の実施の形態と同様のものを採用することができる。
【００６４】
本実施の形態においても、ユーザによる符号切替えの指示を最適候補列の選択に反映させることにより、音声認識精度を向上させるようにしたものである。
【００６５】
図８はステップＳ3 に代えてステップＳ25の処理を採用した点が図５のフローと異なる。図８において図５と同一の処理には同一符号を付して説明を省略する。
【００６６】
ステップＳ25は、ユーザによるモードの指定指示が発生した場合に、指定されたモードに従って、候補系列を評価し、最適候補系列を選択するようになっている。
【００６７】
次に、このように構成された実施の形態の動作について説明する。
【００６８】
ステップＳ11〜Ｓ15及びステップＳ1 ，Ｓ2 までの処理は第２の実施の形態と同様である。即ち、制御部３は、音声の入力がない状態で、操作入力部１によってモードが指定されると、符号指定部１２によってモードを保持させる。符号指定部１２は、モードの指定に従って、感嘆符や疑問符等の対応する符号を決定して、系列評価部６と符号自動挿入部１１に該当する符号が指定されたことを参照できるように、符号状態を保持しておく。
【００６９】
ステップＳ15において音声入力が発生すると、音声認識部５は音声認識を行い、認識文字列候補を作成する。本実施の形態においては、次のステップＳ25において、系列評価部６は、認識文字列候補の評価に際して、モードの指定を参照する。
【００７０】
例えば、「はわいにいきますか」という音声入力に対して、図７に示す認識結果が得られたものとする。モードの指定がない場合には、系列評価部６は、音声認識部５からの認識文字列の候補を言語的に評価し、つながりやすさ等に応じた評価順位をつけ、最適な認識文字列の候補を選択する。例えば、図７の例では、系列評価部６は、「ハワイに行きますが」を最適な認識文字列候補として選択する。
【００７１】
これに対し、ユーザがモードとして疑問文を指定しているものとする。この場合には、系列評価部６は、音声認識文字列が疑問文を構成するものであると判断して、言語的な評価を行う。そうすると、系列評価部６は、例えば、「ハワイに行きますか」という最適認識文字列を得る。
【００７２】
更に、符号自動挿入部７は、系列評価部６が決定した最適認識文字列の符号の挿入位置を検出し、この位置に、ユーザが指定した疑問符を挿入する。こうして、疑問符が付加された文字列「ハワイに行きますか？」がテキスト格納部８に格納されると共に、表示部４の画面上に表示される。
【００７３】
このように、本実施の形態においては、文章のモードを指定することにより、系列評価部６の評価時に文章のモードが考慮される。これにより、認識文字列の系列候補の評価が適切なものとなり、音声認識精度を向上させることが可能である。
【００７４】
【発明の効果】
以上説明したように本発明によれば、グラフィカルユーザインタフェースを用いて、テキスト化する文章に簡単な操作で所定の符号を挿入可能にすると共に、挿入する符号に応じた言語処理を可能にすることにより音声認識精度を向上させることができるという効果を有する。
【図面の簡単な説明】
【図１】本発明の第１の実施の形態に係るディクテーション装置を示すブロック図。
【図２】第１の実施の形態の動作を説明するためのフローチャート。
【図３】第１の実施の形態の動作を説明するための説明図。
【図４】本発明の第２の実施の形態を示すブロック図。
【図５】第２の実施の形態の動作を説明するためのフローチャート。
【図６】本発明の第３の実施の形態に係るディクテーション方法を示すフローチャート。
【図７】第３の実施の形態の動作を説明するための説明図。
【図８】本発明の第４の実施の形態に係るディクテーション方法を示すフローチャート。
【符号の説明】
１…音声入力部、２…操作入力部、３…制御部、４…表示部、５…音声認識部、６…系列評価部、７…符号自動挿入部、８…テキスト格納部。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a dictation apparatus, method, and program that facilitate insertion of codes such as exclamation marks and question marks.
[0002]
[Prior art]
In recent years, with the advancement of speech recognition technology, a dictation system that generates text by speech input has been developed and put into practical use. In the dictation system, text that has been conventionally input using a keyboard or the like is input by voice. Basically, the speech input character string (Hiragana string) captured by the acoustic analysis unit is converted into an appropriate kana-kanji mixed character string by the language processing unit using a language model that expresses the ease of connection between words. By doing so, the text as the user spoke is converted into text.
[0003]
For punctuation included in sentences during speech-to-text conversion, the specified sentence is assigned by assigning a reading such as “maru” as the punctuation mark and “ten” as the punctuation mark, and causing the user to pronounce “maru” or “ten”. Punctuation marks are inserted at the breaks. For example, Japanese Patent Laid-Open No. 2000-29496 discloses an apparatus that automatically generates punctuation marks by continuous speech recognition.
[0004]
However, it is a burden on the user to read punctuation marks that are not originally uttered in daily conversation or reading a sentence. In addition, when a prerecorded voice such as a lecture or an interview is played back and input to a voice recognition engine, text without any punctuation marks is generated.
[0005]
Therefore, recently, by expressing the ease of connection between words and punctuation marks in a language model as well as the ease of connection between words, predicting where punctuation marks are likely to occur and automatically inserting punctuation marks at appropriate sentence breaks Technology has been developed.
[0006]
[Problems to be solved by the invention]
By the way, there are an exclamation mark, a question mark, and the like as a code indicating a sentence break as well as a punctuation mark. However, with the automatic insertion technique, it is extremely difficult to select these codes appropriately and insert them into the text. In addition, some sentences express the meaning of the question by strengthening or raising the end of the sentence while having a positive sentence structure. Since the dictation system does not consider intonation, automatic recognition processing of exclamation marks and question marks is even more difficult when recognizing such sentences.
[0007]
Therefore, in order to insert a code such as an exclamation mark or a question mark as an appropriate sentence break, move the cursor in the text to the insertion position using the keyboard, and press the key corresponding to the code to be inserted. It is necessary to operate. In spite of the dictation in which text is input only by voice without touching the keyboard, the user needs to perform a complicated key input operation.
[0008]
The present invention has been made in view of such problems, and allows a predetermined code to be inserted into a sentence to be converted into text by a simple operation using a graphical user interface, and language processing corresponding to the inserted code. It is an object of the present invention to provide a dictation apparatus, method, and program capable of improving voice recognition accuracy by making possible.
[0009]
[Means for Solving the Problems]
The dictation apparatus according to the present invention includes a voice capturing unit that captures a voice, a voice recognition unit that recognizes the voice captured by the voice capturing unit and converts the voice into a recognized character string, and a recognition character string candidate sequence. A sequence evaluation unit that evaluates and selects, a code automatic insertion unit that detects a sentence break position of the recognized character string selected by the sequence evaluation unit, and inserts a code at the break position; and the recognition character string is displayed. Display control means for displaying on the screen, and before being displayed on the display screen District Inserted at the cutting position The sign To be inserted or inserted at the break position Exclamation mark or question mark To insert Exclamation mark or question mark A graphical user interface for configuring The code automatic insertion means includes According to the graphical user interface Exclamation mark or question mark Inserted at the break position according to the setting of The sign Or according to the setting by the graphical user interface Exclamation mark or question mark Is inserted at the delimiter The sequence evaluation means considers the inserted exclamation mark or question mark when changing the sign inserted at the delimiter position according to the setting of the exclamation mark or question mark by the graphical user interface. Re-evaluate the recognition character string candidate series and reselect the recognition character string This is a dictation device.
[0010]
Note that the present invention relating to an apparatus is also established as an invention relating to a method.
[0011]
Further, the present invention relating to the apparatus is also realized as a program for causing a computer to execute processing corresponding to the present invention.
[0012]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing a dictation apparatus according to a first embodiment of the present invention.
[0013]
In this embodiment, a code such as a punctuation mark automatically inserted into a speech recognition character string can be switched to a desired code by an operation using a GUI (graphical user interface).
[0014]
The dictation apparatus in FIG. 1 is realized by an information processing apparatus such as a personal computer. The voice input unit 1 is for inputting voice, and is an input device including a microphone or the like, for example. The operation input unit 2 is for operating an application displayed on the screen, and is configured by, for example, a keyboard, a mouse, a tablet, and the like. In particular, a mouse, a tablet, and the like can operate buttons and the like displayed on a GUI. Further, a touch panel configured on the display screen of the display unit 4 may be employed as the operation input unit 2.
[0015]
The control unit 3 controls the entire apparatus, and is configured by, for example, a central processing unit (CPU). The control unit 3 controls the speech recognition unit 5, the sequence evaluation unit 6, the code automatic insertion unit 7, the text storage unit 8, and the code switching unit 9 in order to execute dictation.
[0016]
The display unit 4 is for displaying a character string or the like of text converted by voice recognition, and is configured by a flat panel display device such as a CRT display device or a liquid crystal display device.
[0017]
The speech recognition unit 5, the sequence evaluation unit 6, the automatic code insertion unit 7, and the text storage unit 8 are functional elements necessary for the dictation process, and are realized by a unique processing routine and a CPU that executes the processing routine, respectively. Can do.
[0018]
The speech recognition unit 5 recognizes the speech input from the speech input unit 1, converts it to a combination recognition sequence of text character string candidates, and outputs it. The sequence evaluation unit 6 linguistically evaluates the recognition sequence of the text character string converted by the speech recognition unit 5 and outputs an optimal recognition sequence with a rank in order to select an appropriate recognition sequence.
[0019]
The automatic code insertion unit 7 detects a sentence break position of the optimum recognition sequence selected by the series evaluation unit 6 and inserts a normal code (punctuation mark) at the break position.
[0020]
The text storage unit 8 stores a text character string at each stage in the dictation process. For example, the text storage unit 8 stores a text character string that is a signed recognition sequence output from the automatic code insertion unit 7.
[0021]
The code switching unit 9 switches the code to be inserted at the sentence break position of the signed recognition sequence text stored in the text storage unit 8 from a normal code (punctuation mark) to another question mark, exclamation mark, etc. The data is stored in the storage unit 8.
[0022]
Next, the operation of the embodiment configured as described above will be described with reference to the flowchart of FIG. 2 and the explanatory diagram of FIG.
[0023]
When a voice is input via the voice input unit 1, the control unit 3 gives the input voice to the voice recognition unit 5 in step S1. The speech recognition unit 5 recognizes the input speech, creates a recognition character string candidate, and outputs it to the sequence evaluation unit 6 (step S2).
[0024]
The speech recognition unit 5 basically estimates a word character string (Hiragana) that is similar to the input speech for each word. For example, when a voice saying “Kyoto is good” is input, the voice recognition unit 5 selects word candidates such as “today”, “Kyo”, “calling” and “dexterity” for the voice of “Kyo”. In addition, word candidates such as “weather”, “turning point”, and “postponement” are created for the voice of “Tenki”. FIG. 3 shows the recognition character string candidates in this case connected by lines.
[0025]
The series evaluation unit 6 linguistically evaluates the recognized character string candidates given from the speech recognition unit 5, evaluates the ease of connecting words between words, and assigns an evaluation rank as a word candidate to be selected. Then, the candidate with the first evaluation rank is selected as the optimum recognition character string candidate. For example, when the recognized character string candidates shown in FIG. 3 are output from the speech recognition unit 4, evaluation ranks are given to combinations of recognized character string candidates indicated by lines connecting the words. The series evaluation unit 6 selects “the weather is good today”, which is a combination of the recognized character string candidates that are ranked first as the evaluation rank, as an optimum recognized character string candidate, and gives the selected code to the automatic code insertion unit 7 (step S3).
[0026]
The code automatic insertion unit 7 detects the sentence break position of the optimum recognized character string candidate passed from the sequence evaluation unit 6 and inserts the optimum code (punctuation mark) (step S4). For example, in the example of FIG. 3, when the recognition character string candidate “weather is good today” is passed from the series evaluation unit 6, the code automatic insertion unit 7 detects the insertion position of the punctuation mark behind the “good” position. Then, it automatically inserts "The weather is good today."
[0027]
The control unit 3 stores a text character string in each stage of voice-text conversion in the text storage unit 8, and also stores a recognized character string with a sign from the automatic code insertion unit 7 in the text storage unit 8. Further, the control unit 3 gives the display unit 4 the text character string stored in the text storage unit 8 at each stage of the voice-text conversion and displays it.
[0028]
For example, the control unit 3 displays a hiragana character string “Kyou is good” sequentially according to the voice input in response to the voice input “Kyo is good”, and further, a sequence evaluation unit 6 displays “best weather today”, which is the optimum recognized character string candidate assigned the first rank as the evaluation rank. Further, the control unit 3 displays “Today's weather is good” which is the result of automatic code insertion in the code automatic insertion unit 7 on the display screen.
[0029]
If the user has not issued a code switching instruction or a voice capture end instruction for voice-text conversion, the process proceeds to step S9 via steps S5 and S8, and enters a standby state for the next voice input. When the next voice is input through the voice input unit 1, the process returns to step S1 and the same process is repeated, and the voice recognition character string is successively stored in the text storage unit 8, and the display unit 4 The recognition character string is displayed on the display screen. Also, when an instruction to end capture is generated by the user, the process is terminated from step S8.
[0030]
Now, it is assumed that the user confirms a recognized character string as a recognition result on the display screen of the display unit 4 and changes the automatically inserted code. For example, it is assumed that the punctuation mark is changed to a question mark for the recognized character string result “The weather is good today” on the screen in the example of FIG. 3. In this case, the user presses a button for inserting a question mark displayed on the screen, for example, by operating the operation input unit 2 such as a touch panel with a touch pen.
[0031]
The controller 3 detects that a code switching instruction has occurred in step S5, and shifts the processing from step S6 to step S7. Thus, in step S7, the reading mark “.” Is changed to the question “?”. The changed recognition character string “Is the weather good today?” Is stored in the text storage unit 8 and also given to the display unit 4 and displayed on the screen.
[0032]
Note that the user can insert not only a question mark but also an exclamation mark and other arbitrary codes by operating the operation input unit 2. For example, a desired code may be inserted by preparing a plurality of specific buttons such as a “question mark” button and an “exclamation mark” button, respectively. Further, for example, one button may be used as a switch for loop processing, and the inserted code may be switched by pressing the button a plurality of times such as “punctuation mark → question mark → exclamation mark → phrase mark →...
[0033]
Thus, in the present embodiment, the automatically inserted code can be changed to a desired code. The change of the code is performed by the user's GUI operation, and the changed text is stored in the text storage unit 8 and displayed on the display screen. A desired code can be inserted at a sentence break position without hitting a key on the keyboard, and the operability is excellent.
[0034]
FIG. 4 is a block diagram showing a second embodiment of the present invention. In FIG. 4, the same components as those of FIG.
[0035]
In the first embodiment, the user instructs the switching of the code with reference to the recognized character string displayed on the screen. On the other hand, in this embodiment, before voice input, a sentence mode such as whether the sentence to be input is an affirmative sentence, a question sentence, or an exclamation sentence can be designated and designated. The code corresponding to the selected mode is automatically inserted.
[0036]
The present embodiment is different from the first embodiment in that a code designating unit 12 is used instead of the code switching unit 9 and an automatic code insertion unit 11 having an operation different from the automatic code insertion unit 7 is adopted. . In response to the user's operation on the operation input unit 2, the code designating unit 12 determines in advance a code to be inserted into the recognized character string for a sentence to be converted into speech-text before speech input.
[0037]
The code automatic insertion unit 11 detects a break position from the sentence of the optimum recognition sequence selected by the sequence evaluation unit 6, and inserts a code at the detected break position. In this case, the code automatic insertion unit 11 inserts a normal code if the code to be inserted is not specified by the code specifying unit 12, and inserts the specified code if specified. It is supposed to be.
[0038]
Next, the operation of the embodiment configured as described above will be described with reference to the flowchart of FIG.
[0039]
The control unit 3 determines whether or not an operation input for designating a text mode has occurred before voice input (step S11). When a mode is designated by the operation input unit 1 in the absence of voice input, if the designated mode is not held (step S12), the code designation unit 12 holds the mode in step S13. .
[0040]
Also in the present embodiment, the user can specify a mode by a GUI operation. A mode is designated by operating a touch panel as the operation input unit 2 with a touch pen and pressing a mode setting button displayed on the screen, for example. As a mode setting button, for example, a question mark is represented by “?” And an exclamation sentence is represented by “!”. When the user designates “?” With the touch pen, the “question sentence mode” state is maintained.
[0041]
Also in the present embodiment, various buttons are conceivable as the mode setting buttons. For example, a desired code may be inserted by preparing a plurality of specific buttons such as a “question sentence” button and an “exclamation sentence” button, respectively. Also, for example, one button may be used as a loop processing switch, and the inserted code may be switched by pressing the button a plurality of times such as “affirmative sentence → question sentence → exclamation sentence → affirmative sentence →... Good.
[0042]
The code specifying unit 12 determines a corresponding code such as an exclamation mark or a question mark according to the specification of the mode so that the corresponding code is specified in the sequence evaluation unit 6 and the automatic code insertion unit 11. The sign state is retained.
[0043]
Mode setting is accepted until it is detected in step S15 that voice input has occurred. When a voice input is generated in step S15, the control unit 3 gives the captured voice to the voice recognition unit 5. The speech recognition and recognition character string candidate selection processes in steps S1 to S3 are the same as those in the first embodiment.
[0044]
In the present embodiment, the automatic code insertion unit 11 detects the insertion position of the code and determines whether or not the mode is held in step S16. If the mode is not held, the normal code is inserted. Is automatically inserted, and if the mode is maintained, a code corresponding to the mode is automatically inserted.
[0045]
For example, in the example of FIG. 3, when the mode is not designated in step S11, the code automatic insertion unit 11 inserts “.” After “good”, and “the weather is good today”. The recognized character string is stored in the text storage unit 8. On the other hand, in the example of FIG. 3, when the question sentence mode is specified in step S11, the code automatic insertion unit 11 inserts “?” After “good” Is stored in the text storage unit 8. The recognized character string stored in the text storage unit 8 is given to the display unit 4 and displayed on the screen.
[0046]
Thereafter, the same operation is repeated so that the mode designation is accepted for each sentence, and the code desired by the user is inserted into the sentence.
[0047]
As described above, in the present embodiment, it is possible to insert a code desired by the user into the sentence by designating the mode of the sentence before the user inputs a voice. Even in this case, the mode is specified by an extremely simple operation using the GUI.
[0048]
In this embodiment, the example in which the code is inserted according to the mode specified before the voice input has been described. However, even after the voice input, before the code is automatically inserted, the mode is specified. May be received, the instructed mode may be retained, and a code corresponding to the retained mode may be automatically inserted.
[0049]
Further, in combination with the first embodiment, even after a code is automatically inserted, if there is a code switching instruction before the next voice input, the code may be switched to the designated code. Is clear.
[0050]
FIG. 6 is a flowchart showing a dictation method according to the third embodiment of the present invention. Note that the hardware configuration in this embodiment can be the same as that in the first embodiment shown in FIG. In FIG. 2 The same processes as those in FIG.
[0051]
In this embodiment, the voice recognition accuracy is improved by reflecting the code switching instruction by the user in the selection of the optimum candidate string. That is, when a code switching instruction is generated, the control unit 3 is configured to reselect an optimum candidate sequence based on the instruction.
[0052]
FIG. 6 shows that the processing of steps S21 to S23 is added. 2 The flow is different. Step S21 is processing for retaining the designated code state when a code switching instruction is issued by the user. In step S22, the sequence evaluation unit 6 re-evaluates the candidate sequence according to the designated code state, and reselects the optimum candidate sequence. In step S23, the code automatic insertion unit 7 automatically inserts a code corresponding to the code state.
[0053]
Next, the operation of the embodiment configured as described above will be described with reference to the explanatory diagram of FIG.
[0054]
The processing in steps S1 to S6 is the same as that in the first embodiment. That is, when a voice is input via the voice input unit 1, the control unit 3 gives the input voice to the voice recognition unit 5 to create a recognized character string candidate. The speech recognition unit 5 basically estimates a word character string (Hiragana) that is similar to the input speech for each word.
[0055]
For example, it is assumed that a voice “Would you like to go?” Is input. FIG. 7 shows an example of a recognition result in the voice recognition unit 5 in this case. In FIG. 7, the voice recognition unit 5 creates the word strings “Hawaii” and “Kawai” as word candidates for the utterance of “Hawaii”, and “goes” for the utterance of “Iki”. The character strings "" and "coming" are created as word candidates, and the character strings "ka" and "ga" are created as word candidates in response to the utterance of "ka".
[0056]
The series evaluation unit 6 linguistically evaluates the recognition character string candidates from the speech recognition unit 5, evaluates the ease of connection, and gives an evaluation rank. The sequence evaluation unit 6 selects an optimum recognized character string candidate and gives it to the code automatic insertion unit 7.
[0057]
When the recognition character string candidates of FIG. 7 are output from the speech recognition unit 4, the series evaluation unit 6 assigns an evaluation rank to each combination of recognition character string candidates, for example, as a result of “ Go to Hawaii "is selected as the best recognition string candidate. The automatic code insertion unit 7 detects the break position as the sentence of the optimum recognized character string candidate from the sequence evaluation unit 6, inserts the optimum code (punctuation mark), and passes it to the control unit 3.
[0058]
For example, when the optimum recognition character string from the sequence evaluation unit 6 is “I will go to Hawaii”, the automatic code insertion unit 7 determines that the position behind “ga” is the insertion position of the reading mark. Then, the punctuation mark “,” is automatically inserted after “ga”, and the text string “I will go to Hawaii,” is stored in the text storage unit 8.
[0059]
Here, it is assumed that the user operates the operation input unit 2 and selects, for example, display of a question mark on the GUI screen to indicate the question mark. The control unit 3 retains the sign state of the question mark in step S21, and instructs the sequence evaluation unit 6 to re-evaluate the recognition character string candidate in step S22.
[0060]
The series evaluation unit 6 linguistically evaluates the recognized character string candidates again. In this case, the series evaluation unit 6 confirms that a question mark should be added after the character string “I will go to Hawaii”, that is, the character string “I will go to Hawaii” is a question sentence. Used to evaluate recognition character string candidates. For example, the series evaluation unit 6 obtains the optimum recognition character string “Would you like to go to Hawaii” in which the last character “GA” of the sentence is replaced with “KA” that is reasonable as the last character of the question sentence.
[0061]
Next, in step S23, the code automatic insertion unit 7 detects the code insertion position of the optimally recognized character string determined by the sequence evaluation unit 6, and inserts a question mark designated by the user at this position. In this way, the character string “Would you like to go to Hawaii?” With the question mark added is stored in the text storage unit 8 and displayed on the screen of the display unit 4.
[0062]
As described above, in the present embodiment, when the user operates the code switching button or the like and designates a code to be added to the sentence, the recognition character string is again given to the series evaluation unit 6 according to this designation. Reassess the evaluation of candidate series. Thus, not only the code desired by the user is added, but also the recognition result can be switched to a sentence corresponding to the code designated by the user. Thus, the voice recognition accuracy can be improved.
[0063]
FIG. 8 is a flowchart showing a dictation method according to the fourth embodiment of the present invention. Note that the hardware configuration in this embodiment can be the same as that in the second embodiment shown in FIG.
[0064]
Also in the present embodiment, the voice recognition accuracy is improved by reflecting the code switching instruction by the user in the selection of the optimum candidate string.
[0065]
FIG. 8 differs from the flow of FIG. 5 in that the process of step S25 is adopted instead of step S3. In FIG. 8, the same processes as those in FIG.
[0066]
In step S25, when a user designates a mode, a candidate sequence is evaluated according to the designated mode and an optimum candidate sequence is selected.
[0067]
Next, the operation of the embodiment configured as described above will be described.
[0068]
The processes from steps S11 to S15 and steps S1 and S2 are the same as those in the second embodiment. That is, when the mode is designated by the operation input unit 1 in a state in which no voice is input, the control unit 3 causes the code designation unit 12 to hold the mode. The code specifying unit 12 determines a corresponding code such as an exclamation mark or a question mark according to the specification of the mode so that the corresponding code is specified in the sequence evaluation unit 6 and the automatic code insertion unit 11. The sign state is retained.
[0069]
When voice input is generated in step S15, the voice recognition unit 5 performs voice recognition and creates a recognized character string candidate. In the present embodiment, in the next step S25, the sequence evaluation unit 6 refers to the mode designation when evaluating the recognized character string candidates.
[0070]
For example, it is assumed that the recognition result shown in FIG. 7 is obtained for a voice input “Would you like to go?” When the mode is not specified, the series evaluation unit 6 evaluates the recognition character string candidates from the speech recognition unit 5 linguistically, assigns an evaluation rank according to the ease of connection, and the like, and determines the optimum recognition character string. Select candidates for. For example, in the example of FIG. 7, the series evaluation unit 6 selects “I'm going to Hawaii” as the optimum recognition character string candidate.
[0071]
On the other hand, it is assumed that the user specifies a question sentence as a mode. In this case, the sequence evaluation unit 6 determines that the speech recognition character string constitutes a question sentence, and performs linguistic evaluation. Then, the sequence evaluation unit 6 obtains the optimum recognition character string “Would you like to go to Hawaii”, for example.
[0072]
Further, the automatic code insertion unit 7 detects the insertion position of the code of the optimum recognition character string determined by the sequence evaluation unit 6, and inserts a question mark designated by the user at this position. Thus, the character string “Would you like to go to Hawaii?” With the question mark added is stored in the text storage unit 8 and displayed on the screen of the display unit 4.
[0073]
As described above, in the present embodiment, by specifying the text mode, the text mode is taken into consideration at the time of evaluation by the series evaluation unit 6. Thereby, the evaluation of the recognition character string sequence candidates becomes appropriate, and the speech recognition accuracy can be improved.
[0074]
【The invention's effect】
As described above, according to the present invention, it is possible to insert a predetermined code into a sentence to be converted into text by a simple operation using a graphical user interface and to enable language processing according to the inserted code. Thus, the voice recognition accuracy can be improved.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a dictation apparatus according to a first embodiment of the present invention.
FIG. 2 is a flowchart for explaining the operation of the first embodiment;
FIG. 3 is an explanatory diagram for explaining the operation of the first embodiment;
FIG. 4 is a block diagram showing a second embodiment of the present invention.
FIG. 5 is a flowchart for explaining the operation of the second embodiment;
FIG. 6 is a flowchart showing a dictation method according to a third embodiment of the present invention.
FIG. 7 is an explanatory diagram for explaining the operation of the third embodiment;
FIG. 8 is a flowchart showing a dictation method according to a fourth embodiment of the present invention.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Voice input part, 2 ... Operation input part, 3 ... Control part, 4 ... Display part, 5 ... Speech recognition part, 6 ... Series evaluation part, 7 ... Code automatic insertion part, 8 ... Text storage part.

Claims

Audio capturing means for capturing audio;
Speech recognition means for recognizing and converting the speech captured by the speech capture means into a recognized character string;
A sequence evaluation means for evaluating and selecting a candidate sequence of the recognized character string;
A code automatic insertion means for detecting a sentence break position of the recognized character string selected by the series evaluation means and inserting a code at the break position;
Display control means for displaying the recognized character string on a display screen;
To change the code inserted before SL delimited position to be displayed on the display screen, or to insert an exclamation point or question mark is inserted into the separator position, sets the exclamation point or question mark A graphical user interface for
Comprising
The sign automatic insertion means changes the sign inserted at the delimiter position according to the setting of the exclamation mark or question mark by the graphical user interface, or changes the exclamation mark or question mark according to the setting by the graphical user interface. Insert it at the break ,
The sequence evaluation means takes into account the inserted exclamation mark or question mark when changing the sign inserted at the delimiter according to the setting of the exclamation mark or question mark by the graphical user interface. A dictation apparatus characterized by re-evaluating a sequence of candidate sequences and reselecting a recognized character string .

Audio capturing means for capturing audio;
Speech recognition means for recognizing and converting the speech captured by the speech capture means into a recognized character string;
A sequence evaluation means for evaluating and selecting a candidate sequence of the recognized character string;
A code automatic insertion means for detecting a sentence break position of the recognized character string selected by the series evaluation means and inserting a code at the break position;
Display control means for displaying the recognized character string on a display screen;
The exclamation mark or question mark is set to change the sign inserted at the break position displayed on the display screen or to insert the exclamation mark or question mark inserted at the break position A graphical user interface for
Comprising
When inserting the exclamation mark or question mark at the delimiter position according to the setting by the graphical user interface, the series evaluation means considers the inserted exclamation mark or question mark and selects the recognition character string candidate series. A dictation device characterized in that a recognition character string is selected by evaluation .

Audio capturing means for capturing audio;
Speech recognition means for recognizing the speech captured by the speech capture means and converting it into a recognition character string candidate sequence;
A sequence evaluation means for evaluating and selecting a candidate sequence of the recognized character string;
A code automatic insertion means for detecting a sentence break position of the recognized character string selected by the series evaluation means and inserting a code at the break position;
A graphical user interface for setting an exclamation mark or a question mark to be inserted at the separation position selected by the series evaluation means;
Code switching means for switching and inserting the code inserted by the code automatic insertion means to the exclamation mark or question mark set by the graphical user interface;
Comprising
When the exclamation mark or question mark set by the graphical user interface is inserted at the delimiter position, the series evaluation means considers the inserted exclamation mark or question mark and selects the recognition character string candidate series. A dictation device characterized by re-evaluating and reselecting a recognized character string .

Audio capturing means for capturing audio;
A graphical user interface for setting an exclamation mark or question mark to indicate the sentence break to be inserted at the string break position;
Speech recognition means for recognizing the speech captured by the speech capture means and converting it into a recognition character string candidate sequence;
A sequence evaluation means for evaluating and selecting a candidate sequence of the recognized character string;
A code automatic insertion means for detecting a sentence break position of the recognized character string selected by the series evaluation means and inserting a code at the break position;
Comprising
When the exclamation mark or question mark set by the graphical user interface is inserted at the delimiter position, the series evaluation means considers the exclamation mark or question mark to be inserted, and the recognition character string candidate series A dictation device characterized by selecting a recognition character string by evaluating

A dictation method using speech capturing means, speech recognition means, sequence evaluation means, code automatic insertion means, graphical user interface, and code switching means,
A voice capturing step in which the voice capturing means captures a voice;
A speech recognition step in which the speech recognition means recognizes the speech captured in the speech capture step and converts it into a candidate sequence of recognized character strings;
A sequence evaluation step in which the sequence evaluation means evaluates and selects a candidate sequence of the recognized character string;
The code automatic insertion means detects a sentence break position of the recognized character string selected by the series evaluation means, and automatically inserts a code at the break position;
When the exchanging mark or question mark is set at the delimiter position by the graphical user interface for setting the code to be inserted at the delimiter position, the code switching means is inserted in the automatic code insertion step. A code switching step for switching and inserting a code to the set exclamation mark or question mark;
Comprising
When an exclamation mark or a question mark is set at the break position by the graphical user interface for setting a code to be inserted at the break position, the series evaluation means considers the inserted exclamation mark or question mark. The dictation method further comprises a sequence reevaluation step in which the recognition character string candidate sequences are reevaluated and the recognition character strings are selected again .

A dictation method using speech capturing means, graphical user interface, setting means, speech recognition means, sequence evaluation means, and code automatic insertion means,
The setting means detects that an exclamation mark or a question mark to be inserted at the delimiter position is set by the graphical user interface for setting a code indicating a delimiter of the sentence to be inserted at the delimiter position of the character string. , Keep the settings,
A voice capturing step in which the voice capturing means captures a voice;
A speech recognition step in which the speech recognition means recognizes the speech captured in the speech capture step and converts it into a candidate sequence of recognized character strings;
A sequence evaluation step in which the sequence evaluation means evaluates and selects a candidate sequence of the recognized character string;
The code automatic insertion means detects a sentence break position of the recognized character string selected by the series evaluation means, and automatically inserts the code at the break position;
Comprising
When the exclamation mark or question mark set by the graphical user interface is inserted at the delimiter position, the series evaluation means considers the exclamation mark or question mark to be inserted, and the recognition character string candidate series A dictation method characterized in that a recognition character string is selected by evaluating .

On the computer,
Audio capture processing to capture audio,
A speech recognition process for recognizing the speech captured in the speech capture processing and converting it into a recognition character string candidate sequence;
A sequence evaluation process for evaluating and selecting a candidate sequence of the recognized character string;
A code automatic insertion process for detecting a sentence break position of the recognized character string selected by the series evaluation means and inserting a code at the break position;
When the exclamation mark or question mark to be inserted at the delimiter position is set by the graphical user interface for setting the exclamation mark or question mark to be inserted at the delimiter position selected in the series evaluation process, the sign automatic A code switching process in which the code inserted in the insertion process is inserted by switching to a set exclamation mark or question mark; and
When the graphical user interface for setting an exclamation mark or a question mark indicating a sentence break to be inserted at the break position sets an exclamation mark or a question mark to be inserted at the break position, the exclamation mark to be inserted is inserted. A series reevaluation process in which the recognition character string candidate series is reevaluated in consideration of a mark or a question mark and the recognition character string is reselected
A dictation program for running

On the computer,
A process for detecting that an exclamation mark or a question mark to be inserted at the delimiter position is set by the graphical user interface for setting a code indicating the delimiter of the sentence to be inserted at the delimiter position of the character string, and holding the setting When,
Audio capture processing to capture audio,
A speech recognition process for recognizing the speech captured in the speech capture processing and converting it into a recognition character string candidate sequence;
A sequence evaluation process for evaluating and selecting a candidate sequence of the recognized character string;
A code automatic insertion process for detecting a sentence break position of the recognized character string selected in the series evaluation process and inserting the code at the break position;
And execute
In the series evaluation process, when the exclamation mark or question mark set by the graphical user interface is inserted at the delimiter position, the recognition character string candidate series is considered in consideration of the inserted exclamation mark or question mark. A dictation program characterized by evaluating a character string and selecting a recognized character string.