JP4131586B2

JP4131586B2 - Voice recognition device

Info

Publication number: JP4131586B2
Application number: JP18513098A
Authority: JP
Inventors: 紀子小山; 幸弘福永
Original assignee: Toshiba Corp; Toshiba Digital Media Engineering Corp
Current assignee: Toshiba Corp; Toshiba Development and Engineering Corp
Priority date: 1998-06-30
Filing date: 1998-06-30
Publication date: 2008-08-13
Anticipated expiration: 2018-06-30
Also published as: JP2000020085A

Description

【０００１】
【発明の属する技術分野】
本発明は、入力された音声信号を文字情報に変換する音声認識装置に係り、特に数字列を音声入力する場合に用いて好適な音声認識装置に関する。
【０００２】
【従来の技術】
従来、音声認識装置では、「千」、「万」などの位付きで数字列が音声入力された場合には、認識が終了した文字までを順に漢数字として表示するか、あるいは、その数字入力の終了後にまとめて画面上に表示する方法を採っていた。
【０００３】
【発明が解決しようとする課題】
しかしながら、上記した従来技術においては、本来、アラビア数字で入力すべき帳票等への入力の場合でも、漢数字表記で入力されることになる。このため、違和感があり、後に当該漢数字表記をアラビア数字に修正するなどの面倒な操作が必要であった。
【０００４】
また、数字入力の終了後にまとめて表示する方法では、入力途中でリアルタイムにその数字のアラビア表記を確認できないなどの問題があった。
なお、「千」、「万」などの位付きで数字列を入力せずに、例えば「１」、「０」、「０」、「０」…といったように、先頭から順に数字を読み上げるようにすれば、リアルタイムでアラビア数字を得ることができるが、文字列の桁数が多い場合に間違えて入力してしまう可能性がある。
【０００５】
また、例えば「キロ」などの単位接頭（単位に付けられる接頭語）を付けて数字を音声入力した場合、従来の音声認識装置では、認識結果として「キロ」といった単語としての表示しかできず、それを数値化した値で認識結果を確認することはできなかった。
【０００６】
本発明は上記のような点に鑑みなされたもので、「千」、「万」などの位付きで音声入力された数字列を認識する場合に、その認識結果をアラビア数字でリアルタイムに表示することのできる音声認識装置を提供することを目的とする。
【０００８】
本発明に係る音声認識装置は、複数の桁からなる数字列を位付けで先頭から順に音声入力する入力手段と、前記入力手段によって入力された音声を音声信号に変換し、この音声信号から数値を示す単語の読みと、位を示す単語の読みとを認識処理する音声認識手段と、数値を示す単語の読みとこの読みに対応する数値及び位を示す単語の読みが記憶された記憶手段と、前記音声認識手段によって認識処理された単語の読みが数値を示す単語の読みの場合には、前記記憶手段を参照して右詰めとなるように数値化し、位を示す単語の読みの場合、それ以前に数値化した値がある場合には、前記記憶手段を参照して位を示す単語の読みを数値化して前記それ以前に数値化した値と演算処理して数値を確定する数値表現手段と、前記数値表現手段によって数値化された数値を逐次アラビア数字で表示すると共に、当該数値の中で位が確定されている部分と、右詰めで表示される確定されていない部分とを区別して表示する表示手段とを具備したことを特徴としている。
【０００９】
このような構成によれば、音声入力された数字列の認識結果を常に確認することができ、また、アラビア数字を用いることで表中などの漢数字が不適な場所においても違和感なく表示することが可能となる。
【００１２】
【発明の実施の形態】
以下、図面を参照して本発明の実施の形態について説明する。
（第１の実施形態）
図１は本発明の第１の実施形態に係る音声認識装置の構成を示すブロック図である。なお、本装置は、例えば磁気ディスク等の記録媒体に記録されたプログラムを読み込み、このプログラムによって動作が制御されるコンピュータによって実現される。
【００１３】
図１において、入力装置１０１は、オペレータが発声した音声信号を電気的な音声信号に変換するマイクロフォン等である。入力装置１０１により得られた音声信号は音声認識部１０２にて、該当する単語として文字コードに変換される。この際、音声認識部１０２では、音声信号と単語の文字コードとの対応を記述した認識辞書１０３を参照する。
【００１４】
数値表現制御部１０４は、音声認識部１０２より出力される単語列から、「まん（万）」、「ひゃく（百）」等の位を示す数値表現を取り出す。この際、数値表現制御部１０４では、数値・位の読み、表記と共に、その値、例えば「おく（億）」であれば、「１００００００００」が格納されている数値表現テーブル１０５を参照する。
【００１５】
次に、数値表現処理部１０４では、一連の数値表現単語を入力時間の古い方から順にチェックし、０〜９の数字を表す単語であれば、それらの数値を確定前バッファＡ１０７に格納する。また、「千」、「百」、「十」の位を表す単語であれば、数値演算部１０６において確定前バッファＡ１０７に格納されている値とそれぞれの位を数値化したものとの積を取り、その結果を確定前バッファＢ１０８に格納すると共に確定前バッファＡ１０７の内容を初期化する。
【００１６】
ここで、位の直前が数値でない場合、すなわち、確定前バッファＡ１０７が空の場合には、１を代入した上で上記の演算を行う。
更に、「兆」、「億」、「万」の位を表す単語であれば、確定前バッファＡ１０７と確定前バッファＢ１０８の値の和とそれぞれの位を数値化したものとの積を取り、その結果を確定済みバッファ１０９に格納すると共に確定前バッファＡ１０７、確定前バッファＢ１０８の内容を初期化する。
【００１７】
また、１単語が処理される毎に確定前バッファＡ１０７、確定前バッファＢ１０８、確定済みバッファ１０９の値の和を表示用バッファ１１１に格納する。この際、表示属性付加部１１０は確定済みバッファ１０９より得た数字列に対しては確定済みである文字属性（例えば強調文字）を付与、確定前バッファＡ１０７および確定前バッファＢ１０８の数字列に対しては未確定であることを示す文字属性（例えば下線付き）を付与する。
【００１８】
表示装置１１２は、表示用バッファ１１１の情報に従って画面上に、認識結果として得られた数字列の確定済み部分と未確定部分をその表示属性に基づいて区別表示する。
【００１９】
次に、第１の実施形態の動作を説明する。
図２および図３は本発明の第１の実施形態における数字列の入力を行う際の処理の流れを示したフローチャートである。
【００２０】
入力装置１０１を通じてオペレータが音声入力した音声信号は認識バッファ１１３に格納される（ステップ２０１，２０２）。音声入力が終了であれば（ステップ２０３のｙｅｓ）、ここでの処理を終了する。それ以外であれば（ステップ２０３のｎｏ）、認識バッファ１１３に格納された音声信号を音声認識部１０２に与えて、認識辞書１０３を参照して単語に変換する（ステップ２０４）。
【００２１】
ここで、該単語が数値表現であった場合（ステップ２０５のｙｅｓ）、数値表現制御部１０４で該単語の直前の単語を調べ、それが数値表現以外である場合には（ステップ２０６のｙｅｓ）、レジスタＫ１、確定前バッファＡ１０７および確定前バッファＢ１０８、確定済みバッファ１０９を初期化する（ステップ２０７）。数値表現とは、数値表現テーブル１０５に記述されている単語であって、０〜９の数字の他に「十」、「百」、「千」、「万」などの位を表す単語も含まれる。
【００２２】
次に、入力された数値表現が数字であるか、「十」、「百」、「千」、「万」などの位であるかをチェックする（ステップ２０８）。数字であった場合には（ステップ２０８のｙｅｓ）、直前も数字であったかを調べ（ステップ２１９）、位等の数字以外であった場合には（ステップ２１９のｎｏ）、該数字を確定前バッファＡ１０７に格納する（ステップ２２０）。
【００２３】
一方、上記ステップ２０８の判定において、数値表現が数字以外、すなわち、位であった場合は（ステップ２０８のｎｏ）、レジスタＫ０に該単語を数値化した値、すなわち、例えば「百」の場合は「１００」といった数値を格納する（ステップ２０９）。
【００２４】
次に、数値表示制御部１０４では、レジスタＫ０の値が「１０」、「１００」、「１０００」の何れかであるか、それ以外であるかをチェックする（ステップ２１０）。
【００２５】
その結果、レジスタＫ０が「１０」、「１００」、「１０００」の何れかであった場合は、更に後述するレジスタＫ１に格納されている値が「１００００」以下であるかをチェックする（ステップ２１１）。そして、レジスタＫ１が未設定か「１００００」より大きい値であった場合には（ステップ２１１のｎｏ）、確定前バッファＡ１０７の格納データを調べ（ステップ２１２）、確定前バッファＡ１０７が空の場合には（ステップ２１２のｎｏ）、確定前バッファＡ１０７に１を代入する（ステップ２１３）。
【００２６】
数値演算部１０６では、確定前バッファＡ１０７とレジスタＫ０の積を計算し、その結果を確定前バッファＢ１０８に格納されている値に加える（ステップ２１４）。計算結果の格納が終わると、次の計算のために確定前バッファＡ１０７を初期化する（ステップ２１５）。
【００２７】
一方、上記ステップ２１０の判定が偽であったか、上記ステップ２１１の判定が真であった場合には、レジスタＫ１にレジスタＫ０の値を代入すると共に（ステップ２１６）、確定前バッファＡ１０７と確定前バッファＢ１０８の和とレジスタＫ１の積を確定済みバッファ１０９に格納されている値に加える（ステップ２１７）。また、確定前バッファＡ１０７および確定前バッファＢ１０８は初期化する（ステップ２１８）。
【００２８】
ステップ２１５、ステップ２１８またはステップ２２０の終了後、数値表現制御部１０４は、確定前バッファＡ１０７および確定前バッファＢ１０８、確定済みバッファ１０９に格納されている全ての値の和を数値演算部１０６において計算し、表示用バッファ１１１に格納する（ステップ２２７）。
【００２９】
同時に、表示属性付加部１１０では、レジスタＫ１に値が格納されているかをチェックする（ステップ２２８）。その結果、格納済みの場合には（ステップ２２８のｎｏ）、表示用バッファ１１１のレジスタＫ１以上の桁の文字に確定済みであることを示す表示属性を付与し（ステップ２２９）、確定済み表示属性が付与されなかった文字には未確定表示属性を付与する（ステップ２３０）。
【００３０】
表示装置１１２では、表示用バッファ１１１に格納された数字列を各文字に付与されている表示属性に従って画面上に表示する（ステップ２３４）。つまり、表示属性に基づいて位が確定されている部分と確定されていない部分とを区別表示する。
【００３１】
また、ステップ２０５において数値入力が終了したと判断された場合（ステップ２３１のｙｅｓ）、確定前バッファＡ１０７、確定前バッファＢ１０８と確定済みバッファ１０９に格納されている各値の和を数値演算部１０６により演算し、その結果を表示用バッファ１１１に格納する（ステップ２３２）。同時に、表示用バッファ１１１の文字列に対して確定済み表示属性を付与する（ステップ２３３）。表示用バッファ１１１の情報は、表示装置１１２において出力される（ステップ２３４）。
【００３２】
一方、ステップ２１９の判定において、数字が連続したと判断される場合、例えば小数点以下の数値入力の場合、確定前バッファＡ１０７をチェックする（ステップ２２１）。その結果、確定前バッファＡ１０７に数値が格納されている場合には（ステップ２１１のｙｅｓ）、確定済みバッファ１０９を１桁左シフトを行って確定前バッファＡ１０７の値を加え（ステップ２２２）、同時に確定前バッファＡ１０７は初期化する（ステップ２２３）。
【００３３】
確定前バッファＡ１０７の初期化終了後、およびステップ２２１において確定前バッファＡ１０７が空であった場合、上記同様に、確定済みバッファ１０９を１桁左シフトした上で、入力された数字を加える（ステップ２２４）。この確定済みバッファ１０９の値は表示用バッファ１１１に格納され（ステップ２２５）、その値に確定済み表示属性が付与され（ステップ２２６）、表示装置１１２に表示される（ステップ２３４）。
【００３４】
図４は数値表現テーブル１０５の構成例である。
ここで、数値表現の単語に関する読みと表記が格納されており、音声認識部１０２から出力される認識結果が、読み・表記のいずれの場合でも数字を見つけることが可能である。この場合、数字に関する表記としては、アラビア数字、漢数字、大字の３種類を有する。例えば、読み「いち」に対しては、アラビア数字の「１」、漢数字の「一」、大字の「壱」が対応付けられている。表示用バッファ１１１には、この表記が用いられる。
【００３５】
また、数値表現制御部１０４において、各バッファ（確定前バッファＡ１０７、確定前バッファＢ１０８、確定済みバッファ１０９）には、この数値表現テーブル１０５に記述されている数値が用いられる。つまり、演算を行う際には、この数値が用いられる。
【００３６】
図５は数値を音声入力した際の確定前バッファＡ１０７、確定前バッファＢ１０８、確定済みバッファ１０９および表示用バッファ１１１のデータ格納例である。なお、表示用バッファ１１１の下線文字は未確定文字列、下線のない文字は確定済みであることを示し、表示装置１１２上にも同様の表示が行われる。
【００３７】
今、「５２０３０４」といった複数の桁からなる数字例を位付きで音声入力する場合を例にして、各バッファのデータの流れを説明する。
（１）まず、「ご」といった数字を示す単語を音声入力する。この場合、「ご」だけでは、次に「千」、「万」等の位がくる可能性があり、その数値を確定することはできない。したがって、「ご」を認識した際に、数値「５」を確定前バッファＡ１０７に格納し、表示用バッファ１１１に下線付きで表記「５」（ここではアラビア数字を第１表記とする）を格納する。これにより、表記「５」が未確定状態で表示装置１１２に表示される。
【００３８】
（２）次に、位を示す単語「じゅう」を音声入力する。この段階でも、単に「５０」なのか、その後に「万」、「億」、「兆」の位がくるのか分からない。したがって、前回入力された「５」と今回入力された「１０」との積である数値「５０」を確定前バッファＢ１０８に格納する。
【００３９】
また、表示用バッファ１１１に下線付きで表記「５０」を格納し、同表記を表示装置１１２に表示する。
（３）次に、「に」といった数字を示す単語を音声入力した場合にも、まだ確定できないため、「に」を認識した際に、数値「２」を確定前バッファＡ１０７に格納する。また、表示用バッファ１１１に下線付きで表記「５２」を格納し、同表記「５２」を表示装置１１２に表示する。
【００４０】
（４）次に、位を示す単語「まん」を音声入力する。ここで、単位の「万」を境にして、それ以上の単位、つまり、「万」、「億」、「兆」…が入力された時点で数値の位を確定することができる。これに対し、「十」、「百」、「千」は、その後に「万」、「億」、「兆」…がくる可能性があるため、数値の位を確定することはできない。
【００４１】
したがって、今、「まん」が入力された時点で、「万」の位まで数値を確定することができ、数値「５２００００」を確定済みバッファ１０９に格納する。また、表示用バッファ１１１に表記「５２００００」を格納すると共にその下４桁に下線を付し、これを表示装置１１２に表示する。
【００４２】
（５）次に、「さん」といった数字を示す単語を音声入力する。この場合には、その後に「千」、「百」、「十」といった位が入る可能性があるため、数値を確定できない。したがって、確定前バッファＡ１０７に「さん」に対応する数値「３」を格納し、表示用バッファ１１１に表記「５２０００３」を格納すると共に、その下４桁に下線を付して、これを表示装置１１２に表示する。
【００４３】
（６）次に、位を示す単語「びゃく」を音声入力する。この場合、既に「万」の位が確定しているため、「百」の位を確定することができる。したがって、直前に入力された「３」を「３００」と演算し、確定済みバッファ１０９に「５２０３００」を格納する。
【００４４】
また、表示用バッファ１１１に表記「５２０３００」を格納し、その下２桁に下線を付して、これを表示装置１１２に表示する。
（７）次に、「よん」といった数字を示す単語を音声入力する。この場合には、その後に「十」の位が入る可能性があるため、数値を確定できない。したがって、確定前バッファＡ１０７に「よん」に対応する数値「４」を格納し、表示用バッファ１１１に表記「５２０３０４」を格納すると共に、その下１桁に下線を付して、これを表示装置１１２に表示する。
【００４５】
このようにして、「５２０３０４」といった数字例を順次音声認識し、その認識結果をアラビア数字でリアルタイムに表示することができる。なお、最後の「４」については、「十」の位が入る可能性があるため、下線を付して表示するものとする。
【００４６】
同様に、「７００００８００１０５００」といった数字例を位付きで音声入力する場合でも、内部演算によって得られた数値を用いて、その認識結果をアラビア数字でリアルタイムに表示することができる。この場合、「なな」の次に「ちょう」と発声した時点で、「兆」の位を確定することができる。
【００４７】
また、「８０５０３．６４」といった小数点を含む数字例を位付きで音声入力する場合でも対応できる。この場合、小数点以下の数字は「ろく」、「よん」といったように、数値を読み上げる形となるため、それをそのままアラビア数字で表示すれば良い。
【００４８】
なお、例えば「いち」、「おく」、「ちょう」といったように、位を誤って入力した場合には、その誤り部分で文字列の認識が区切られる。つまり、この例では、「いちおく」と「ちょう」に分けられ、「いちおく」については「１００００００００」と表示されるが、「ちょう」については例えば「帳」や「丁」などのように、別の単語として認識されることになる。
【００４９】
（第２の実施形態）
次に、本発明の第２の実施形態を説明する。
図６は本発明の第２の実施形態に係る音声認識装置の構成を示すブロック図である。なお、本装置は、例えば磁気ディスク等の記録媒体に記録されたプログラムを読み込み、このプログラムによって動作が制御されるコンピュータによって実現される。
【００５０】
図６に示す入力装置５０１から認識バッファ５１３は、図１の入力装置１０１から認識バッファ１１３に相当するため、ここではその説明を省略するものとする。
【００５１】
本実施形態では、数値表現制御部５０４において、数字、位等の数値表現以外にも、単位接頭テーブル５１４に格納された単語も数値表現の一部と見なす。単位接頭テーブル５１４は数値表現テーブル５０５と同様の構造であり、ここでは「キロ」、「ミリ」等が格納されている。
【００５２】
数値表現テーブル５０５の単語が入力された場合、確定前バッファＡ５０７、確定前バッファＢ５０８を確定済みバッファ５０９の値に加えると共に、表示用バッファ５１１に格納する。また、表示用バッファ５１１の最後尾には単位接頭テーブル５１４を参照し、入力された単位接頭の表記を追加し、表示属性付加部５１０により、表示用バッファ５１１の全文字に対して確定済み表示属性を与える。
【００５３】
更に、確定済みバッファ５０９の値と、単位接頭を数値化した値の積を確定済みバッファ５０９および書き換えバッファ５１５に格納する。オペレータが書き換え指示を出すことにより、表記選択手段５１６により書き換えバッファ５１５に格納された値（つまり、単位接頭を数値化表現した値）を入力結果として使用することができる。
【００５４】
次に、第２の実施形態の動作を説明する。
図７乃至図９は本発明の第２の実施形態における単位接頭を含む数字列の入力を行う際の処理の流れを示したフローチャートである。なお、図中のステップ６０１からステップ６０５までは、図２および図３に示すフローチャート（第１の実施形態）のステップ２０１からステップ２０５までに相当し、入力された音声信号が認識され、文字コードからなる単語として出力される。
【００５５】
ここで、該単語が数値表現であるかを判断するが、数値表現テーブル５０５に記述されている数字・位に加え、単位接頭テーブル５１４に記述される「キロ」等の単位接頭も数値表現であるとする（ステップ６０５）。
【００５６】
数値表現でなかった場合には、ステップ６３１に進む。ステップ６３１からステップ６３３までは、図２および図３に示すフローチャートのステップ２３１からステップ２３３に相当する。
【００５７】
数値表現であった場合は（ステップ６０５のｙｅｓ）、その直前の単語をチェックする（ステップ６０６）。そして、該単語が単位接頭であるか、数値表現以外であった場合には（ステップ６０６のｙｅｓ）、レジスタＫ１、確定前バッファＡ５０７、確定前バッファＢ５０８および確定済みバッファ５０９を初期化する（ステップ６０７）。
【００５８】
次に、入力された単語が数字であるかを調べ（ステップ６０８）、数字の場合はステップ６１９に進む。ステップ６１９からステップ６２６までの処理は、図２および図３に示すフローチャートのステップ２１９からステップ２２６と同様である。
【００５９】
また、数字でなかった場合は、該単語が単位接頭であるかをチェックする（ステップ６３５）。単位接頭でない場合には（ステップ６３５のｎｏ）、ステップ６０９からステップ６１８（図２および図３に示すフローチャートのステップ２０９からステップ２１８と同一の処理）、ステップ６２７からステップ６３０（図２および図３に示すフローチャートのステップ２２７からステップ２３０と同一の処理）へ進む。
【００６０】
単位接頭であった場合は（ステップ６３５のｙｅｓ）、レジスタＫ０に単位接頭テーブル５１４を参照して得た該単語の倍率、例えば「キロ」の場合には「１０００」を、「ミリ」の場合には「０．００１」を格納する（ステップ６３６）。
【００６１】
次に、確定前バッファＡ５０７、確定済みバッファＢ５０８、確定済みバッファ５０９の和を確定済みバッファ５０９および表示用バッファ５１１に格納する（ステップ６３７）。表示用バッファ５１１には単位接頭の表記、例えば「ｋ」等を追加し（ステップ６３８）、確定済み表示属性を付与する（ステップ６３９）。更に、書き換えバッファ５１５には確定済みバッファ５０９とレジスタＫ０の積を格納する（ステップ６４０）。
【００６２】
表示用バッファ５１１の内容は、表示装置５１２において画面上に表示され（ステップ６３４）、一方、書き換えバッファ５１５は、オペレータの書き換え指示がなされた際に、認識結果と入れ替えられる。
【００６３】
図１０は単位接頭テーブル５１４の構成例である。
図４に示した数値表現テーブル１０５とほぼ同様の構成であるが、「じゅう（十）」などが、その語自体で数値的な意味を持つのに対して、単位接頭はその語自体には数値的な意味は持たないことから、ここでは倍率で表す。
【００６４】
すなわち、読み「ナノ」といった単位接頭の表記は「ｎ」であるが、これをむ数値化表現する際には、倍率「０．００００００００１」を用いる。同様に、「マイクロ」−「μ」の数値化表現では、倍率「０．０００００１」を用い、「ミリ」−「ｍ」の数値化表現では、倍率「０．００１」を用いる。
【００６５】
また、「キロ」−「ｋ」の数値化表現では、倍率「１０００」を用い、「メガ」−「Ｍ」の数値化表現では、倍率「１００００００」を用い、「ギガ」−「Ｇ」の数値化表現では、倍率「１０００００００００」を用いる。
【００６６】
図１１は単位接頭を含んだ数値を音声入力した際の確定前バッファＡ５０７、確定前バッファＢ５０８、確定済みバッファ５０９、表示用バッファ５１１、書き換えバッファ５１５のデータ格納例である。なお、表示用バッファ５１１の下線文字は未確定文字列、下線のない文字は確定済みであることを示し、表示装置５１２上にも同様の表示が行われる。
【００６７】
今、「３８００００ｋ」といった複数の桁からなる数字例を位付きで、かつ、単位接頭付きで音声入力する場合を例にして、各バッファのデータの流れを説明する。
【００６８】
（１）まず、「さん」といった数字を示す単語を音声入力する。この場合、「さん」だけでは、次に「千」、「万」等の位がくる可能性があり、その数値を確定することはできない。したがって、認識結果として「さん」に対応する数値「３」を確定前バッファＡ５０７に格納し、表示用バッファ５１１に下線付きで読み「さん」に対応する表記「３」（ここではアラビア数字を第１表記とする）を格納する。これにより、表記「３」が未確定状態で表示装置５１２に表示される。
【００６９】
（２）次に、位を示す単語「じゅう」を音声入力する。この段階でも、単に「３０」なのか、その後に「万」、「億」、「兆」の位がくるのか分からない。したがって、前回入力された「３」と今回入力された「１０」との積である数値「３０」を確定前バッファＢ５０８に格納する。
【００７０】
（３）次に、「はち」といった数字を示す単語を音声入力した場合にも、まだ確定できないため、その「はち」に対応する数値「８」を確定前バッファＡ５０７に格納する。また、表示用バッファ５１１に下線付きで表記「３８」を格納し、同表記「３８」を表示装置５１２に表示する。
【００７１】
（４）次に、位を示す単語「まん」を音声入力する。ここで、単位の「万」を境にして、それ以上の単位、つまり、「万」、「億」、「兆」…が入力された時点で数値の位を確定することができる。これに対し、「十」、「百」、「千」は、その後に「万」、「億」、「兆」…がくる可能性があるため、数値の位を確定することはできない。
【００７２】
したがって、今、「まん」が入力された時点で、「万」の位まで数値を確定することができ、数値「３８００００」を確定済みバッファ５０９に格納する。また、表示用バッファ５１１に表記「３８００００」を格納すると共にその下４桁に下線を付し、これを表示装置５１２に表示する。
【００７３】
（５）次に、「きろ」といった単位接頭を音声入力すると、その時点ですべての数値が確定され、表示用バッファ５１１に「ｋ」を追加して、「３８００００ｋ」といった表記になる。
【００７４】
ここで、書き換えを指示すると、「ｋ」に対応する倍率「１０００」を適用して、表記「３８００００ｋ」を「３８０００００００」に置き換えて、書き換えバッファ５１５に格納する。これにより、「３８０００００００」といった表記で認識結果を得ることができる。
【００７５】
このように、音声により入力された数字を逐次画面に表示することにより、認識結果をオペレータが常に確認することができる。また、アラビア数字を用いることで、表中などの漢数字が不適な場所においても違和感なく表示することが可能となる。
【００７６】
更に、単位接頭を付与して数字列を音声入力した際には、該単位接頭を使った表記だけでなく、数値化した表記も書き換え候補として選択することができる。したがって、小数点以下の桁数などを意識することなく簡単に入力することが可能となる。
【００７７】
なお、桁数が多い数値の入力では、キーボードでは「０」を数多く入力する必要があったが、本方式を用いれば、「万」、「億」、「兆」等の位を音声で簡単に入力することができるため、短時間での数値入力が可能となり、表計算ソフト等数値入力を多用するアプリケーションにおいては、作業時間の短縮化を図ることができる。
【００７８】
なお、上述した各実施形態において記載した手法は、コンピュータに実行させることのできるプログラムとして、例えば磁気ディスク（フロッピーディスク、ハードディスク等）、光ディスク（ＣＤ−ＲＯＭ、ＤＶＤ等）、半導体メモリなどの記録媒体に書き込んで各種装置に適用したり、通信媒体により伝送して各種装置に適用することも可能である。本装置を実現するコンピュータは、記録媒体に記録されたプログラムを読み込み、このプログラムによって動作が制御されることにより、上述した処理を実行する。
【００７９】
【発明の効果】
以上のように本発明によれば、音声入力された数字列を逐次画面に表示することができ、その認識結果をオペレータが常に確認することができる。また、アラビア数字を用いることで表中などの漢数字が不適な場所においても違和感なく表示することが可能となる。
【図面の簡単な説明】
【図１】本発明の第１の実施形態に係る音声認識装置の構成を示すブロック図。
【図２】上記第１の実施形態における数字列の入力を行う際の処理の流れを示したフローチャート。
【図３】上記第１の実施形態における数字列の入力を行う際の処理の流れを示したフローチャート。
【図４】上記第１の実施形態における数値表現テーブルの構成例。
【図５】上記第１の実施形態における数値を音声入力した際の各バッファのデータ格納例。
【図６】本発明の第２の実施形態に係る音声認識装置の構成を示すブロック図。
【図７】上記第２の実施形態における単位接頭を含む数字列の入力を行う際の処理の流れを示したフローチャート。
【図８】上記第２の実施形態における単位接頭を含む数字列の入力を行う際の処理の流れを示したフローチャート。
【図９】上記第２の実施形態における単位接頭を含む数字列の入力を行う際の処理の流れを示したフローチャート。
【図１０】上記第２の実施形態における単位接頭テーブルの構成例。
【図１１】上記第２の実施形態における単位接頭を含んだ数値を音声入力した際の各バッファのデータ格納例。
【符号の説明】
１０１…入力装置
１０２…音声認識部
１０３…認識辞書
１０４…数値表現制御部
１０５…数値表現テーブル
１０６…数値演算部
１０７…確定前バッファＡ
１０８…確定前バッファＢ
１０９…確定済みバッファ
１１０…表示属性付加部
１１１…表示用バッファ
１１２…表示装置[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a speech recognition device that converts an input speech signal into character information, and particularly suitable for speech input of a numeric string. In Related.
[0002]
[Prior art]
Conventionally, in a speech recognition device, when a digit string is input with a digit such as “thousand” or “ten thousand”, characters up to the end of recognition are displayed as Chinese numerals in order or input The method of displaying on the screen collectively after the end of was taken.
[0003]
[Problems to be solved by the invention]
However, in the above-described prior art, even in the case of input to a form or the like that should be originally input with Arabic numerals, the input is performed with Chinese numerals. For this reason, there is a sense of incongruity, and a troublesome operation such as correcting the Chinese numeral notation to Arabic numerals later is necessary.
[0004]
Also, the method of displaying the numbers collectively after the completion of the number input has a problem that the Arabic notation of the numbers cannot be confirmed in real time during the input.
It should be noted that the numbers are read out in order from the beginning, for example, “1”, “0”, “0”, “0”, etc. In this way, Arabic numerals can be obtained in real time, but if the number of digits in the character string is large, there is a possibility that it will be input incorrectly.
[0005]
For example, when a unit prefix (a prefix attached to a unit) such as “kilo” is added and a number is input by voice, a conventional speech recognition apparatus can only display a word such as “kilo” as a recognition result. It was not possible to confirm the recognition result with the numerical value.
[0006]
The present invention has been made in view of the above points, and when recognizing a numeric string input by voice such as “thousand” or “ten thousand”, the recognition result is displayed in real time in Arabic numerals. Voice recognition device The The purpose is to provide.
[0008]
A speech recognition apparatus according to the present invention includes a number input unit that inputs a number string consisting of a plurality of digits in order from the beginning, and converts a voice input by the input unit into a voice signal. Speech recognition means for recognizing a reading of a word indicating a word and a reading of a word indicating a position; a storage means storing a reading of a word indicating a numerical value and a reading of a word corresponding to the numerical value and the position; In the case of reading the word recognized by the voice recognition means is a word reading indicating a numerical value, the storage means is referred to and digitized so as to be right-justified. If there is a numerical value before that, numerical expression means for converting the numerical value of the reading of the word indicating the position with reference to the storage means and determining the numerical value by arithmetic processing with the previously numerical value And the numerical expression means Display means for sequentially displaying numerical values numerically expressed in Arabic numerals, and distinguishing and displaying a portion of the numerical value whose position is fixed and a non-fixed portion displayed right-justified It is characterized by comprising.
[0009]
According to such a configuration, the recognition result of a numeric string input by voice can be confirmed at all times, and by using Arabic numerals, it can be displayed without a sense of incongruity even in places where Chinese numerals such as those in the table are inappropriate. Is possible.
[0012]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below with reference to the drawings.
(First embodiment)
FIG. 1 is a block diagram showing the configuration of the speech recognition apparatus according to the first embodiment of the present invention. The apparatus is realized by a computer that reads a program recorded on a recording medium such as a magnetic disk and whose operation is controlled by the program.
[0013]
In FIG. 1, an input device 101 is a microphone or the like that converts an audio signal uttered by an operator into an electrical audio signal. A voice signal obtained by the input device 101 is converted into a character code by the voice recognition unit 102 as a corresponding word. At this time, the voice recognition unit 102 refers to the recognition dictionary 103 describing the correspondence between the voice signal and the character code of the word.
[0014]
The numerical expression control unit 104 extracts a numerical expression indicating the rank of “Man (ten)”, “Hyaku (hundred)”, etc. from the word string output from the speech recognition unit 102. At this time, the numerical value expression control unit 104 refers to the numerical value expression table 105 in which “100000000” is stored if the value, for example, “Put (Billion)”, is read together with the reading and notation of the numerical value / position.
[0015]
Next, the numerical expression processing unit 104 checks a series of numerical expression words in order from the oldest input time, and stores the numerical values in the pre-determination buffer A 107 if the word represents a number from 0 to 9. If the word represents the place of “thousand”, “hundred”, or “ten”, the product of the value stored in the pre-determination buffer A 107 and the digitized value of each place in the numerical operation unit 106 is calculated. The result is stored in the pre-determination buffer B108 and the contents of the pre-determination buffer A107 are initialized.
[0016]
Here, when the place immediately before is not a numerical value, that is, when the pre-determination buffer A 107 is empty, the above calculation is performed after substituting 1.
Further, if the word represents the place of “trillion”, “100 million”, and “ten thousand”, the product of the sum of the values of the pre-determination buffer A 107 and the pre-determination buffer B 108 and the numerical value of each position is taken, The result is stored in the confirmed buffer 109 and the contents of the pre-confirmation buffer A 107 and the pre-confirmation buffer B 108 are initialized.
[0017]
Each time one word is processed, the sum of the values of the pre-confirmation buffer A 107, the pre-confirmation buffer B 108, and the confirmed buffer 109 is stored in the display buffer 111. At this time, the display attribute addition unit 110 assigns a character attribute (for example, emphasized character) that has been confirmed to the number string obtained from the confirmed buffer 109, and the number string in the pre-confirmation buffer A107 and the pre-confirmation buffer B108. Is assigned a character attribute (for example, underlined) indicating that it is not confirmed.
[0018]
The display device 112 displays the confirmed part and the unconfirmed part of the numeric string obtained as a recognition result on the screen according to the information in the display buffer 111 based on the display attributes.
[0019]
Next, the operation of the first embodiment will be described.
2 and 3 are flowcharts showing the flow of processing when inputting a numeric string in the first embodiment of the present invention.
[0020]
The voice signal input by the operator through the input device 101 is stored in the recognition buffer 113 (steps 201 and 202). If the voice input is completed (Yes in Step 203), the process here ends. Otherwise (no in step 203), the speech signal stored in the recognition buffer 113 is given to the speech recognition unit 102 and converted to a word with reference to the recognition dictionary 103 (step 204).
[0021]
Here, when the word is a numerical expression (yes in step 205), the numerical expression control unit 104 checks the word immediately before the word, and when it is other than the numerical expression (yes in step 206). , Register K1, pre-confirmation buffer A107, pre-confirmation buffer B108, and confirmed buffer 109 are initialized (step 207). The numerical expression is a word described in the numerical expression table 105, and includes a word representing a place such as “ten”, “hundred”, “thousand”, “ten thousand” in addition to the numbers 0 to 9. It is.
[0022]
Next, it is checked whether the input numerical expression is a numeral or whether it is a digit such as “ten”, “hundred”, “thousand”, “ten thousand” (step 208). If it is a number (yes in step 208), it is checked whether it was a number immediately before (step 219). If it is not a number such as a digit (no in step 219), the number is stored in the buffer before confirmation. Store in A107 (step 220).
[0023]
On the other hand, if it is determined in step 208 above that the numerical expression is other than a number, that is, a digit (no in step 208), the value obtained by digitizing the word in the register K0, that is, for example, “100”. A numerical value such as “100” is stored (step 209).
[0024]
Next, the numerical display control unit 104 checks whether the value of the register K0 is any one of “10”, “100”, “1000”, or any other value (step 210).
[0025]
As a result, when the register K0 is any one of “10”, “100”, and “1000”, it is further checked whether a value stored in a register K1 described later is “10000” or less (step S1). 211). If the register K1 is not set or is greater than “10000” (no in step 211), the data stored in the buffer A107 before determination is examined (step 212), and the buffer A107 before determination is empty. (No in step 212), 1 is substituted into the pre-determination buffer A107 (step 213).
[0026]
The numerical operation unit 106 calculates the product of the pre-determination buffer A 107 and the register K0, and adds the result to the value stored in the pre-determination buffer B 108 (step 214). When the calculation results are stored, the pre-determination buffer A107 is initialized for the next calculation (step 215).
[0027]
On the other hand, if the determination in step 210 is false or the determination in step 211 is true, the value of the register K0 is substituted for the register K1 (step 216), and the pre-determination buffer A107 and the pre-determination buffer The product of the sum of B108 and the register K1 is added to the value stored in the confirmed buffer 109 (step 217). Further, the pre-confirmation buffer A 107 and the pre-confirmation buffer B 108 are initialized (step 218).
[0028]
After step 215, step 218, or step 220, the numerical expression control unit 104 calculates the sum of all the values stored in the pre-confirmation buffer A107, the pre-confirmation buffer B108, and the confirmed buffer 109 in the numerical operation unit 106. Then, it is stored in the display buffer 111 (step 227).
[0029]
At the same time, the display attribute adding unit 110 checks whether a value is stored in the register K1 (step 228). As a result, if it has been stored (no in step 228), a display attribute indicating that it has been determined is assigned to the character of the digit greater than or equal to the register K1 in the display buffer 111 (step 229). An unconfirmed display attribute is assigned to the character that is not given (step 230).
[0030]
The display device 112 displays the numeric string stored in the display buffer 111 on the screen according to the display attribute assigned to each character (step 234). In other words, a portion where the rank is determined based on the display attribute and a portion where the position is not fixed are displayed separately.
[0031]
If it is determined in step 205 that the numerical value input has been completed (step 231: yes), the numerical operation unit 106 calculates the sum of the values stored in the pre-confirmation buffer A 107, the pre-confirmation buffer B 108, and the confirmed buffer 109. And the result is stored in the display buffer 111 (step 232). At the same time, a confirmed display attribute is assigned to the character string in the display buffer 111 (step 233). Information in the display buffer 111 is output from the display device 112 (step 234).
[0032]
On the other hand, if it is determined in step 219 that the numbers are consecutive, for example, if a numerical value after the decimal point is input, the pre-determination buffer A 107 is checked (step 221). As a result, when a numerical value is stored in the pre-confirmation buffer A107 (yes in step 211), the confirmed buffer 109 is shifted one digit to the left and the value of the pre-confirmation buffer A107 is added (step 222). The pre-determination buffer A107 is initialized (step 223).
[0033]
After completion of initialization of the pre-determined buffer A107, and when the pre-determined buffer A107 is empty in step 221, as described above, the finalized buffer 109 is shifted to the left by one digit, and the input number is added (step 224). The value of the confirmed buffer 109 is stored in the display buffer 111 (step 225), the confirmed display attribute is given to the value (step 226), and the value is displayed on the display device 112 (step 234).
[0034]
FIG. 4 is a configuration example of the numerical expression table 105.
Here, readings and notations relating to numerically expressed words are stored, and it is possible to find a number when the recognition result output from the speech recognition unit 102 is either reading or notation. In this case, there are three types of notation relating to numerals: Arabic numerals, Chinese numerals, and large letters. For example, the reading “1” is associated with the Arabic numeral “1”, the Chinese numeral “1”, and the large letter “壱”. This notation is used for the display buffer 111.
[0035]
In the numerical expression control unit 104, the numerical values described in the numerical expression table 105 are used for each buffer (pre-confirmation buffer A107, pre-confirmation buffer B108, and confirmed buffer 109). In other words, this numerical value is used when performing the calculation.
[0036]
FIG. 5 shows an example of data storage in the pre-confirmation buffer A 107, the pre-confirmation buffer B 108, the pre-confirmation buffer 109, and the display buffer 111 when a numerical value is input by voice. The underline character of the display buffer 111 indicates an undetermined character string, and the character without the underline indicates that it has been confirmed, and the same display is performed on the display device 112.
[0037]
Now, the flow of data in each buffer will be described by taking as an example a case where a numerical example consisting of a plurality of digits such as “520304” is input with a rank.
(1) First, a word indicating a number such as “go” is input by voice. In this case, “go” alone may be the next place, such as “thousand” or “ten thousand”, and the numerical value cannot be determined. Therefore, when “go” is recognized, the numerical value “5” is stored in the pre-determined buffer A107, and the display buffer 111 stores the underlined notation “5” (here, Arabic numerals are the first notation). To do. Thereby, the notation “5” is displayed on the display device 112 in an unconfirmed state.
[0038]
(2) Next, the word “ju” indicating the place is input by voice. Even at this stage, I don't know if it is simply "50" or if it is followed by "ten thousand", "100 million", or "trillion". Therefore, the numerical value “50”, which is the product of “5” input last time and “10” input this time, is stored in the pre-determination buffer B108.
[0039]
Further, the notation “50” is stored in the display buffer 111 with an underline, and the notation is displayed on the display device 112.
(3) Next, even when a word indicating a number such as “ni” is inputted by voice, it cannot be confirmed yet. Therefore, when “ni” is recognized, the numerical value “2” is stored in the pre-determination buffer A107. Further, the notation “52” is stored in the display buffer 111 with an underline, and the notation “52” is displayed on the display device 112.
[0040]
(4) Next, the word “Man” indicating the rank is input by voice. Here, with the unit “10,000” as a boundary, the unit of the numerical value can be determined when more units, that is, “10,000”, “Billion”, “Trillion”,. On the other hand, for “ten”, “hundred”, and “thousand”, there is a possibility that “ten thousand”, “billion”, “trillion”, etc. may come after that, and therefore the place of the numerical value cannot be determined.
[0041]
Therefore, when “Man” is input, the numerical value can be confirmed to the place of “10,000”, and the numerical value “520000” is stored in the confirmed buffer 109. Further, the notation “520000” is stored in the display buffer 111 and the lower four digits are underlined and displayed on the display device 112.
[0042]
(5) Next, a word indicating a number such as “san” is input by voice. In this case, since there is a possibility that digits such as “thousand”, “hundred”, and “ten” may enter after that, the numerical value cannot be determined. Accordingly, the numerical value “3” corresponding to “san” is stored in the pre-determination buffer A 107, the notation “520003” is stored in the display buffer 111, and the lower four digits are underlined, and this is displayed on the display device. 112.
[0043]
(6) Next, the word “Baku” indicating the position is input by voice. In this case, since the place of “10,000” has already been decided, the place of “hundred” can be decided. Therefore, “3” input immediately before is calculated as “300”, and “520300” is stored in the confirmed buffer 109.
[0044]
Further, the notation “520300” is stored in the display buffer 111, and the lower two digits are underlined and displayed on the display device 112.
(7) Next, a word indicating a number such as “Yon” is input by voice. In this case, the number cannot be determined because there is a possibility that the place of “ten” will be entered after that. Therefore, the numerical value “4” corresponding to “Yon” is stored in the pre-determination buffer A 107, the notation “520304” is stored in the display buffer 111, and the lower one digit is underlined, and this is displayed on the display device. 112.
[0045]
In this way, it is possible to sequentially recognize a numerical example such as “520304” and display the recognition result in real time in Arabic numerals. Note that the last “4” may be displayed with an underline because there is a possibility that the place of “ten” will be entered.
[0046]
Similarly, even when a numerical example such as “7000080010500” is input with a rank, the recognition result can be displayed in real time in Arabic numerals using a numerical value obtained by internal calculation. In this case, the position of “trillion” can be determined when “Cho” is spoken after “Nana”.
[0047]
Further, it is possible to cope with a case where a numerical example including a decimal point such as “80503.64” is input with a rank. In this case, since the numbers after the decimal point are read out like “Roku” and “Yon”, they can be displayed as Arabic numerals as they are.
[0048]
Note that if the position is entered incorrectly, such as “1”, “put”, “cho”, etc., the recognition of the character string is delimited by the error portion. That is, in this example, it is divided into “Ichiku” and “Cho”, and “Ichiku” is displayed as “100000000”, but “Cho” is, for example, “Book” or “Ding”. , Will be recognized as another word.
[0049]
(Second Embodiment)
Next, a second embodiment of the present invention will be described.
FIG. 6 is a block diagram showing a configuration of a speech recognition apparatus according to the second embodiment of the present invention. The apparatus is realized by a computer that reads a program recorded on a recording medium such as a magnetic disk and whose operation is controlled by the program.
[0050]
Since the input device 501 to the recognition buffer 513 shown in FIG. 6 correspond to the input device 101 to the recognition buffer 113 of FIG. 1, the description thereof is omitted here.
[0051]
In the present embodiment, the numerical expression control unit 504 considers the words stored in the unit prefix table 514 as a part of the numerical expression in addition to numerical expressions such as numbers and digits. The unit prefix table 514 has the same structure as the numerical expression table 505, and stores “kilo”, “milli”, and the like here.
[0052]
When a word in the numerical expression table 505 is input, the pre-confirmation buffer A 507 and the pre-confirmation buffer B 508 are added to the value of the confirmed buffer 509 and stored in the display buffer 511. In addition, the unit prefix table 514 is referred to at the end of the display buffer 511, the notation of the input unit prefix is added, and the display attribute addition unit 510 has confirmed the display of all characters in the display buffer 511. Give attributes.
[0053]
Further, the product of the value of the confirmed buffer 509 and the value obtained by digitizing the unit prefix is stored in the confirmed buffer 509 and the rewrite buffer 515. When the operator issues a rewrite instruction, the value stored in the rewrite buffer 515 by the notation selection unit 516 (that is, the value obtained by expressing the unit prefix in numerical form) can be used as the input result.
[0054]
Next, the operation of the second embodiment will be described.
7 to 9 are flowcharts showing the flow of processing when inputting a numeric string including a unit prefix in the second embodiment of the present invention. Note that Steps 601 to 605 in the figure correspond to Steps 201 to 205 in the flowcharts (first embodiment) shown in FIGS. 2 and 3, and the input voice signal is recognized, and the character code Is output as a word consisting of
[0055]
Here, it is determined whether or not the word is a numerical expression. In addition to the numbers and positions described in the numerical expression table 505, unit prefixes such as “kilo” described in the unit prefix table 514 are also expressed in numerical expressions. Suppose that there is (step 605).
[0056]
If it is not a numerical expression, the process proceeds to step 631. Steps 631 to 633 correspond to steps 231 to 233 in the flowcharts shown in FIGS.
[0057]
If it is a numerical expression (Yes in Step 605), the word immediately before it is checked (Step 606). If the word is a unit prefix or other than a numerical expression (yes in step 606), the register K1, the pre-confirmation buffer A507, the pre-confirmation buffer B508, and the finalized buffer 509 are initialized (step). 607).
[0058]
Next, it is checked whether or not the input word is a number (step 608). The processing from step 619 to step 626 is the same as that from step 219 to step 226 in the flowcharts shown in FIGS.
[0059]
If it is not a number, it is checked whether the word is a unit prefix (step 635). If it is not a unit prefix (No in Step 635), Step 609 to Step 618 (the same processing as Step 209 to Step 218 in the flowchart shown in FIGS. 2 and 3), Step 627 to Step 630 (FIG. 2 and FIG. 3). The process proceeds from step 227 in the flowchart shown in FIG.
[0060]
If it is a unit prefix (yes in step 635), the magnification of the word obtained by referring to the unit prefix table 514 in the register K0, for example, “1000” in the case of “kilo”, and “mm” Is stored as “0.001” (step 636).
[0061]
Next, the sum of the pre-determined buffer A 507, the confirmed buffer B 508, and the confirmed buffer 509 is stored in the confirmed buffer 509 and the display buffer 511 (step 637). A unit prefix notation, for example, “k”, is added to the display buffer 511 (step 638), and a confirmed display attribute is given (step 639). Further, the product of the confirmed buffer 509 and the register K0 is stored in the rewrite buffer 515 (step 640).
[0062]
The contents of the display buffer 511 are displayed on the screen of the display device 512 (step 634), while the rewrite buffer 515 is replaced with the recognition result when an operator rewrite instruction is given.
[0063]
FIG. 10 is a configuration example of the unit prefix table 514.
Although the configuration is almost the same as the numerical expression table 105 shown in FIG. 4, “10 (ten)” has a numerical meaning in the word itself, whereas the unit prefix is in the word itself. Since it has no numerical meaning, it is expressed here in terms of magnification.
[0064]
That is, the unit prefix notation such as “nano” is “n”, but when expressing numerically including this, the magnification “0.000000001” is used. Similarly, in the numerical expression of “micro”-“μ”, the magnification “0.000001” is used, and in the numerical expression of “milli”-“m”, the magnification “0.001” is used.
[0065]
Also, in the numerical expression of “kilo”-“k”, the magnification “1000” is used, and in the numerical expression of “mega”-“M”, the magnification “1000000” is used, and “giga”-“G”. In the numerical expression, the magnification “1000000000” is used.
[0066]
FIG. 11 shows an example of data storage in the pre-confirmation buffer A507, the pre-confirmation buffer B508, the preconfirmation buffer 509, the display buffer 511, and the rewrite buffer 515 when a numerical value including a unit prefix is input. The underline character of the display buffer 511 indicates an undetermined character string, and the character without the underline indicates that it has been confirmed, and the same display is performed on the display device 512.
[0067]
Now, the flow of data in each buffer will be described by taking as an example a case where a voice is input with a unit prefix with an example of a number consisting of a plurality of digits such as “380000k”.
[0068]
(1) First, a word indicating a number such as “san” is input by voice. In this case, “san” alone may be the next place in the order of “thousand”, “ten thousand”, etc., and the numerical value cannot be determined. Therefore, the numerical value “3” corresponding to “san” is stored in the pre-determination buffer A 507 as the recognition result, and the display buffer 511 is underlined and read as “3” (here, the Arabic numeral 1). Thereby, the notation “3” is displayed on the display device 512 in an unconfirmed state.
[0069]
(2) Next, the word “ju” indicating the place is input by voice. Even at this stage, I don't know if it's just "30", or if it is followed by "ten thousand", "100 million" or "trillion". Therefore, the numerical value “30”, which is the product of “3” input last time and “10” input this time, is stored in the pre-determination buffer B508.
[0070]
(3) Next, even when a word indicating a number such as “Hachi” is inputted by voice, since it cannot be confirmed yet, the numerical value “8” corresponding to the “Hachi” is stored in the pre-determination buffer A507. Further, the notation “38” is stored in the display buffer 511 with an underline, and the notation “38” is displayed on the display device 512.
[0071]
(4) Next, the word “Man” indicating the rank is input by voice. Here, with the unit “10,000” as a boundary, the unit of the numerical value can be determined when more units, that is, “10,000”, “Billion”, “Trillion”,. On the other hand, for “ten”, “hundred”, and “thousand”, there is a possibility that “ten thousand”, “billion”, “trillion”, etc. may come after that, and therefore the place of the numerical value cannot be determined.
[0072]
Therefore, now, when “Man” is input, the numerical value can be fixed to the place of “10,000”, and the numerical value “380000” is stored in the fixed buffer 509. Further, the notation “3800000” is stored in the display buffer 511 and the lower four digits are underlined and displayed on the display device 512.
[0073]
(5) Next, when a unit prefix such as “Kiro” is input by voice, all the numerical values are fixed at that time, and “k” is added to the display buffer 511 so that “380000k” is represented.
[0074]
When rewriting is instructed here, the magnification “1000” corresponding to “k” is applied, the notation “3800000k” is replaced with “380000000”, and the rewriting buffer 515 is stored. As a result, the recognition result can be obtained with the notation “380000000”.
[0075]
Thus, the operator can always confirm the recognition result by sequentially displaying the numbers inputted by voice on the screen. In addition, by using Arabic numerals, it is possible to display without discomfort even in places where Chinese numerals are inappropriate in the table.
[0076]
Further, when a unit string is added and a numeric string is input by voice, not only a notation using the unit prefix but also a numerical notation can be selected as a rewriting candidate. Therefore, it is possible to input easily without being aware of the number of digits after the decimal point.
[0077]
When inputting numbers with a large number of digits, it was necessary to input many “0” s on the keyboard. However, using this method, it is easy to speak “10,000”, “Billion”, “Trillion”, etc. with voice. Therefore, it is possible to input numerical values in a short time, and in an application that frequently uses numerical inputs such as spreadsheet software, the work time can be shortened.
[0078]
The method described in each embodiment described above is a recording medium such as a magnetic disk (floppy disk, hard disk, etc.), optical disk (CD-ROM, DVD, etc.), semiconductor memory, etc., as a program that can be executed by a computer. Can be applied to various devices, or transmitted by a communication medium and applied to various devices. A computer that implements this apparatus reads the program recorded on the recording medium, and executes the above-described processing by controlling the operation by this program.
[0079]
【The invention's effect】
As described above, according to the present invention, it is possible to sequentially display a voice input numeric string on the screen, and the operator can always check the recognition result. In addition, by using Arabic numerals, it is possible to display a sense of incongruity even in places where Chinese numerals in the table are inappropriate.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a speech recognition apparatus according to a first embodiment of the present invention.
FIG. 2 is a flowchart showing a process flow when inputting a numeric string in the first embodiment.
FIG. 3 is a flowchart showing a flow of processing when inputting a numeric string in the first embodiment.
FIG. 4 is a configuration example of a numerical expression table in the first embodiment.
FIG. 5 shows an example of data storage in each buffer when a numerical value is inputted by voice in the first embodiment.
FIG. 6 is a block diagram showing a configuration of a speech recognition apparatus according to a second embodiment of the present invention.
FIG. 7 is a flowchart showing a processing flow when inputting a numeric string including a unit prefix in the second embodiment.
FIG. 8 is a flowchart showing a processing flow when inputting a numeric string including a unit prefix in the second embodiment.
FIG. 9 is a flowchart showing a processing flow when inputting a numeric string including a unit prefix in the second embodiment.
FIG. 10 shows a configuration example of a unit prefix table in the second embodiment.
FIG. 11 shows an example of data storage in each buffer when a numerical value including a unit prefix in the second embodiment is inputted by voice.
[Explanation of symbols]
101 ... Input device
102 ... voice recognition unit
103 ... Recognition dictionary
104: Numerical expression control unit
105 ... Numerical expression table
106: Numerical calculation unit
107: Buffer A before confirmation
108 ... Buffer B before confirmation
109 ... Final buffer
110 ... display attribute addition part
111 ... Display buffer
112 ... Display device

Claims

An input means for inputting a voice string in order from the top of a digit string consisting of multiple digits,
The voice inputted by said input means into a speech signal, the reading of a word indicating the number from the speech signal, a speech recognition means for recognizing processes the readings of the word indicating the position,
Storage means for word reading is stored which indicates the values and positions of the read word corresponding to the reading of numerical,
Wherein when the reading of a word indicating the numerical value read word recognized processed by the speech recognition means, to quantify such that right-aligned with reference to said storage means, for reading the word indicating the position, If there is a numerical value before that, numerical expression means for digitizing the reading of the word indicating the position with reference to the storage means and determining the numerical value by performing arithmetic processing with the previously numerical value When,
The numerical values digitized by the numerical expression means are sequentially displayed as Arabic numerals, and the portion of the numerical value whose position is fixed and the portion of the numerical value that is not fixed are displayed separately. A speech recognition apparatus comprising: a display unit;