JP3815110B2

JP3815110B2 - Voice input device and voice input method

Info

Publication number: JP3815110B2
Application number: JP10216299A
Authority: JP
Inventors: 俊之小高
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1999-04-09
Filing date: 1999-04-09
Publication date: 2006-08-30
Anticipated expiration: 2019-04-09
Also published as: JP2000293195A

Description

【０００１】
【発明の属する技術分野】
本発明は，マンマシンインタフェースに係り，特に，装置にデータを入力する場合の一手段であり，音声を文字情報などに変換する音声認識に関するものである。
【０００２】
【従来の技術】
音声認識を利用した装置においては，装置が認識誤りをすることは避けられない。そのため，装置として実装するためには，認識結果の訂正手段を設けることが必須である。訂正箇所を指定する方法として，訂正個所をポインティングデバイス（マウスやタッチパネル）で指定する方法もあるが，本発明では，音声のみで行う，あるいは，音声しか使えない状況を想定する。このような場合，従来では，認識誤りに対して，訂正箇所が特定できないために，前回と同じ発声内容を繰り返し言い直すことにより，訂正を行っていた。
【０００３】
【発明が解決しようとする課題】
音声のみで認識誤りを訂正しようとした場合，発声全体を言い直すのは煩わしい。特に，桁数の多い連続数字や住所のような長い発声のうち，誤認識された一部のために全体を言い直すことは，非常に使い勝手が悪く，効率も悪い。
【０００４】
本発明の目的は，音声認識を用いた音声入力装置において不可避な認識誤りが起きた場合にも，利用者に比較的少ない労力で，かつ音声のみで効率良く訂正ができるようなマンマシンインタフェースを備えた音声入力装置，あるいは，該音声入力装置を含む各種情報処理装置を提供することにある。
【０００５】
【課題を解決するための手段】
本発明では，訂正を意図する入力も可能とする認識対象辞書を生成する認識対象生成手段と，前の認識結果を保持し，訂正の意図が検出された場合には，保持していた認識結果を訂正して出力する結果保持手段とを備えることにより，前記課題を解決する。
【０００６】
本発明の音声入力装置では、音声を入力する音声入力手段と，入力した音声を指定された認識対象辞書の範囲内で認識する音声認識手段と，該音声認識手段の認識結果を保持する結果保持手段と，認識結果を出力する認識結果出力手段と，予め指定されている基本認識対象辞書などを元に前記音声認識手段が用いる認識対象辞書を生成する認識対象生成手段とからなり，前記認識対象生成手段は，結果保持手段から，訂正対象となり得る前発声の認識結果を受け取った場合には，前記基本認識対象辞書および前記認識結果から訂正対象辞書を生成し，該訂正対象辞書と，予め指定されている訂正用語辞書と，前記基本認識対象辞書を組み合わせて新たな認識対象辞書を作成し，前記結果保持手段は，前記認識結果に訂正用語を含んでいるかどうか判定し，前記認識結果に訂正用語を含んでいる場合には，該訂正用語の意図に対応した訂正を保持内容に対して施すと共に，訂正された結果を認識結果出力手段および認識対象生成手段に対して出力し，前記認識結果に訂正用語を含んでいない場合には，そのままの結果を認識結果出力手段および認識対象生成手段に対して出力する。
【０００７】
また、本発明の他の構成の音声入力装置では、音声を入力する音声入力手段と，入力した音声を指定された認識対象辞書の範囲内で認識する音声認識手段と，該音声認識手段の認識結果を保持する結果保持手段と，認識結果を出力する認識結果出力手段と，予め指定されている基本認識対象辞書などを元に前記音声認識手段が用いる認識対象辞書を生成する認識対象生成手段とからなり，前記認識対象生成手段は，前記基本認識対象辞書と，予め指定されている訂正用語辞書を組合せて，新たに認識対象辞書を作成し，前記結果保持手段は，前記音声認識手段の認識結果に訂正用語を含んでいる場合には，該訂正用語の意図に対応した訂正を保持内容に対して施した結果を出力する。
【０００８】
また、本発明の他の構成の音声入力装置では、音声を入力する手段と、入力音声を認識する認識手段と、上記入力音声の認識結果を出力する出力手段と、を有し、上記認識手段は、上記認識結果の一部分が誤認識語の際に、該誤認識語、正しい語、およびそれらの語のいずれが正しいか誤りかを示す語句とを訂正として上記入力手段から入力して、該訂正を認識して上記認識結果の一部分の誤認識語を正しい語に修正した結果を最終結果として出力する。
【０００９】
また、本発明の他の構成の音声入力装置では、数字音声を入力する手段と、入力音声を認識する認識手段と、上記入力音声の認識結果を出力する出力手段と、を有し、上記認識手段は、上記認識結果の数字が誤認識語の際に、該誤認識数字と正しい数字の数学演算的な相違を演算子と数字との組み合わせで示した語句を、訂正として上記入力手段から入力して、該訂正を認識して上記誤認識数字に上記訂正が指示する演算を施して得られる数字を最終結果として出力する。
【００１０】
【発明の実施の形態】
本発明の第１の実施例について説明する。
【００１１】
図１に音声入力装置のブロック図を示す。発声された音声は，音声入力手段（マイクからＡ／Ｄ変換までに相当）でデジタル信号化され，音声認識手段では，予め与えられるか，あるいは，その都度外部から指定される認識対象辞書の範囲で，デジタル信号化された音声を認識し，認識結果を出力する。結果保持手段で認識結果は保持され，認識結果出力手段を介して適宜利用者に対して，画面や音声により提示される。認識対象生成手段は，基本認識対象辞書より認識対象辞書を生成するが，訂正用語辞書を同時に用いて訂正の意図を含む利用者の発声を認識可能となるように認識対象辞書を再構成する。また，認識対象生成手段は，結果保持手段から認識結果を受け取った場合には，その認識結果をも同時に用いて，訂正の意図を含む利用者の発声を認識可能となるように認識対象辞書を再構成する。
【００１２】
桁無制限の連続数字を認識対象とした場合を例に説明する。簡単のために，基本認識対象辞書は任意の連続数字のみとするが，実際の応用システムを想定した場合には，各種制御コマンドなどが加わることがあり，この限りではない。
【００１３】
利用者の発声をU，装置の出力をSとした場合，利用者の入力と装置の出力は例えば，
Ｕ「０１０」（利用者が発声）
Ｓ「０８０」（装置が「１」を「８」に置換誤り）
Ｕ「８を１」（利用者が訂正の意図を含んで発声）
Ｓ「０１０」（装置が前の認識結果の「８」を「１」に訂正）
となる。
【００１４】
２番目の利用者の入力「８を１」を認識するために，認識対象生成手段は認識対象辞書を再構成する。認識対象辞書は，「基本認識対象辞書」の他に，「訂正対象辞書」＋「訂正用語」＋「訂正対象辞書に対応する基本認識対象辞書」という組み合わせとが並列になった辞書となる。この場合，訂正用語は「を」としているが，これを限定するのもではなく，「から」「より」などでも良い。訂正対象辞書は，認識結果に対して，1桁または複数桁の組合せ全てとする。この例では，「０８０」に対して，「０」「８」「０８」「８０」「０８０」とする。ただし，「０」のみでは１桁目の「０」か３桁目の「０」かが特定できないため，「０」は訂正対象辞書としない方が良い。
【００１５】
また，「訂正対象辞書に対応する基本認識対象辞書」としては，任意の連続数字から訂正対象を除くようにすることにより，より確実に訂正することが可能となる。例えば，「８を１」や「８を８８」，「８を１１」等は認識対象に含めても，「８を８」は認識対象にならないようにすると良い。
【００１６】
先の例は，連続数字を認識対象とした場合の置換誤り（ある数字が他の数字へ誤認識）の例であるが，湧き出し誤り（実際は発声していない数字が認識結果に含まれる誤り）や脱落誤り（実際に発声した数字が認識結果から欠けてしまう誤り）の場合も同様な訂正が可能である。例えば，湧き出し誤りの例として，
Ｕ「４１００」
Ｓ「４２１００」（装置が「２」を湧き出し誤り）
Ｕ「４２１を４１」（訂正の意図を含んだ発声）
Ｓ「４１００」（装置が前の認識結果を訂正）
脱落誤りの例として，
Ｕ「０１２３」
Ｓ「０２３」（装置が「１」を脱落誤り）
Ｕ「０２を０１２」（訂正の意図を含んだ発声）
Ｓ「０１２３」（装置が前の認識結果を訂正）
となる。
【００１７】
ところで，ここでの例は簡単のために３〜４桁の連続数字と少ない桁数であるが，桁数が多くなったときに，全てを言い直すことと比較して，利用者の負担が少ないことは明らかである。また，同じ方法により繰り返し訂正をすることもできる。
上記のように本願の音声入力装置は、音声を入力する音声入力手段と，入力した音声を指定された認識対象辞書の範囲内で認識する音声認識手段と，該音声認識手段の認識結果を保持する結果保持手段と，認識結果を出力する認識結果出力手段と，予め指定されている基本認識対象辞書などを元に前記音声認識手段が用いる認識対象辞書を生成する認識対象生成手段とからなる。さらに認識対象生成手段が、前記基本認識対象辞書と，予め指定されている訂正用語辞書を組合せて，新たに認識対象辞書を作成し，前記結果保持手段は，前記音声認識手段の認識結果に訂正用語を含んでいる場合には，該訂正用語の意図に対応した訂正を保持内容に対して施した結果を出力する。又、基本認識対象辞書が，数字の位を含まない棒読みを認識するための辞書，および，数字の位を含む桁読みを認識するための辞書を含み，前記結果保持手段は，前記音声認識手段の認識結果に訂正用語を含み，かつ，数字の桁読みを含んでいる場合には，該桁読みの桁情報で修正位置を特定した上で，前記訂正用語の意図に対応した訂正を保持内容に対して施し，訂正した結果を出力するようにすることもできる。
【００１８】
さらに，本発明は，認識対象を住所のような地名とした場合にも使うことができる。例えば，
Ｕ「東京都国分寺市東恋ヶ窪」
Ｓ「東京都国分寺市西恋ヶ窪」（装置が「東恋ヶ窪」を「西恋ヶ窪」と置換誤り）
Ｕ「東恋ヶ窪を西恋ヶ窪」（訂正の意図を含んだ発声）
Ｓ「東京都国分寺市東恋ヶ窪」（装置が前の認識結果を訂正）
のようにして訂正できる。この場合，基本認識対象辞書は例えば日本全国の住所であり，訂正用語は「を」である。また，訂正対象辞書は，基本認識対象辞書および認識結果から求められ，例えば，「東京都」「国分寺市」「西恋ヶ窪」の３単語とすれば良い。また，実在の住所としての制約（国分寺市は東京都にしかない，等）を考慮して，誤認識される単位を求めても良く，そうした場合訂正対象辞書は，「東京都国分寺市西恋ヶ窪」「国分寺市西恋ヶ窪」「西恋ヶ窪」「西」となる。
【００１９】
本発明の第２の実施例について説明する。図２に音声入力装置のブロック図を示す。図１のブロック図と異なるのは，認識対象生成手段が認識結果を使わない点である。
【００２０】
訂正用語としては，「OK」等を用いる。
【００２１】
例えば，
Ｕ「０１０」
Ｓ「０８０」（装置が１を８に置換誤り）
Ｕ「OK，１」（１桁目は訂正しない，２桁目を１に）
Ｓ「０１０」（装置が前の認識結果を訂正）
となる。「OK」の個数により，訂正箇所を特定する。
【００２２】
認識対象が地名の場合でも，同様に適用可能である。例えば，
Ｕ「東京都国分寺市東恋ヶ窪」
Ｓ「東京都国分寺市西恋ヶ窪」（装置が「東恋ヶ窪」を「西恋ヶ窪」と置換誤り）
Ｕ「ＯＫ，ＯＫ，西恋ヶ窪」（訂正の意図を含んだ発声）
Ｓ「東京都国分寺市東恋ヶ窪」（装置が前の認識結果を訂正）
となる。
【００２３】
次に本発明の第３の実施例について説明する。第３の実施例は，認識対象を数字に限った場合の例であるが，音声入力装置のブロック図としては図２と同じである。訂正用語としては，「足す」「引く」「プラス」「マイナス」等を用いる。
【００２４】
例えば，「足す」を用いると，
Ｕ「６７８９０」
Ｓ「６７１９０」（装置が８を１に置換誤り）
Ｕ「足す７００（ななひゃく）」（訂正の意図を含む発声）
Ｓ「６７８９０」（装置が前の認識結果を訂正）
「引く」を用いると，
Ｕ「０１０」
Ｓ「０８０」（装置が１を８に置換誤り）
Ｕ「引く７０（ななじゅー）」（訂正の意図を含む発声）
Ｓ「０１０」（装置が前の認識結果を訂正）
となる。ここで重要なのは，利用者の訂正の意図を含む発声は，「ひゃく」や「じゅー」等の位を含めた発声となっており，位の情報で訂正位置を指定できる点である。
【００２５】
以上の実施例から、本発明では、訂正を入力するユーザは訂正する語句のみを再入力するのではなく、必ずそのほかに補助となる語（訂正用語）を付加する点である。つまり、訂正用語により訂正の方法を示す。訂正対象と正しい内容を同時に発声し、いずれが訂正対象か正しい内容かを示す「を」「から」「より」などの方向性を示す語を用いたり、数字の場合は桁毎に「OK」という訂正用語を対応させ、訂正用語を桁の移動の合図とする。もちろん、訂正用語は「OK」に限らない。又、数字認識の場合はには認識と演算を組み合わせることができる。訂正用語に演算子を用い、訂正対象である誤認識した数に入力する数字に付された演算子に従い、誤認識語に数字を演算してその結果を正しい認識結果とする。
【００２６】
本発明は，音声認識のアルゴリズムを特に限定する発明ではないが，発明を実施するにあたっては，例えばHMM（Hidden Marko Model）を用いることが可能である。なお，HMMによる音声認識の詳細は，“中川聖一：確率モデルによる音声認識，電子情報通信学会，１９８８”などにあり，ここでは省略する。
【００２７】
【発明の効果】
本発明によれば，特に長い発声の認識結果に対する訂正が音声だけで簡単に行え，効率的に音声入力できるようになる。
【００２８】
また、訂正を部分的な入力としたり、訂正用語を用いて前の入力とは異なる入力方法を用いるため、多様な訂正を実現し、利用者が入力しやすい音声入力装置を提供できる。また、入力が簡単になるだけでなく最終的な認識結果の向上を実現する音声入力装置を提供することができる。
【図面の簡単な説明】
【図１】本発明による音声入力装置の一実施例を示すブロック図である。
【図２】本発明による音声入力装置の他の実施例を示すブロック図である。
【符号の説明】
101…音声入力手段，102…音声認識手段，103…結果出力手段，104…認識結果出力手段，105…認識対象生成手段，106…基本認識対象辞書，107…訂正用語辞書[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a man-machine interface, and more particularly to a voice recognition that is a means for inputting data into an apparatus and converts voice into character information.
[0002]
[Prior art]
In a device using voice recognition, it is inevitable that the device makes a recognition error. Therefore, in order to implement as a device, it is essential to provide a means for correcting the recognition result. Although there is a method of designating a correction location with a pointing device (mouse or touch panel) as a method for designating a correction location, the present invention assumes a situation where only the speech is used or only the speech can be used. In such a case, conventionally, since a correction location cannot be specified for a recognition error, correction is performed by re-phrasing the same utterance content as the previous time.
[0003]
[Problems to be solved by the invention]
When trying to correct a recognition error using only speech, it is troublesome to restate the entire utterance. In particular, it is very inconvenient and inefficient to rephrase the whole word for a part of a misrecognized part of a long utterance such as a continuous number with a large number of digits or an address.
[0004]
An object of the present invention is to provide a man-machine interface that enables a user to make corrections efficiently with only a small amount of speech even when an inevitable recognition error occurs in a speech input device using speech recognition. An object of the present invention is to provide a voice input device provided, or various information processing devices including the voice input device.
[0005]
[Means for Solving the Problems]
In the present invention, a recognition target generation unit that generates a recognition target dictionary that enables input intended to be corrected, and a previous recognition result are held. When a correction intention is detected, the recognition result that is held is retained. The problem is solved by providing a result holding means for correcting and outputting the above.
[0006]
In the speech input device of the present invention, speech input means for inputting speech, speech recognition means for recognizing the input speech within the range of the designated recognition target dictionary, and result holding for holding the recognition result of the speech recognition means A recognition result output means for outputting a recognition result, and a recognition target generation means for generating a recognition target dictionary used by the voice recognition means based on a basic recognition target dictionary specified in advance. When the generation unit receives the recognition result of the previous utterance that can be corrected from the result holding unit, the generation unit generates the correction target dictionary from the basic recognition target dictionary and the recognition result, and specifies the correction target dictionary A new recognition target dictionary is created by combining the corrected correction term dictionary and the basic recognition target dictionary, and the result holding means determines whether the recognition result includes the correction term. If the recognition result includes a correction term, a correction corresponding to the intention of the correction term is applied to the stored content, and the corrected result is output as a recognition result output unit and a recognition target generation unit. When the recognition result does not include a correction term, the result is output as it is to the recognition result output means and the recognition target generation means.
[0007]
Further, in the voice input device of another configuration of the present invention, voice input means for inputting voice, voice recognition means for recognizing the inputted voice within the range of the designated recognition target dictionary, and recognition by the voice recognition means A result holding means for holding the result; a recognition result output means for outputting the recognition result; a recognition target generating means for generating a recognition target dictionary used by the voice recognition means based on a basic recognition target dictionary specified in advance; The recognition target generation means combines the basic recognition target dictionary with a correction term dictionary specified in advance to create a new recognition target dictionary, and the result holding means recognizes the speech recognition means. When the correction term is included in the result, a result obtained by performing correction corresponding to the intention of the correction term on the retained content is output.
[0008]
According to another aspect of the present invention, there is provided a voice input device comprising: voice input means; recognition means for recognizing input voice; and output means for outputting the recognition result of the input voice. When a part of the recognition result is a misrecognized word, the misrecognized word, a correct word, and a phrase indicating which of those words is correct are input as corrections from the input means, Recognizing the correction, the result of correcting the misrecognized word in a part of the recognition result to a correct word is output as the final result.
[0009]
According to another aspect of the present invention, there is provided a speech input device comprising: means for inputting numeric speech; recognition means for recognizing input speech; and output means for outputting a recognition result of the input speech. When the number of the recognition result is a misrecognized word, the means inputs, as a correction, a phrase indicating a mathematical arithmetic difference between the misrecognized number and the correct number from the input unit. Then, a number obtained by recognizing the correction and performing an operation instructed by the correction on the erroneously recognized number is output as a final result.
[0010]
DETAILED DESCRIPTION OF THE INVENTION
A first embodiment of the present invention will be described.
[0011]
FIG. 1 shows a block diagram of the voice input device. The spoken voice is converted into a digital signal by voice input means (corresponding to from the microphone to A / D conversion), and the voice recognition means gives a range of a recognition target dictionary that is given in advance or is designated from outside each time. Then, the digital signal is recognized and the recognition result is output. The result holding means holds the recognition result and presents it to the user with a screen or voice as appropriate via the recognition result output means. The recognition target generation unit generates a recognition target dictionary from the basic recognition target dictionary, and reconfigures the recognition target dictionary so that the user's utterance including the correction intention can be recognized using the correction term dictionary at the same time. In addition, when the recognition target generation unit receives the recognition result from the result holding unit, the recognition target generation unit sets the recognition target dictionary so that the user's utterance including the correction intention can be recognized using the recognition result at the same time. Reconfigure.
[0012]
An example will be described in which consecutive digits with unlimited digits are set as recognition targets. For the sake of simplicity, the basic recognition target dictionary includes only arbitrary continuous numbers. However, when an actual application system is assumed, various control commands may be added.
[0013]
If the user utterance is U and the device output is S, the user input and device output are
U “010” (spoken by user)
S “080” (The device mistakenly replaces “1” with “8”)
U “8 to 1” (User speaks with intention to correct)
S “010” (Device corrects “8” in previous recognition result to “1”)
It becomes.
[0014]
In order to recognize the input “8 to 1” of the second user, the recognition target generation means reconstructs the recognition target dictionary. In addition to the “basic recognition target dictionary”, the recognition target dictionary is a dictionary in which a combination of “correction target dictionary” + “correction term” + “basic recognition target dictionary corresponding to the correction target dictionary” is arranged in parallel. In this case, the correction term is “no”, but it is not limited to this and may be “from” or “more”. The correction target dictionary is all combinations of one or more digits for the recognition result. In this example, “0”, “8”, “08”, “80”, and “080” are set for “080”. However, since “0” alone cannot identify the first digit “0” or the third digit “0”, it is better not to use “0” as a correction target dictionary.
[0015]
In addition, the “basic recognition target dictionary corresponding to the correction target dictionary” can be corrected more reliably by removing the correction target from an arbitrary continuous number. For example, “8 is 1”, “8 is 88”, “8 is 11”, and the like may be included in the recognition target, but “8 is 8” may not be the recognition target.
[0016]
The previous example is an example of a substitution error (a number is misrecognized as another number) when consecutive numbers are to be recognized. However, an error (a number that is not actually spoken is included in the recognition result) ) And omission errors (errors in which the actually uttered numbers are missing from the recognition result) can be corrected in the same way. For example, as an example of an error
U “4100”
S “42100” (The device springed up “2” and made an error)
U “421 to 41” (Speaking with intention of correction)
S “4100” (Device corrects previous recognition result)
As an example of omission error,
U “0123”
S “023” (Equipment is missing “1”)
U “02 to 012” (Voice including intention of correction)
S “0123” (Device corrects previous recognition result)
It becomes.
[0017]
By the way, the example here is 3-4 digits and a small number of digits for the sake of simplicity, but when the number of digits increases, the burden on the user is less compared to rephrasing everything. It is clear. It is also possible to make corrections repeatedly using the same method.
As described above, the voice input device of the present application holds the voice input means for inputting voice, the voice recognition means for recognizing the input voice within the designated recognition target dictionary, and the recognition result of the voice recognition means. And a recognition result output means for outputting a recognition result, and a recognition target generation means for generating a recognition target dictionary used by the speech recognition means based on a basic recognition target dictionary specified in advance. Further, the recognition target generation unit creates a new recognition target dictionary by combining the basic recognition target dictionary and a correction term dictionary specified in advance, and the result holding unit corrects the recognition result of the voice recognition unit. If a term is included, a result obtained by performing correction corresponding to the intention of the correction term on the retained content is output. The basic recognition target dictionary includes a dictionary for recognizing a bar reading that does not include a digit position, and a dictionary for recognizing a digit reading that includes a digit position, and the result holding means includes the voice recognition means If the recognition result includes a correction term and a digit reading, the correction position is specified by the digit information of the digit reading and the correction corresponding to the intention of the correction term is retained. The corrected result can also be output.
[0018]
Furthermore, the present invention can also be used when the recognition target is a place name such as an address. For example,
U “Higashi Koigakubo, Kokubunji, Tokyo”
S “Nishikoigakubo, Kokubunji, Tokyo” (The equipment mistakenly replaces “Higashikoigakubo” with “Nishikoigakubo”)
U “East Koigakubo to Nishikoigakubo” (Speaking with intention of correction)
S “Tokyo Higashi Koigakubo, Kokubunji, Tokyo” (The device corrects the previous recognition result)
It can be corrected as follows. In this case, the basic recognition target dictionary is, for example, addresses all over Japan, and the correction term is “O”. The correction target dictionary is obtained from the basic recognition target dictionary and the recognition result, and may be, for example, three words of “Tokyo”, “Kokubunji City”, and “Nishikoigakubo”. In addition, it is possible to obtain misrecognized units in consideration of restrictions as real addresses (Kokubunji City is only in Tokyo, etc.). In such a case, the dictionary to be corrected is “Nishikoigakubo, Kokubunji City, Tokyo” It becomes "Kokobunji City Nishikoigakubo", "Nishikoigakubo", "West".
[0019]
A second embodiment of the present invention will be described. FIG. 2 shows a block diagram of the voice input device. The difference from the block diagram of FIG. 1 is that the recognition target generation means does not use the recognition result.
[0020]
“OK” or the like is used as a correction term.
[0021]
For example,
U “010”
S "080" (Device replaces 1 with 8)
U "OK, 1" (The first digit is not corrected, the second digit is 1)
S “010” (Device corrects previous recognition result)
It becomes. The correction part is specified by the number of “OK”.
[0022]
The same applies when the recognition target is a place name. For example,
U “Higashi Koigakubo, Kokubunji, Tokyo”
S “Nishikoigakubo, Kokubunji, Tokyo” (The equipment mistakenly replaces “Higashikoigakubo” with “Nishikoigakubo”)
U “OK, OK, Nishi-Kogakubo” (Speaking with intention of correction)
S “Tokyo Higashi Koigakubo, Kokubunji, Tokyo” (The device corrects the previous recognition result)
It becomes.
[0023]
Next, a third embodiment of the present invention will be described. The third embodiment is an example in which recognition targets are limited to numerals, but the block diagram of the voice input device is the same as that in FIG. As correction terms, “add”, “subtract”, “plus”, “minus”, etc. are used.
[0024]
For example, using "Add"
U "67890"
S "67190" (Device replaces 8 with 1)
U “Add 700 (Nanahyaku)” (Voice including intention of correction)
S “67890” (Device corrects previous recognition result)
With “Draw”,
U “010”
S "080" (Device replaces 1 with 8)
U “Take 70” (Voice including intention of correction)
S “010” (Device corrects previous recognition result)
It becomes. What is important here is that the utterance including the intention of correction by the user is an utterance including the positions such as “Hyaku” and “Ju”, and the correction position can be designated by the position information.
[0025]
From the above embodiments, in the present invention, the user who inputs correction does not re-input only the word to be corrected, but always adds an auxiliary word (correction term). In other words, the correction method is indicated by the correction term. Speak the correct contents and correct contents at the same time, and use a word indicating direction such as “to”, “from”, “more” to indicate which is correct or correct, or “OK” for each digit. The correction term is made to correspond, and the correction term is used as a signal for shifting the digit. Of course, the correction term is not limited to “OK”. In the case of number recognition, recognition and calculation can be combined. An operator is used as the correction term, and a number is calculated for the misrecognized word according to the operator attached to the number input to the number of erroneously recognized numbers to be corrected, and the result is set as a correct recognition result.
[0026]
The present invention is not an invention that specifically limits a speech recognition algorithm. However, for example, an HMM (Hidden Marko Model) can be used to implement the invention. Details of speech recognition by the HMM are described in “Seiichi Nakagawa: Speech recognition by probability model, IEICE, 1988” and the like is omitted here.
[0027]
【The invention's effect】
According to the present invention, it is possible to easily correct a recognition result of a particularly long utterance by using only a voice and efficiently input a voice.
[0028]
In addition, since correction is a partial input or an input method that is different from the previous input using correction terms is used, it is possible to provide a voice input device that realizes various corrections and is easy for a user to input. In addition, it is possible to provide a voice input device that not only simplifies input but also improves the final recognition result.
[Brief description of the drawings]
FIG. 1 is a block diagram showing an embodiment of a voice input device according to the present invention.
FIG. 2 is a block diagram showing another embodiment of the voice input device according to the present invention.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 101 ... Voice input means, 102 ... Voice recognition means, 103 ... Result output means, 104 ... Recognition result output means, 105 ... Recognition object generation means, 106 ... Basic recognition object dictionary, 107 ... Correction term dictionary

Claims

Input means for inputting a voice,
A recording means for recording a basic recognition target dictionary for storing at least numbers and a correction dictionary for recording operators as correction terms;
Recognition means for recognizing the input speech using the basic recognition target dictionary and the correction dictionary ;
Output means for outputting the recognition result of the input speech;
Have
The input means, when the numbers of the recognition result is incorrect recognition, receives the correct input words indicated mathematical operations specific differences of the correct numbers as the mis-recognized number in combination with the operators and numbers,
The recognition means recognizes the number and operator of the correction input, and the final result is a number obtained by performing an operation using the operator on the number of the recognition result and the recognized correction input number. A voice input device for outputting.

An input means for inputting voice;
A recording means for recording a basic recognition target dictionary for storing at least numbers and a correction dictionary for recording operators as correction terms;
Recognition means for recognizing the input speech using the basic recognition target dictionary and the correction dictionary;
A voice input method executed in a voice input device having output means for outputting a recognition result of the input voice,
When the number of the recognition result output is incorrect, correction of a phrase indicating a mathematical arithmetic difference between the number of the recognition result and the correct number by a combination of the operator and the number via the input means Receiving input,
Recognize the correction input number and operator using the recognition means,
A speech input method, wherein a number obtained by performing an operation using the operator on the number of the erroneous recognition result and the recognized number of corrected input is output as a final result.