JP2000293195A

JP2000293195A - Voice inputting device

Info

Publication number: JP2000293195A
Application number: JP11102162A
Authority: JP
Inventors: Toshiyuki Odaka; 俊之小高
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1999-04-09
Filing date: 1999-04-09
Publication date: 2000-10-20
Anticipated expiration: 2019-04-09
Also published as: JP3815110B2

Abstract

PROBLEM TO BE SOLVED: To simply execute a correction to a recognition result, that takes an especially long time for generation, by voice only and to efficiently conduct a voice inputting by conducting a correction corresponding to an objective to the contents being held when a correction word is included in the recognition result and outputting the result to a recognition result outputting means. SOLUTION: When a recognition object generating means 105 receives the recognition result of a previous uttering, that becomes a correction object, from a result holding means 103, a correction object dictionary is generated from a fundamental recognition object dictionary 106 and the recognition result. Then, a new recognition object dictionary is generated by combining the correction object dictionary, a correction word dictionary 107, that is beforehand specified, and the dictionary 106. The means 103 conducts discrimination to determine whether the recognition result includes a correction word or not. When a correction word is included in the recognition result, a correction corresponding to the objective of a correction word is performed to the contents being held and the corrected result is outputted to a recognition result outputting means 104 and a recognition object generating means 105.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は，マンマシンインタ
フェースに係り，特に，装置にデータを入力する場合の
一手段であり，音声を文字情報などに変換する音声認識
に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a man-machine interface, and more particularly to a means for inputting data to a device, and more particularly to a speech recognition for converting speech into character information or the like.

【０００２】[0002]

【従来の技術】音声認識を利用した装置においては，装
置が認識誤りをすることは避けられない。そのため，装
置として実装するためには，認識結果の訂正手段を設け
ることが必須である。訂正箇所を指定する方法として，
訂正個所をポインティングデバイス（マウスやタッチパ
ネル）で指定する方法もあるが，本発明では，音声のみ
で行う，あるいは，音声しか使えない状況を想定する。
このような場合，従来では，認識誤りに対して，訂正箇
所が特定できないために，前回と同じ発声内容を繰り返
し言い直すことにより，訂正を行っていた。2. Description of the Related Art In a device utilizing speech recognition, it is inevitable that the device makes a recognition error. Therefore, in order to implement it as a device, it is essential to provide a means for correcting the recognition result. As a method of specifying the correction part,
Although there is a method of designating a correction part with a pointing device (mouse or touch panel), in the present invention, a situation is assumed in which only a voice is used or only a voice is usable.
In such a case, conventionally, for the recognition error, since the corrected portion cannot be specified, the same utterance content as that of the previous time is repeatedly repeated to perform the correction.

【０００３】[0003]

【発明が解決しようとする課題】音声のみで認識誤りを
訂正しようとした場合，発声全体を言い直すのは煩わし
い。特に，桁数の多い連続数字や住所のような長い発声
のうち，誤認識された一部のために全体を言い直すこと
は，非常に使い勝手が悪く，効率も悪い。When an attempt is made to correct a recognition error using only speech, it is troublesome to restate the entire utterance. In particular, it is extremely inconvenient and inefficient to restate the entire utterance of a long utterance such as a continuous digit having a large number of digits or an address because of a misrecognized part.

【０００４】本発明の目的は，音声認識を用いた音声入
力装置において不可避な認識誤りが起きた場合にも，利
用者に比較的少ない労力で，かつ音声のみで効率良く訂
正ができるようなマンマシンインタフェースを備えた音
声入力装置，あるいは，該音声入力装置を含む各種情報
処理装置を提供することにある。[0004] An object of the present invention is to provide a user who can efficiently correct a speech input device using speech recognition even if an unavoidable recognition error occurs with a relatively small amount of labor and using only speech. It is an object of the present invention to provide a voice input device having a machine interface or various information processing devices including the voice input device.

【０００５】[0005]

【課題を解決するための手段】本発明では，訂正を意図
する入力も可能とする認識対象辞書を生成する認識対象
生成手段と，前の認識結果を保持し，訂正の意図が検出
された場合には，保持していた認識結果を訂正して出力
する結果保持手段とを備えることにより，前記課題を解
決する。According to the present invention, there is provided a recognition target generating means for generating a recognition target dictionary which also enables an input intended for correction, and a method for holding a previous recognition result and detecting a correction intention. The above problem is solved by providing a result holding unit for correcting and outputting the held recognition result.

【０００６】本発明の音声入力装置では、音声を入力す
る音声入力手段と，入力した音声を指定された認識対象
辞書の範囲内で認識する音声認識手段と，該音声認識手
段の認識結果を保持する結果保持手段と，認識結果を出
力する認識結果出力手段と，予め指定されている基本認
識対象辞書などを元に前記音声認識手段が用いる認識対
象辞書を生成する認識対象生成手段とからなり，前記認
識対象生成手段は，結果保持手段から，訂正対象となり
得る前発声の認識結果を受け取った場合には，前記基本
認識対象辞書および前記認識結果から訂正対象辞書を生
成し，該訂正対象辞書と，予め指定されている訂正用語
辞書と，前記基本認識対象辞書を組み合わせて新たな認
識対象辞書を作成し，前記結果保持手段は，前記認識結
果に訂正用語を含んでいるかどうか判定し，前記認識結
果に訂正用語を含んでいる場合には，該訂正用語の意図
に対応した訂正を保持内容に対して施すと共に，訂正さ
れた結果を認識結果出力手段および認識対象生成手段に
対して出力し，前記認識結果に訂正用語を含んでいない
場合には，そのままの結果を認識結果出力手段および認
識対象生成手段に対して出力する。In the voice input device of the present invention, voice input means for inputting voice, voice recognition means for recognizing the input voice within a specified dictionary to be recognized, and holding the recognition result of the voice recognition means And a recognition result generating means for generating a recognition target dictionary used by the voice recognition means based on a pre-designated basic recognition target dictionary and the like. The recognition target generation means generates a correction target dictionary from the basic recognition target dictionary and the recognition result when receiving a recognition result of a previous utterance that can be a correction target from a result holding means, and A new dictionary to be recognized is created by combining a corrected term dictionary specified in advance and the dictionary for basic recognition, and the result holding means includes the corrected term in the recognition result. If the recognition result includes a corrected term, a correction corresponding to the intention of the corrected term is made to the held content, and the corrected result is output to the recognition result output means and the recognition target. The recognition result is output to the recognition result output means and the recognition target generation means when the correction result does not include the corrected term.

【０００７】また、本発明の他の構成の音声入力装置で
は、音声を入力する音声入力手段と，入力した音声を指
定された認識対象辞書の範囲内で認識する音声認識手段
と，該音声認識手段の認識結果を保持する結果保持手段
と，認識結果を出力する認識結果出力手段と，予め指定
されている基本認識対象辞書などを元に前記音声認識手
段が用いる認識対象辞書を生成する認識対象生成手段と
からなり，前記認識対象生成手段は，前記基本認識対象
辞書と，予め指定されている訂正用語辞書を組合せて，
新たに認識対象辞書を作成し，前記結果保持手段は，前
記音声認識手段の認識結果に訂正用語を含んでいる場合
には，該訂正用語の意図に対応した訂正を保持内容に対
して施した結果を出力する。According to another aspect of the present invention, there is provided a voice input device for inputting voice, a voice recognition device for recognizing the input voice within a specified dictionary to be recognized, and the voice recognition device. A result holding means for holding a recognition result of the means, a recognition result output means for outputting a recognition result, and a recognition target for generating a recognition target dictionary used by the voice recognition means based on a basic recognition target dictionary specified in advance. Generating means, wherein the recognition target generating means combines the basic recognition target dictionary with a pre-specified correction term dictionary,
When a new dictionary to be recognized is created, and the result holding unit includes a correction term in the recognition result of the speech recognition unit, a correction corresponding to the intention of the correction term is applied to the held content. Output the result.

【０００８】また、本発明の他の構成の音声入力装置で
は、音声を入力する手段と、入力音声を認識する認識手
段と、上記入力音声の認識結果を出力する出力手段と、
を有し、上記認識手段は、上記認識結果の一部分が誤認
識語の際に、該誤認識語、正しい語、およびそれらの語
のいずれが正しいか誤りかを示す語句とを訂正として上
記入力手段から入力して、該訂正を認識して上記認識結
果の一部分の誤認識語を正しい語に修正した結果を最終
結果として出力する。According to another aspect of the present invention, there is provided a voice input device comprising: a voice input unit; a recognition unit for recognizing the input voice; an output unit for outputting a recognition result of the input voice;
The recognition means, when a part of the recognition result is a misrecognized word, corrects the misrecognized word, a correct word, and a phrase indicating which of those words is correct or incorrect, and The correction unit recognizes the correction and corrects the erroneously recognized word of a part of the recognition result into a correct word and outputs the result as a final result.

【０００９】また、本発明の他の構成の音声入力装置で
は、数字音声を入力する手段と、入力音声を認識する認
識手段と、上記入力音声の認識結果を出力する出力手段
と、を有し、上記認識手段は、上記認識結果の数字が誤
認識語の際に、該誤認識数字と正しい数字の数学演算的
な相違を演算子と数字との組み合わせで示した語句を、
訂正として上記入力手段から入力して、該訂正を認識し
て上記誤認識数字に上記訂正が指示する演算を施して得
られる数字を最終結果として出力する。A voice input device having another configuration according to the present invention includes a means for inputting a numeric voice, a recognition means for recognizing the input voice, and an output means for outputting a recognition result of the input voice. The recognition means, when the number of the recognition result is a misrecognized word, a phrase indicating a mathematical operation difference between the misrecognized number and a correct number by a combination of an operator and a number,
A correction is input from the input means, the correction is recognized, and a number obtained by performing an operation indicated by the correction on the erroneously recognized numeral is output as a final result.

【００１０】[0010]

【発明の実施の形態】本発明の第１の実施例について説
明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS A first embodiment of the present invention will be described.

【００１１】図１に音声入力装置のブロック図を示す。
発声された音声は，音声入力手段（マイクからＡ／Ｄ変
換までに相当）でデジタル信号化され，音声認識手段で
は，予め与えられるか，あるいは，その都度外部から指
定される認識対象辞書の範囲で，デジタル信号化された
音声を認識し，認識結果を出力する。結果保持手段で認
識結果は保持され，認識結果出力手段を介して適宜利用
者に対して，画面や音声により提示される。認識対象生
成手段は，基本認識対象辞書より認識対象辞書を生成す
るが，訂正用語辞書を同時に用いて訂正の意図を含む利
用者の発声を認識可能となるように認識対象辞書を再構
成する。また，認識対象生成手段は，結果保持手段から
認識結果を受け取った場合には，その認識結果をも同時
に用いて，訂正の意図を含む利用者の発声を認識可能と
なるように認識対象辞書を再構成する。FIG. 1 shows a block diagram of a voice input device.
The uttered voice is converted into a digital signal by voice input means (corresponding to microphone to A / D conversion), and the voice recognition means provides a range of a recognition target dictionary which is given in advance or specified externally each time. Then, the digitalized speech is recognized, and the recognition result is output. The recognition result is held by the result holding means, and is presented to the user through a recognition result output means by a screen or voice as appropriate. The recognition target generation means generates the recognition target dictionary from the basic recognition target dictionary, and reconstructs the recognition target dictionary so that the user's utterance including the intention of the correction can be recognized simultaneously using the correction term dictionary. In addition, when the recognition target generating unit receives the recognition result from the result holding unit, the recognition target generation unit also uses the recognition result at the same time to generate a recognition target dictionary so that the user's utterance including the intention of correction can be recognized. Reconfigure.

【００１２】桁無制限の連続数字を認識対象とした場合
を例に説明する。簡単のために，基本認識対象辞書は任
意の連続数字のみとするが，実際の応用システムを想定
した場合には，各種制御コマンドなどが加わることがあ
り，この限りではない。A case will be described as an example where a continuous number of unlimited digits is to be recognized. For the sake of simplicity, the dictionary for basic recognition is made up of only arbitrary continuous numbers, but if an actual application system is assumed, various control commands may be added, and this is not the case.

【００１３】利用者の発声をU，装置の出力をSとした場
合，利用者の入力と装置の出力は例えば，Ｕ「０１０」（利用者が発声）Ｓ「０８０」（装置が「１」を「８」に置換誤り）Ｕ「８を１」（利用者が訂正の意図を含んで発声）Ｓ「０１０」（装置が前の認識結果の「８」を「１」
に訂正）となる。Assuming that the utterance of the user is U and the output of the device is S, the input of the user and the output of the device are, for example, U “010” (user utters) S “080” (device is “1”) U is replaced with “8”) U “8 is 1” (user utters with intention of correction) S “010” (the device replaces “8” of previous recognition result with “1”)
Corrected).

【００１４】２番目の利用者の入力「８を１」を認識す
るために，認識対象生成手段は認識対象辞書を再構成す
る。認識対象辞書は，「基本認識対象辞書」の他に，
「訂正対象辞書」＋「訂正用語」＋「訂正対象辞書に対
応する基本認識対象辞書」という組み合わせとが並列に
なった辞書となる。この場合，訂正用語は「を」として
いるが，これを限定するのもではなく，「から」「よ
り」などでも良い。訂正対象辞書は，認識結果に対し
て，1桁または複数桁の組合せ全てとする。この例で
は，「０８０」に対して，「０」「８」「０８」「８
０」「０８０」とする。ただし，「０」のみでは１桁目
の「０」か３桁目の「０」かが特定できないため，
「０」は訂正対象辞書としない方が良い。In order to recognize the input "8 to 1" of the second user, the recognition target generation means reconfigures the recognition target dictionary. The dictionary to be recognized is, in addition to the “basic dictionary to be recognized”,
The combination of “correction target dictionary” + “correction term” + “basic recognition target dictionary corresponding to the correction target dictionary” becomes a dictionary in parallel. In this case, the correction term is "", but is not limited thereto, and may be "kara" or "yo". The correction target dictionary is a combination of one or more digits for the recognition result. In this example, “0”, “8”, “08”, “8”
0 "and" 080 ". However, since it is not possible to specify the first digit “0” or the third digit “0” only with “0”,
It is better not to set "0" as a correction target dictionary.

【００１５】また，「訂正対象辞書に対応する基本認識
対象辞書」としては，任意の連続数字から訂正対象を除
くようにすることにより，より確実に訂正することが可
能となる。例えば，「８を１」や「８を８８」，「８を
１１」等は認識対象に含めても，「８を８」は認識対象
にならないようにすると良い。The "basic recognition target dictionary corresponding to the correction target dictionary" can be corrected more reliably by removing the correction target from arbitrary continuous numbers. For example, “8 to 1”, “8 to 88”, “8 to 11”, etc. may be included in the recognition target, but “8 to 8” may not be recognized.

【００１６】先の例は，連続数字を認識対象とした場合
の置換誤り（ある数字が他の数字へ誤認識）の例である
が，湧き出し誤り（実際は発声していない数字が認識結
果に含まれる誤り）や脱落誤り（実際に発声した数字が
認識結果から欠けてしまう誤り）の場合も同様な訂正が
可能である。例えば，湧き出し誤りの例として，Ｕ「４１００」Ｓ「４２１００」（装置が「２」を湧き出し誤り）Ｕ「４２１を４１」（訂正の意図を含んだ発声）Ｓ「４１００」（装置が前の認識結果を訂正）脱落誤りの例として，Ｕ「０１２３」Ｓ「０２３」（装置が「１」を脱落誤り）Ｕ「０２を０１２」（訂正の意図を含んだ発声）Ｓ「０１２３」（装置が前の認識結果を訂正）となる。The previous example is an example of a replacement error (a certain number is erroneously recognized as another number) when a continuous number is to be recognized. However, a source error (a number that is not actually uttered) is included in the recognition result. The same correction is possible in the case of an included error or an omission error (an error in which an actually uttered number is missing from the recognition result). For example, as an example of a source error, U “4100” S “42100” (the device sourced “2” error) U “421” 41 (speech including the intention of correction) S “4100” (device is U “0123” S “023” (device loses “1”) U “012” (speech with intention of correction) S “0123” (The device corrects the previous recognition result.)

【００１７】ところで，ここでの例は簡単のために３〜
４桁の連続数字と少ない桁数であるが，桁数が多くなっ
たときに，全てを言い直すことと比較して，利用者の負
担が少ないことは明らかである。また，同じ方法により
繰り返し訂正をすることもできる。By the way, the example here is 3 to 3 for simplicity.
It is clear that the number of digits is four consecutive digits and the number of digits is small. However, when the number of digits increases, it is clear that the burden on the user is smaller than restatement of all. Further, the correction can be repeatedly performed by the same method.

【００１８】さらに，本発明は，認識対象を住所のよう
な地名とした場合にも使うことができる。例えば，Ｕ「東京都国分寺市東恋ヶ窪」Ｓ「東京都国分寺市西恋ヶ窪」（装置が「東恋ヶ窪」
を「西恋ヶ窪」と置換誤り）Ｕ「東恋ヶ窪を西恋ヶ窪」（訂正の意図を含んだ
発声）Ｓ「東京都国分寺市東恋ヶ窪」（装置が前の認識結果
を訂正）のようにして訂正できる。この場合，基本認識対象辞書
は例えば日本全国の住所であり，訂正用語は「を」であ
る。また，訂正対象辞書は，基本認識対象辞書および認
識結果から求められ，例えば，「東京都」「国分寺市」
「西恋ヶ窪」の３単語とすれば良い。また，実在の住所
としての制約（国分寺市は東京都にしかない，等）を考
慮して，誤認識される単位を求めても良く，そうした場
合訂正対象辞書は，「東京都国分寺市西恋ヶ窪」「国分
寺市西恋ヶ窪」「西恋ヶ窪」「西」となる。Further, the present invention can also be used when the recognition target is a place name such as an address. For example, U “Higashi Koigakubo, Kokubunji, Tokyo” S “Nishi Koigabo, Kokubunji, Tokyo”
Is replaced with "Nishi-Koigabo") U "Higashi-Koigabo in Nishi-Koigabo" (Speech with intention of correction) S "Higashi-Koigabo in Kokubunji, Tokyo" (The device corrects the previous recognition result) . In this case, the dictionary for basic recognition is, for example, an address in the whole of Japan, and the corrected term is "". The correction target dictionary is obtained from the basic recognition target dictionary and the recognition result. For example, “Tokyo”, “Kokubunji”
What should be three words of "Nishi Koigabo". In addition, the unit that may be misrecognized may be calculated in consideration of the restriction as an actual address (Kokubunji city is only in Tokyo). "Kokubunji City Nishi Koigabo""NishiKoigabo""west".

【００１９】本発明の第２の実施例について説明する。
図２に音声入力装置のブロック図を示す。図１のブロッ
ク図と異なるのは，認識対象生成手段が認識結果を使わ
ない点である。Next, a second embodiment of the present invention will be described.
FIG. 2 shows a block diagram of the voice input device. The difference from the block diagram of FIG. 1 is that the recognition target generation means does not use the recognition result.

【００２０】訂正用語としては，「OK」等を用いる。As the correction term, "OK" or the like is used.

【００２１】例えば，Ｕ「０１０」Ｓ「０８０」（装置が１を８に置換誤り）Ｕ「OK，１」（１桁目は訂正しない，２桁目を１に）Ｓ「０１０」（装置が前の認識結果を訂正）となる。「OK」の個数により，訂正箇所を特定する。For example, U "010" S "080" (device replaces 1 with 8) U "OK, 1" (1st digit is not corrected, 2nd digit is 1) S "010" (device Corrects the previous recognition result). The correction part is specified by the number of "OK".

【００２２】認識対象が地名の場合でも，同様に適用可
能である。例えば，Ｕ「東京都国分寺市東恋ヶ窪」Ｓ「東京都国分寺市西恋ヶ窪」（装置が「東恋ヶ窪」
を「西恋ヶ窪」と置換誤り）Ｕ「ＯＫ，ＯＫ，西恋ヶ窪」（訂正の意図を含んだ
発声）Ｓ「東京都国分寺市東恋ヶ窪」（装置が前の認識結果
を訂正）となる。The same applies to the case where the recognition target is a place name. For example, U “Higashi Koigakubo, Kokubunji, Tokyo” S “Nishi Koigabo, Kokubunji, Tokyo”
Is replaced with "Nishi-Koigabo") U "OK, OK, Nishi-Koigabo" (speech including intention of correction) S "Higashi-Koigabo, Kokubunji-shi, Tokyo" (The device corrects the previous recognition result).

【００２３】次に本発明の第３の実施例について説明す
る。第３の実施例は，認識対象を数字に限った場合の例
であるが，音声入力装置のブロック図としては図２と同
じである。訂正用語としては，「足す」「引く」「プラ
ス」「マイナス」等を用いる。Next, a third embodiment of the present invention will be described. Although the third embodiment is an example in which the recognition target is limited to numerals, the block diagram of the voice input device is the same as FIG. As the correction term, “add”, “subtract”, “plus”, “minus” and the like are used.

【００２４】例えば，「足す」を用いると，Ｕ「６７８９０」Ｓ「６７１９０」（装置が８を１に置
換誤り）Ｕ「足す７００（ななひゃく）」（訂正の意図を含む
発声）Ｓ「６７８９０」（装置が前の認識結
果を訂正）「引く」を用いると，Ｕ「０１０」Ｓ「０８０」（装置が１を８に置換
誤り）Ｕ「引く７０（ななじゅー）」（訂正の意図を含む発
声）Ｓ「０１０」（装置が前の認識結果
を訂正）となる。ここで重要なのは，利用者の訂正の意図を含む
発声は，「ひゃく」や「じゅー」等の位を含めた発声とな
っており，位の情報で訂正位置を指定できる点である。For example, if "add" is used, U "67890" S "67190" (the device replaces 8 with 1) U "add 700 (Nanahyaku)" (speech including intention of correction) S " 67890 "(the device corrects the previous recognition result) If" pull "is used, U" 010 "S" 080 "(the device replaces 1 with 8) U" pull 70 (Nanaji-oo) "(correction) S "010" (the device corrects the previous recognition result). What is important here is that the utterance including the intention of the user to correct the utterance includes utterances such as “Hyaku” and “Ju”, and the correction position can be specified by the information of the order.

【００２５】以上の実施例から、本発明では、訂正を入
力するユーザは訂正する語句のみを再入力するのではな
く、必ずそのほかに補助となる語（訂正用語）を付加す
る点である。つまり、訂正用語により訂正の方法を示
す。訂正対象と正しい内容を同時に発声し、いずれが訂
正対象か正しい内容かを示す「を」「から」「より」な
どの方向性を示す語を用いたり、数字の場合は桁毎に
「OK」という訂正用語を対応させ、訂正用語を桁の移動
の合図とする。もちろん、訂正用語は「OK」に限らな
い。又、数字認識の場合はには認識と演算を組み合わせ
ることができる。訂正用語に演算子を用い、訂正対象で
ある誤認識した数に入力する数字に付された演算子に従
い、誤認識語に数字を演算してその結果を正しい認識結
果とする。From the above embodiments, the present invention is characterized in that a user who inputs a correction does not re-input only a phrase to be corrected, but always adds an auxiliary word (correction term). That is, a correction method is indicated by a correction term. Either utter the correct content and the correct content at the same time, and use a word indicating the direction, such as "o", "kara", or "yo", which indicates which is the correct content or the correct content. And the correction term is used as a signal of digit shift. Of course, the correction term is not limited to “OK”. In the case of numeral recognition, recognition and calculation can be combined. An operator is used as a correction term, and a number is calculated on the misrecognized word according to the operator attached to the number to be input to the misrecognized number to be corrected, and the result is regarded as a correct recognition result.

【００２６】本発明は，音声認識のアルゴリズムを特に
限定する発明ではないが，発明を実施するにあたって
は，例えばHMM（Hidden Marko Model）を用いることが
可能である。なお，HMMによる音声認識の詳細は，“中
川聖一：確率モデルによる音声認識，電子情報通信学
会，１９８８”などにあり，ここでは省略する。Although the present invention does not particularly limit the speech recognition algorithm, it is possible to use, for example, an HMM (Hidden Marko Model) in carrying out the invention. The details of the speech recognition by the HMM are described in "Seiichi Nakagawa: Speech Recognition by Stochastic Model, IEICE, 1988" and the like, and are omitted here.

【００２７】[0027]

【発明の効果】本発明によれば，特に長い発声の認識結
果に対する訂正が音声だけで簡単に行え，効率的に音声
入力できるようになる。According to the present invention, especially the recognition result of a long utterance can be easily corrected by using only the voice, and the voice can be input efficiently.

【００２８】また、訂正を部分的な入力としたり、訂正
用語を用いて前の入力とは異なる入力方法を用いるた
め、多様な訂正を実現し、利用者が入力しやすい音声入
力装置を提供できる。また、入力が簡単になるだけでな
く最終的な認識結果の向上を実現する音声入力装置を提
供することができる。Further, since the correction is made a partial input or an input method different from the previous input is used by using the correction term, various corrections can be realized and a voice input device which is easy for the user to input can be provided. . In addition, it is possible to provide a voice input device that not only simplifies the input but also improves the final recognition result.

[Brief description of the drawings]

【図１】本発明による音声入力装置の一実施例を示すブ
ロック図である。FIG. 1 is a block diagram showing one embodiment of a voice input device according to the present invention.

【図２】本発明による音声入力装置の他の実施例を示す
ブロック図である。FIG. 2 is a block diagram showing another embodiment of the voice input device according to the present invention.

[Explanation of symbols]

101…音声入力手段，102…音声認識手段，103…結果出
力手段，104…認識結果出力手段，105…認識対象生成手
段，106…基本認識対象辞書，107…訂正用語辞書101: voice input means, 102: voice recognition means, 103: result output means, 104: recognition result output means, 105: recognition target generation means, 106: basic recognition target dictionary, 107: correction term dictionary

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 3/00 ５５１Ｐ５６１Ｅ ──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) G10L 3/00 551P 561E

Claims

[Claims]

A voice input means for inputting voice, a voice recognition means for recognizing the input voice within a specified dictionary to be recognized, a result holding means for holding a recognition result of the voice recognition means, A recognition result output means for outputting a recognition result;
A recognition target generating means for generating a recognition target dictionary used by the voice recognition means based on a basic recognition target dictionary specified in advance; When the utterance recognition result is received, a correction target dictionary is generated from the basic recognition target dictionary and the recognition result, and the correction target dictionary, a correction term dictionary specified in advance, and the basic recognition target dictionary are stored. A new recognition target dictionary is created in combination with the result dictionary, and the result holding unit determines whether the recognition result includes a corrected term. If the recognition result includes the corrected term, the intention of the corrected term is determined. Is performed on the held contents, and the corrected result is output to the recognition result output means and the recognition target generation means, and the recognition result includes the corrected term. When no outputs as results for the recognition result output means and the recognition object generation means, voice input device, characterized in that.

2. A voice input means for inputting voice, a voice recognition means for recognizing the input voice within a specified dictionary to be recognized, a result holding means for holding a recognition result of the voice recognition means, A recognition result output means for outputting a recognition result;
A recognition target generating means for generating a recognition target dictionary used by the voice recognition means based on a pre-specified basic recognition target dictionary and the like; A new dictionary to be recognized is created by combining the corrected term dictionaries, and the result holding unit, when the recognition result of the speech recognition unit includes the corrected term, corresponds to the intention of the corrected term. A speech input device for outputting a result obtained by applying a correction to a held content.

3. The voice input device according to claim 2, wherein the basic recognition target dictionary is a dictionary for recognizing a stick reading not including a digit position and a dictionary for recognizing a digit reading including a digit position. When the result of the speech recognition includes a corrected term and includes a digit reading of a number, the result holding unit specifies a correction position based on the digit information of the digit reading. Apply a correction corresponding to the intention of the correction term to the held content, and output the corrected result;
A voice input device characterized by the above.

4. A telephone, a car navigation system, a personal information terminal, a game machine, or various personal computers comprising the voice input device according to claim 1.

5. An apparatus for inputting a voice, a recognizing means for recognizing the input voice, and an output means for outputting a recognition result of the input voice, wherein the recognizing means detects a part of the recognition result as an error In the case of a recognition word, the erroneous recognition word, the correct word, and a phrase indicating which of these words is correct or incorrect are input from the input means as a correction, and the correction is recognized and the recognition result is obtained. A voice input device for outputting a result obtained by correcting a part of misrecognized words to a correct word as a final result.

6. A means for inputting a numerical voice, a recognizing means for recognizing an input voice, and an output means for outputting a recognition result of the input voice, wherein the recognizing means is configured such that the number of the recognition result is In the case of a misrecognized word, a word indicating a mathematical operation difference between the misrecognized number and the correct number by a combination of an operator and a number is input as a correction from the input means, and the correction is recognized. A voice input device for outputting a number obtained by performing an operation instructed by the correction on the misrecognized number as a final result.