JP2000293195A - Voice inputting device - Google Patents

Voice inputting device

Info

Publication number
JP2000293195A
JP2000293195A JP11102162A JP10216299A JP2000293195A JP 2000293195 A JP2000293195 A JP 2000293195A JP 11102162 A JP11102162 A JP 11102162A JP 10216299 A JP10216299 A JP 10216299A JP 2000293195 A JP2000293195 A JP 2000293195A
Authority
JP
Japan
Prior art keywords
recognition
result
voice
correction
dictionary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP11102162A
Other languages
Japanese (ja)
Other versions
JP2000293195A5 (en
JP3815110B2 (en
Inventor
Toshiyuki Odaka
俊之 小高
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Priority to JP10216299A priority Critical patent/JP3815110B2/en
Publication of JP2000293195A publication Critical patent/JP2000293195A/en
Publication of JP2000293195A5 publication Critical patent/JP2000293195A5/ja
Application granted granted Critical
Publication of JP3815110B2 publication Critical patent/JP3815110B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Abstract

PROBLEM TO BE SOLVED: To simply execute a correction to a recognition result, that takes an especially long time for generation, by voice only and to efficiently conduct a voice inputting by conducting a correction corresponding to an objective to the contents being held when a correction word is included in the recognition result and outputting the result to a recognition result outputting means. SOLUTION: When a recognition object generating means 105 receives the recognition result of a previous uttering, that becomes a correction object, from a result holding means 103, a correction object dictionary is generated from a fundamental recognition object dictionary 106 and the recognition result. Then, a new recognition object dictionary is generated by combining the correction object dictionary, a correction word dictionary 107, that is beforehand specified, and the dictionary 106. The means 103 conducts discrimination to determine whether the recognition result includes a correction word or not. When a correction word is included in the recognition result, a correction corresponding to the objective of a correction word is performed to the contents being held and the corrected result is outputted to a recognition result outputting means 104 and a recognition object generating means 105.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【0001】[0001]

【発明の属する技術分野】本発明は,マンマシンインタ
フェースに係り,特に,装置にデータを入力する場合の
一手段であり,音声を文字情報などに変換する音声認識
に関するものである。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a man-machine interface, and more particularly to a means for inputting data to a device, and more particularly to a speech recognition for converting speech into character information or the like.

【0002】[0002]

【従来の技術】音声認識を利用した装置においては,装
置が認識誤りをすることは避けられない。そのため,装
置として実装するためには,認識結果の訂正手段を設け
ることが必須である。訂正箇所を指定する方法として,
訂正個所をポインティングデバイス(マウスやタッチパ
ネル)で指定する方法もあるが,本発明では,音声のみ
で行う,あるいは,音声しか使えない状況を想定する。
このような場合,従来では,認識誤りに対して,訂正箇
所が特定できないために,前回と同じ発声内容を繰り返
し言い直すことにより,訂正を行っていた。
2. Description of the Related Art In a device utilizing speech recognition, it is inevitable that the device makes a recognition error. Therefore, in order to implement it as a device, it is essential to provide a means for correcting the recognition result. As a method of specifying the correction part,
Although there is a method of designating a correction part with a pointing device (mouse or touch panel), in the present invention, a situation is assumed in which only a voice is used or only a voice is usable.
In such a case, conventionally, for the recognition error, since the corrected portion cannot be specified, the same utterance content as that of the previous time is repeatedly repeated to perform the correction.

【0003】[0003]

【発明が解決しようとする課題】音声のみで認識誤りを
訂正しようとした場合,発声全体を言い直すのは煩わし
い。特に,桁数の多い連続数字や住所のような長い発声
のうち,誤認識された一部のために全体を言い直すこと
は,非常に使い勝手が悪く,効率も悪い。
When an attempt is made to correct a recognition error using only speech, it is troublesome to restate the entire utterance. In particular, it is extremely inconvenient and inefficient to restate the entire utterance of a long utterance such as a continuous digit having a large number of digits or an address because of a misrecognized part.

【0004】本発明の目的は,音声認識を用いた音声入
力装置において不可避な認識誤りが起きた場合にも,利
用者に比較的少ない労力で,かつ音声のみで効率良く訂
正ができるようなマンマシンインタフェースを備えた音
声入力装置,あるいは,該音声入力装置を含む各種情報
処理装置を提供することにある。
[0004] An object of the present invention is to provide a user who can efficiently correct a speech input device using speech recognition even if an unavoidable recognition error occurs with a relatively small amount of labor and using only speech. It is an object of the present invention to provide a voice input device having a machine interface or various information processing devices including the voice input device.

【0005】[0005]

【課題を解決するための手段】本発明では,訂正を意図
する入力も可能とする認識対象辞書を生成する認識対象
生成手段と,前の認識結果を保持し,訂正の意図が検出
された場合には,保持していた認識結果を訂正して出力
する結果保持手段とを備えることにより,前記課題を解
決する。
According to the present invention, there is provided a recognition target generating means for generating a recognition target dictionary which also enables an input intended for correction, and a method for holding a previous recognition result and detecting a correction intention. The above problem is solved by providing a result holding unit for correcting and outputting the held recognition result.

【0006】本発明の音声入力装置では、音声を入力す
る音声入力手段と,入力した音声を指定された認識対象
辞書の範囲内で認識する音声認識手段と,該音声認識手
段の認識結果を保持する結果保持手段と,認識結果を出
力する認識結果出力手段と,予め指定されている基本認
識対象辞書などを元に前記音声認識手段が用いる認識対
象辞書を生成する認識対象生成手段とからなり,前記認
識対象生成手段は,結果保持手段から,訂正対象となり
得る前発声の認識結果を受け取った場合には,前記基本
認識対象辞書および前記認識結果から訂正対象辞書を生
成し,該訂正対象辞書と,予め指定されている訂正用語
辞書と,前記基本認識対象辞書を組み合わせて新たな認
識対象辞書を作成し,前記結果保持手段は,前記認識結
果に訂正用語を含んでいるかどうか判定し,前記認識結
果に訂正用語を含んでいる場合には,該訂正用語の意図
に対応した訂正を保持内容に対して施すと共に,訂正さ
れた結果を認識結果出力手段および認識対象生成手段に
対して出力し,前記認識結果に訂正用語を含んでいない
場合には,そのままの結果を認識結果出力手段および認
識対象生成手段に対して出力する。
In the voice input device of the present invention, voice input means for inputting voice, voice recognition means for recognizing the input voice within a specified dictionary to be recognized, and holding the recognition result of the voice recognition means And a recognition result generating means for generating a recognition target dictionary used by the voice recognition means based on a pre-designated basic recognition target dictionary and the like. The recognition target generation means generates a correction target dictionary from the basic recognition target dictionary and the recognition result when receiving a recognition result of a previous utterance that can be a correction target from a result holding means, and A new dictionary to be recognized is created by combining a corrected term dictionary specified in advance and the dictionary for basic recognition, and the result holding means includes the corrected term in the recognition result. If the recognition result includes a corrected term, a correction corresponding to the intention of the corrected term is made to the held content, and the corrected result is output to the recognition result output means and the recognition target. The recognition result is output to the recognition result output means and the recognition target generation means when the correction result does not include the corrected term.

【0007】また、本発明の他の構成の音声入力装置で
は、音声を入力する音声入力手段と,入力した音声を指
定された認識対象辞書の範囲内で認識する音声認識手段
と,該音声認識手段の認識結果を保持する結果保持手段
と,認識結果を出力する認識結果出力手段と,予め指定
されている基本認識対象辞書などを元に前記音声認識手
段が用いる認識対象辞書を生成する認識対象生成手段と
からなり,前記認識対象生成手段は,前記基本認識対象
辞書と,予め指定されている訂正用語辞書を組合せて,
新たに認識対象辞書を作成し,前記結果保持手段は,前
記音声認識手段の認識結果に訂正用語を含んでいる場合
には,該訂正用語の意図に対応した訂正を保持内容に対
して施した結果を出力する。
According to another aspect of the present invention, there is provided a voice input device for inputting voice, a voice recognition device for recognizing the input voice within a specified dictionary to be recognized, and the voice recognition device. A result holding means for holding a recognition result of the means, a recognition result output means for outputting a recognition result, and a recognition target for generating a recognition target dictionary used by the voice recognition means based on a basic recognition target dictionary specified in advance. Generating means, wherein the recognition target generating means combines the basic recognition target dictionary with a pre-specified correction term dictionary,
When a new dictionary to be recognized is created, and the result holding unit includes a correction term in the recognition result of the speech recognition unit, a correction corresponding to the intention of the correction term is applied to the held content. Output the result.

【0008】また、本発明の他の構成の音声入力装置で
は、音声を入力する手段と、入力音声を認識する認識手
段と、上記入力音声の認識結果を出力する出力手段と、
を有し、上記認識手段は、上記認識結果の一部分が誤認
識語の際に、該誤認識語、正しい語、およびそれらの語
のいずれが正しいか誤りかを示す語句とを訂正として上
記入力手段から入力して、該訂正を認識して上記認識結
果の一部分の誤認識語を正しい語に修正した結果を最終
結果として出力する。
According to another aspect of the present invention, there is provided a voice input device comprising: a voice input unit; a recognition unit for recognizing the input voice; an output unit for outputting a recognition result of the input voice;
The recognition means, when a part of the recognition result is a misrecognized word, corrects the misrecognized word, a correct word, and a phrase indicating which of those words is correct or incorrect, and The correction unit recognizes the correction and corrects the erroneously recognized word of a part of the recognition result into a correct word and outputs the result as a final result.

【0009】また、本発明の他の構成の音声入力装置で
は、数字音声を入力する手段と、入力音声を認識する認
識手段と、上記入力音声の認識結果を出力する出力手段
と、を有し、上記認識手段は、上記認識結果の数字が誤
認識語の際に、該誤認識数字と正しい数字の数学演算的
な相違を演算子と数字との組み合わせで示した語句を、
訂正として上記入力手段から入力して、該訂正を認識し
て上記誤認識数字に上記訂正が指示する演算を施して得
られる数字を最終結果として出力する。
A voice input device having another configuration according to the present invention includes a means for inputting a numeric voice, a recognition means for recognizing the input voice, and an output means for outputting a recognition result of the input voice. The recognition means, when the number of the recognition result is a misrecognized word, a phrase indicating a mathematical operation difference between the misrecognized number and a correct number by a combination of an operator and a number,
A correction is input from the input means, the correction is recognized, and a number obtained by performing an operation indicated by the correction on the erroneously recognized numeral is output as a final result.

【0010】[0010]

【発明の実施の形態】本発明の第1の実施例について説
明する。
DESCRIPTION OF THE PREFERRED EMBODIMENTS A first embodiment of the present invention will be described.

【0011】図1に音声入力装置のブロック図を示す。
発声された音声は,音声入力手段(マイクからA/D変
換までに相当)でデジタル信号化され,音声認識手段で
は,予め与えられるか,あるいは,その都度外部から指
定される認識対象辞書の範囲で,デジタル信号化された
音声を認識し,認識結果を出力する。結果保持手段で認
識結果は保持され,認識結果出力手段を介して適宜利用
者に対して,画面や音声により提示される。認識対象生
成手段は,基本認識対象辞書より認識対象辞書を生成す
るが,訂正用語辞書を同時に用いて訂正の意図を含む利
用者の発声を認識可能となるように認識対象辞書を再構
成する。また,認識対象生成手段は,結果保持手段から
認識結果を受け取った場合には,その認識結果をも同時
に用いて,訂正の意図を含む利用者の発声を認識可能と
なるように認識対象辞書を再構成する。
FIG. 1 shows a block diagram of a voice input device.
The uttered voice is converted into a digital signal by voice input means (corresponding to microphone to A / D conversion), and the voice recognition means provides a range of a recognition target dictionary which is given in advance or specified externally each time. Then, the digitalized speech is recognized, and the recognition result is output. The recognition result is held by the result holding means, and is presented to the user through a recognition result output means by a screen or voice as appropriate. The recognition target generation means generates the recognition target dictionary from the basic recognition target dictionary, and reconstructs the recognition target dictionary so that the user's utterance including the intention of the correction can be recognized simultaneously using the correction term dictionary. In addition, when the recognition target generating unit receives the recognition result from the result holding unit, the recognition target generation unit also uses the recognition result at the same time to generate a recognition target dictionary so that the user's utterance including the intention of correction can be recognized. Reconfigure.

【0012】桁無制限の連続数字を認識対象とした場合
を例に説明する。簡単のために,基本認識対象辞書は任
意の連続数字のみとするが,実際の応用システムを想定
した場合には,各種制御コマンドなどが加わることがあ
り,この限りではない。
A case will be described as an example where a continuous number of unlimited digits is to be recognized. For the sake of simplicity, the dictionary for basic recognition is made up of only arbitrary continuous numbers, but if an actual application system is assumed, various control commands may be added, and this is not the case.

【0013】利用者の発声をU,装置の出力をSとした場
合,利用者の入力と装置の出力は例えば, U「010」 (利用者が発声) S「080」 (装置が「1」を「8」に置換誤り) U「8を1」 (利用者が訂正の意図を含んで発声) S「010」 (装置が前の認識結果の「8」を「1」
に訂正) となる。
Assuming that the utterance of the user is U and the output of the device is S, the input of the user and the output of the device are, for example, U “010” (user utters) S “080” (device is “1”) U is replaced with “8”) U “8 is 1” (user utters with intention of correction) S “010” (the device replaces “8” of previous recognition result with “1”)
Corrected).

【0014】2番目の利用者の入力「8を1」を認識す
るために,認識対象生成手段は認識対象辞書を再構成す
る。認識対象辞書は,「基本認識対象辞書」の他に,
「訂正対象辞書」+「訂正用語」+「訂正対象辞書に対
応する基本認識対象辞書」という組み合わせとが並列に
なった辞書となる。この場合,訂正用語は「を」として
いるが,これを限定するのもではなく,「から」「よ
り」などでも良い。訂正対象辞書は,認識結果に対し
て,1桁または複数桁の組合せ全てとする。この例で
は,「080」に対して,「0」「8」「08」「8
0」「080」とする。ただし,「0」のみでは1桁目
の「0」か3桁目の「0」かが特定できないため,
「0」は訂正対象辞書としない方が良い。
In order to recognize the input "8 to 1" of the second user, the recognition target generation means reconfigures the recognition target dictionary. The dictionary to be recognized is, in addition to the “basic dictionary to be recognized”,
The combination of “correction target dictionary” + “correction term” + “basic recognition target dictionary corresponding to the correction target dictionary” becomes a dictionary in parallel. In this case, the correction term is "", but is not limited thereto, and may be "kara" or "yo". The correction target dictionary is a combination of one or more digits for the recognition result. In this example, “0”, “8”, “08”, “8”
0 "and" 080 ". However, since it is not possible to specify the first digit “0” or the third digit “0” only with “0”,
It is better not to set "0" as a correction target dictionary.

【0015】また,「訂正対象辞書に対応する基本認識
対象辞書」としては,任意の連続数字から訂正対象を除
くようにすることにより,より確実に訂正することが可
能となる。例えば,「8を1」や「8を88」,「8を
11」等は認識対象に含めても,「8を8」は認識対象
にならないようにすると良い。
The "basic recognition target dictionary corresponding to the correction target dictionary" can be corrected more reliably by removing the correction target from arbitrary continuous numbers. For example, “8 to 1”, “8 to 88”, “8 to 11”, etc. may be included in the recognition target, but “8 to 8” may not be recognized.

【0016】先の例は,連続数字を認識対象とした場合
の置換誤り(ある数字が他の数字へ誤認識)の例である
が,湧き出し誤り(実際は発声していない数字が認識結
果に含まれる誤り)や脱落誤り(実際に発声した数字が
認識結果から欠けてしまう誤り)の場合も同様な訂正が
可能である。例えば,湧き出し誤りの例として, U「4100」 S「42100」 (装置が「2」を湧き出し誤り) U「421を41」 (訂正の意図を含んだ発声) S「4100」 (装置が前の認識結果を訂正) 脱落誤りの例として, U「0123」 S「023」 (装置が「1」を脱落誤り) U「02を012」 (訂正の意図を含んだ発声) S「0123」 (装置が前の認識結果を訂正) となる。
The previous example is an example of a replacement error (a certain number is erroneously recognized as another number) when a continuous number is to be recognized. However, a source error (a number that is not actually uttered) is included in the recognition result. The same correction is possible in the case of an included error or an omission error (an error in which an actually uttered number is missing from the recognition result). For example, as an example of a source error, U “4100” S “42100” (the device sourced “2” error) U “421” 41 (speech including the intention of correction) S “4100” (device is U “0123” S “023” (device loses “1”) U “012” (speech with intention of correction) S “0123” (The device corrects the previous recognition result.)

【0017】ところで,ここでの例は簡単のために3〜
4桁の連続数字と少ない桁数であるが,桁数が多くなっ
たときに,全てを言い直すことと比較して,利用者の負
担が少ないことは明らかである。また,同じ方法により
繰り返し訂正をすることもできる。
By the way, the example here is 3 to 3 for simplicity.
It is clear that the number of digits is four consecutive digits and the number of digits is small. However, when the number of digits increases, it is clear that the burden on the user is smaller than restatement of all. Further, the correction can be repeatedly performed by the same method.

【0018】さらに,本発明は,認識対象を住所のよう
な地名とした場合にも使うことができる。例えば, U「東京都国分寺市東恋ヶ窪」 S「東京都国分寺市西恋ヶ窪」 (装置が「東恋ヶ窪」
を「西恋ヶ窪」と置換誤り) U「東恋ヶ窪を西恋ヶ窪」 (訂正の意図を含んだ
発声) S「東京都国分寺市東恋ヶ窪」 (装置が前の認識結果
を訂正) のようにして訂正できる。この場合,基本認識対象辞書
は例えば日本全国の住所であり,訂正用語は「を」であ
る。また,訂正対象辞書は,基本認識対象辞書および認
識結果から求められ,例えば,「東京都」「国分寺市」
「西恋ヶ窪」の3単語とすれば良い。また,実在の住所
としての制約(国分寺市は東京都にしかない,等)を考
慮して,誤認識される単位を求めても良く,そうした場
合訂正対象辞書は,「東京都国分寺市西恋ヶ窪」「国分
寺市西恋ヶ窪」「西恋ヶ窪」「西」となる。
Further, the present invention can also be used when the recognition target is a place name such as an address. For example, U “Higashi Koigakubo, Kokubunji, Tokyo” S “Nishi Koigabo, Kokubunji, Tokyo”
Is replaced with "Nishi-Koigabo") U "Higashi-Koigabo in Nishi-Koigabo" (Speech with intention of correction) S "Higashi-Koigabo in Kokubunji, Tokyo" (The device corrects the previous recognition result) . In this case, the dictionary for basic recognition is, for example, an address in the whole of Japan, and the corrected term is "". The correction target dictionary is obtained from the basic recognition target dictionary and the recognition result. For example, “Tokyo”, “Kokubunji”
What should be three words of "Nishi Koigabo". In addition, the unit that may be misrecognized may be calculated in consideration of the restriction as an actual address (Kokubunji city is only in Tokyo). "Kokubunji City Nishi Koigabo""NishiKoigabo""west".

【0019】本発明の第2の実施例について説明する。
図2に音声入力装置のブロック図を示す。図1のブロッ
ク図と異なるのは,認識対象生成手段が認識結果を使わ
ない点である。
Next, a second embodiment of the present invention will be described.
FIG. 2 shows a block diagram of the voice input device. The difference from the block diagram of FIG. 1 is that the recognition target generation means does not use the recognition result.

【0020】訂正用語としては,「OK」等を用いる。As the correction term, "OK" or the like is used.

【0021】例えば, U「010」 S「080」 (装置が1を8に置換誤り) U「OK,1」 (1桁目は訂正しない,2桁目を1に) S「010」 (装置が前の認識結果を訂正) となる。「OK」の個数により,訂正箇所を特定する。For example, U "010" S "080" (device replaces 1 with 8) U "OK, 1" (1st digit is not corrected, 2nd digit is 1) S "010" (device Corrects the previous recognition result). The correction part is specified by the number of "OK".

【0022】認識対象が地名の場合でも,同様に適用可
能である。例えば, U「東京都国分寺市東恋ヶ窪」 S「東京都国分寺市西恋ヶ窪」 (装置が「東恋ヶ窪」
を「西恋ヶ窪」と置換誤り) U「OK,OK,西恋ヶ窪」 (訂正の意図を含んだ
発声) S「東京都国分寺市東恋ヶ窪」 (装置が前の認識結果
を訂正) となる。
The same applies to the case where the recognition target is a place name. For example, U “Higashi Koigakubo, Kokubunji, Tokyo” S “Nishi Koigabo, Kokubunji, Tokyo”
Is replaced with "Nishi-Koigabo") U "OK, OK, Nishi-Koigabo" (speech including intention of correction) S "Higashi-Koigabo, Kokubunji-shi, Tokyo" (The device corrects the previous recognition result).

【0023】次に本発明の第3の実施例について説明す
る。第3の実施例は,認識対象を数字に限った場合の例
であるが,音声入力装置のブロック図としては図2と同
じである。訂正用語としては,「足す」「引く」「プラ
ス」「マイナス」等を用いる。
Next, a third embodiment of the present invention will be described. Although the third embodiment is an example in which the recognition target is limited to numerals, the block diagram of the voice input device is the same as FIG. As the correction term, “add”, “subtract”, “plus”, “minus” and the like are used.

【0024】例えば,「足す」を用いると, U「67890」 S「67190」 (装置が8を1に置
換誤り) U「足す700(ななひゃく)」 (訂正の意図を含む
発声) S「67890」 (装置が前の認識結
果を訂正) 「引く」を用いると, U「010」 S「080」 (装置が1を8に置換
誤り) U「引く70(ななじゅー)」 (訂正の意図を含む発
声) S「010」 (装置が前の認識結果
を訂正) となる。ここで重要なのは,利用者の訂正の意図を含む
発声は,「ひゃく」や「じゅー」等の位を含めた発声とな
っており,位の情報で訂正位置を指定できる点である。
For example, if "add" is used, U "67890" S "67190" (the device replaces 8 with 1) U "add 700 (Nanahyaku)" (speech including intention of correction) S " 67890 "(the device corrects the previous recognition result) If" pull "is used, U" 010 "S" 080 "(the device replaces 1 with 8) U" pull 70 (Nanaji-oo) "(correction) S "010" (the device corrects the previous recognition result). What is important here is that the utterance including the intention of the user to correct the utterance includes utterances such as “Hyaku” and “Ju”, and the correction position can be specified by the information of the order.

【0025】以上の実施例から、本発明では、訂正を入
力するユーザは訂正する語句のみを再入力するのではな
く、必ずそのほかに補助となる語(訂正用語)を付加す
る点である。つまり、訂正用語により訂正の方法を示
す。訂正対象と正しい内容を同時に発声し、いずれが訂
正対象か正しい内容かを示す「を」「から」「より」な
どの方向性を示す語を用いたり、数字の場合は桁毎に
「OK」という訂正用語を対応させ、訂正用語を桁の移動
の合図とする。もちろん、訂正用語は「OK」に限らな
い。又、数字認識の場合はには認識と演算を組み合わせ
ることができる。訂正用語に演算子を用い、訂正対象で
ある誤認識した数に入力する数字に付された演算子に従
い、誤認識語に数字を演算してその結果を正しい認識結
果とする。
From the above embodiments, the present invention is characterized in that a user who inputs a correction does not re-input only a phrase to be corrected, but always adds an auxiliary word (correction term). That is, a correction method is indicated by a correction term. Either utter the correct content and the correct content at the same time, and use a word indicating the direction, such as "o", "kara", or "yo", which indicates which is the correct content or the correct content. And the correction term is used as a signal of digit shift. Of course, the correction term is not limited to “OK”. In the case of numeral recognition, recognition and calculation can be combined. An operator is used as a correction term, and a number is calculated on the misrecognized word according to the operator attached to the number to be input to the misrecognized number to be corrected, and the result is regarded as a correct recognition result.

【0026】本発明は,音声認識のアルゴリズムを特に
限定する発明ではないが,発明を実施するにあたって
は,例えばHMM(Hidden Marko Model)を用いることが
可能である。なお,HMMによる音声認識の詳細は,“中
川聖一:確率モデルによる音声認識,電子情報通信学
会,1988”などにあり,ここでは省略する。
Although the present invention does not particularly limit the speech recognition algorithm, it is possible to use, for example, an HMM (Hidden Marko Model) in carrying out the invention. The details of the speech recognition by the HMM are described in "Seiichi Nakagawa: Speech Recognition by Stochastic Model, IEICE, 1988" and the like, and are omitted here.

【0027】[0027]

【発明の効果】本発明によれば,特に長い発声の認識結
果に対する訂正が音声だけで簡単に行え,効率的に音声
入力できるようになる。
According to the present invention, especially the recognition result of a long utterance can be easily corrected by using only the voice, and the voice can be input efficiently.

【0028】また、訂正を部分的な入力としたり、訂正
用語を用いて前の入力とは異なる入力方法を用いるた
め、多様な訂正を実現し、利用者が入力しやすい音声入
力装置を提供できる。また、入力が簡単になるだけでな
く最終的な認識結果の向上を実現する音声入力装置を提
供することができる。
Further, since the correction is made a partial input or an input method different from the previous input is used by using the correction term, various corrections can be realized and a voice input device which is easy for the user to input can be provided. . In addition, it is possible to provide a voice input device that not only simplifies the input but also improves the final recognition result.

【図面の簡単な説明】[Brief description of the drawings]

【図1】本発明による音声入力装置の一実施例を示すブ
ロック図である。
FIG. 1 is a block diagram showing one embodiment of a voice input device according to the present invention.

【図2】本発明による音声入力装置の他の実施例を示す
ブロック図である。
FIG. 2 is a block diagram showing another embodiment of the voice input device according to the present invention.

【符号の説明】[Explanation of symbols]

101…音声入力手段,102…音声認識手段,103…結果出
力手段,104…認識結果出力手段,105…認識対象生成手
段,106…基本認識対象辞書,107…訂正用語辞書
101: voice input means, 102: voice recognition means, 103: result output means, 104: recognition result output means, 105: recognition target generation means, 106: basic recognition target dictionary, 107: correction term dictionary

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.7 識別記号 FI テーマコート゛(参考) G10L 3/00 551P 561E ──────────────────────────────────────────────────続 き Continued on the front page (51) Int.Cl. 7 Identification symbol FI Theme coat ゛ (Reference) G10L 3/00 551P 561E

Claims (6)

【特許請求の範囲】[Claims] 【請求項1】音声を入力する音声入力手段と,入力した
音声を指定された認識対象辞書の範囲内で認識する音声
認識手段と,該音声認識手段の認識結果を保持する結果
保持手段と,認識結果を出力する認識結果出力手段と,
予め指定されている基本認識対象辞書などを元に前記音
声認識手段が用いる認識対象辞書を生成する認識対象生
成手段とからなり,前記認識対象生成手段は,結果保持
手段から,訂正対象となり得る前発声の認識結果を受け
取った場合には,前記基本認識対象辞書および前記認識
結果から訂正対象辞書を生成し,該訂正対象辞書と,予
め指定されている訂正用語辞書と,前記基本認識対象辞
書を組み合わせて新たな認識対象辞書を作成し,前記結
果保持手段は,前記認識結果に訂正用語を含んでいるか
どうか判定し,前記認識結果に訂正用語を含んでいる場
合には,該訂正用語の意図に対応した訂正を保持内容に
対して施すと共に,訂正された結果を認識結果出力手段
および認識対象生成手段に対して出力し,前記認識結果
に訂正用語を含んでいない場合には,そのままの結果を
認識結果出力手段および認識対象生成手段に対して出力
する,ことを特徴とする音声入力装置。
A voice input means for inputting voice, a voice recognition means for recognizing the input voice within a specified dictionary to be recognized, a result holding means for holding a recognition result of the voice recognition means, A recognition result output means for outputting a recognition result;
A recognition target generating means for generating a recognition target dictionary used by the voice recognition means based on a basic recognition target dictionary specified in advance; When the utterance recognition result is received, a correction target dictionary is generated from the basic recognition target dictionary and the recognition result, and the correction target dictionary, a correction term dictionary specified in advance, and the basic recognition target dictionary are stored. A new recognition target dictionary is created in combination with the result dictionary, and the result holding unit determines whether the recognition result includes a corrected term. If the recognition result includes the corrected term, the intention of the corrected term is determined. Is performed on the held contents, and the corrected result is output to the recognition result output means and the recognition target generation means, and the recognition result includes the corrected term. When no outputs as results for the recognition result output means and the recognition object generation means, voice input device, characterized in that.
【請求項2】音声を入力する音声入力手段と,入力した
音声を指定された認識対象辞書の範囲内で認識する音声
認識手段と,該音声認識手段の認識結果を保持する結果
保持手段と,認識結果を出力する認識結果出力手段と,
予め指定されている基本認識対象辞書などを元に前記音
声認識手段が用いる認識対象辞書を生成する認識対象生
成手段とからなり,前記認識対象生成手段は,前記基本
認識対象辞書と,予め指定されている訂正用語辞書を組
合せて,新たに認識対象辞書を作成し,前記結果保持手
段は,前記音声認識手段の認識結果に訂正用語を含んで
いる場合には,該訂正用語の意図に対応した訂正を保持
内容に対して施した結果を出力する,ことを特徴とする
音声入力装置。
2. A voice input means for inputting voice, a voice recognition means for recognizing the input voice within a specified dictionary to be recognized, a result holding means for holding a recognition result of the voice recognition means, A recognition result output means for outputting a recognition result;
A recognition target generating means for generating a recognition target dictionary used by the voice recognition means based on a pre-specified basic recognition target dictionary and the like; A new dictionary to be recognized is created by combining the corrected term dictionaries, and the result holding unit, when the recognition result of the speech recognition unit includes the corrected term, corresponds to the intention of the corrected term. A speech input device for outputting a result obtained by applying a correction to a held content.
【請求項3】請求項2記載の音声入力装置において,前
記基本認識対象辞書は,数字の位を含まない棒読みを認
識するための辞書,および,数字の位を含む桁読みを認
識するための辞書を含み,前記結果保持手段は,前記音
声認識手段の認識結果に訂正用語を含み,かつ,数字の
桁読みを含んでいる場合には,該桁読みの桁情報で修正
位置を特定した上で,前記訂正用語の意図に対応した訂
正を保持内容に対して施し,訂正した結果を出力する,
ことを特徴とする音声入力装置。
3. The voice input device according to claim 2, wherein the basic recognition target dictionary is a dictionary for recognizing a stick reading not including a digit position and a dictionary for recognizing a digit reading including a digit position. When the result of the speech recognition includes a corrected term and includes a digit reading of a number, the result holding unit specifies a correction position based on the digit information of the digit reading. Apply a correction corresponding to the intention of the correction term to the held content, and output the corrected result;
A voice input device characterized by the above.
【請求項4】請求項1,2,あるいは3記載の音声入力
装置を備えることを特徴とする電話機,カーナビゲーシ
ョンシステム,個人情報端末,ゲーム機,あるいは,各
種パーソナルコンピュータ。
4. A telephone, a car navigation system, a personal information terminal, a game machine, or various personal computers comprising the voice input device according to claim 1.
【請求項5】音声を入力する手段と、 入力音声を認識する認識手段と、 上記入力音声の認識結果を出力する出力手段と、 を有し、 上記認識手段は、上記認識結果の一部分が誤認識語の際
に、該誤認識語、正しい語、およびそれらの語のいずれ
が正しいか誤りかを示す語句とを訂正として上記入力手
段から入力して、該訂正を認識して上記認識結果の一部
分の誤認識語を正しい語に修正した結果を最終結果とし
て出力することを特徴とする音声入力装置。
5. An apparatus for inputting a voice, a recognizing means for recognizing the input voice, and an output means for outputting a recognition result of the input voice, wherein the recognizing means detects a part of the recognition result as an error In the case of a recognition word, the erroneous recognition word, the correct word, and a phrase indicating which of these words is correct or incorrect are input from the input means as a correction, and the correction is recognized and the recognition result is obtained. A voice input device for outputting a result obtained by correcting a part of misrecognized words to a correct word as a final result.
【請求項6】数字音声を入力する手段と、 入力音声を認識する認識手段と、 上記入力音声の認識結果を出力する出力手段と、 を有し、 上記認識手段は、上記認識結果の数字が誤認識語の際
に、該誤認識数字と正しい数字の数学演算的な相違を演
算子と数字との組み合わせで示した語句を、訂正として
上記入力手段から入力して、該訂正を認識して上記誤認
識数字に上記訂正が指示する演算を施して得られる数字
を最終結果として出力することを特徴とする音声入力装
置。
6. A means for inputting a numerical voice, a recognizing means for recognizing an input voice, and an output means for outputting a recognition result of the input voice, wherein the recognizing means is configured such that the number of the recognition result is In the case of a misrecognized word, a word indicating a mathematical operation difference between the misrecognized number and the correct number by a combination of an operator and a number is input as a correction from the input means, and the correction is recognized. A voice input device for outputting a number obtained by performing an operation instructed by the correction on the misrecognized number as a final result.
JP10216299A 1999-04-09 1999-04-09 Voice input device and voice input method Expired - Fee Related JP3815110B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP10216299A JP3815110B2 (en) 1999-04-09 1999-04-09 Voice input device and voice input method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP10216299A JP3815110B2 (en) 1999-04-09 1999-04-09 Voice input device and voice input method

Publications (3)

Publication Number Publication Date
JP2000293195A true JP2000293195A (en) 2000-10-20
JP2000293195A5 JP2000293195A5 (en) 2004-08-26
JP3815110B2 JP3815110B2 (en) 2006-08-30

Family

ID=14320031

Family Applications (1)

Application Number Title Priority Date Filing Date
JP10216299A Expired - Fee Related JP3815110B2 (en) 1999-04-09 1999-04-09 Voice input device and voice input method

Country Status (1)

Country Link
JP (1) JP3815110B2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002287792A (en) * 2001-03-27 2002-10-04 Denso Corp Voice recognition device
JP2004508594A (en) * 2000-09-08 2004-03-18 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Speech recognition method with replacement command
KR101095864B1 (en) 2008-12-02 2011-12-21 한국전자통신연구원 Apparatus and method for generating N-best hypothesis based on confusion matrix and confidence measure in speech recognition of connected Digits
US8145487B2 (en) 2007-02-16 2012-03-27 Denso Corporation Voice recognition apparatus and navigation apparatus
JP2012230670A (en) * 2011-04-25 2012-11-22 Honda Motor Co Ltd System, method, and computer program for correcting incorrect recognition by return

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004508594A (en) * 2000-09-08 2004-03-18 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Speech recognition method with replacement command
JP2002287792A (en) * 2001-03-27 2002-10-04 Denso Corp Voice recognition device
JP4604377B2 (en) * 2001-03-27 2011-01-05 株式会社デンソー Voice recognition device
US8145487B2 (en) 2007-02-16 2012-03-27 Denso Corporation Voice recognition apparatus and navigation apparatus
KR101095864B1 (en) 2008-12-02 2011-12-21 한국전자통신연구원 Apparatus and method for generating N-best hypothesis based on confusion matrix and confidence measure in speech recognition of connected Digits
JP2012230670A (en) * 2011-04-25 2012-11-22 Honda Motor Co Ltd System, method, and computer program for correcting incorrect recognition by return

Also Published As

Publication number Publication date
JP3815110B2 (en) 2006-08-30

Similar Documents

Publication Publication Date Title
KR100453021B1 (en) Oral Text Recognition Method and System
EP2466450B1 (en) method and device for the correction of speech recognition errors
JP4942860B2 (en) Recognition dictionary creation device, speech recognition device, and speech synthesis device
US6876967B2 (en) Speech complementing apparatus, method and recording medium
US20090228273A1 (en) Handwriting-based user interface for correction of speech recognition errors
US20020123894A1 (en) Processing speech recognition errors in an embedded speech recognition system
WO2003025904A1 (en) Correcting a text recognized by speech recognition through comparison of phonetic sequences in the recognized text with a phonetic transcription of a manually input correction word
JPWO2006097975A1 (en) Speech recognition program
JP2001312296A (en) System and method for voice recognition and computer- readable recording medium
JP2003022089A (en) Voice spelling of audio-dedicated interface
JP3104661B2 (en) Japanese writing system
JP2000293195A (en) Voice inputting device
JP3718088B2 (en) Speech recognition correction method
JP2004251998A (en) Conversation understanding device
JP4736423B2 (en) Speech recognition apparatus and speech recognition method
JP2003162524A (en) Language processor
US20080256071A1 (en) Method And System For Selection Of Text For Editing
JP2000293195A5 (en)
JPH064264A (en) Voice input/output system
JP2002535728A (en) Speech recognition device including sub-word memory
JP2000056796A (en) Speech input device and method therefor
JP2995941B2 (en) Speech recognition device for unspecified speakers
JP2000181487A (en) Voice recognition device
JPH08110790A (en) Sound recognizing device
JP2001067096A (en) Voice recognition result evaluating device and record medium

Legal Events

Date Code Title Description
A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20060220

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20060307

RD01 Notification of change of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7421

Effective date: 20060417

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20060425

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20060516

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20060529

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090616

Year of fee payment: 3

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100616

Year of fee payment: 4

LAPS Cancellation because of no payment of annual fees