JP2005283797A

JP2005283797A - Device and method for speech recognition

Info

Publication number: JP2005283797A
Application number: JP2004095461A
Authority: JP
Inventors: Hiroshi Saito; 浩斎藤
Original assignee: Nissan Motor Co Ltd
Current assignee: Nissan Motor Co Ltd
Priority date: 2004-03-29
Filing date: 2004-03-29
Publication date: 2005-10-13

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech recognition device capable of correcting contents which were misrecognized in speech recognition processing by simple utterance. <P>SOLUTION: The result of the speech recognition processing is displayed on a display together with vocabularies 31 to 33 for correction. When the speech recognition result is corrected from "0482XX8888" to "0482XX2888", for example, the vocabulary 31 for correction is used and "2 for the 7th" is uttered to make a correction. Consequently, the micrecognized part can be corrected by specifying the correction position, so the contents which were uttered first need not be all uttered again and corrections can easily be made. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、音声を認識して、文字等の情報に変換する音声認識装置および音声認識方法に関する。 The present invention relates to a speech recognition apparatus and speech recognition method for recognizing speech and converting it into information such as characters.

従来の音声認識装置において、入力された音声が誤って認識された場合に、誤認識された箇所を利用者が読み返すとともに、訂正語を発話することにより、訂正を行うことができる技術が知られている（特許文献１参照）。 In a conventional speech recognition apparatus, when an input speech is erroneously recognized, a technique is known in which a user can read back a misrecognized portion and speak a correction word to make a correction. (See Patent Document 1).

特開２０００−２９３１９５号公報JP 2000-293195 A

しかしながら、従来の音声認識装置では、例えば、「１７１７」と発話したものが「７７７７」と誤認識された場合に、「７７を１７」と訂正するための発話をしても、複数の「７７」が存在するために、この方法による訂正を行うことができず、「７７７７を１７１７」と訂正するように、全ての部分を発話し直さなければならなかった。 However, in the conventional speech recognition apparatus, for example, when an utterance of “1717” is erroneously recognized as “7777”, even if an utterance for correcting “77 to 17” is made, a plurality of “77” ”, The correction by this method could not be performed, and all parts had to be re-spoken to correct“ 7777 to 1717 ”.

本発明による音声認識装置および音声認識方法は、音声認識結果を訂正する音声が入力されると、入力された訂正音声によって指定される訂正位置を少なくとも含む訂正内容を特定し、特定した訂正位置の音声認識結果を訂正することを特徴とする。 The voice recognition device and the voice recognition method according to the present invention specify the correction content including at least the correction position specified by the input correction voice when the voice for correcting the voice recognition result is input, and The speech recognition result is corrected.

本発明による音声認識装置および音声認識方法によれば、音声認識結果のうち、誤認識された部分の位置を指定して訂正することができるので、容易に訂正を行うことができる。 According to the speech recognition apparatus and speech recognition method of the present invention, it is possible to specify and correct the position of the misrecognized portion of the speech recognition result, so that the correction can be easily performed.

−第１の実施の形態−
図１は、本発明による音声認識装置の第１の実施の形態における構成を示す図である。以下では、第１の実施の形態における音声認識装置を、車両に搭載されて、発話によって電話番号を音声入力して、発呼することができる車載ハンズフリー電話システムに適用した例について説明する。第１の実施の形態における音声認識装置は、音声認識コントローラ１と、フォンコントローラ２と、マイク３と、音声入力操作スイッチ４と、スピーカ５と、ディスプレイ６と、携帯電話１０とを備える。 -First embodiment-
FIG. 1 is a diagram showing a configuration of a voice recognition apparatus according to a first embodiment of the present invention. Hereinafter, an example will be described in which the speech recognition apparatus according to the first embodiment is applied to an in-vehicle hands-free telephone system that is mounted on a vehicle and can make a call by inputting a telephone number by speech. The speech recognition apparatus in the first embodiment includes a speech recognition controller 1, a phone controller 2, a microphone 3, a speech input operation switch 4, a speaker 5, a display 6, and a mobile phone 10.

マイク３は、利用者が発話した音声を集音して、音声認識コントローラ１に出力する。音声入力操作スイッチ４は、例えば、車両のステアリングに設置されて、運転者が音声入力を行う際に操作される。音声認識コントローラ１は、ＣＰＵ１ａ、ＲＯＭ１ｂおよびＲＡＭ１ｃを備える。音声入力操作スイッチ４が操作されると、ＣＰＵ１ａは、ＲＯＭ１ｂから音声待ち受け状態を示すアイコンを読み出して、ディスプレイ６に表示させるとともに、発話開始の合図を示すビープ音をスピーカ５から出力させる。 The microphone 3 collects the voice spoken by the user and outputs it to the voice recognition controller 1. The voice input operation switch 4 is installed, for example, in the steering of the vehicle, and is operated when the driver performs voice input. The voice recognition controller 1 includes a CPU 1a, a ROM 1b, and a RAM 1c. When the voice input operation switch 4 is operated, the CPU 1a reads an icon indicating a voice standby state from the ROM 1b, displays the icon on the display 6, and outputs a beep sound indicating a utterance start signal from the speaker 5.

利用者は、スピーカ５から出力されるビープ音を確認すると、マイク３に向かって、ハンズフリー電話のダイヤル入力を開始するためのコマンドである「ダイヤル入力」と発話する。このダイヤル入力を開始するコマンドである「ダイヤル入力」に対応する語彙は、予めＲＯＭ１ｂに登録しておく。 When the user confirms the beep sound output from the speaker 5, the user speaks “dial input”, which is a command for starting dial input of the hands-free telephone, toward the microphone 3. The vocabulary corresponding to “dial input”, which is a command for starting the dial input, is registered in the ROM 1b in advance.

音声認識コントローラ１のＣＰＵ１ａは、マイク３を介して入力される音声と、ＲＯＭ１ｂに格納されている音声認識辞書の中の語彙とを比較して、入力された語彙を特定する。このような音声認識処理は、既知の方法を用いることができるため、音声認識処理についての詳細な説明は割愛する。 The CPU 1a of the speech recognition controller 1 compares the speech input via the microphone 3 with the vocabulary in the speech recognition dictionary stored in the ROM 1b, and specifies the input vocabulary. Since such a voice recognition process can use a known method, a detailed description of the voice recognition process is omitted.

ＣＰＵ１ａにより、「ダイヤル入力」という音声が認識されると、認識結果である「ダイヤル入力」の文字をディスプレイ６に表示させるとともに、スピーカ５から、「ダイヤル入力を行います。番号をどうぞ」という音声を出力する。なお、ディスプレイ６に表示させる文字は、ＲＯＭ１ｂに予め登録しておき、ＣＰＵ１ａがＲＯＭ１ｂから読み出して、ディスプレイ６に表示させることができる。また、スピーカ５から出力する音声は、例えば、ＴＴＳ（Text-To-Speech）機能を利用することができる。この場合も、スピーカ５から出力するメッセージは、ＲＯＭ１ｂに格納しておけばよい。 When the CPU 1a recognizes the voice “dial input”, the character “dial input”, which is the recognition result, is displayed on the display 6 and from the speaker 5, the voice “dial input. Is output. The characters to be displayed on the display 6 can be registered in the ROM 1b in advance, and the CPU 1a can read out from the ROM 1b and display them on the display 6. The sound output from the speaker 5 can use, for example, a TTS (Text-To-Speech) function. Also in this case, the message output from the speaker 5 may be stored in the ROM 1b.

「ダイヤル入力を行います。番号をどうぞ」という音声がスピーカ５から出力されると、利用者は、携帯電話１０から発信したい電話番号を発話する。ここでは、「０４８２ＸＸ２８８８」と発話したとする。利用者により発話された番号は、マイク３を介して、音声認識コントローラ１に入力される。電話番号の音声入力情報は、数字に限定されるため、音声認識コントローラ１のＣＰＵ１ａは、ＲＯＭ１ｂに格納されている数字認識辞書に基づいて、入力された数字を認識する。 When the voice “Dial input. Please give me the number” is output from the speaker 5, the user speaks the telephone number that the mobile phone 10 wants to call. Here, it is assumed that “0482XX2888” is spoken. The number spoken by the user is input to the speech recognition controller 1 via the microphone 3. Since the voice input information of the telephone number is limited to numbers, the CPU 1a of the voice recognition controller 1 recognizes the input numbers based on the number recognition dictionary stored in the ROM 1b.

ここで、数字の認識処理の結果が「０４８２ＸＸ８８８８」であったとする。ＣＰＵ１ａは、認識処理の結果である「０４８２ＸＸ８８８８」をディスプレイ６に表示するとともに、認識した数字の桁数をカウントし、電話番号の全桁が入力されていると判断すると、「０４８２ＸＸ８８８８でよろしいですか」等の認識結果を確認するメッセージをスピーカ５から音声にて出力する。 Here, it is assumed that the number recognition result is “0482XX8888”. When the CPU 1a displays “0482XX8888” as a result of the recognition process on the display 6 and counts the number of digits of the recognized number and determines that all digits of the telephone number are input, “0482XX8888 is OK? A message for confirming the recognition result such as “” is output from the speaker 5 by voice.

図２は、ディスプレイ６に表示される、認識した電話番号を確認するメッセージを示す図である。ディスプレイ６には、図２に示すように、認識された電話番号２０とともに、発話可能であることを示すアイコン２５、および、認識結果を確認するメッセージに対する応答例３０〜３３を表示している。利用者は、アイコン２５が表示されている間に発話することができる。 FIG. 2 is a diagram showing a message displayed on the display 6 for confirming the recognized telephone number. As shown in FIG. 2, the display 6 displays the recognized telephone number 20, an icon 25 indicating that speech is possible, and response examples 30 to 33 for messages confirming the recognition result. The user can speak while the icon 25 is displayed.

応答例３０は、音声認識された番号が正しい場合を示している。応答例３０、すなわち、「はい」と利用者が発話して、音声認識コントローラ１で正しく認識されると、認識された番号に発呼するように、ＣＰＵ１ａからフォンコントローラ２に対して、発呼指示が出される。フォンコントローラ２は、ＣＰＵ２ａ、ＲＯＭ２ｂおよびＲＡＭ２ｃを備えている。フォンコントローラ２のＣＰＵ２ａは、発呼指示を受けると、指示された番号に発信するように、携帯電話１０に指令を出す。これにより、ハンズフリーにて、電話をかけることができる。 Response example 30 shows a case where the number recognized by voice recognition is correct. Response example 30, that is, when the user speaks “Yes” and the voice recognition controller 1 correctly recognizes the call, the CPU 1a calls the phone controller 2 so as to call the recognized number. Instructions are given. The phone controller 2 includes a CPU 2a, a ROM 2b, and a RAM 2c. When receiving the call instruction, the CPU 2a of the phone controller 2 issues a command to the mobile phone 10 so as to make a call to the specified number. Thereby, it is possible to make a call hands-free.

応答例３１〜３３は、音声認識された番号が間違っている場合を示すものである。応答例３１は、間違っている箇所を番号で指定するとともに、正しい数字を入力する場合を示している。上述した例では、「０４８２ＸＸ２８８８」と発話したのに対して、「０４８２ＸＸ８８８８」と認識されたため、先頭から数えて７番目（７桁目）の数字が誤って認識されている。従って、「７番目を２」と発話することにより、７番目の数字を「２」に訂正することができる。 The response examples 31 to 33 show cases where the number recognized by voice is wrong. The response example 31 shows a case where a wrong place is designated by a number and a correct number is input. In the above-described example, although “0482XX2888” is spoken, it is recognized as “0482XX8888”, so the seventh (seventh digit) number from the head is erroneously recognized. Therefore, by speaking “7th to 2”, the 7th number can be corrected to “2”.

応答例３２は、間違っている箇所の番号、および、誤認識されている数字を指定するとともに、正しい数字を入力する場合を示している。上述した例では、「７番目の８を２」と発話することにより、７番目の数字である「８」を「２」に訂正することができる。応答例３３は、誤認識された数字を指定して正しい数字を入力する場合を示している。例えば、「８８８８を２８８８」と発話することにより、誤認識された箇所を訂正することができる。ただし、認識された数字の中に「８」の数字が複数あるため、「８を２」と発話して訂正することはできない。 The response example 32 indicates a case where a wrong part number and a wrongly recognized number are designated and a correct number is input. In the example described above, the seventh number “8” can be corrected to “2” by saying “7th 8 is 2”. The response example 33 indicates a case where a wrong number is designated and a correct number is input. For example, by speaking “8888 to 2888”, a misrecognized portion can be corrected. However, since there are a plurality of numbers “8” among the recognized numbers, “8 is 2” cannot be corrected by speaking.

なお、応答例３０〜３３に対応する語彙は、ＲＯＭ１ｂに予め登録しておく。ＣＰＵ１ａは、入力された音声と、ＲＯＭ１ｂに登録されている応答例３０〜３３に対応する語彙とを比較することにより、認識結果を肯定する応答例３０に対応する語彙が入力されたか、訂正用の応答例３１〜３３に対応する語彙が入力されたかを判定する。 The vocabulary corresponding to the response examples 30 to 33 is registered in advance in the ROM 1b. The CPU 1a compares the input voice with the vocabulary corresponding to the response examples 30 to 33 registered in the ROM 1b to determine whether the vocabulary corresponding to the response example 30 that affirms the recognition result has been input or for correction. It is determined whether the vocabulary corresponding to the response examples 31 to 33 is input.

図３は、第１の実施の形態における音声認識装置により行われる処理内容を示すフローチャートである。ステップＳ１０から始まる処理は、音声認識コントローラ１のＣＰＵ１ａにより行われる。ステップＳ１０では、音声入力操作スイッチが操作されたか否かを判定する。音声入力操作スイッチ４が押されたことを示す信号が入力されると、ステップＳ２０に進み、音声入力操作スイッチ４が押されていないと判定すると、ステップＳ１０で待機する。 FIG. 3 is a flowchart showing the contents of processing performed by the speech recognition apparatus according to the first embodiment. The process starting from step S10 is performed by the CPU 1a of the speech recognition controller 1. In step S10, it is determined whether or not the voice input operation switch has been operated. If a signal indicating that the voice input operation switch 4 has been pressed is input, the process proceeds to step S20. If it is determined that the voice input operation switch 4 has not been pressed, the process waits in step S10.

ステップＳ２０では、ＲＯＭ１ｂから発話可能アイコン（音声待ち受けアイコン）を読み出して、ディスプレイ６に表示するとともに、スピーカ５からビープ音を出力する。ステップＳ２０に続くステップＳ３０では、利用者による音声入力が行われたか否かを判定する。マイク３を介して、音声入力が行われたと判定するとステップＳ４０に進み、音声入力が行われていないと判定すると、ステップＳ３０で待機する。 In step S20, an utterable icon (speech standby icon) is read from the ROM 1b and displayed on the display 6, and a beep sound is output from the speaker 5. In step S30 following step S20, it is determined whether or not voice input by the user has been performed. If it is determined that voice input has been performed via the microphone 3, the process proceeds to step S40. If it is determined that voice input has not been performed, the process waits in step S30.

ステップＳ４０では、音声入力された言葉の音声認識処理を行って、ステップＳ５０に進む。ステップＳ５０では、ステップＳ４０における音声認識処理の結果、音声認識された言葉が「ダイヤル入力」であるか否かを判定する。音声認識された言葉が「ダイヤル入力」であると判定するとステップＳ６０に進み、「ダイヤル入力」以外の別の言葉であると判定すると、ステップＳ１０に戻る。 In step S40, speech recognition processing is performed on the words that are input by speech, and the process proceeds to step S50. In step S50, it is determined whether or not the speech-recognized word is “dial input” as a result of the speech recognition processing in step S40. If it is determined that the speech-recognized word is “dial input”, the process proceeds to step S60. If it is determined that the word is other than “dial input”, the process returns to step S10.

ステップＳ６０では、認識結果である「ダイヤル入力」の文字をディスプレイ６に表示させるとともに、スピーカ５から、「ダイヤル入力を行います。番号をどうぞ」という音声を出力する。ステップＳ６０に続くステップＳ７０では、マイク３を介して、音声入力が行われたか否かを判定する。音声入力が行われたと判定するとステップＳ８０に進み、音声入力が行われていないと判定すると、ステップＳ７０で待機する。 In step S60, the character “dial input”, which is the recognition result, is displayed on the display 6 and the speaker 5 outputs a voice “dial input. In step S70 following step S60, it is determined whether voice input has been performed via the microphone 3. If it is determined that voice input has been performed, the process proceeds to step S80. If it is determined that voice input has not been performed, the process waits in step S70.

ステップＳ８０では、入力された音声に対して音声認識処理を実行して、ステップＳ９０に進む。ステップＳ９０では、ステップＳ８０において音声認識が行われた結果が正しいか否かを利用者に問う音声をスピーカ５から出力する。例えば、音声認識された電話番号が「０４８２ＸＸ８８８８」の場合には、「０４８２ＸＸ８８８８でよろしいですか」等の音声をスピーカ５から出力する。同時に、認識結果である電話番号、および、訂正用の言い回しの語彙をディスプレイ６に表示する（図２参照）。 In step S80, voice recognition processing is executed for the input voice, and the process proceeds to step S90. In step S90, the speaker 5 outputs a voice asking the user whether the result of the voice recognition performed in step S80 is correct. For example, when the phone number recognized as voice is “0482XX8888”, a voice such as “Are you sure it is 0482XX8888” is output from the speaker 5. At the same time, the phone number that is the recognition result and the wording for correction are displayed on the display 6 (see FIG. 2).

ステップＳ１００では、ステップＳ９０で出力された音声認識の結果が正しいかを問う音声に対して、利用者の応答があったか否かを判定する。マイク３を介して、利用者の音声が入力されたと判定するとステップＳ１１０に進み、音声が入力されていないと判定すると、ステップＳ１００で待機する。ステップＳ１１０では、入力された音声に対して、音声認識処理を行い、ステップＳ１２０に進む。 In step S100, it is determined whether or not the user has responded to the voice asking whether the voice recognition result output in step S90 is correct. If it is determined that the user's voice is input via the microphone 3, the process proceeds to step S110. If it is determined that no voice is input, the process waits in step S100. In step S110, voice recognition processing is performed on the input voice, and the process proceeds to step S120.

ステップＳ１２０では、ステップＳ１１０における音声認識処理の結果、入力された語彙が訂正用の語彙であるか否かを判定する。この判定は、上述したように、ＲＯＭ１ｂに登録されている訂正用の応答例３１〜３３に対応する語彙が入力されたか否かに基づいて行う。「７番目を２」のような訂正語彙が入力されたと判定するとステップＳ１３０に進み、入力された語彙は、訂正用の語彙ではないと判定すると、ステップＳ１４０に進む。 In step S120, it is determined whether or not the input vocabulary is a correction vocabulary as a result of the speech recognition process in step S110. As described above, this determination is made based on whether or not the vocabulary corresponding to the correction response examples 31 to 33 registered in the ROM 1b is input. If it is determined that a corrected vocabulary such as “7th is 2” is input, the process proceeds to step S130. If it is determined that the input vocabulary is not a vocabulary for correction, the process proceeds to step S140.

ステップＳ１３０では、入力された訂正用の語彙に基づいて、ステップＳ８０で認識した電話番号の訂正処理を実行する。訂正処理を実行すると、ステップＳ９０に戻り、訂正した電話番号が正しいか否かを利用者に問う音声をスピーカ５を介して出力する。この場合にも、訂正した電話番号、および、訂正用の言い回しの語彙をディスプレイ６に表示する。 In step S130, the telephone number recognized in step S80 is corrected based on the input correction vocabulary. When the correction process is executed, the process returns to step S90, and a voice asking the user whether or not the corrected telephone number is correct is output via the speaker 5. Also in this case, the corrected telephone number and the wording for correction are displayed on the display 6.

ステップＳ１４０では、入力された語彙が認識結果を肯定する応答例３０に対応する語彙であるか否かを判定する。認識結果を肯定する語彙、すなわち、「はい」の語彙が入力されたと判定すると、ステップＳ１５０に進む。一方、入力された語彙が認識結果を肯定する語彙ではないと判定すると、ステップＳ１００に戻る。 In step S140, it is determined whether or not the input vocabulary is a vocabulary corresponding to the response example 30 that affirms the recognition result. If it is determined that the vocabulary that affirms the recognition result, that is, the vocabulary “Yes” is input, the process proceeds to step S150. On the other hand, if it is determined that the input vocabulary is not a vocabulary that affirms the recognition result, the process returns to step S100.

ステップＳ１５０では、認識結果を肯定する語彙が入力された時点において認識されている電話番号に発信するように、フォンコントローラ２に対して発信指令を出す。発信指令を受けたフォンコントローラ２のＣＰＵ２ａは、携帯電話１０に対して、指令を受けた電話番号に発信させる。 In step S150, a call instruction is issued to the phone controller 2 so as to make a call to the telephone number recognized when the vocabulary that affirms the recognition result is input. The CPU 2a of the phone controller 2 that has received the call command causes the mobile phone 10 to make a call to the phone number that has received the command.

第１の実施の形態における音声認識装置によれば、利用者が入力した音声に対する音声認識結果が誤っている場合に、訂正する位置を指定して訂正することができるので、容易に訂正することができる。すなわち、音声認識結果を訂正するために、正確に認識された部分を含む全ての内容を再度、発話する必要がなくなるので、訂正時の利便性が高くなる。 According to the speech recognition apparatus in the first embodiment, when the speech recognition result for the speech input by the user is incorrect, the position to be corrected can be designated and corrected. Can do. That is, since it is not necessary to utter again all the contents including the correctly recognized part in order to correct the speech recognition result, the convenience at the time of correction is enhanced.

従来の音声認識装置では、数字列の中に同じ数字が含まれている場合には、その数字だけを指定して訂正することができなかった。例えば、「８８８８」を「２８８８」に訂正するために、「８を２」と発話して訂正することができなかったが、第１の実施の形態における音声認識装置によれば、「１番目を２」または「１番目の８を２」と発話して、訂正することができる。 In the conventional speech recognition apparatus, when the same number is included in the number string, it is impossible to specify and correct only the number. For example, in order to correct “8888” to “2888”, it was not possible to correct it by uttering “8 to 2”. However, according to the speech recognition apparatus in the first embodiment, “1st Can be corrected by saying “2” or “1st 8 is 2”.

また、音声認識結果を訂正するための語彙（訂正するための音声入力例）をディスプレイ６に表示するので、利用者は、訂正するための語彙を記憶しておく必要が無く、表示されている語彙を用いて、容易に訂正を行うことができる。 Further, since the vocabulary for correcting the speech recognition result (speech input example for correction) is displayed on the display 6, the user does not need to store the vocabulary for correction and is displayed. Corrections can be made easily using vocabulary.

−第２の実施の形態−
第１の実施の形態における音声認識装置では、利用者が音声入力したい電話番号を一度に発話する例について説明した。第２の実施の形態における音声認識装置では、利用者が電話番号を一度に発話せずに、数桁ずつ区切って発話する例について説明する。なお、第２の実施の形態における音声認識装置の構成は、図１に示す第１の実施の形態における音声認識装置の構成と同じである。 -Second Embodiment-
In the voice recognition apparatus according to the first embodiment, an example has been described in which a user utters a telephone number to be input at a time. In the speech recognition apparatus according to the second embodiment, an example will be described in which a user does not utter a telephone number at a time, but utters by dividing it by several digits. The configuration of the speech recognition apparatus in the second embodiment is the same as the configuration of the speech recognition apparatus in the first embodiment shown in FIG.

例えば、利用者が電話番号の最初の３桁の数字である「０４６」と発話したとする。音声認識コントローラ１のＣＰＵ１ａは、入力された音声に対して音声認識処理を行い、認識結果をディスプレイ６に表示するとともに、第１の実施の形態における音声認識装置と同様に、訂正用の語彙を表示する。この時、ＣＰＵ１ａは、音声認識した数字の桁数をカウントし、電話番号の全桁が入力されていないと判定すると、続きの数字の音声入力を促すメッセージを出力する。例えば、認識した数字が「０４６」であった場合には、「０４６の続きをどうぞ」というメッセージをスピーカ５から出力させる。 For example, assume that the user speaks “046”, which is the first three digits of the telephone number. The CPU 1a of the speech recognition controller 1 performs speech recognition processing on the input speech, displays the recognition result on the display 6, and, as with the speech recognition apparatus in the first embodiment, corrects vocabulary for correction. indicate. At this time, the CPU 1a counts the number of digits of the voice-recognized numbers, and when determining that all the digits of the telephone number have not been input, outputs a message prompting the voice input of the subsequent numbers. For example, if the recognized number is “046”, a message “Please continue with 046” is output from the speaker 5.

続いて、利用者が次の３桁の数字である「２７０」と発話したとする。音声認識コントローラ１のＣＰＵ１ａは、入力された音声に対して音声認識処理を行い、認識結果をディスプレイ６に表示するとともに、訂正用の語彙を表示する。 Next, assume that the user utters “270”, which is the next three-digit number. The CPU 1a of the speech recognition controller 1 performs speech recognition processing on the input speech, displays the recognition result on the display 6, and displays the vocabulary for correction.

図４は、ディスプレイ６に表示される、これまでに認識した電話番号２０、発話可能アイコン２５、および、訂正応答例３１〜３３，４０を示す図である。訂正応答例３１〜３３は、図２に示す訂正応答例３１〜３３と同じである。ただし、電話番号の全ての桁の数字が入力されていないために、音声認識結果を肯定する応答例３０（図２参照）は表示されない。 FIG. 4 is a diagram showing the telephone number 20, the utterable icon 25, and correction response examples 31 to 33 and 40 recognized so far, which are displayed on the display 6. The correction response examples 31 to 33 are the same as the correction response examples 31 to 33 shown in FIG. However, since all the digits of the telephone number are not input, the response example 30 (see FIG. 2) that confirms the voice recognition result is not displayed.

訂正応答例４０は、音声入力した数字列に対して、音声認識処理の結果である数字列の桁数が少ない場合に、数字を追加入力するための語彙である。一般的に、ある数字列を音声入力する場合に、最初の数字の発話パワーは小さいので、音声認識処理によって認識されない場合がある。例えば、「２７０」と音声入力した場合でも、最初の数字の「２」が認識されずに、「７０」と誤認識されることがある。 The correction response example 40 is a vocabulary for additionally inputting numbers when the number of digits in the number string that is the result of the speech recognition process is smaller than the number string input by voice. Generally, when a certain number string is input by voice, the first number has a small utterance power and may not be recognized by the voice recognition process. For example, even when “270” is input as a voice, the first number “2” may not be recognized and may be erroneously recognized as “70”.

このような場合、利用者は、応答例４０に示す語彙を用いて、「２を追加」と発話すれば、直前の音声認識処理の結果である「７０」の前に「２」を追加する訂正を行うことができる。なお、応答例３３に示す語彙を用いて、「７０」を「２７０」のように発話することにより、訂正を行うこともできる。 In such a case, if the user speaks “add 2” using the vocabulary shown in the response example 40, “2” is added before “70” which is the result of the immediately preceding speech recognition process. Corrections can be made. It should be noted that correction can be performed by speaking “70” like “270” using the vocabulary shown in the response example 33.

図５は、第２の実施の形態における音声認識装置により行われる処理内容を示すフローチャートであり、音声認識コントローラ１のＣＰＵ１ａにより行われる。第１の実施の形態における音声認識装置により行われる処理と同一の処理については、同一の符号を付して詳しい説明は省略する。 FIG. 5 is a flowchart showing the processing contents performed by the speech recognition apparatus according to the second embodiment, which is performed by the CPU 1a of the speech recognition controller 1. The same processes as those performed by the speech recognition apparatus in the first embodiment are denoted by the same reference numerals, and detailed description thereof is omitted.

ステップＳ１０〜ステップＳ８０までの処理は、図３に示すフローチャートのステップＳ１０〜ステップＳ８０までの処理と同一であるので、詳しい説明は省略する。ステップＳ８０に続くステップＳ２００では、ステップＳ８０で行った音声認識処理の結果である番号をディスプレイ６に表示する。例えば、認識された番号が３桁なら、３桁の数字をディスプレイ６に表示する。この時、図４を用いて説明したように、訂正用の応答例３１〜３３，４０も同時に表示する。 The processing from step S10 to step S80 is the same as the processing from step S10 to step S80 in the flowchart shown in FIG. In step S200 following step S80, a number that is the result of the speech recognition process performed in step S80 is displayed on the display 6. For example, if the recognized number is three digits, a three-digit number is displayed on the display 6. At this time, as described with reference to FIG. 4, the response examples 31 to 33 and 40 for correction are also displayed at the same time.

ステップＳ２００に続くステップＳ２１０では、音声認識した数字列に基づいて、電話番号の全桁の数字が入力されたか否かを判定する。電話番号の全桁の数字が入力されたと判定するとステップＳ９０に進み、全桁の数字が入力されていないと判定すると、ステップＳ２２０に進む。ステップＳ２２０では、電話番号の続きの数字の音声入力を促す音声をスピーカ５から出力する。 In step S210 following step S200, it is determined whether or not all digits of the telephone number have been input based on the number sequence that has been voice-recognized. If it is determined that all digits of the telephone number have been input, the process proceeds to step S90. If it is determined that all digits have not been input, the process proceeds to step S220. In step S <b> 220, a voice prompting voice input of numbers following the telephone number is output from the speaker 5.

ステップＳ２２０に続くステップＳ２３０では、電話番号の続きの数字の音声入力が行われたか否かを判定する。マイク３を介して、利用者の音声が入力されたと判定するとステップＳ２４０に進み、音声が入力されていないと判定すると、ステップＳ２３０で待機する。ステップＳ２４０では、入力された音声に対して、音声認識処理を行い、ステップＳ２５０に進む。 In step S230 following step S220, it is determined whether or not a voice input of a number following the telephone number has been performed. If it is determined that the user's voice is input via the microphone 3, the process proceeds to step S240. If it is determined that no voice is input, the process waits in step S230. In step S240, voice recognition processing is performed on the input voice, and the process proceeds to step S250.

ステップＳ２５０では、ステップＳ２４０で行った音声認識処理の結果、ステップＳ２３０で音声入力された言葉が数字であるか否かを判定する。音声入力された言葉が数字であると判定すると、ステップＳ２００に戻り、音声認識処理の結果である番号をディスプレイ６に表示する。一方、音声入力された言葉が数字ではないと判定すると、ステップＳ１２０に進む。 In step S250, as a result of the speech recognition process performed in step S240, it is determined whether or not the word input by speech in step S230 is a number. If it is determined that the speech input word is a number, the process returns to step S200, and the number that is the result of the speech recognition process is displayed on the display 6. On the other hand, if it is determined that the speech input word is not a number, the process proceeds to step S120.

ステップＳ１２０では、ステップＳ２３０で音声入力された言葉が訂正用の応答例３１〜３３，４０に対応する語彙であるか否かを判定する。音声入力された言葉が訂正用の応答例３１〜３３，４０に対応する語彙であると判定すると、ステップＳ１３０に進み、訂正用の応答例に対応する語彙ではないと判定すると、ステップＳ２３０に戻る。ステップＳ１３０では、入力された訂正用の語彙に基づいて、電話番号の訂正処理を実行する。訂正処理を実行すると、ステップＳ２００に戻り、訂正した電話番号をディスプレイ６に表示する。 In step S120, it is determined whether or not the word input by voice in step S230 is a vocabulary corresponding to the response examples 31 to 33 and 40 for correction. If it is determined that the speech input word is a vocabulary corresponding to the correction response examples 31 to 33, 40, the process proceeds to step S130. If it is determined that the word is not a vocabulary corresponding to the correction response example, the process returns to step S230. . In step S130, telephone number correction processing is executed based on the input correction vocabulary. When the correction process is executed, the process returns to step S200, and the corrected telephone number is displayed on the display 6.

電話番号の全桁の数字が入力されたと判定した後に進むステップＳ９０では、「０４６２７０ＸＸＸＸでよろしいですか？」のように、音声認識結果が正しいか否かを利用者に問う音声をスピーカ５から出力する。認識結果を確認する音声を出力すると、ステップＳ１００に進む。ステップＳ１００では、認識結果を確認する音声に対して、利用者の応答があったか否かを判定する。利用者の音声が入力されたと判定すると、ステップＳ１１０に進み、音声認識処理を行う。 In step S90, which proceeds after it is determined that all digits of the telephone number have been input, a voice asking the user whether or not the voice recognition result is correct is output from the speaker 5, such as "Are you sure you want to use 046270XXXX?" To do. When the sound for confirming the recognition result is output, the process proceeds to step S100. In step S100, it is determined whether or not there is a user response to the voice for confirming the recognition result. If it is determined that the user's voice has been input, the process proceeds to step S110 to perform voice recognition processing.

ステップＳ１１０に続くステップＳ１４０では、認識結果を肯定する語彙が入力されたか否かを判定する。認識結果を肯定する語彙３０、すなわち、「はい」の語彙が入力されたと判定すると、ステップＳ１５０に進む。一方、入力された語彙が認識結果を肯定する語彙３０ではないと判定すると、ステップＳ１２０に進む。 In step S140 following step S110, it is determined whether or not a vocabulary that affirms the recognition result has been input. If it is determined that the vocabulary 30 that affirms the recognition result, that is, the vocabulary “Yes” is input, the process proceeds to step S150. On the other hand, if it is determined that the input vocabulary is not the vocabulary 30 that affirms the recognition result, the process proceeds to step S120.

ステップＳ１５０では、認識結果を肯定する語彙が入力された時の電話番号に発信するように、フォンコントローラ２に対して発信指令を出す。発信指令を受けたフォンコントローラ２のＣＰＵ２ａは、携帯電話１０に対して、指令を受けた電話番号に発信させる。 In step S150, a call command is issued to the phone controller 2 so as to call the telephone number when the vocabulary that affirms the recognition result is input. The CPU 2a of the phone controller 2 that has received the call command causes the mobile phone 10 to make a call to the phone number that has received the command.

第２の実施の形態における音声認識装置によれば、第１の実施の形態における音声認識装置と同様に、誤認識された音声認識結果に対して、訂正する位置を指定して訂正することができるので、容易に訂正を行うことができる。また、音声認識結果を訂正するための語彙をディスプレイ６に表示するので、利用者は、訂正するための語彙を記憶しておく必要が無く、表示されている語彙を用いて、容易に訂正を行うことができる。 According to the speech recognition apparatus in the second embodiment, as in the speech recognition apparatus in the first embodiment, it is possible to specify a correction position and correct the erroneously recognized speech recognition result. As a result, correction can be easily performed. Further, since the vocabulary for correcting the speech recognition result is displayed on the display 6, the user does not need to memorize the vocabulary for correction and can easily make corrections using the displayed vocabulary. It can be carried out.

第２の実施の形態における音声認識装置によれば、数字を追加するための応答例４０を予め決めておき、この応答例４０に対応する語彙が入力された場合に、既に認識されている結果に対して、数字を追加する訂正を行うので、認識されなかった数字を容易に追加することができる。例えば、電話番号のように、桁数が多い数字を音声入力して、先頭の数字が認識されなかった場合に、全ての桁の数字を再度入力する必要がなくなるので、訂正を容易に行うことができる。上述したように、発話の最初の数字は、発話パワーが小さいため、最初の数字が認識されないケースが多発することが想定される。従って、発話の最初の数字を追加するための訂正応答例４０を予め決めておけば、効率的な訂正方法を実現することができる。 According to the speech recognition apparatus in the second exemplary embodiment, when a response example 40 for adding a number is determined in advance and a vocabulary corresponding to the response example 40 is input, a result that has already been recognized. On the other hand, since a correction for adding a number is performed, an unrecognized number can be easily added. For example, if a number with a large number of digits, such as a phone number, is input by voice and the first number is not recognized, it is not necessary to input all the digits again, so correction is easy. Can do. As described above, since the first number of utterances has low utterance power, it is assumed that there are many cases where the first number is not recognized. Therefore, if the correction response example 40 for adding the first number of the utterance is determined in advance, an efficient correction method can be realized.

本発明は、上述した各実施の形態に限定されることはない。例えば、第１および第２の実施の形態における音声認識装置では、電話番号を音声にて入力する例について説明したが、入力する音声は数字に限られず、アルファベットでもよいし、カタカナ、ひらがな、漢字などの文字でもよい。この場合でも、例えば「かながわ」と発話した場合に、「あながわ」と誤認識されると、「１番目を『か』」のように、訂正する位置を指定して訂正することができる。 The present invention is not limited to the embodiments described above. For example, in the speech recognition apparatus according to the first and second embodiments, the example in which the telephone number is input by voice has been described. However, the input voice is not limited to numerals, and may be alphabets, katakana, hiragana, or kanji. Or other characters. Even in this case, for example, when “Kanagawa” is spoken, if it is erroneously recognized as “Kanagawa”, it can be corrected by specifying a correction position such as “Kana” as the first one. .

音声認識結果に対する訂正位置の指定方法は、「○番目を×」の語彙に限られず、「○桁目を×」でもよいし、「○文字目を×」のような語彙を用いても良い。また、訂正個所の指定方法は、先頭からの番号に限られず、最後の数字（文字）から数えた番号でもよい。例えば、「０４８２ＸＸ８８８８」を「０４８２ＸＸ２８８８」に訂正する方法として、「７番目を２」と訂正する方法について説明したが、「右から４番目を２」と発話して訂正できるようにしてもよい。 The method of specifying the correction position for the speech recognition result is not limited to the vocabulary “Xth”, but may use “X” at the “○” digit, or a vocabulary such as “×” at the “○” character. . Further, the method of specifying the correction part is not limited to the number from the top, but may be a number counted from the last number (character). For example, as a method of correcting “0482XX8888” to “0482XX2888”, the method of correcting “7th to 2” has been described. However, “4th from the right is 2” may be corrected by speaking.

ディスプレイ６には、音声認識結果を訂正するための応答例３０〜３３を示したが（図２参照）、例えば、「２番目を１」のように、具体的な訂正例を示すようにしてもよい。例えば、「１」と「７」が間違って認識される傾向が高い場合に、「７」と認識された２番目の数字に関して、「２番目を１」のように具体的な訂正例を示すことにより、さらに容易に訂正を行うことができる。 The display 6 shows response examples 30 to 33 for correcting the speech recognition result (see FIG. 2). For example, a specific correction example such as “second is 1” is shown. Also good. For example, in the case where “1” and “7” are likely to be recognized incorrectly, a specific correction example such as “2 to 1” is shown for the second number recognized as “7”. Thus, correction can be performed more easily.

電話番号の全ての桁の数字が入力されて、「０４８２ＸＸ８８８８でよろしいですか」等の認識結果を確認する音声がスピーカ５から出力された後に、利用者が「いいえ」等の認識結果を否定する音声を入力すると、「訂正をどうぞ」等の音声をスピーカ５から出力するようにしてもよい。 After all the digits of the telephone number have been input and a sound confirming the recognition result such as “Are you sure? 0482XX8888” is output from the speaker 5, the user denies the recognition result such as “No”. When audio is input, audio such as “Please correct” may be output from the speaker 5.

特許請求の範囲の構成要素と第１および第２の実施の形態の構成要素との対応関係は次の通りである。すなわち、マイク３が音声入力手段を、音声認識コントローラ１が音声認識手段、訂正音声入力判定手段、訂正内容特定手段、訂正手段および追加訂正音声判定手段を、スピーカ５およびディスプレイ６が出力手段を、ディスプレイ６が表示手段をそれぞれ構成する。なお、本発明の特徴的な機能を損なわない限り、各構成要素は上記構成に限定されるものではない。 The correspondence between the constituent elements of the claims and the constituent elements of the first and second embodiments is as follows. That is, the microphone 3 is a voice input means, the voice recognition controller 1 is a voice recognition means, a corrected voice input determination means, a correction content specifying means, a correction means and an additional corrected voice determination means, a speaker 5 and a display 6 are output means, The display 6 constitutes display means. In addition, unless the characteristic function of this invention is impaired, each component is not limited to the said structure.

本発明による音声認識装置の第１の実施の形態における構成を示す図The figure which shows the structure in 1st Embodiment of the speech recognition apparatus by this invention. ディスプレイに表示される、認識した電話番号を確認するメッセージを示す図Figure showing a message on the display confirming the recognized phone number 第１の実施の形態における音声認識装置により行われる処理内容を示すフローチャートThe flowchart which shows the processing content performed by the speech recognition apparatus in 1st Embodiment. ディスプレイに表示される、これまでに認識した電話番号と訂正語彙とを示す図Figure showing the phone number and correction vocabulary recognized so far on the display 第２の実施の形態における音声認識装置により行われる処理内容を示すフローチャートThe flowchart which shows the processing content performed by the speech recognition apparatus in 2nd Embodiment.

Explanation of symbols

１…音声認識コントローラ
２…フォンコントローラ
３…マイク
４…音声入力操作スイッチ
５…スピーカ
６…ディスプレイ
１０…携帯電話 DESCRIPTION OF SYMBOLS 1 ... Voice recognition controller 2 ... Phone controller 3 ... Microphone 4 ... Voice input operation switch 5 ... Speaker 6 ... Display 10 ... Mobile phone

Claims

Voice input means for inputting voice;
Voice recognition means for recognizing the voice input by the voice input means;
Output means for outputting a voice recognition result by the voice recognition means;
Corrected voice input determination means for determining whether or not a voice for correcting the voice recognition result output by the output means (hereinafter referred to as corrected voice) is input;
When it is determined that the corrected voice is input by the corrected voice input determining means, the correction content specifying means for specifying the correction content including at least the correction position designated by the corrected voice;
A speech recognition apparatus comprising: correction means for correcting the speech recognition result at the specified correction position based on the correction content specified by the correction content specifying means.

The speech recognition apparatus according to claim 1,
The speech recognition apparatus according to claim 1, wherein the correction position of the speech recognition result is designated by the order of arrangement order of characters or the like constituting the speech recognition result.

The speech recognition apparatus according to claim 1 or 2,
Further comprising an additional correction instruction determination means for determining whether or not a correction instruction for adding content to the voice recognition result is included in the corrected voice;
When the correction means determines that the correction instruction for adding content is included by the additional correction instruction determination means, the correction means adds the correction content specified by the correction content specification means to the speech recognition result. A speech recognition apparatus characterized by

The speech recognition apparatus according to claim 1 or 2,
The speech recognition apparatus further comprising display means for displaying at least one speech input example for correcting the speech recognition result.

Perform voice recognition processing on the input voice,
Outputting the result of the speech recognition process;
It is determined whether or not a voice for correcting the output voice recognition result (hereinafter referred to as a corrected voice) is input,
When it is determined that the corrected voice is input, the correction content including at least the correction position designated by the corrected voice is specified,
A speech recognition method, wherein a speech recognition result at a specified correction position is corrected based on the specified correction content.