JP2019020589A

JP2019020589A - Voice recognition system and processing stop method in the same

Info

Publication number: JP2019020589A
Application number: JP2017138891A
Authority: JP
Inventors: 井上　譲; Yuzuru Inoue; 譲井上
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2017-07-18
Filing date: 2017-07-18
Publication date: 2019-02-07

Abstract

To provide a voice recognition system capable of cancelling a stop instruction even if the stop instruction of processing in execution is erroneously recognized.SOLUTION: A voice recognition system 10 comprises: a voice recognition section 11 for constantly analyzing peripheral voice and recognizing a voice instruction; and a processing execution section 13 for performing processing corresponding to the voice instruction. The processing execution section 13 comprises: a stop reservation setting section 131 for reserving stop processing in execution when the voice instruction is the stop instruction of the processing in execution; a stop reservation execution section 132 for performing the reserved stop processing when a previously determined time elapses from the reservation of the stop processing; a user intention estimation section 134 for estimating intention of a user from information except for the voice instruction; and a stop reservation cancel section 133 for cancelling the reservation of the stop processing when the user is estimated to have no stop intention of the processing in execution before the previously determined time elapses from the reservation of the stop processing.SELECTED DRAWING: Figure 1

Description

本発明は、音声を常時解析する音声認識システムに関するものである。 The present invention relates to a speech recognition system that constantly analyzes speech.

周囲で発せられた音声を常時解析することによってユーザの音声指示を認識する音声認識システムが知られている。そのような音声認識システムでは、ユーザの発話が意図せずして音声指示として認識されるという誤認識の防止が課題となる。 2. Description of the Related Art A voice recognition system that recognizes a user's voice instruction by constantly analyzing voice emitted in the surroundings is known. In such a voice recognition system, there is a problem of preventing erroneous recognition that a user's utterance is unintentionally recognized as a voice instruction.

例えば下記の特許文献１には、音声指示（音声操作）を行う意図がユーザにあるか否かを判断し、その意図が無いと判断した場合には、意図が有ると判断した場合に比して、被制御装置の制御態様をユーザに意識させない方向（ユーザの邪魔にならない方向）に変更する音声認識システムが提案されている。例えば、当該音声認識システムが音声認識に失敗した場合、当該音声認識システムは、ユーザに音声指示の意図があると判断すれば「もう一度発話してください」という音声メッセージを出力し、ユーザに音声指示の意図がない判断すれば何も行わない。 For example, in Patent Document 1 below, it is determined whether or not the user has an intention to perform a voice instruction (voice operation), and when it is determined that the intention is not present, it is compared with a case where it is determined that there is an intention. Thus, there has been proposed a speech recognition system that changes the control mode of the controlled device in a direction that does not make the user aware of it (a direction that does not disturb the user). For example, if the voice recognition system fails in voice recognition, the voice recognition system outputs a voice message “Please speak again” if the user judges that the voice is intended to be voiced. If there is no intention, do nothing.

国際公開第２０１６／０５１５１９号International Publication No. 2016/051519

特許文献１の音声認識システムでは、ユーザの意図を判断することによって誤認識の発生を抑制しているが、その判断はユーザが発話した時点で行われ、発話後のユーザの挙動などは考慮されない。そのため、音声指示が一旦誤認識されるとそれを取り消すことができない。特に、実行中の処理を停止（終了あるいは中断を含む）させる音声指示が誤って認識されて、当該処理がユーザの意図に反して停止すると、例えば未保存のデータの消失などの問題が生じる。例えば、車両に搭載された音声認識システムが、搭乗者の「車を止めて」という発話の「止めて」を、実行中の処理を停止させる音声指示と誤認するようなケースが考えられる。 In the speech recognition system of Patent Document 1, the occurrence of misrecognition is suppressed by determining the user's intention, but the determination is made when the user utters, and the behavior of the user after the utterance is not considered. . For this reason, once a voice instruction is erroneously recognized, it cannot be canceled. In particular, if a voice instruction for stopping (including ending or interrupting) a process being executed is erroneously recognized and the process is stopped against the user's intention, a problem such as loss of unsaved data occurs. For example, a case may be considered in which a voice recognition system mounted on a vehicle misidentifies a passenger's utterance “stop the car” as a voice instruction to stop the process being executed.

本発明は以上のような課題を解決するためになされたものであり、実行中の処理の停止指示が誤って認識された場合でも、その停止指示の取り消しが可能な音声認識システムを提供することを目的とする。 The present invention has been made to solve the above-described problems, and provides a voice recognition system capable of canceling a stop instruction even if the stop instruction for the process being executed is erroneously recognized. With the goal.

本発明に係る音声認識システムは、周囲の音声を常時解析することによって音声指示を認識する音声認識部と、音声認識部により認識された音声指示に応じた処理を実行する処理実行部とを備え、処理実行部は、音声指示が実行中の処理の停止指示であった場合に、実行中の処理の停止処理を予約する停止予約設定部と、停止処理の予約から予め定められた時間が経過すると、予約された停止処理を実行する停止予約実行部と、音声指示以外の情報からユーザの意思を推定するユーザ意思推定部と、停止処理の予約から予め定められた時間が経過する前に、ユーザ意思推定部によりユーザに実行中の処理の停止意思がないと推定されると、停止処理の予約を解除する停止予約解除部と、を備える。 The voice recognition system according to the present invention includes a voice recognition unit that recognizes a voice instruction by constantly analyzing surrounding voices, and a process execution unit that executes a process according to the voice instruction recognized by the voice recognition unit. The process execution unit includes a stop reservation setting unit for reserving a stop process for a process being executed when a voice instruction is an instruction to stop the process being executed, and a predetermined time has elapsed from the reservation of the stop process. Then, the stop reservation execution unit that executes the reserved stop process, the user intention estimation unit that estimates the user's intention from information other than the voice instruction, and before a predetermined time elapses from the reservation of the stop process, When the user intention estimation unit estimates that the user does not intend to stop the process being executed, the user intention estimation unit includes a stop reservation cancellation unit that cancels the reservation of the stop process.

本発明に係る音声認識システムは、実行中の処理の停止指示を認識した場合に、実行中の処理を即時に停止させるのではなく、予め定められた時間後に停止するように停止処理を予約する。ただし、上記時間が経過する前に、ユーザに実行中の処理を停止させる意思がないと推定されると、当該音声認識システムは、停止処理の予約を解除する。そのため、停止指示が誤って認識された場合でも、ユーザの意思に応じて事後的にその取り消しを行うことができる。 The speech recognition system according to the present invention reserves a stop process so that it stops after a predetermined time, instead of immediately stopping the process being executed, when it recognizes a stop instruction of the process being executed. . However, if it is estimated that the user does not intend to stop the process being executed before the time has elapsed, the speech recognition system cancels the reservation for the stop process. Therefore, even when the stop instruction is recognized by mistake, it can be canceled afterwards according to the user's intention.

実施の形態１に係る音声認識システムの構成を示す機能ブロック図である。1 is a functional block diagram illustrating a configuration of a speech recognition system according to Embodiment 1. FIG. 実施の形態１に係る音声認識システムの音声指示に対する動作を示すフローチャートである。4 is a flowchart illustrating an operation for a voice instruction of the voice recognition system according to the first embodiment. 実施の形態１に係る音声認識システムの手操作に対する動作を示すフローチャートである。4 is a flowchart illustrating an operation for manual operation of the voice recognition system according to the first embodiment. 実施の形態１に係る音声認識システムの予約された停止処理に関する動作を示すフローチャートである。4 is a flowchart illustrating an operation related to a reserved stop process of the voice recognition system according to the first embodiment. 音声認識システムのハードウェア構成の例を示す図である。It is a figure which shows the example of the hardware constitutions of a speech recognition system. 音声認識システムのハードウェア構成の例を示す図である。It is a figure which shows the example of the hardware constitutions of a speech recognition system. 実施の形態２に係る音声認識システムにおける実行中処理の停止処理を示すフローチャートである。6 is a flowchart showing a stop process of an ongoing process in the speech recognition system according to the second embodiment. 実施の形態３に係る音声認識システムの音声指示に対する動作を示すフローチャートである。14 is a flowchart illustrating an operation for a voice instruction of the voice recognition system according to the third embodiment. 実施の形態４に係る音声認識システムの構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the speech recognition system which concerns on Embodiment 4. 実施の形態４に係る音声認識システムの変形例を説明するための図である。FIG. 10 is a diagram for explaining a modification of the voice recognition system according to the fourth embodiment.

＜実施の形態１＞
図１は、実施の形態１に係る音声認識システム１０の構成を示す機能ブロック図である。図１のように、音声認識システム１０には、ユーザが音声認識システム１０に音声を入力するためのマイク１と、ユーザが手で音声認識システム１０を操作するための手操作入力装置２と、音声認識システム１０がユーザに情報を提示するための情報出力装置３とが接続されている。 <Embodiment 1>
FIG. 1 is a functional block diagram showing the configuration of the speech recognition system 10 according to the first embodiment. As shown in FIG. 1, the voice recognition system 10 includes a microphone 1 for a user to input voice into the voice recognition system 10, a manual operation input device 2 for a user to operate the voice recognition system 10, An information output device 3 is connected to the voice recognition system 10 for presenting information to the user.

手操作入力装置２は、例えば、キーボードやタッチパッド、押しボタンなどで構成される。情報出力装置３は、例えば、画像や文字メッセージを表示する画面、音声メッセージや効果音を出力するスピーカなどで構成される。また、手操作入力装置２としてのタッチパッドを、情報出力装置３の画面上に配置することで、手操作入力装置２および情報出力装置３を１つのタッチパネルモニタとして構成してもよい。 The manual operation input device 2 includes, for example, a keyboard, a touch pad, and a push button. The information output device 3 includes, for example, a screen that displays images and text messages, a speaker that outputs voice messages and sound effects, and the like. Alternatively, the manual operation input device 2 and the information output device 3 may be configured as one touch panel monitor by arranging a touch pad as the manual operation input device 2 on the screen of the information output device 3.

音声認識システム１０は、音声認識部１１、手操作認識部１２および処理実行部１３を備えている。音声認識部１１は、マイク１が取得した周囲の音声を常時解析することによって、ユーザの音声指示を認識する。手操作認識部１２は、手操作入力装置２に入力された操作（以下「手操作」という）を認識する。処理実行部１３は、音声認識部１１により認識された音声指示、あるいは手操作認識部１２により認識された手操作に応じた処理を実行する。 The voice recognition system 10 includes a voice recognition unit 11, a manual operation recognition unit 12, and a process execution unit 13. The voice recognition unit 11 recognizes a user's voice instruction by constantly analyzing the surrounding voice acquired by the microphone 1. The manual operation recognition unit 12 recognizes an operation input to the manual operation input device 2 (hereinafter referred to as “manual operation”). The process execution unit 13 executes a process corresponding to the voice instruction recognized by the voice recognition unit 11 or the manual operation recognized by the manual operation recognition unit 12.

図１に示すように、処理実行部１３は、停止予約設定部１３１、停止予約実行部１３２、停止予約解除部１３３およびユーザ意思推定部１３４を備えている。 As illustrated in FIG. 1, the process execution unit 13 includes a stop reservation setting unit 131, a stop reservation execution unit 132, a stop reservation cancellation unit 133, and a user intention estimation unit 134.

停止予約設定部１３１は、音声認識部１１から取得した音声指示が、実行中の処理の停止させる指示であった場合に、その停止に係る処理が予め定められた時間後に実行されるように、停止処理を予約する。以下、実行中の処理を「実行中処理」、実行中処理の停止を指示する音声指示を「停止指示」、実行中処理を停止させる処理を「停止処理」、停止処理の予約から実行までの時間（上記の「予め定められた時間」）を「停止保留時間」という。なお、本明細書において、処理の「停止」とは、処理を完全に停止させる「終了」（停止時点からの再開ができない停止）や、一時的に停止させる「中断」（停止時点からの再開ができる停止）など、広い概念を含むものとする。 When the voice instruction acquired from the voice recognition unit 11 is an instruction to stop the process being executed, the stop reservation setting unit 131 is configured so that the process related to the stop is executed after a predetermined time. Schedule stop processing. In the following, the process being executed is “in-process”, the voice instruction to stop the process being executed is “stop instruction”, the process to stop the process in progress is “stop process”, and the process from reservation to execution The time (the above “predetermined time”) is referred to as “stop hold time”. In this specification, “stop” of the process means “end” to stop the process completely (stop that cannot be resumed from the stop point) or “suspend” to temporarily stop (restart from the stop point) Including a wide range of concepts such as stoppages).

停止予約実行部１３２は、停止予約設定部１３１が停止処理の予約を設定した時点からの経過時間を測定し、停止処理の予約から停止保留時間だけ経過したときに、予約されている停止処理を実行する。 The stop reservation execution unit 132 measures the elapsed time from the time when the stop reservation setting unit 131 sets the stop processing reservation, and performs the reserved stop processing when the stop hold time has elapsed from the stop processing reservation. Run.

ユーザ意思推定部１３４は、音声認識部１１によって認識される音声指示以外の情報から、ユーザの意思、具体的には、ユーザに実行中処理を停止させる意思があるか否かを推定する。本実施の形態では、ユーザ意思推定部１３４は、ユーザが手操作入力装置２を用いて何らかの手操作を行ったときには、ユーザに実行中処理を停止させる意思がないと推定する。逆に、実行中処理の停止処理が予約されて以降、何の手操作も行われていなければ、ユーザ意思推定部１３４は、ユーザに実行中処理を停止させる意思があると推定する。 The user intention estimation unit 134 estimates from the information other than the voice instruction recognized by the voice recognition unit 11 whether or not the user's intention, specifically, the user has an intention to stop the running process. In the present embodiment, when the user performs some manual operation using the manual operation input device 2, the user intention estimation unit 134 estimates that the user has no intention to stop the running process. On the other hand, if no manual operation has been performed since the stop process of the ongoing process is reserved, the user intention estimation unit 134 estimates that the user has an intention to stop the ongoing process.

停止予約解除部１３３は、停止処理の予約から停止保留時間が経過する前に、ユーザ意思推定部１３４がユーザに実行中処理の停止意思がないと推定すると、停止処理の予約を解除する。 If the user intention estimation unit 134 estimates that the user has no intention to stop the running process before the stop hold time elapses from the reservation of the stop process, the stop reservation canceling unit 133 cancels the reservation of the stop process.

ここで、図２〜図４のフローチャートを用いて、実施の形態１に係る音声認識システム１０の動作について説明する。 Here, the operation of the speech recognition system 10 according to Embodiment 1 will be described using the flowcharts of FIGS.

まず、図２を参照しつつ、音声認識システム１０の音声指示に対する動作を説明する。音声認識システム１０が起動すると、音声認識部１１は、マイク１が取得した音声を解析する（ステップＳ１０１）。音声認識部１１によりユーザの音声指示が認識されなければ（ステップＳ１０２でＮＯ）、ステップＳ１０１が繰り返される。つまり、音声認識部１１による周囲の音声の解析は常時行われる。 First, the operation of the voice recognition system 10 in response to voice instructions will be described with reference to FIG. When the voice recognition system 10 is activated, the voice recognition unit 11 analyzes the voice acquired by the microphone 1 (step S101). If the voice recognition unit 11 does not recognize the user's voice instruction (NO in step S102), step S101 is repeated. That is, the surrounding voice analysis by the voice recognition unit 11 is always performed.

音声認識部１１によりユーザの音声指示が認識されると（ステップＳ１０２でＹＥＳ）、その認識結果が処理実行部１３へ伝達され、処理実行部１３は、当該音声指示が実行中処理の停止指示であるか否かを確認する（ステップＳ１０３）。 When the voice recognition unit 11 recognizes the user's voice instruction (YES in step S102), the recognition result is transmitted to the process execution unit 13, and the process execution unit 13 indicates that the voice instruction is an instruction to stop the ongoing process. It is confirmed whether or not there is (step S103).

音声認識部１１により認識されたユーザの音声指示が、実行中処理の停止指示以外のものであった場合には（ステップＳ１０３でＮＯ）、処理実行部１３は、当該音声指示に応じた処理を実行し（ステップＳ１０４）、ステップＳ１０１へ戻る。なお、処理実行部１３が実行する処理の内容によっては、情報出力装置３から画像や音声が出力される。 When the voice instruction of the user recognized by the voice recognition unit 11 is other than a stop instruction for the process being executed (NO in step S103), the process execution unit 13 performs a process according to the voice instruction. Execute (step S104) and return to step S101. Depending on the content of the process executed by the process execution unit 13, an image or sound is output from the information output device 3.

一方、音声認識部１１により認識されたユーザの音声指示が、実行中処理の停止指示であった場合には（ステップＳ１０３でＮＯ）、その実行中処理の停止処理が停止保留時間だけ経過した後に実行されるように、停止予約設定部１３１が停止処理を予約して（ステップＳ１０５）、ステップＳ１０１へ戻る。 On the other hand, when the voice instruction of the user recognized by the voice recognition unit 11 is a stop instruction for the ongoing process (NO in step S103), after the stop process of the ongoing process has elapsed for the stop hold time. The stop reservation setting unit 131 reserves a stop process so as to be executed (step S105), and the process returns to step S101.

このように、音声認識システム１０は、ユーザの音声指示を認識するとそれに応じた処理を即時に実行するが、例外的に、音声指示が実行中処理の停止指示であった場合には、実行中処理を即時に停止させるのではなく、停止保留時間が経過した後に停止させる。 As described above, when the voice recognition system 10 recognizes the voice instruction of the user, the voice recognition system 10 immediately executes a process corresponding to the voice instruction. However, when the voice instruction is an instruction to stop the ongoing process, it is being executed. Instead of stopping the process immediately, stop the process after the stop hold time has elapsed.

次に、図３を参照しつつ、音声認識システム１０の手操作に対する動作を説明する。音声認識システム１０が起動すると、手操作認識部１２は、手操作入力装置２にユーザの手操作が入力されたか否かを確認する（ステップＳ２０１）。手操作入力装置２に手操作が入力されると（ステップＳ２０１でＹＥＳ）、手操作認識部１２が手操作の内容を認識し、その認識結果が処理実行部１３へ伝達される。処理実行部１３は、当該手操作に応じた処理を実行し（ステップＳ２０２）、ステップＳ２０１へ戻る。なお、手操作入力装置２に手操作を入力されない間は（ステップＳ２０１でＮＯ）、ステップＳ２０１が繰り返される。 Next, the operation | movement with respect to the manual operation of the speech recognition system 10 is demonstrated, referring FIG. When the voice recognition system 10 is activated, the manual operation recognition unit 12 confirms whether or not a user's manual operation is input to the manual operation input device 2 (step S201). When a manual operation is input to the manual operation input device 2 (YES in step S201), the manual operation recognition unit 12 recognizes the content of the manual operation, and the recognition result is transmitted to the process execution unit 13. The process execution part 13 performs the process according to the said manual operation (step S202), and returns to step S201. Note that step S201 is repeated while no manual operation is input to the manual operation input device 2 (NO in step S201).

このように、音声認識システム１０は、ユーザが入力した手操作に応じた処理（実行中処理を停止処理を含む）を即時に実行する。 As described above, the voice recognition system 10 immediately executes a process (including an ongoing process including a stop process) according to a manual operation input by the user.

図２のフローと図３のフローは並行して行われる。また、図２のステップＳ１０４で処理実行部１３が実行する処理と、図３のステップＳ２０２で処理実行部１３が実行する処理との間には、実質的な差はない。例えば、音声指示に応じて処理実行部１３が実行した処理を、手操作によって停止させることもできるし、逆に、手操作に応じて処理実行部１３が実行した処理を、音声操作によって停止させることもできる（この場合は停止処理の予約が行われる）。 The flow in FIG. 2 and the flow in FIG. 3 are performed in parallel. Further, there is no substantial difference between the process executed by the process execution unit 13 in step S104 of FIG. 2 and the process executed by the process execution unit 13 in step S202 of FIG. For example, the process executed by the process execution unit 13 in response to a voice instruction can be stopped by a manual operation. Conversely, the process executed by the process execution unit 13 in response to a manual operation is stopped by a voice operation. (In this case, the stop process is reserved).

次に、図４を参照して、音声認識システム１０の予約された停止処理に関する動作を説明する。図４のフローは、処理実行部１３の停止予約実行部１３２、停止予約解除部１３３およびユーザ意思推定部１３４により実行される。 Next, with reference to FIG. 4, an operation related to the reserved stop process of the speech recognition system 10 will be described. The flow in FIG. 4 is executed by the stop reservation executing unit 132, the stop reservation canceling unit 133, and the user intention estimating unit 134 of the process executing unit 13.

音声認識システム１０が起動すると、停止予約実行部１３２は、停止予約設定部１３１によって設定された有効な停止処理の予約が存在するか否かを確認する（ステップＳ３０１）。すでに解除された予約や、すでに実行された予約、すでに停止した処理に対する予約などは、有効なものではない。有効な停止処理の予約が存在しなければ（ステップＳ３０１でＮＯ）、新たな予約が設定されるまでステップＳ３０１が繰り返される。 When the speech recognition system 10 is activated, the stop reservation execution unit 132 confirms whether there is a reservation for a valid stop process set by the stop reservation setting unit 131 (step S301). A reservation that has already been canceled, a reservation that has already been executed, or a reservation for a process that has already stopped is not valid. If there is no valid stop process reservation (NO in step S301), step S301 is repeated until a new reservation is set.

有効な停止処理の予約が存在する場合（ステップＳ３０１でＹＥＳ）、ユーザ意思推定部１３４が、手操作認識部１２による手操作の認識結果に基づいて、ユーザに実行中処理の停止意思があるか否かを推定する（ステップＳ３０２）。すなわち、ユーザ意思推定部１３４は、ユーザにより何らかの手操作が行われれば、ユーザに実行中処理の停止意思がないと推定する。逆に、実行中処理の停止処理が予約されて以降、何の手操作も行われていなければ、ユーザ意思推定部１３４は、ユーザに実行中処理の停止意思があると推定する。 If there is a reservation for a valid stop process (YES in step S301), whether the user intention estimation unit 134 has the intention to stop the ongoing process based on the manual operation recognition result by the manual operation recognition unit 12 Whether or not is estimated (step S302). That is, the user intention estimation unit 134 estimates that the user has no intention to stop the ongoing process if any manual operation is performed by the user. On the other hand, if no manual operation has been performed since the stop process of the ongoing process is reserved, the user intention estimation unit 134 estimates that the user has an intention to stop the ongoing process.

ユーザ意思推定部１３４により、ユーザに実行中処理の停止意思がないと推定された場合には（ステップＳ３０３でＮＯ）、停止予約解除部１３３が停止処理の予約を解除して（ステップＳ３０４）、ステップＳ３０１へ戻る。 When the user intention estimation unit 134 estimates that the user does not intend to stop the process being executed (NO in step S303), the stop reservation cancellation unit 133 cancels the reservation of the stop process (step S304), Return to step S301.

一方、ユーザ意思推定部１３４により、ユーザに実行中処理の停止意思があると推定された場合には（ステップＳ３０３でＹＥＳ）、停止予約実行部１３２が、停止処理の予約から停止保留時間だけ経過したか否かを確認する（ステップＳ３０５）。このとき、停止保留時間が経過していれば（ステップＳ３０５でＹＥＳ）、停止予約実行部１３２が予約されている停止処理を実行して（ステップＳ３０６）、ステップＳ３０１へ戻る。しかし、停止保留時間が経過していなければ（ステップＳ３０５でＮＯ）、停止処理の予約を維持したまま、ステップＳ３０１へ戻る。 On the other hand, if the user intention estimation unit 134 estimates that the user has an intention to stop the process being executed (YES in step S303), the stop reservation execution unit 132 has elapsed the stop hold time from the reservation of the stop process. It is confirmed whether or not it has been done (step S305). At this time, if the stop hold time has elapsed (YES in step S305), the stop reservation execution unit 132 executes the reserved stop process (step S306), and the process returns to step S301. However, if the stop hold time has not elapsed (NO in step S305), the process returns to step S301 while the stop process reservation is maintained.

以上のように、実施の形態１に係る音声認識システム１０は、実行中処理の停止指示が認識されても、当該実行中処理を即時に停止させるのではなく、停止保留時間だけ経過した後に停止させる。ただし、停止保留時が経過する前に、実行中処理を停止させる意思がユーザにないと推定されると、停止処理の予約は解除される。よって、停止指示が誤って認識された場合でも、ユーザの意思に応じて事後的にその取り消しを行うことができる。 As described above, the speech recognition system 10 according to Embodiment 1 does not stop the in-execution process immediately but stops after the stop hold time has elapsed, even if an instruction to stop the in-execution process is recognized. Let However, if it is presumed that the user does not intend to stop the ongoing process before the stop suspension time elapses, the reservation of the stop process is canceled. Therefore, even when the stop instruction is recognized by mistake, it can be canceled afterwards according to the user's intention.

なお、本発明に係る音声認識システム１０の適用は特に限られず、例えば、ＰＣ（パーソナルコンピュータ）、携帯電話、スマートフォン、ナビゲーション装置など、音声指示を入力可能な電子機器に広く適用可能である。 The application of the voice recognition system 10 according to the present invention is not particularly limited, and can be widely applied to electronic devices that can input voice instructions, such as a PC (personal computer), a mobile phone, a smartphone, and a navigation device.

図５および図６は、それぞれ音声認識システム１０のハードウェア構成の一例を示す図である。図１に示した音声認識システム１０の各要素（音声認識部１１、手操作認識部１２および処理実行部１３）は、例えば図５に示す処理回路５０により実現される。すなわち、処理回路５０は、周囲の音声を常時解析することによって音声指示を認識する音声認識部１１と、ユーザの手操作を認識する手操作認識部１２と、音声認識部１１により認識された音声指示に応じた処理を実行する処理実行部１３とを備える。また、その処理実行部１３には、音声指示が実行中処理の停止指示であった場合に、実行中処理の停止処理を予約する停止予約設定部１３１と、停止処理の予約から停止保留時間が経過すると、予約された停止処理を実行する停止予約実行部１３２と、音声指示以外の情報からユーザの意思を推定するユーザ意思推定部１３４と、停止処理の予約から停止保留時間が経過する前に、ユーザ意思推定部１３４によりユーザに実行中処理の停止意思がないと推定されると、停止処理の予約を解除する停止予約解除部１３３とが、備えられる。処理回路５０には、専用のハードウェアが適用されてもよいし、メモリに格納されるプログラムを実行するプロセッサ（中央処理装置（ＣＰＵ：Central Processing Unit）、処理装置、演算装置、マイクロプロセッサ、マイクロコンピュータ、ＤＳＰ（Digital Signal Processor）ともいう）が適用されてもよい。 FIG. 5 and FIG. 6 are diagrams each illustrating an example of a hardware configuration of the speech recognition system 10. Each element (speech recognition unit 11, manual operation recognition unit 12, and process execution unit 13) of the speech recognition system 10 illustrated in FIG. 1 is realized by, for example, a processing circuit 50 illustrated in FIG. That is, the processing circuit 50 recognizes a voice instruction by constantly analyzing surrounding sounds, a manual operation recognition unit 12 that recognizes a user's manual operation, and a voice recognized by the voice recognition unit 11. And a process execution unit 13 that executes a process according to the instruction. The process execution unit 13 also has a stop reservation setting unit 131 for reserving a stop process for the ongoing process when the voice instruction is a stop instruction for the ongoing process, and a stop hold time from the stop process reservation. When the time has elapsed, the stop reservation execution unit 132 that executes the reserved stop process, the user intention estimation unit 134 that estimates the user's intention from information other than the voice instruction, and before the stop hold time elapses from the reservation of the stop process When the user intention estimation unit 134 estimates that the user does not intend to stop the process being executed, a stop reservation cancellation unit 133 that cancels the reservation of the stop process is provided. Dedicated hardware may be applied to the processing circuit 50, or a processor (a central processing unit (CPU), a processing device, an arithmetic device, a microprocessor, a micro processor) that executes a program stored in a memory A computer or a DSP (Digital Signal Processor) may also be applied.

処理回路５０が専用のハードウェアである場合、処理回路５０は、例えば、単一回路、複合回路、プログラム化したプロセッサ、並列プログラム化したプロセッサ、ＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field-Programmable Gate Array）、またはこれらを組み合わせたものなどが該当する。音声認識システム１０の各要素の機能のそれぞれは、複数の処理回路で実現されてもよいし、それらの機能がまとめて一つの処理回路で実現されてもよい。 When the processing circuit 50 is dedicated hardware, the processing circuit 50 includes, for example, a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, an ASIC (Application Specific Integrated Circuit), and an FPGA (Field-Programmable). Gate Array) or a combination of these. Each of the functions of each element of the speech recognition system 10 may be realized by a plurality of processing circuits, or these functions may be realized by a single processing circuit.

図６は、処理回路５０がプロセッサを用いて構成されている場合における音声認識システム１０のハードウェア構成を示している。この場合、音声認識システム１０の各要素の機能は、ソフトウェア等（ソフトウェア、ファームウェア、またはソフトウェアとファームウェア）との組み合わせにより実現される。ソフトウェア等はプログラムとして記述され、メモリ５２に格納される。処理回路５０としてのプロセッサ５１は、メモリ５２に記憶されたプログラムを読み出して実行することにより、各部の機能を実現する。すなわち、音声認識システム１０は、処理回路５０により実行されるときに、周囲の音声を常時解析することによってユーザの音声指示を認識する処理と、認識された音声指示に応じた処理を実行する処理と、音声指示が実行中処理の停止指示であった場合に実行中処理の停止処理を予約する処理と、音声指示以外の情報からユーザの意思を推定する処理と、停止処理の予約から予め定められた時間が経過すると予約された停止処理を実行する処理と、停止処理の予約から予め定められた時間が経過する前に、ユーザに実行中処理の停止意思がないと推定されると停止処理の予約を解除する処理と、が結果的に実行されることになるプログラムを格納するためのメモリ５２を備える。換言すれば、このプログラムは、音声認識システム１０の各要素の動作の手順や方法をコンピュータに実行させるものであるともいえる。 FIG. 6 shows a hardware configuration of the speech recognition system 10 when the processing circuit 50 is configured using a processor. In this case, the function of each element of the speech recognition system 10 is realized by a combination of software and the like (software, firmware, or software and firmware). Software or the like is described as a program and stored in the memory 52. The processor 51 as the processing circuit 50 implements the functions of the respective units by reading out and executing the program stored in the memory 52. That is, the voice recognition system 10 performs processing that recognizes the voice instruction of the user by constantly analyzing surrounding voices and processing that corresponds to the recognized voice instructions when executed by the processing circuit 50. Pre-determined from a process for reserving a stop process for an ongoing process when the voice instruction is a stop instruction for a process being executed, a process for estimating a user's intention from information other than a voice instruction, and a reservation for a stop process A process for executing a reserved stop process when a predetermined time elapses, and a stop process when it is estimated that the user does not intend to stop the process being executed before a predetermined time elapses from the reservation of the stop process. And a memory 52 for storing a program to be executed as a result. In other words, it can be said that this program causes a computer to execute the operation procedure and method of each element of the speech recognition system 10.

ここで、メモリ５２は、例えば、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、フラッシュメモリー、ＥＰＲＯＭ（Erasable Programmable Read Only Memory）、ＥＥＰＲＯＭ（Electrically Erasable Programmable Read Only Memory）などの、不揮発性または揮発性の半導体メモリ、ＨＤＤ（Hard Disk Drive）、磁気ディスク、フレキシブルディスク、光ディスク、コンパクトディスク、ミニディスク、ＤＶＤ（Digital Versatile Disc）およびそのドライブ装置等、または、今後使用されるあらゆる記憶媒体であってもよい。 Here, the memory 52 is nonvolatile or non-volatile such as RAM (Random Access Memory), ROM (Read Only Memory), flash memory, EPROM (Erasable Programmable Read Only Memory), EEPROM (Electrically Erasable Programmable Read Only Memory), etc. Volatile semiconductor memory, HDD (Hard Disk Drive), magnetic disk, flexible disk, optical disk, compact disk, mini disk, DVD (Digital Versatile Disc) and its drive device, etc., or any storage media used in the future May be.

以上、音声認識システム１０の各要素の機能が、ハードウェアおよびソフトウェア等のいずれか一方で実現される構成について説明した。しかしこれに限ったものではなく、音声認識システム１０の一部の要素を専用のハードウェアで実現し、別の一部の要素をソフトウェア等で実現する構成であってもよい。例えば、一部の要素については専用のハードウェアとしての処理回路５０でその機能を実現し、他の一部の要素についてはプロセッサ５１としての処理回路５０がメモリ５２に格納されたプログラムを読み出して実行することによってその機能を実現することが可能である。 The configuration in which the function of each element of the voice recognition system 10 is realized by either hardware or software has been described above. However, the present invention is not limited to this, and a configuration may be adopted in which some elements of the speech recognition system 10 are realized by dedicated hardware and another part is realized by software or the like. For example, the functions of some elements are realized by the processing circuit 50 as dedicated hardware, and the processing circuit 50 as the processor 51 reads a program stored in the memory 52 for the other some elements. The function can be realized by executing.

以上のように、音声認識システム１０は、ハードウェア、ソフトウェア等、またはこれらの組み合わせによって、上述の各機能を実現することができる。 As described above, the speech recognition system 10 can realize the above-described functions by hardware, software, or the like, or a combination thereof.

＜実施の形態２＞
処理実行部１３が実行する処理の中には、停止する前に、ユーザに対して停止の許可を求めるものがあることが想定される。実施の形態２の音声認識システム１０は、実行中処理をユーザの手操作に応じて停止させるときにはユーザに許可を要求するが、音声指示に応じて停止させるときにはその要求を省略する。 <Embodiment 2>
It is assumed that some of the processes executed by the process execution unit 13 ask the user for permission to stop before stopping. The voice recognition system 10 according to the second embodiment requests permission from the user when stopping the ongoing process according to the user's manual operation, but omits the request when stopping the process according to the voice instruction.

図７は、実施の形態２に係る音声認識システム１０における実行中処理の停止処理を示すフローチャートである。すなわち、図７のフローは、図３のステップＳ２０２において、手操作に応じて実行中処理の停止処理が行われるとき、あるいは、図４のステップＳ３０６において、予約された停止処理が実行されるときに行われる。 FIG. 7 is a flowchart showing a stop process of the ongoing process in the speech recognition system 10 according to the second embodiment. That is, the flow of FIG. 7 is performed when the stop process of the running process is performed according to the manual operation in step S202 of FIG. 3, or when the reserved stop process is executed in step S306 of FIG. To be done.

処理実行部１３は、実行中処理の停止処理を開始する際に、停止対象である実行中処理が、ユーザに停止の許可を求めるフロー（停止許可要求フロー）を有しているか否か確認する（ステップＳ４０１）。当該実行中処理が停止許可要求フローを有していなければ（ステップＳ４０１でＮＯ）、処理実行部１３は、当該実行中処理をそのまま停止させて（ステップＳ４０５）、図７のフローを終了する。 When the process execution unit 13 starts the stop process of the process being executed, the process execution unit 13 checks whether the process being executed has a flow for requesting the user to stop (a stop permission request flow). (Step S401). If the in-execution process does not have a stop permission request flow (NO in step S401), the process execution unit 13 stops the in-execution process as it is (step S405) and ends the flow of FIG.

一方、停止対象である実行中処理が停止許可要求フローを有していれば（ステップＳ４０１でＹＥＳ）、処理実行部１３は、現在の停止処理が、ユーザの手操作に応じて行われているか、あるいは、停止予約実行部１３２からの指示によって行われているかを確認する（ステップＳ４０２）。言い換えれば、ステップＳ４０２では、現在の停止処理が、図３のステップＳ２０２で行われているか、あるいは、図４のステップＳ３０６で行われているかが確認される。 On the other hand, if the ongoing process to be stopped has a stop permission request flow (YES in step S401), the process execution unit 13 determines whether the current stop process is being performed according to the user's manual operation. Alternatively, it is confirmed whether or not it is performed according to an instruction from the stop reservation execution unit 132 (step S402). In other words, in step S402, it is confirmed whether the current stop process is performed in step S202 in FIG. 3 or in step S306 in FIG.

現在の停止処理がユーザの手操作に応じて行われている場合（ステップＳ４０２でＮＯ）、処理実行部１３は、実行中処理の停止許可要求フローを実行する（ステップＳ４０３）。停止許可要求フローにおいては、処理実行部１３が、情報出力装置３を用いてユーザに実行中処理の停止許可を求める。例えば、「○○処理を停止しますか？」などの音声メッセージを情報出力装置３のスピーカから出力させたり、同様の文字メッセージを情報出力装置３の画面に表示させたりする。ユーザは、当該実行中処理の停止を許可するか否かの応答を、音声指示または手操作によって行うことができる。 When the current stop process is performed according to the user's manual operation (NO in step S402), the process execution unit 13 executes a stop permission request flow for the process being executed (step S403). In the stop permission request flow, the process execution unit 13 uses the information output device 3 to ask the user for permission to stop the process being executed. For example, a voice message such as “Do you want to stop the XX process?” Is output from the speaker of the information output device 3, or a similar character message is displayed on the screen of the information output device 3. The user can make a response as to whether or not to stop the ongoing process by voice instruction or manual operation.

処理実行部１３は、停止許可要求フローにおいてユーザの許可を得られれば（ステップＳ４０４でＹＥＳ）、停止対象の実行中処理を停止させて（ステップＳ４０５）、図７のフローを終了する。しかし、ユーザの許可を得ることができなければ（ステップＳ４０４でＮＯ）、実行中処理を停止させることなく、図７のフローを終了する。 If the process execution unit 13 obtains the user's permission in the stop permission request flow (YES in step S404), the process execution unit 13 stops the execution process to be stopped (step S405) and ends the flow of FIG. However, if the user's permission cannot be obtained (NO in step S404), the process in FIG. 7 is terminated without stopping the running process.

一方、現在の停止処理が、停止予約実行部１３２からの指示によって行われている場合には（ステップＳ４０２でＹＥＳ）、処理実行部１３は停止許可要求フローの実行を省略し（ステップＳ４０６）、停止対象の実行中処理を停止させて（ステップＳ４０５）、図７のフローを終了する。 On the other hand, when the current stop process is performed according to an instruction from the stop reservation execution unit 132 (YES in step S402), the process execution unit 13 omits the execution of the stop permission request flow (step S406), The in-execution process to be stopped is stopped (step S405), and the flow of FIG. 7 is ended.

このように、本実施の形態では、処理実行部１３は、手操作に応じて実行中処理の停止処理を行うときはユーザの許可を要求し、停止予約実行部１３２からの指示により実行中処理の停止処理を行うときはユーザの許可を要求しない。それにより、ユーザは、実行中処理の停止指示を行ってから当該実行中処理が停止するまでの間に、追加の音声指示や手操作を行う必要がなくなり、音声認識システム１０の利便性が向上する。 As described above, in the present embodiment, the process execution unit 13 requests the user's permission when performing the stop process of the in-execution process in response to a manual operation, and performs the in-execution process according to an instruction from the stop reservation execution unit 132. When the stop process is performed, the user permission is not requested. This eliminates the need for the user to perform an additional voice instruction or manual operation between the time when the in-execution process is instructed and the time when the in-execution process is stopped, thereby improving the convenience of the voice recognition system 10. To do.

＜実施の形態３＞
実施の形態１，２の音声認識システム１０においては、実行中処理の停止処理を予約された後、停止保留時間が経過するまでの間は、ユーザがその予約を取り消すことができる。しかし、停止処理が予約されたことにユーザが気づかなかった場合には、ユーザの意思に反して、実行中処理が停止することも考えられる。そこで、実施の形態３では、音声認識システム１０が、実行中処理の停止処理を予約するとユーザへの通知を行うようにする。 <Embodiment 3>
In the speech recognition system 10 according to the first and second embodiments, the user can cancel the reservation until the stop hold time elapses after the stop process of the process being executed is reserved. However, if the user does not notice that the stop process has been reserved, it is also conceivable that the executing process stops against the user's intention. Therefore, in the third embodiment, when the speech recognition system 10 reserves the stop process of the ongoing process, it notifies the user.

図８は、実施の形態３に係る音声認識システム１０の音声指示に対する動作を示す図である。図８のフローは、図２のフローに対し、停止処理の予約を行うステップＳ１０５の次に、停止処理を予約した旨をユーザに通知するステップＳ１０６が追加されたものとなっている。 FIG. 8 is a diagram illustrating an operation in response to a voice instruction of the voice recognition system 10 according to the third embodiment. In the flow of FIG. 8, step S106 for notifying the user that the stop process is reserved is added to the flow of FIG. 2 after step S105 for reserving the stop process.

ステップＳ１０６では、停止処理の予約を設定した停止予約設定部１３１が、情報出力装置３を用いて、停止処理を予約した旨をユーザへ通知する。例えば、停止予約設定部１３１が、「○○処理の停止が予約されました」、「○秒後にデータを自動保存して終了します」などの音声メッセージを、情報出力装置３のスピーカから出力させたり、同様の文字メッセージを情報出力装置３の画面に表示させたりする。また、停止処理が予約されると、情報出力装置３の画面の色を変更したり、アイコンの色を変化させたり、アイコンを点滅させたりしてもよい。また、予約された停止処理が実行されるまでの残り時間を示すインジケータを、情報出力装置３の画面に表示させてもよい。 In step S <b> 106, the stop reservation setting unit 131 that has set reservation for stop processing notifies the user that the stop processing has been reserved using the information output device 3. For example, the stop reservation setting unit 131 outputs a voice message from the speaker of the information output device 3 such as “the stop of the XX process is reserved” or “the data is automatically saved after XX seconds and ends”. Or a similar character message is displayed on the screen of the information output device 3. When the stop process is reserved, the screen color of the information output device 3 may be changed, the icon color may be changed, or the icon may be blinked. In addition, an indicator indicating the remaining time until the reserved stop process is executed may be displayed on the screen of the information output device 3.

本実施の形態によれば、停止処理が予約されたことにユーザが気づかせることができるため、ユーザの意思に反して実行中処理が停止することが防止される。 According to the present embodiment, since the user can be aware that the stop process is reserved, it is possible to prevent the ongoing process from stopping against the user's intention.

＜実施の形態４＞
図９は、実施の形態４に係る音声認識システム１０の構成を示す機能ブロック図である。図９の音声認識システム１０は、図１の構成に対し、手操作認識部１２に代えてユーザ挙動情報取得部１４を備える構成となっている。また、ユーザ挙動情報取得部１４には、ユーザの映像を撮影するカメラ４と、ユーザの生体情報を取得するセンサ５とが接続されている。カメラ４およびセンサ５はいずれか片方のみでもよい。 <Embodiment 4>
FIG. 9 is a functional block diagram showing the configuration of the speech recognition system 10 according to the fourth embodiment. The voice recognition system 10 in FIG. 9 is configured to include a user behavior information acquisition unit 14 instead of the manual operation recognition unit 12 in the configuration of FIG. The user behavior information acquisition unit 14 is connected to a camera 4 that captures the user's video and a sensor 5 that acquires the user's biological information. Only one of the camera 4 and the sensor 5 may be used.

ユーザ挙動情報取得部１４は、カメラ４またはセンサ５から、ユーザの挙動を示す情報であるユーザ挙動情報を取得する。ユーザ挙動情報の内容としては、例えば、ユーザの位置、動作（仕草）、顔の向き、口の動き、視線の方向、発話時の呼吸動作などである。 The user behavior information acquisition unit 14 acquires user behavior information that is information indicating a user's behavior from the camera 4 or the sensor 5. The contents of the user behavior information include, for example, the user's position, action (gesture), face direction, mouth movement, line-of-sight direction, and breathing action during speech.

また本実施の形態のユーザ意思推定部１３４は、ユーザ挙動情報取得部１４が取得したユーザ挙動情報から、ユーザに実行中処理を停止させる意思があるか否かを推定する。例えば、ユーザの視線の方向または顔の向きが情報出力装置３の画面を向いている場合や、ユーザが手操作入力装置２へ向けて手を伸ばす動作をした場合、ユーザが（音声指示のために）口を開いたり大きく息を吸い込んだりした場合などには、ユーザ意思推定部１３４は、ユーザに実行中処理の停止意思がないと推定する。また、ユーザが上記のような挙動をとったとしても、ユーザの位置がマイク１、手操作入力装置２または情報出力装置３から一定距離以上離れている場合や、ユーザがマイク１、手操作入力装置２または情報出力装置３の前から立ち去った場合などには、ユーザ意思推定部１３４は、ユーザに実行中処理の停止意思があると推定してもよい。 Further, the user intention estimation unit 134 according to the present embodiment estimates whether or not the user has an intention to stop the running process from the user behavior information acquired by the user behavior information acquisition unit 14. For example, when the direction of the user's line of sight or the direction of the face is facing the screen of the information output device 3 or when the user performs an action of reaching for the manual operation input device 2, the user (for voice instruction) B) When the user opens his / her mouth or inhales a lot, the user intention estimation unit 134 estimates that the user has no intention to stop the ongoing process. Even if the user behaves as described above, if the user's position is more than a certain distance from the microphone 1, the manual operation input device 2, or the information output device 3, When the user 2 or the information output device 3 leaves, for example, the user intention estimation unit 134 may estimate that the user has an intention to stop the ongoing process.

実施の形態４に係る音声認識システム１０の動作は、基本的に実施の形態１（図２〜図４）と同様である。ただし、図４のステップＳ３０２において、ユーザ意思推定部１３４は、ユーザ挙動情報から、ユーザに実行中処理の停止意思があるか否かを推定する。 The operation of the speech recognition system 10 according to the fourth embodiment is basically the same as that of the first embodiment (FIGS. 2 to 4). However, in step S302 in FIG. 4, the user intention estimation unit 134 estimates whether or not the user has an intention to stop the ongoing process from the user behavior information.

実施の形態４においては、ユーザに実行中処理を停止させる意思があるか否かの判断基準が実施の形態１とは異なるが、実施の形態１と同様の効果が得られる。 In the fourth embodiment, although the criterion for determining whether or not the user has the intention to stop the running process is different from that in the first embodiment, the same effect as in the first embodiment can be obtained.

また、実施の形態４は、実施の形態１〜３と組み合わせることも可能である。つまり、図１０のように、音声認識システム１０に、手操作認識部１２およびユーザ挙動情報取得部１４の両方を設けてもよい。その場合、ユーザ意思推定部１３４は、手操作入力装置２にユーザの手操作が入力されたとき、および、ユーザの挙動が予め定められた条件を満たしたときの両方に、ユーザに実行中処理の停止意思がないと推定する。 The fourth embodiment can be combined with the first to third embodiments. That is, as shown in FIG. 10, the voice recognition system 10 may be provided with both the manual operation recognition unit 12 and the user behavior information acquisition unit 14. In that case, the user intention estimation unit 134 performs processing that is being executed by the user both when the user's manual operation is input to the manual operation input device 2 and when the user's behavior satisfies a predetermined condition. It is estimated that there is no intention to stop.

＜変形例＞
本発明において、音声認識部１１が実行中処理の停止指示と判断する音声は、基本的には「止めて」や「停止」、「終了」などの言語であるが、例えば、子供の泣き声などの非言語の音声でもよい。 <Modification>
In the present invention, the voice that the voice recognition unit 11 determines to be an instruction to stop the ongoing process is basically a language such as “stop”, “stop”, “end”, etc. Non-language speech.

例えば、ユーザが、ＰＣを用いてインターネット通販サービスを利用しているときに子供の泣き声が聞こえると、ユーザはＰＣの前を離れて子供の世話をするであろう。このとき、ユーザが買い物データを一時保存し忘れて、通販サービスのシステム側がタイムアウトすると、買い物途中の情報が消えてしまい、買い物を再開するときには始めから買い物をやり直す必要が生じる。 For example, if a user hears a child's cry while using an Internet mail order service using a PC, the user will leave the PC and take care of the child. At this time, when the user forgets to temporarily save the shopping data and the system side of the mail order service times out, the information in the middle of shopping is lost, and when shopping is resumed, it is necessary to redo the shopping from the beginning.

上記のインターネット通販サービスの例において、子供の泣き声を停止指示と判断する音声認識システム１０がＰＣに搭載されていれば、音声認識システム１０が子供の泣き声を検出すると、買い物機能の処理の停止処理が予約される。その後、停止保留時間が経過すると、ＰＣ側で自動的に停止処理（買い物データの一時保存の処理を含む）が実行されため、買い物途中の情報が消えてしまうことを回避できる。また、停止保留時間が経過する前に、ユーザが子供の世話を終えて手操作または特定の仕草を行うと、買い物機能の処理の停止処理の予約が解除され、買い物を続行することができる。 In the example of the Internet mail order service described above, if the voice recognition system 10 that determines that a child's cry is an instruction to stop is installed in the PC, the shopping function process is stopped when the voice recognition system 10 detects a child's cry. Is reserved. Thereafter, when the stop hold time elapses, a stop process (including a process for temporarily storing shopping data) is automatically executed on the PC side, so that it is possible to avoid the disappearance of information during shopping. If the user finishes taking care of the child and performs a manual operation or a specific gesture before the stop hold time elapses, the reservation of the stop processing for the shopping function processing is canceled and the shopping can be continued.

実行中処理の停止指示と判断される非言語音声としては、子供の泣き声の他、ペットの鳴き声や、地震の衝撃音（ガタガタ音）などが考えられる。 As non-verbal speech that is determined to be an instruction to stop processing during execution, in addition to a child's cry, a pet's cry, an earthquake impact sound, and the like can be considered.

なお、本発明は、その発明の範囲内において、各実施の形態を自由に組み合わせたり、各実施の形態を適宜、変形、省略することが可能である。 It should be noted that the present invention can be freely combined with each other within the scope of the invention, and each embodiment can be appropriately modified or omitted.

１マイク、２手操作入力装置、３情報出力装置、４カメラ、５センサ、１０音声認識システム、１１音声認識部、１２手操作認識部、１３処理実行部、１４ユーザ挙動情報取得部、１３１停止予約設定部、１３２停止予約実行部、１３３停止予約解除部、１３４ユーザ意思推定部、５０処理回路、５１プロセッサ、５２メモリ。 1 microphone, 2 hand operation input device, 3 information output device, 4 camera, 5 sensor, 10 speech recognition system, 11 speech recognition unit, 12 hand operation recognition unit, 13 process execution unit, 14 user behavior information acquisition unit, 131 stop Reservation setting part, 132 Stop reservation execution part, 133 Stop reservation cancellation part, 134 User intention estimation part, 50 Processing circuit, 51 Processor, 52 Memory.

Claims

A voice recognition unit that recognizes voice instructions by constantly analyzing surrounding voices;
A process execution unit that executes a process according to the voice instruction recognized by the voice recognition unit;
The process execution unit
A stop reservation setting unit for reserving a stop process of the running process when the voice instruction is a stop instruction of the running process;
When a predetermined time has elapsed from the reservation of the stop process, a stop reservation execution unit that executes the reserved stop process;
A user intention estimation unit that estimates a user's intention from information other than the voice instruction;
If the user intention estimation unit estimates that the user does not intend to stop the process being executed before the predetermined time has elapsed since the reservation of the stop process, the reservation of the stop process is canceled. A stop reservation canceling section,
A speech recognition system comprising:

A manual operation recognition unit that recognizes the user's manual operation;
The user intention estimation unit estimates that the user does not intend to stop the process being executed when the manual operation is recognized before the predetermined time elapses from the reservation of the stop process.
The speech recognition system according to claim 1.

A manual operation recognition unit that recognizes the user's manual operation;
The process execution unit requests permission of the user when performing the process stop process in response to the manual operation, and performs the process stop process according to an instruction from the stop reservation execution unit. The voice recognition system according to claim 1 or 2, wherein when performing the operation, permission of the user is not requested.

A user behavior information acquisition unit for acquiring information indicating the user behavior;
If the user behavior satisfies a predetermined condition before the predetermined time has elapsed since the reservation of the stop process, the user intention estimation unit may stop the user from stopping the process being executed. Presumed not,
The voice recognition system according to any one of claims 1 to 3.

The speech recognition system according to any one of claims 1 to 4, wherein the stop reservation setting unit notifies the user when the stop process is reserved.

The voice recognition system according to claim 5, wherein the notification to the user is a display or voice notification.

A method for stopping processing in a voice recognition system in a voice recognition system,
The voice recognition unit of the voice recognition system recognizes voice instructions by constantly analyzing surrounding voices,
A processing execution unit of the voice recognition system executes a process according to the voice instruction recognized by the voice recognition unit;
When the voice instruction is an instruction to stop the process being executed, the stop reservation setting unit of the voice recognition system reserves the stop process of the process being executed,
The user intention estimation unit of the voice recognition system estimates the user's intention from information other than the voice instruction,
When a predetermined time has elapsed since the reservation of the stop process, the stop reservation execution unit of the voice recognition system executes the reserved stop process,
If the user intention estimation unit estimates that the user does not intend to stop the process being executed before the predetermined time has elapsed since the reservation of the stop process, the stop reservation of the speech recognition system is performed. The cancellation unit cancels the reservation for the stop process,
How to stop processing.