JP2009069202A

JP2009069202A - Speech processor

Info

Publication number: JP2009069202A
Application number: JP2007234443A
Authority: JP
Inventors: Mitsumasa Kubo; 充正久保
Original assignee: Teac Corp
Current assignee: Teac Corp
Priority date: 2007-09-10
Filing date: 2007-09-10
Publication date: 2009-04-02

Abstract

<P>PROBLEM TO BE SOLVED: To easily and surely cancel speech recognition by an operator when erroneous recognition of the speech recognition occurs. <P>SOLUTION: The operator inputs speech from a microphone to operate a sound device 14. When speech recognition results of a speech recognition unit/device control unit 12 are erroneous, the operator performs specific movement such as hands are shaken right and left. A moving body detecting unit 18 detects movement of the hands of the operator by non-contact, and returns to a just-before operation state when detecting the specific hand movement. When the hands are shaken from left to right, the operation may be cancelled, and when the hands are shaken from right to left, it may start speech recognition. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は音声処理装置、特に音声認識結果の取消に関する。 The present invention relates to a voice processing device, and more particularly to cancellation of a voice recognition result.

操作者の発話により機器の動作を制御する技術が公知であるが、当該機器がＤＶＤドライブやカーオーディオ装置、カーナビゲーションシステム等の音声を出力する機器である場合、操作者の発話と当該機器からの音声が混在することになるため誤認識が生じやすい。このため、誤認識が生じた場合には操作者が容易かつ確実に認識結果を取り消せることが望まれる。 A technique for controlling the operation of a device by an operator's utterance is known, but when the device is a device that outputs audio, such as a DVD drive, a car audio device, or a car navigation system, the operator's utterance and the device Are likely to cause misrecognition. For this reason, it is desired that the operator can easily and reliably cancel the recognition result when erroneous recognition occurs.

下記の特許文献１には、音声認識から所定時間経過しない間に無効指示が入力された場合に、音声認識結果を無効にすることが開示されており、無効指示として「チガウ」等の音声入力、キーボードやマウス等からの入力が例示されている。 Patent Document 1 below discloses that a speech recognition result is invalidated when an invalid instruction is input before a predetermined time has elapsed since the speech recognition. Examples of input from a keyboard or a mouse are illustrated.

また、下記の特許文献２にも、「チガウ」、「トリケシ」、「ムコウ」等の音声入力により直前の音声コマンドを無効とすることが開示されている。 Patent Document 2 below also discloses that the immediately preceding voice command is invalidated by voice input such as “Chiga”, “Trikes”, and “Muko”.

特開平７−２１９５８３号公報Japanese Patent Application Laid-Open No. 7-219583 特開平７−２１９５９１号公報Japanese Patent Laid-Open No. 7-219591

しかしながら、操作者の発話と当該機器からの音声が混在するため誤認識が生じやすい状況において、操作者の「チガウ」あるいは「トリケシ」等の音声で取消あるいは無効化する場合、その取消あるいは無効のための発話自体が誤認識されるおそれがある。一方、キーボードやマウス等からコマンドを入力することで取消あるいは無効化する方法では、操作者はこれらの入力デバイスを操作することを余儀なくされるため、音声認識により機器を操作する利点が失われるだけでなく、操作者がこれらの入力デバイスを操作できない場合には取り消すことができない問題がある。もちろん、音声認識自体は正しくても操作者が意思を変える場合もあり、この場合にも迅速に取り消せることが望ましい。 However, if the operator's utterance and the voice from the device are mixed, it is likely that misrecognition is likely to occur. When canceling or invalidating with the operator's voice such as “Chigau” or “Trikes”, the cancellation or invalidation Therefore, the utterance itself may be misrecognized. On the other hand, in the method of canceling or invalidating by inputting a command from a keyboard, mouse, etc., the operator is forced to operate these input devices, so the advantage of operating the device by voice recognition is only lost. In addition, there is a problem that cannot be canceled if the operator cannot operate these input devices. Of course, even if the voice recognition itself is correct, the operator may change his / her intention. In this case, it is desirable that it can be canceled quickly.

本発明の目的は、音声認識の誤認識等が生じた場合に、容易かつ確実に認識結果を取り消す（あるいは無効とする）ことができる装置を提供することにある。 An object of the present invention is to provide an apparatus capable of canceling (or invalidating) a recognition result easily and surely when erroneous recognition of voice recognition occurs.

本発明は、操作者の音声を認識して音声出力処理を含む処理を実行する音声処理装置であって、前記操作者の音声を認識する音声認識手段と、前記操作者の手の動きを非接触で検出する動体検出手段と、音声認識後に前記動体検出手段で前記操作者の手の第１の動きを検出した場合に前記音声認識手段による直前の音声認識結果を取り消す制御手段とを有することを特徴とする。 The present invention is a voice processing device that recognizes an operator's voice and executes processing including voice output processing, wherein the voice recognition means for recognizing the operator's voice and the movement of the operator's hand A moving object detecting means for detecting by contact; and a control means for canceling the immediately preceding voice recognition result by the voice recognizing means when the first movement of the operator's hand is detected by the moving object detecting means after the voice recognition. It is characterized by.

本発明の１つの実施形態では、前記制御手段は、前記動体検出手段で前記操作者の手の第２の動きを検出した場合に前記音声認識手段による音声認識を開始させる。 In one embodiment of the present invention, the control means starts voice recognition by the voice recognition means when the moving body detection means detects a second movement of the operator's hand.

また、本発明の他の実施形態では、前記制御手段は、音声認識後に前記動体検出手段で前記操作者の手の第３の動きを検出した場合に前記音声認識手段による直前の音声認識結果を確定する。 In another embodiment of the present invention, the control unit may display a voice recognition result immediately before by the voice recognition unit when the moving body detection unit detects a third movement of the operator's hand after voice recognition. Determine.

本発明によれば、音声の誤認識等が生じた場合に、容易かつ迅速にこれを取り消すことができる。 According to the present invention, when erroneous voice recognition or the like occurs, it can be easily and quickly canceled.

以下、図面に基づき本発明の実施形態について説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１に、本実施形態における音声処理装置の構成ブロック図を示す。マイクロフォン１０は、操作者（ユーザ）からの発話を入力し、電気信号に変換して音声認識部／装置制御部１２に出力する。 FIG. 1 is a block diagram showing the configuration of the speech processing apparatus according to this embodiment. The microphone 10 receives an utterance from an operator (user), converts it into an electrical signal, and outputs it to the voice recognition unit / device control unit 12.

音声認識部／装置制御部１２は、マイクロフォン１０からの操作者の発話を解析して音声認識し、認識結果に応じて制御用コマンドを音響装置１４に出力する。音声認識は公知の技術を用いることができ、予め音声データベースを記憶しておき、入力された発話と音声データベースとを照合して音声認識を行ってもよい。 The speech recognition unit / device control unit 12 analyzes the speech of the operator from the microphone 10 and recognizes the speech, and outputs a control command to the acoustic device 14 according to the recognition result. A known technique can be used for speech recognition. A speech database may be stored in advance, and speech recognition may be performed by collating the input utterance with the speech database.

音響装置１４はＤＶＤプレーヤやカーオーディオ、カーナビゲーション、ゲーム機、通信端末装置等の各種機器であって、少なくとも音声を出力する機器である。音声認識部／装置制御部１２からの制御用コマンドは例えば再生コマンド、停止コマンド、早送りコマンド、録音／録画コマンド、スクロールコマンド、トレイ開閉コマンド等である。音響装置１４からの音声信号はスピーカ１６に供給され音声出力される。 The acoustic device 14 is a variety of devices such as a DVD player, car audio, car navigation, game machine, and communication terminal device, and is a device that outputs at least sound. Control commands from the voice recognition unit / device control unit 12 are, for example, a playback command, a stop command, a fast-forward command, a recording / recording command, a scroll command, a tray open / close command, and the like. The audio signal from the acoustic device 14 is supplied to the speaker 16 and output as audio.

動体検出部１８は、操作者の手の動きを非接触で検出し、手の動きに応じて異なる検出信号を音声認識部／装置制御部１２及び動作メモリ部２０に出力する。動体検出部１８は、手の動きを例えば赤外線検知器を用いて検出する。操作者の手の動きは、操作者が容易に実現できる手の動きであり、手の開閉や手の左右の振りである。本実施形態では、手の左右の振りを検出する場合を例示する。動体検出部１８は、手の左右の振りを検出するが、手を左から右に振った場合、手を右から左に振った場合とを互いに区別して検出する。そして、手を左から右に振った場合を音声認識を開始させるトリガ信号として音声認識部／装置制御部１２に出力し、手を右から左に振った場合を音声認識を取り消す（ＵＮＤＯ）取消信号（あるいは戻り信号）として動作メモリ部２０に出力する。 The moving body detection unit 18 detects the movement of the operator's hand in a non-contact manner, and outputs different detection signals to the voice recognition unit / device control unit 12 and the operation memory unit 20 according to the movement of the hand. The moving body detection unit 18 detects the movement of the hand using, for example, an infrared detector. The movement of the operator's hand is a movement of the hand that can be easily realized by the operator, such as opening and closing of the hand and swinging of the hand from side to side. In the present embodiment, a case where a left / right swing of the hand is detected is illustrated. The moving body detection unit 18 detects the left / right swing of the hand, and detects the case where the hand is swung from left to right and the case where the hand is swung from right to left. Then, the case where the hand is shaken from the left to the right is output to the voice recognition unit / device control unit 12 as a trigger signal for starting the speech recognition, and the case where the hand is shaken from the right to the left is canceled (UNDO). A signal (or a return signal) is output to the operation memory unit 20.

動作メモリ部２０は、音声認識結果に応じて新たな動作を行う前の音響装置１４の動作状態を記憶する。そして、動体検出部１８から取消信号が供給された場合、動作メモリ部２０は直前の動作状態を音声認識部／装置制御部１２に出力して現在のコマンド、すなわち音声認識結果に応じたコマンドを取り消して直前の動作状態に復帰する。例えば、音響装置１４が再生中であるときに音声認識の結果停止コマンドが出力された場合、動作メモリ部２０は直前の動作状態として再生状態を音声認識部／装置制御部１２に供給する。音声認識部／装置制御部１２は、これに応じて再生コマンドを音響装置１４に出力する。 The operation memory unit 20 stores the operation state of the acoustic device 14 before performing a new operation according to the voice recognition result. When a cancel signal is supplied from the moving object detection unit 18, the operation memory unit 20 outputs the previous operation state to the voice recognition unit / device control unit 12 and outputs a current command, that is, a command corresponding to the voice recognition result. Cancel and return to the previous operating state. For example, when a stop command is output as a result of voice recognition while the audio device 14 is being played back, the operation memory unit 20 supplies the playback state to the voice recognition unit / device control unit 12 as the previous operation state. In response to this, the voice recognition unit / device control unit 12 outputs a reproduction command to the acoustic device 14.

図２に、動体検出部１８の構成を示す。動体検出部１８は、２個の赤外線検知器３０、３２及び方向検知器３４を含んで構成される。２個の赤外線検知器３０は、左右方向に所定距離だけ離間して近接配置され、それぞれ赤外線を検出して方向検知器３４に出力する。 FIG. 2 shows the configuration of the moving object detection unit 18. The moving body detection unit 18 includes two infrared detectors 30 and 32 and a direction detector 34. The two infrared detectors 30 are arranged close to each other by a predetermined distance in the left-right direction, and each detects infrared rays and outputs them to the direction detector 34.

方向検知器３４は、２個のコンパレータ３６、３８及び２個のフリップフロップ４０、４２を含んで構成される。コンパレータ３６の一方の入力端子（＋）には赤外線検知器３０が接続され、他方の入力端子（−）は所定の電圧（しきい値電圧）が印加される。コンパレータ３６は、赤外線検知器３０の出力を所定のしきい電圧と比較しその大小関係に応じてＨｉあるいはＬｏｗの２値信号をフリップフロップ４０、４２に出力する。また、コンパレータ３８の一方の入力端子（＋）には赤外線検知器３２が接続され、他方の入力端子（−）は所定の電圧（しきい値電圧）が印加される。コンパレータ３８は、赤外線検知器３２の出力を所定のしきい電圧と比較しその大小関係に応じてＨｉあるいはＬｏｗの２値信号をフリップフロップ４０、４２に出力する。 The direction detector 34 includes two comparators 36 and 38 and two flip-flops 40 and 42. The infrared detector 30 is connected to one input terminal (+) of the comparator 36, and a predetermined voltage (threshold voltage) is applied to the other input terminal (−). The comparator 36 compares the output of the infrared detector 30 with a predetermined threshold voltage, and outputs a Hi or Low binary signal to the flip-flops 40 and 42 according to the magnitude relationship. The infrared detector 32 is connected to one input terminal (+) of the comparator 38, and a predetermined voltage (threshold voltage) is applied to the other input terminal (−). The comparator 38 compares the output of the infrared detector 32 with a predetermined threshold voltage, and outputs a Hi or Low binary signal to the flip-flops 40 and 42 according to the magnitude relationship.

フリップフロップ４０、４２はＤ型フリップフロップである。フリップフロップ４０のＤ端子にはコンパレータ３８の出力が供給され、クロック（ＣＫ）端子にはコンパレータ３６の出力が供給される。また、フリップフロップ４２のＤ端子にはコンパレータ３６の出力が供給され、クロック（ＣＫ）端子にはコンパレータ３８の出力が供給される。従って、コンパレータ３８のＨｉ出力はコンパレータ３６のＨｉ出力のタイミングで出力されることとなり、コンパレータ３６のＨｉ出力はコンパレータ３８のＨｉ出力のタイミングで出力されることになる。コンパレータ３６は赤外線検知器３０で操作者の手から発する赤外線を検知したときにＨｉ出力となり、コンパレータ３８は赤外線検知器３２で操作者の手から発する赤外線を検知したときにＨｉ出力となる。結局、図中Ａで示すように操作者がまず赤外線検知器３０の前に手をかざし、次に赤外線検知器３２の前に手をかざすように手を振った場合にフリップフロップ４２からＨｉ出力が動作メモリ部２０に供給され、図中Ｂに示すように操作者がまず赤外線検知器３２の前に手をかざし、次に赤外線検知器３０の前に手をかざすように手を振った場合にフリップフロップ４０からＨｉ出力が装置制御部１２に供給される。フリップフロップ４０のＨｉ出力を音声認識のトリガ信号とし、フリップフロップ４２のＨｉ出力を取消信号とすると、図中Ａ方向に手を振った場合に取消、Ｂ方向に手を振った場合に音声認識トリガ／決定を指示できることになる。図中Ａ方向を右から左への手の振り方向、図中Ｂ方向を左から右への手の振り方向に対応させると、操作者は単に手の振り方向を変えることで音声認識開始と取消とを区別して指示できることになる。 The flip-flops 40 and 42 are D-type flip-flops. The output of the comparator 38 is supplied to the D terminal of the flip-flop 40, and the output of the comparator 36 is supplied to the clock (CK) terminal. The output of the comparator 36 is supplied to the D terminal of the flip-flop 42, and the output of the comparator 38 is supplied to the clock (CK) terminal. Accordingly, the Hi output of the comparator 38 is output at the timing of the Hi output of the comparator 36, and the Hi output of the comparator 36 is output at the timing of the Hi output of the comparator 38. The comparator 36 outputs Hi when the infrared detector 30 detects infrared rays emitted from the operator's hand, and the comparator 38 outputs Hi when the infrared detector 32 detects infrared rays emitted from the operator's hand. Eventually, as indicated by A in the figure, when the operator first places his hand in front of the infrared detector 30 and then shakes his hand in front of the infrared detector 32, the flip-flop 42 outputs Hi. Is supplied to the operation memory unit 20, and the operator first places his hand in front of the infrared detector 32 and then shakes his hand in front of the infrared detector 30 as shown in FIG. The Hi output from the flip-flop 40 is supplied to the device controller 12. If the Hi output of the flip-flop 40 is used as a trigger signal for voice recognition and the Hi output of the flip-flop 42 is used as a cancel signal, the voice recognition is canceled when the hand is waved in the A direction and the hand is waved in the B direction. The trigger / decision can be indicated. If the direction A in the figure corresponds to the direction of the hand swing from right to left, and the direction B in the figure corresponds to the direction of hand swing from left to right, the operator can start voice recognition by simply changing the direction of hand swing. It is possible to instruct differently from cancellation.

図３に、本実施形態の全体処理フローチャートを示す。装置を起動すると、音声認識部／装置制御部１２は、操作者が手を左から右に振ったか否かを判定する（Ｓ１０１）。具体的には、動体検出部１８から手を左から右に振った場合の検出信号を受信したか否かを判定する。手を左から右に振った場合の検出信号を受信した場合、音声認識部／装置制御部１２は所定の音声認識処理を開始し（Ｓ１０２）、マイクロフォン１０から入力された操作者の音声を解析して認識する（Ｓ１０３、Ｓ１０４）。手を左から右に振った場合の検出信号を受信しない場合、音声認識は開始しない。 FIG. 3 shows an overall process flowchart of the present embodiment. When the device is activated, the voice recognition unit / device control unit 12 determines whether or not the operator has shaken his / her hand from left to right (S101). Specifically, it is determined whether a detection signal is received from the moving object detection unit 18 when the hand is swung from left to right. When the detection signal when the hand is shaken from the left to the right is received, the voice recognition unit / device control unit 12 starts a predetermined voice recognition process (S102), and analyzes the voice of the operator input from the microphone 10 (S103, S104). If no detection signal is received when the hand is swung from left to right, speech recognition does not start.

音声認識を開始した場合、動作メモリ部２０は音声認識を開始する前、あるいは音声認識を開始したときの動作状態（これらを総称して直前状態とする）を記憶する（Ｓ１０５）。例えば、音声認識を開始したときに停止状態であればその停止状態を記憶する。音声認識を開始したときに再生状態であればその再生状態を記憶する。再生位置をさらに記憶してもよい。現在の状態を記憶するのはコンピュータ等におけるレジューム機能として公知である。直前の動作状態を記憶した後、音声認識部／装置制御部１２は音声認識結果に応じた制御用コマンドを音響装置１４に出力し、音響装置１３は当該コマンドに応じた動作を行う（Ｓ１０６）。 When the voice recognition is started, the operation memory unit 20 stores the operation state before starting the voice recognition or when the voice recognition is started (these are collectively referred to as a previous state) (S105). For example, if the voice recognition is started, the stop state is stored if the voice recognition is started. If it is in a playback state when voice recognition is started, the playback state is stored. The playback position may be further stored. The storage of the current state is known as a resume function in a computer or the like. After storing the previous operation state, the voice recognition unit / device control unit 12 outputs a control command corresponding to the voice recognition result to the acoustic device 14, and the acoustic device 13 performs an operation corresponding to the command (S106). .

音響装置１４がコマンドに応じた動作を行った後、操作者が手を右から左に振ったか否かを判定する（Ｓ１０７）。具体的には、動体検出部１８から手を右から左に振った場合の検出信号を受信したか否かを判定する。手を右から左に振った場合の検出信号を受信した場合、音声認識部／装置制御部１２はＳ１０６で実行したコマンドを取り消し、動作メモリ部２０に記憶された直前の動作状態を読み出して音響装置１４を直前の動作状態に復帰させる（Ｓ１０８）。一方、手を右から左に振った場合の検出信号を受信しなかった場合、Ｓ１０６で実行したコマンドを引き続き実行する。 After the acoustic device 14 performs an operation according to the command, it is determined whether or not the operator has shaken his / her hand from right to left (S107). Specifically, it is determined whether or not a detection signal is received from the moving object detection unit 18 when the hand is swung from right to left. When the detection signal when the hand is swung from right to left is received, the voice recognition unit / device control unit 12 cancels the command executed in S106, reads the immediately previous operation state stored in the operation memory unit 20, and performs sound processing. The device 14 is returned to the previous operating state (S108). On the other hand, if the detection signal when the hand is swung from right to left is not received, the command executed in S106 is continuously executed.

ここで、Ｓ１０７の判定をＳ１０６で音声認識の結果新たなコマンド実行を開始してから所定時間内に行ってもよい。つまり、新たなコマンド実行を開始してから所定時間以内に操作者が手を右から左に振ったか否かを判定する。所定時間内に操作者が手を右から左に振った場合にのみコマンドを取り消し、所定時間を経過した場合には操作者は取り消す意思がないものとみなして引き続きコマンドを実行する。あるいは、Ｓ１０７でＮＯと判定された場合、さらに操作者が手を左から右に振ったか否かを判定してもよい。具体的には、Ｓ１０１の判定と同様に動体検出部１８から手を左から右に振った場合の検出信号を受信したか否かを判定する。そして、操作者が手を右から左に振らず、左から右に振った場合には、Ｓ１０６で実行したコマンドを操作者が肯定したものとしてＳ１０６で実行したコマンドを確定し引き続きコマンドを実行する。このように、音声認識結果に対して操作者が手を右から左に振った場合に取り消し、手を左から右に振った場合に決定（あるいは確定）とすることで、操作者は単に手を振るだけで済むことになる。単に手を振るだけで音声認識の結果の取り消し／決定を指示できる利点は、音声認識の結果に応じたコマンドにより音響装置１４から音声出力された場合に顕著となる。すなわち、音響装置１４から音声出力されている状況で音声により取り消しあるいは決定を指示することは誤認識の可能性が高くなるため困難であるが、手を振る動作であればこのような問題は生じない。 Here, the determination in S107 may be performed within a predetermined time after the execution of a new command as a result of speech recognition in S106. That is, it is determined whether or not the operator swings his / her hand from right to left within a predetermined time after starting execution of a new command. The command is canceled only when the operator shakes his / her hand from right to left within a predetermined time, and when the predetermined time elapses, the operator continues to execute the command assuming that there is no intention to cancel. Or when it determines with NO by S107, you may determine whether the operator further swung the hand from the left to the right. Specifically, it is determined whether or not a detection signal when the hand is shaken from the left to the right is received from the moving object detection unit 18 as in the determination of S101. Then, when the operator does not shake his hand from right to left, but from left to right, the command executed in S106 is confirmed and the command is continuously executed by assuming that the command executed in S106 is affirmed by the operator. . In this way, the operator simply cancels the voice recognition result when the hand shakes his hand from right to left and decides (or confirms) when his hand shakes from left to right. Just shake it. The advantage that it is possible to instruct cancellation / determination of the result of speech recognition simply by waving a hand becomes prominent when speech is output from the acoustic device 14 by a command corresponding to the result of speech recognition. In other words, it is difficult to instruct cancellation or determination by voice in a situation where sound is output from the acoustic device 14 because the possibility of misrecognition increases, but such a problem arises if the hand is shaken. Absent.

本実施形態では、動体検出部１８として互いに左右に近接配置された２個の赤外線検知器３０、３２を用いているが、これに限定されないのは言うまでもなく、赤外線以外の波長の光を検出する光検知器、超音波を検出する超音波検知器、画像を検出するカメラ等、操作者の手の動きを非接触で検出できる任意の検知器が含まれる。 In the present embodiment, the two infrared detectors 30 and 32 arranged close to each other on the left and right sides are used as the moving body detection unit 18, but it is needless to say that the present invention is not limited to this. Any detector that can detect the movement of the operator's hand in a non-contact manner is included, such as a light detector, an ultrasonic detector that detects ultrasonic waves, and a camera that detects images.

また、本実施形態では手の動きとして操作者の手の振りを例示したが、手を開閉する、指を振る、指を移動する、指を特定の形状にする等も含まれる。例えば、人差し指を振ることで取り消し、親指を立てることで決定を指示するように構成することも可能である。 In this embodiment, the hand movement of the operator is exemplified as the movement of the hand. However, opening and closing the hand, shaking the finger, moving the finger, making the finger a specific shape, and the like are also included. For example, it can be configured to cancel by waving an index finger and to instruct a decision by placing a thumb.

実施形態の構成ブロック図である。It is a configuration block diagram of an embodiment. 図１の動体検出部の構成ブロック図である。FIG. 2 is a configuration block diagram of a moving object detection unit in FIG. 1. 実施形態の処理フローチャートである。It is a processing flowchart of an embodiment.

Explanation of symbols

１０マイクロフォン、１２音声認識部／装置制御部、１４音響装置、１６スピーカ、１８動体検出部、２０動作メモリ部。 10 microphone, 12 voice recognition unit / device control unit, 14 acoustic device, 16 speaker, 18 moving object detection unit, 20 operation memory unit.

Claims

A voice processing device that recognizes an operator's voice and executes processing including voice output processing,
Voice recognition means for recognizing the voice of the operator;
Moving object detection means for detecting the movement of the operator's hand in a non-contact manner;
Control means for canceling the immediately preceding voice recognition result by the voice recognition means when the first movement of the operator's hand is detected by the moving object detection means after voice recognition;
A speech processing apparatus comprising:

The apparatus of claim 1.
The voice processing apparatus, wherein the control means starts voice recognition by the voice recognition means when the moving body detection means detects a second movement of the operator's hand.

The apparatus according to claim 1,
The speech processing apparatus characterized in that the control means determines a speech recognition result immediately before by the speech recognition means when a third movement of the operator's hand is detected by the moving object detection means after speech recognition.