JPH06337695A

JPH06337695A - Voice recognition device

Info

Publication number: JPH06337695A
Application number: JP5125908A
Authority: JP
Inventors: Yasuyuki Masai; 康之正井
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1993-05-27
Filing date: 1993-05-27
Publication date: 1994-12-06

Abstract

PURPOSE:To properly switch a recognition objective vocabulary without being affected by the deviation in time due to delay, etc., at the time of transmitting a recognition objective vocabulary switching request. CONSTITUTION:This device is constituted so that at the time of inputting the recognition objective vocabulary switching command from a command input part 5, the time when the command is inputted is read from a timer part 3 and stored in a command time storage part 6, and at the time of inputting a sound from a sound input part 1, the starting point and the ending point of the sound are detected by a sound detection part 2, and the time is read from the timer part 3 every time to be stored in a sound time storage part 4, and at the time of storing a sound input ending time, a sound input starting time stored in the sound time storage part 4 is compared with the storage contents of the command time storage part 6 by a recognition objective vocabulary decision part 7, and whether the dealing input sound is voice-recognized or not for the vocabulary after the recognition objective vocabulary is switched is decided by whether the sound input starting time is later or not, and the recognition objective vocabulary switching command is issued for a voice recognition part 8.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、認識対象語彙を必要に
応じて切り換えて、最小限の語彙を対象に入力音声の認
識を行う音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition device for switching a recognition target vocabulary as needed and recognizing an input voice targeting a minimum vocabulary.

【０００２】[0002]

【従来の技術】近年、音声認識装置は、普及がめざまし
く、多くの分野で使用されるようになってきているもの
の、まだ、実用に十分な性能が得られていない。そこ
で、認識性能を向上させる一つの手段として、認識対象
語彙を必要に応じて切り換えることで、最小限の語彙を
対象に認識を行う手段が提案されている。2. Description of the Related Art In recent years, a voice recognition device has been remarkably popularized and has been used in many fields, but the performance sufficient for practical use has not yet been obtained. Therefore, as one means for improving recognition performance, a means for recognizing a minimum vocabulary by switching the recognition target vocabulary as necessary has been proposed.

【０００３】[0003]

【発明が解決しようとする課題】しかし、従来の認識対
象語彙を切り換える手段では、スイッチ等の認識対象語
彙切り換え要求入力手段から入力された認識対象語彙の
切り換え要求（認識対象語彙切り換えコマンド）が当該
切り換え手段（認識対象語彙判定手段）に伝達された時
点を境に、音声認識装置内で認識処理を司る音声認識手
段の認識対象語彙を切り換えていたため、以下に示すよ
うな問題が生じていた。However, in the conventional means for switching the recognition target vocabulary, the recognition target vocabulary switching request (recognition target vocabulary switching command) inputted from the recognition target vocabulary switching request input means such as a switch is concerned. Since the recognition target vocabulary of the voice recognition means that controls the recognition processing in the voice recognition device is switched at the time point when the recognition target vocabulary is transmitted to the switching means (recognition target vocabulary determination means), the following problems occur.

【０００４】（１）認識対象語彙切り換え要求入力手段
から認識対象語彙切り換え要求が入力されてから、その
要求が認識対象語彙切り換え手段（認識対象語彙判定手
段）に実際に伝達されるまでに、時間遅れがあると、音
声認識装置の認識対象語彙が変更されていないのに、利
用者が認識対象語彙変更後の語彙を発声してしまい、誤
認識が生じる。(1) The time from the input of the recognition target vocabulary switching request from the recognition target vocabulary switching request input means to the actual transmission of the request to the recognition target vocabulary switching means (recognition target vocabulary determination means). If there is a delay, even though the recognition target vocabulary of the voice recognition device is not changed, the user utters the vocabulary after the recognition target vocabulary is changed, resulting in erroneous recognition.

【０００５】（２）音声が入力されている途中で認識対
象語彙切り換え要求が入力されたとき、現在入力されて
いる音声は、同切り換え要求の入力前の認識対象語彙で
認識するのか、それとも、同切り換え要求に従って認識
対象語彙切り換えを行った後の認識対象語彙で認識する
のか判別がつかず、利用者の意図している認識対象語彙
での認識が行われない。(2) When a recognition target vocabulary switching request is input while voices are being input, is the currently input voice recognized by the recognition target vocabulary before the input of the switching request? The recognition target vocabulary after switching the recognition target vocabulary according to the switching request cannot be discriminated whether or not it is recognized, and the recognition in the recognition target vocabulary intended by the user is not performed.

【０００６】そこで本発明は、利用者が発声した音声が
どの語彙を認識対象としたものであるかを判定して認識
処理を行うことにより、認識対象語彙切り換え要求の伝
達時の遅延等による時間のずれの影響を受けることなく
適切に認識対象語彙を切り換えることができ、もって精
度よく音声認識が行える音声認識装置を提供することを
目的とする。Therefore, according to the present invention, the vocabulary uttered by the user is judged to determine which vocabulary is to be recognized, and the recognition processing is performed. It is an object of the present invention to provide a voice recognition device that can appropriately switch the vocabulary to be recognized without being affected by the shift of the voice recognition and can perform voice recognition with high accuracy.

【０００７】[0007]

【課題を解決するための手段】本発明の音声認識装置
は、入力された音声を認識する音声認識手段と、上記音
声の入力時刻を記憶するための第１の時刻記憶手段と、
上記音声認識手段で適用する認識対象語彙の切り換え要
求を入力するための認識対象語彙切り換え要求入力手段
と、上記認識対象語彙切り換え要求の入力時刻を記憶す
るための第２の時刻記憶手段と、上記第１および第２の
時刻記憶手段の両記憶内容を比較し、その比較結果に応
じて音声認識手段で適用する認識対象語彙の切り換えを
判定する認識対象語彙判定手段とを備え、この認識対象
語彙判定手段の判定結果に従う認識対象語彙を用いて音
声認識手段が認識処理を行うことを特徴とするものであ
る。A voice recognition device of the present invention comprises a voice recognition means for recognizing an input voice, and a first time storage means for storing the input time of the voice.
A recognition target vocabulary switching request input means for inputting a recognition target vocabulary switching request applied by the voice recognition means; a second time storage means for storing the input time of the recognition target vocabulary switching request; The first and second time storage means are compared with each other, and the recognition target vocabulary determination means for determining switching of the recognition target vocabulary applied by the voice recognition means is provided according to the comparison result. It is characterized in that the speech recognition means performs recognition processing using the recognition target vocabulary according to the judgment result of the judgment means.

【０００８】[0008]

【作用】上記の構成においては、音声が入力された時刻
が第１の時刻記憶手段に記憶され、認識対象語彙切り換
え要求が入力された時刻が第２の時刻記憶手段に記憶さ
れる。これら両時刻記憶手段の記憶内容は、例えば第１
の記憶手段に音声入力時刻が記憶された直後に認識対象
語彙判定手段により比較される。この比較処理により、
音声入力時刻には既に認識対象語彙切り換え要求が入力
されているか否かが調べられ、入力された音声がどの語
彙を認識対象としたものであるか否か、即ち要求された
認識対象語彙の切り換えを行うか否かが判定される。音
声認識手段は、この認識対象語彙判定手段の判定結果に
従い、現在の認識対象語彙のままで、或いは要求された
認識対象語彙に切り換えて、音声認識処理を行う。In the above structure, the time when the voice is input is stored in the first time storage means, and the time when the recognition target vocabulary switching request is input is stored in the second time storage means. The contents stored in both time storage means are, for example, the first
Immediately after the voice input time is stored in the storage means, the recognition target vocabulary determination means makes a comparison. By this comparison process,
It is checked whether or not a recognition target vocabulary switching request has already been input at the voice input time, and whether or not which vocabulary the input voice is for recognition, that is, switching of the requested recognition target vocabulary It is determined whether to perform. The voice recognition means performs the voice recognition processing according to the determination result of the recognition target vocabulary determination means, with the current recognition target vocabulary as it is or switching to the requested recognition target vocabulary.

【０００９】このように、音声の入力時刻が記憶される
第１の時刻記憶手段と認識対象語彙切り換え要求入力時
刻が記憶される第２の時刻記憶手段の両内容を比較し
て、音声入力時刻には既に認識対象語彙切り換え要求が
入力されているか否かを調べることにより、入力された
音声がどの語彙を認識対象としたものであるか否かを判
定することができるため、この判定結果に従う認識対象
語彙を用いて音声認識を行うことにより、認識対象語彙
切り換え要求の伝達時の遅延等による時間のずれの影響
を受けることなく適切に認識対象語彙を切り換えること
ができ、高性能な音声認識を実現することが可能とな
る。As described above, the contents of the first time storage means for storing the voice input time and the second time storage means for storing the recognition target vocabulary switching request input time are compared, and the voice input time is compared. It is possible to determine which vocabulary the input voice is the recognition target by checking whether or not the recognition target vocabulary switching request has already been input to, and therefore, according to this determination result. By performing speech recognition using the recognition target vocabulary, it is possible to switch the recognition target vocabulary appropriately without being affected by the time lag due to the delay in transmitting the recognition target vocabulary switching request, etc. Can be realized.

【００１０】[0010]

【実施例】以下、本発明の一実施例について図面を参照
して説明する。図１は、本発明の一実施例に係る音声認
識装置の構成を概略的に示すブロック図である。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram schematically showing the configuration of a voice recognition device according to an embodiment of the present invention.

【００１１】この図１に示す音声認識装置は、マイクロ
ホン等から入力された音声信号を増幅する音声入力部１
と、この音声入力部１により増幅された音声信号から入
力音声を検出する音声検出部２と、時計機能を持つタイ
マ部３と、音声検出部２により検出された入力音声の時
刻をタイマ部３から読み込み記憶するための音声時刻記
憶部４とを有している。The voice recognition apparatus shown in FIG. 1 has a voice input section 1 for amplifying a voice signal input from a microphone or the like.
A voice detection unit 2 for detecting an input voice from the voice signal amplified by the voice input unit 1, a timer unit 3 having a clock function, and a timer unit 3 for setting the time of the input voice detected by the voice detection unit 2. And a voice time storage unit 4 for reading and storing.

【００１２】図１に示す音声認識装置はまた、認識対象
語彙の切り換えを要求するコマンド（認識対象語彙切り
換えコマンド）を入力するためのコマンド入力部５と、
このコマンド入力部５により認識対象語彙切り換えコマ
ンドが入力された時刻をタイマ部３から読み込み記憶す
るためのコマンド時刻記憶部６と、音声時刻記憶部４お
よびコマンド時刻記憶部６に記憶された各時刻を比較し
て入力音声が認識対象語彙切り換えコマンドの入力前か
後かを判定する認識対象語彙判定部７と、音声認識部８
を有している。この音声認識部８は、認識対象語彙判定
部７の判定結果で決まる認識対象語彙の範囲内で、音声
入力部１により増幅された音声信号に対する音声認識を
行う。The voice recognition apparatus shown in FIG. 1 also includes a command input section 5 for inputting a command (recognition target vocabulary switching command) requesting switching of the recognition target vocabulary,
A command time storage unit 6 for reading and storing the time at which the recognition target vocabulary switching command is input by the command input unit 5, the voice time storage unit 4, and each time stored in the command time storage unit 6. And the speech recognition unit 8 for comparing the input speech with the speech recognition target vocabulary switching command before and after the recognition target vocabulary switching command is input.
have. The voice recognition unit 8 performs voice recognition on the voice signal amplified by the voice input unit 1 within the range of the recognition target vocabulary determined by the determination result of the recognition target vocabulary determination unit 7.

【００１３】次に、以上の構成の音声認識装置の動作を
説明する。まず音声入力部１は、マイクロホン等から入
力された音声信号を音声検出部２および音声認識部８で
使用する適切な信号レベルに増幅する。Next, the operation of the speech recognition apparatus having the above configuration will be described. First, the voice input unit 1 amplifies a voice signal input from a microphone or the like to an appropriate signal level used by the voice detection unit 2 and the voice recognition unit 8.

【００１４】音声検出部２は、音声入力部１により増幅
された音声信号を受けて、利用者が発声した音声（入力
音声）を検出する。この種の音声検出の実現手段として
は、従来より様々な手段が提案されている。本発明では
その方式は問わないが、入力音声信号のパワーを求め、
予め設定された閾値を越えた時点を音声の入力開始点と
し、閾値より小さくなった時点を音声の入力終了点とし
て音声を検出する方式が一般的であるため、ここでは、
当該方式により音声検出部２が実現されているものとす
る。なお、音声検出部２は、入力音声の継続時間（音声
信号のパワーが閾値を連続して越えている期間）が、本
装置で認識可能な語彙のうち最長の音声継続時間の語彙
に対応して予め定められた時間を越えた場合にも、その
時点を音声の入力終了点として検出するようになってい
る。The voice detection unit 2 receives the voice signal amplified by the voice input unit 1 and detects the voice uttered by the user (input voice). Various means have been conventionally proposed as means for realizing this type of voice detection. In the present invention, the method is not limited, but the power of the input audio signal is obtained,
Since a method is generally used in which a time point that exceeds a preset threshold value is used as a voice input start point, and a time point that is less than the threshold value is used as a voice input end point, the voice is detected here.
It is assumed that the voice detection unit 2 is realized by the method. The voice detection unit 2 corresponds to the vocabulary having the longest voice duration among the vocabularies that can be recognized by this device, when the duration of the input voice (the period when the power of the voice signal continuously exceeds the threshold value). Even when a predetermined time is exceeded, the time point is detected as a voice input end point.

【００１５】音声検出部２による音声検出結果は逐次音
声時刻記憶部４に送られる。音声時刻記憶部４は、音声
検出部２による音声入力開始点の検出時の時刻（音声入
力開始時刻）Ｔs と音声入力終了点の検出時の時刻（音
声入力終了時刻）Ｔe を、その都度タイマ部３から読み
込み、それぞれの時刻Ｔs ，Ｔe を内部記憶する。な
お、入力音声の時刻の記憶方法としては、上記の音声入
力開始時刻と音声入力終了時刻に限ったものではなく、
例えば、入力音声の中心の時刻を記憶するなど、用途に
合わせて種々変形して記憶することも可能である。The result of voice detection by the voice detector 2 is sequentially sent to the voice time memory 4. The voice time storage unit 4 sets a time (voice input start time) Ts when the voice input start point is detected by the voice detection unit 2 and a time (voice input end time) Te when the voice input end point is detected as a timer each time. It is read from the unit 3 and the respective times Ts and Te are internally stored. The method of storing the time of the input voice is not limited to the voice input start time and the voice input end time described above,
For example, it is possible to store the time at the center of the input voice by variously modifying it according to the application.

【００１６】次に、図１の音声認識装置（内の音声認識
部８）で適用する認識対象語彙（のセット）の切り換え
を要求する認識対象語彙切り換えコマンドは、コマンド
入力部５から入力される。このコマンド入力部５の実現
手段としては、スイッチ、キーボード等からの利用者操
作によるコマンド入力の他、フローチャートに従った処
理の流れに合わせたコマンド入力等が適用可能である。
本実施例で適用される認識対象語彙切り換えコマンド
は、単に認識対象語彙（のセット）の切り換えのみを指
示し、切り換え後の認識対象語彙（のセット）までは指
示するものでないものとする。Next, a recognition target vocabulary switching command for requesting switching of (a set of) recognition target vocabularies applied by the speech recognition apparatus (inside the speech recognition unit 8) of FIG. 1 is input from the command input unit 5. . As a means for realizing the command input unit 5, in addition to command input by a user operation from a switch, a keyboard, etc., command input according to the flow of processing according to the flowchart and the like can be applied.
It is assumed that the recognition target vocabulary switching command applied in the present embodiment merely instructs switching of (a set of) the recognition target vocabulary, and does not instruct up to (the set of) recognition target vocabulary after the switching.

【００１７】コマンド時刻記憶部６は、コマンド入力部
５から認識対象語彙切り換えコマンドが入力されると、
その入力時点の時刻（コマンド入力時刻）Ｔc をタイマ
部３から読み込み、内部記憶する。When the recognition target vocabulary switching command is input from the command input unit 5, the command time storage unit 6
The time (command input time) Tc at the time of the input is read from the timer unit 3 and internally stored.

【００１８】認識対象語彙判定部７は、このコマンド時
刻記憶部６に記憶されたコマンド入力時刻Ｔc 、および
音声時刻記憶部４に記憶された音声入力開始時刻Ｔs ，
音声入力終了時刻Ｔe をもとに、音声認識部８で適用す
る認識対象語彙を切り換えるか否かを判定する。この判
定の詳細については後述する。なお、認識対象語彙判定
部７は現在の認識対象語彙（のセット）の識別子を保持
する現認識対象語彙識別子保持部７１を有している。The recognition target vocabulary determination unit 7 has a command input time Tc stored in the command time storage unit 6 and a voice input start time Ts stored in the voice time storage unit 4.
Based on the voice input end time Te, it is determined whether or not to switch the recognition target vocabulary applied by the voice recognition unit 8. Details of this determination will be described later. The recognition target word vocabulary determination unit 7 has a current recognition target word vocabulary identifier holding unit 71 that holds an identifier of (a set of) the current recognition target vocabulary.

【００１９】音声認識部８は、認識対象語彙判定部７で
の判定結果に従う認識対象語彙（のセット）の範囲内
で、音声入力部１により増幅された音声信号に基づく入
力音声に対する認識処理を行う。この音声認識部８の実
現手段としては、従来より様々な手段が提案されている
が、本発明ではその方式は問わない。ここでは、例とし
て、利用者が音声を予め登録する必要のない不特定話者
音声認識方式を適用した音声認識部８の場合について、
図２および図３を参照して説明する。The voice recognition unit 8 performs a recognition process for an input voice based on the voice signal amplified by the voice input unit 1 within the range of (a set of) the recognition target vocabulary according to the determination result of the recognition target vocabulary determination unit 7. To do. Various means have been conventionally proposed as means for realizing the voice recognition unit 8, but any method may be used in the present invention. Here, as an example, regarding the case of the voice recognition unit 8 to which an unspecified speaker voice recognition method in which the user does not need to register voice in advance,
This will be described with reference to FIGS. 2 and 3.

【００２０】まず、音声入力部１により増幅された音声
信号は、図２に示す音声認識部８内の音響分析部８１に
入力される。音響分析部８１は、入力される音声信号を
音響分析して特徴パラメータを求めるためのもので、こ
こでは、入力音声信号をＬＰＣ（Linear Predictive Co
ding）メルケプストラム分析する。なお、音響分析部８
１での音響分析は、ＬＰＣメルケプストラム分析に限る
ものではなく、ＢＰＦ（Band Pass Filter）分析等でも
よい。First, the voice signal amplified by the voice input unit 1 is input to the acoustic analysis unit 81 in the voice recognition unit 8 shown in FIG. The acoustic analysis unit 81 is for acoustically analyzing an input voice signal to obtain a characteristic parameter. Here, the input voice signal is LPC (Linear Predictive Co).
ding) Mel cepstrum analysis. The acoustic analysis unit 8
The acoustic analysis in 1 is not limited to the LPC mel cepstrum analysis, but may be a BPF (Band Pass Filter) analysis or the like.

【００２１】音響分析部８１は、図３にその詳細を示す
ように、Ａ／Ｄ（アナログ／ディジタル）変換器８１
１、パワー計算部８１２およびＬＰＣ分析部８１３から
構成される。The acoustic analysis unit 81 has an A / D (analog / digital) converter 81, as shown in detail in FIG.
1, a power calculation unit 812 and an LPC analysis unit 813.

【００２２】音響分析部８１に入力された音声信号は、
Ａ／Ｄ変換器８１１にて、例えばサンプリング周波数１
２ｋＨｚ，１２ビットで量子化された後、パワー計算部
８１２に入力されて、その音声パワーが計算され、更に
ＬＰＣ分析部８１３に入力されて、ＬＰＣメルケプスト
ラム分析（ＬＰＣ分析）される。このＬＰＣ分析は、例
えばフレーム長１６msec、フレーム周期８msecで１６次
のＬＰＣメルケプストラムを分析パラメータとして行わ
れる。The audio signal input to the acoustic analysis unit 81 is
In the A / D converter 811, for example, the sampling frequency 1
After being quantized with 12 bits at 2 kHz, it is input to the power calculation unit 812, the voice power thereof is calculated, and further input to the LPC analysis unit 813, and LPC mel cepstrum analysis (LPC analysis) is performed. This LPC analysis is performed using a 16th-order LPC mel cepstrum as an analysis parameter with a frame length of 16 msec and a frame period of 8 msec, for example.

【００２３】さて、図２に示す音声認識部８内には、上
記の音響分析部８１の他に、音響分析部８１により求め
られた特徴パラメータによりフレーム毎にラベルを求め
るための音声量子化部８２、所定のＰＳ（音声セグメン
ト）単位の認識辞書が記憶されているＰＳ辞書記憶部８
３、音声量子化部８２により求められたラベル系列をＨ
ＭＭ（Hidden Markov Model ）を用いて認識するＨＭＭ
認識部８４、ＨＭＭのモデルＭのパラメータが記憶され
ているＨＭＭパラメータ記憶部８５、および認識結果を
出力する認識結果出力部８６が設けられている。Now, in the speech recognition unit 8 shown in FIG. 2, in addition to the acoustic analysis unit 81 described above, a speech quantization unit for obtaining a label for each frame based on the characteristic parameters obtained by the acoustic analysis unit 81. 82, a PS dictionary storage unit 8 in which a recognition dictionary in units of a predetermined PS (speech segment) is stored
3, the label sequence obtained by the voice quantizer 82 is set to H
HMM recognized using MM (Hidden Markov Model)
A recognition unit 84, an HMM parameter storage unit 85 in which the parameters of the HMM model M are stored, and a recognition result output unit 86 for outputting the recognition result are provided.

【００２４】音声量子化部８２は、音響分析部８１で分
析された特徴パラメータをＰＳ辞書記憶部８３に登録さ
れている所定のＰＳ単位の認識辞書と時間軸方向に連続
的にマッチング処理し、各フレーム毎に類似度が最大と
なるＰＳを量子化結果として、ＨＭＭ認識部８４に出力
する。ここで、音声量子化部８２でのＰＳによる連続マ
ッチング処理は、次式（１）に示す複合ＬＰＣメルケプ
ストラム類似尺度を用いて行われる。The voice quantizer 82 continuously performs matching processing of the characteristic parameters analyzed by the acoustic analyzer 81 with the recognition dictionary of a predetermined PS unit registered in the PS dictionary memory 83 in the time axis direction, The PS having the maximum similarity for each frame is output to the HMM recognition unit 84 as the quantization result. Here, the continuous matching process by PS in the voice quantizing unit 82 is performed using the composite LPC mel cepstrum similarity measure shown in the following expression (1).

【００２５】[0025]

【数１】なお、（１）式において、ＣはＬＰＣメルケプストラ
ム、Ｗ_m ^(Ki)、φ_m ^(ki)はそれぞれＰＳ名Ｋi の固有値
から求められる重みと固有ベクトルである。また、（
・）は内積を示し、‖ ‖はノルムを示している。[Equation 1] In the equation (1), C is the LPC mel cepstrum, W _m ^(Ki) and φ _m ^(ki) are weights and eigenvectors obtained from the eigenvalues of the PS name Ki, respectively. Also,(
・) Indicates dot product, and ‖ ‖ indicates norm.

【００２６】さて、本実施例で用いられるＰＳとして
は、例えば次のようなものがある。（１）持続性セグメント：（１−１）母音定常部（１−２）摩擦子音部（２）子音セグメント：母音への渡り（過渡部）を
含む部分［半音節］（３）音節境界セグメント：（３−１）母音境界（３−２）母音、子音境界（３−３）母音、無音境界（４）その他のセグメント：無声化母音等このうち、（１）、（２）および（４）の一部について
は音節を認識セグメントとする場合にも採用されること
が多い。しかし、本実施例におけるＰＳの長所は、上記
（１）、（２）、（４）に示されるセグメントに加えて
上記（３）の音節境界セグメントを採用したことにあ
る。The PS used in this embodiment is, for example, as follows. (1) Persistence segment: (1-1) Vowel stationary part (1-2) Friction consonant part (2) Consonant segment: Part including transition (transition part) to vowel [semi-syllable] (3) Syllable boundary segment : (3-1) Vowel boundary (3-2) Vowel, consonant boundary (3-3) Vowel, silence boundary (4) Other segment: unvoiced vowel, etc. Among these, (1), (2) and (4) ) Is often used when the recognition segment is a syllable. However, the advantage of PS in this embodiment is that the syllable boundary segment of (3) above is adopted in addition to the segments of (1), (2), and (4) above.

【００２７】ＨＭＭ認識部８４は、音声量子化部８２か
ら出力される各フレーム毎に類似度が最大となるＰＳ、
即ちＰＳ系列（ラベル系列）を受けて、対応する入力音
声の単語照合を行う。このＨＭＭ認識部８４の単語照合
を以下に説明する。The HMM recognizing unit 84 has a PS that maximizes the similarity for each frame output from the voice quantizing unit 82.
That is, the PS series (label series) is received and word matching of the corresponding input voice is performed. The word matching of the HMM recognition unit 84 will be described below.

【００２８】本実施例における単語照合は、上記のよう
にＰＳ系列をラベル系列として求め、これを単語（カテ
ゴリ）毎のＨＭＭに通して行うものである。ここで、Ｈ
ＭＭの一般的定式化について述べる。ＨＭＭでは、Ｎ個
の状態Ｓ₁ ，Ｓ₂ ，…，Ｓ_Nを持ち、初期状態がこれら
Ｎ個の状態に確率的に分布しているとする。音声では、
一定のフレーム周期毎に、ある確率（遷移確率）で状態
を遷移するモデルが使われる。遷移の際には、ある確率
（出力確率）でラベルを出力するが、ラベルを出力しな
いで状態を遷移するナル遷移を導入することもある。出
力ラベル系列が与えられても状態遷移系列は一意には決
まらない。観測できるのは、ラベル系列だけであること
からhidden（隠れ）markov model （ＨＭＭ）と呼ばれ
ている。ＨＭＭのモデルＭは次の６つのパラメータから
定義される。In the word matching in this embodiment, the PS series is obtained as a label series as described above, and this is passed through the HMM for each word (category). Where H
A general formulation of MM will be described. It is assumed that the HMM has N states S ₁ , S ₂ , ..., _SN , and the initial state is stochastically distributed to these N states. In voice,
A model that transitions a state with a certain probability (transition probability) is used for each fixed frame period. At the time of transition, a label is output with a certain probability (output probability), but a null transition that transitions the state without outputting the label may be introduced. Even if the output label sequence is given, the state transition sequence is not uniquely determined. Since only the label series can be observed, it is called a hidden markov model (HMM). The HMM model M is defined by the following six parameters.

【００２９】Ｎ：状態数（状態Ｓ₁ ，Ｓ₂ ，…，Ｓ_N ）Ｋ：ラベル数（ラベルＲ＝１，２，…，Ｋ）ｐ_ij ：遷移確率Ｓ_i からＳ_j に遷移する確率ｑ_ij(k) ：Ｓ_i からＳ_j への遷移の際にラベルｋを出力
する確率ｍ_i ：初期状態確率初期状態がＳ_i である確率Ｆ：最終状態の集合次に、モデルＭに対して音声の特徴を反映した遷移上の
制限を加える。音声では、一般的に状態Ｓ_i から以前に
通過した状態（Ｓ_i-1 ，Ｓ_i-2 ，…）に戻るようなルー
プの遷移は時間的前後関係を乱すため許されない。N: number of states (states S ₁ , S ₂ , ..., S _N ) K: number of labels (labels R = 1, 2, ..., K) p _ij : transition probability Probability of transition from S _i to S _j q _ij (k): Probability of outputting label k at the transition from S _i to S _j m _i : Probability of initial state S Probability of initial state S _i F: Set of final states Next, for model M Add restrictions on transitions that reflect the characteristics of voice. In the case of speech, generally, a transition of a loop that returns from a state S _i to a previously passed state (S _i-1 , S _i-2 , ...) Is not allowed because it disturbs the temporal context.

【００３０】ＨＭＭの評価は、モデルＭが第１位のラベ
ル系列Ｏ₁ ＝ｏ₁₁，ｏ₂₁，…，ｏ_T1を出力する確率Ｐｒ
（Ｏ／Ｍ）を求めることである。認識時には、ＨＭＭ認
識部８４で各モデルを仮定してＰｒ（Ｏ／Ｍ）が最大に
なるようなモデルＭをＨＭＭパラメータ記憶部８５から
探す。そして、この確率Ｐｒ（Ｏ／Ｍ）が最大となるモ
デルに対応するカテゴリを入力音声に対する認識結果と
して、認識結果出力部８６から出力する。The HMM is evaluated by the probability Pr that the model M outputs the first-ranked label sequence O ₁ = o ₁₁ , o ₂₁ , ..., O _T1.
(O / M). At the time of recognition, the HMM recognition unit 84 searches the HMM parameter storage unit 85 for a model M that maximizes Pr (O / M) assuming each model. Then, the recognition result output unit 86 outputs the category corresponding to the model having the maximum probability Pr (O / M) as the recognition result for the input voice.

【００３１】ここで、ＨＭＭパラメータ記憶部８５に
は、認識対象語彙セットＡを構成するアルファベット
“ａ”，ｂ”，“ｃ”の３語についてのモデルＭ（のパ
ラメータ）と、認識対象語彙セットＢを構成する数字
“１”，２”，“３”の３語についてのモデルＭ（のパ
ラメータ）が記憶されている。ＨＭＭ認識部８４による
認識時には、認識対象語彙セットＡまたは認識対象語彙
セットＢのいずれか一方を対象に、Ｐｒ（Ｏ／Ｍ）が最
大になるようなモデルＭを探す動作が行われる。いずれ
の認識対象語彙セットを対象とするかは、認識対象語彙
判定部７により指定される。Here, in the HMM parameter storage unit 85, the model M (parameters) of the three words of the alphabets "a", b ", and" c "forming the recognition target vocabulary set A and the recognition target vocabulary set. The model M (parameters) of the three words “1”, 2 ”, and“ 3 ”forming B is stored. At the time of recognition by the HMM recognition unit 84, an operation of searching for a model M that maximizes Pr (O / M) is performed for either the recognition target vocabulary set A or the recognition target vocabulary set B. Which recognition target vocabulary set is targeted is specified by the recognition target vocabulary determination unit 7.

【００３２】また、ＨＭＭの学習は、予め学習データの
ラベル系列をＨＭＭに与え、そこでＰｒ（Ｏ／Ｍ）が最
大となるモデルＭのパラメータを推定すればよい。そし
て、その推定パラメータをＨＭＭパラメータ記憶部８５
に登録しておく。For learning the HMM, the label series of the learning data may be given to the HMM in advance, and the parameter of the model M that maximizes Pr (O / M) may be estimated there. Then, the estimated parameter is stored in the HMM parameter storage unit 85.
Register in.

【００３３】次に、本発明に直接関係する、認識対象語
彙切り換えの動作について、認識対象語彙セットＡ
（“ａ”，“ｂ”，“ｃ”）と認識対象語彙セットＢ
（“１”，“２”，“３”）を切り換える場合を例に、
具体的に説明する。Next, regarding the recognition target vocabulary switching operation directly related to the present invention, the recognition target vocabulary set A will be described.
(“A”, “b”, “c”) and recognition target vocabulary set B
Taking "(1", "2", "3") as an example,
This will be specifically described.

【００３４】まず、現時点において、認識対象語彙判定
部７の現認識対象語彙識別子保持部７１には認識対象語
彙セットＡを示す識別子がセットされ、音声認識部８に
おける認識対象語彙がセットＡに設定されているとす
る。この状態で、コマンド入力部５から認識対象語彙切
り換えコマンドを入力して、現在の認識対象語彙セット
Ａから認識対象語彙セットＢに切り換え設定するものと
する。First, at present, an identifier indicating a recognition target vocabulary set A is set in the current recognition target vocabulary identifier holding unit 71 of the recognition target vocabulary determination unit 7, and the recognition target vocabulary in the voice recognition unit 8 is set in the set A. It has been done. In this state, a recognition target vocabulary switching command is input from the command input unit 5 to switch and set the current recognition target vocabulary set A to the recognition target vocabulary set B.

【００３５】ここで入力音声をＳ１として、この入力音
声Ｓ１と認識対象語彙切り換えコマンドの時間関係を図
４に示す。図４において、Ｔs は入力音声Ｓ１の音声入
力開始時刻、Ｔe は同入力音声Ｓ１の音声入力終了時刻
であり、前記したように音声時刻記憶部４に記憶されて
いる。次にＴc はコマンド（認識対象語彙切り換えコマ
ンド）入力時刻であり、前記したようにコマンド時刻記
憶部６に記憶されている。またＴd はコマンド入力部５
から入力された認識対象語彙切り換えコマンドが従来方
式を適用したと仮定した場合に認識対象語彙判定部７に
実際に伝達される時刻、Ｄt はその際の時間遅れであ
る。ここで、上記の各時刻は、Ｔc ＜Ｔs ＜Ｔd ＜Ｔe
の関係にあるものとする。Here, assuming that the input voice is S1, the time relationship between the input voice S1 and the recognition target vocabulary switching command is shown in FIG. In FIG. 4, Ts is the voice input start time of the input voice S1, and Te is the voice input end time of the same input voice S1, which is stored in the voice time storage unit 4 as described above. Next, Tc is a command (recognition target vocabulary switching command) input time, which is stored in the command time storage unit 6 as described above. Also, Td is the command input section 5
Dt is a time delay at which the recognition target vocabulary switching command input from the above is actually transmitted to the recognition target vocabulary determining unit 7 assuming that the conventional method is applied. Here, at each of the above times, Tc <Ts <Td <Te
It is assumed that

【００３６】このように図４の例は、時刻Ｔc に認識対
象語彙切り換えコマンドが入力された後に、時刻Ｔs か
ら音声Ｓ１の入力が開始され、従来であれば、時刻Ｔe
に音声Ｓ１の入力が終了する前の時刻Ｔd に同コマンド
が認識対象語彙判定部７に伝達される場合の時間関係を
示している。Thus, in the example of FIG. 4, the input of the voice S1 is started from the time Ts after the recognition target vocabulary switching command is input at the time Tc.
Shows the time relationship when the command is transmitted to the recognition target vocabulary determination unit 7 at time Td before the input of the voice S1 is completed.

【００３７】さて本実施例では、認識対象語彙判定部７
は、音声時刻記憶部４に音声入力終了時刻Ｔe が記憶さ
れた時点で、同記憶部４に既に記憶されている音声入力
開始時刻Ｔs を音声の入力時刻として、コマンド時刻記
憶部６に記憶されている時刻（コマンド入力時刻Ｔc ）
と比較し、音声入力開始時刻Ｔs の方が後であるか否か
により、対応する入力音声は認識対象語彙切り換え後の
語彙を対象に音声認識すべきであるか否か、即ち認識対
象語彙を切り換えるか否かを判定する。In the present embodiment, the recognition target vocabulary determination unit 7
Is stored in the command time storage unit 6 when the voice input end time Te is stored in the voice time storage unit 4 with the voice input start time Ts already stored in the storage unit 4 as the voice input time. Time (command input time Tc)
In comparison with whether the voice input start time Ts is later, whether the corresponding input voice should be voice-recognized for the vocabulary after switching the recognition target vocabulary, that is, the recognition target vocabulary is determined. Determine whether to switch.

【００３８】そして認識対象語彙切り換えを判定した場
合には、認識対象語彙判定部７は、現認識対象語彙識別
子保持部７１の内容を、現在音声認識に適用されている
認識対象語彙セットを示す識別子から、もう一方の認識
対象語彙セットを示す識別子に変更し、その旨を音声認
識部８（内のＨＭＭ認識部８４）に通知する。When it is determined that the recognition target vocabulary is switched, the recognition target vocabulary determining unit 7 sets the contents of the current recognition target vocabulary identifier holding unit 71 to an identifier indicating the recognition target vocabulary set currently applied to the speech recognition. Is changed to an identifier indicating the other recognition target vocabulary set, and the fact is notified to the voice recognition unit 8 (the HMM recognition unit 84 therein).

【００３９】図４の例では、入力音声Ｓ１の音声入力開
始時刻Ｔs が音声時刻記憶部４に記憶された時点には、
既に認識対象語彙切り換えコマンドが入力されてそのコ
マンド入力時刻Ｔc がコマンド時刻記憶部６に記憶され
ている。この場合、認識対象語彙判定部７は、音声入力
開始時刻Ｔs がコマンド入力時刻Ｔc より後であること
から、認識対象語彙切り換えを判定し、現認識対象語彙
識別子保持部７１の内容を、現在の認識対象語彙セット
Ａを示す識別子からもう一方の認識対象語彙セットＢを
示す識別子に変更し、音声認識部８（内のＨＭＭ認識部
８４）に対して認識対象語彙セットＢへの切り換えを通
知する。そして認識対象語彙判定部７は、この切り換え
通知を行うと、コマンド時刻記憶部６の内容を無効状態
に設定する。なお、音声入力開始時刻Ｔs がコマンド入
力時刻Ｔc より前の場合、あるいはコマンド時刻記憶部
６の内容が無効状態の場合には、音声認識部８（内のＨ
ＭＭ認識部８４）への切り換え通知はない。In the example of FIG. 4, when the voice input start time Ts of the input voice S1 is stored in the voice time storage unit 4,
The recognition target vocabulary switching command has already been input and the command input time Tc has been stored in the command time storage unit 6. In this case, since the voice input start time Ts is after the command input time Tc, the recognition target vocabulary determination unit 7 determines the recognition target vocabulary switching, and changes the contents of the current recognition target vocabulary identifier holding unit 71 to the current one. The identifier indicating the recognition target vocabulary set A is changed to the other identifier indicating the recognition target vocabulary set B, and the voice recognition unit 8 (the HMM recognition unit 84 therein) is notified of the switching to the recognition target vocabulary set B. . Then, the recognition target vocabulary determination unit 7 sets the contents of the command time storage unit 6 to the invalid state when the switching notification is made. If the voice input start time Ts is before the command input time Tc, or if the content of the command time storage unit 6 is invalid, the voice recognition unit 8 (H in
There is no notification of switching to the MM recognition unit 84).

【００４０】音声認識部８内のＨＭＭ認識部８４は、入
力音声Ｓ１の入力終了時点（所定時間を経過しても音声
入力が続いている場合には、その所定時間を越えた時
点）から、その音声Ｓ１の認識処理を開始する。ここ
で、認識対象語彙判定部７から上記の切り換え通知を受
けた場合には、入力音声Ｓ１の認識処理、即ちＰｒ（Ｏ
／Ｍ）が最大になるようなモデルＭを探す処理を、図４
に示すように、現在の認識対象語彙セットＡではなく
て、認識対象語彙セットＢを対象に行う。これにより現
在の認識対象語彙セットＡを認識対象語彙セットＢに切
り換えるために認識対象語彙切り換えコマンドを入力し
てから、例えば数字“１”を発声した場合には、この数
字“１”の入力音声（Ｓ１）は、認識対象語彙セットＢ
（“１”，“２”，“３”）を対象に認識処理されるの
で、正しく認識される。この認識結果は、音声認識部８
の認識結果出力部８６から出力される。The HMM recognizing unit 84 in the voice recognizing unit 8 starts from the input end time of the input voice S1 (when the voice input continues even after the lapse of a predetermined time, when the predetermined time is exceeded). The recognition process of the voice S1 is started. Here, when the above switching notification is received from the recognition target vocabulary determination unit 7, the recognition process of the input voice S1, that is, Pr (O
/ M) is maximized, the process of searching for the model M is shown in FIG.
As shown in, the recognition target vocabulary set B is used instead of the current recognition target vocabulary set A. As a result, when the recognition target vocabulary switching command is input in order to switch the current recognition target vocabulary set A to the recognition target vocabulary set B, for example, when the number "1" is uttered, the input voice of this number "1" is input. (S1) is a recognition target vocabulary set B
Since the recognition processing is performed for ("1", "2", "3"), it is correctly recognized. This recognition result is the voice recognition unit 8
Is output from the recognition result output unit 86.

【００４１】これに対し、従来のように、認識対象語彙
切り換えコマンドが認識対象語彙判定部７に伝達された
時刻Ｔd を境に認識対象語彙を切り換える方式では、入
力音声Ｓ１は、図４に示すように認識対象語彙セットＡ
（“ａ”，“ｂ”，“ｃ”）を対象に認識処理される。
このため、上記の例のように、現在の認識対象語彙セッ
トＡを認識対象語彙セットＢに切り換えるために認識対
象語彙切り換えコマンドを入力し、しかる後に数字
“１”を発声したとしても、この数字“１”の入力音声
（Ｓ１）が認識対象語彙セットＡ（“ａ”，“ｂ”，
“ｃ”）を対象に認識処理される従来方式では、認識結
果としては、アルファベット“ａ”，“ｂ”，“ｃ”の
いずれかしか得られず、正しく認識されることはない。On the other hand, in the conventional method of switching the recognition target vocabulary at the time Td when the recognition target vocabulary switching command is transmitted to the recognition target vocabulary determination section 7, the input voice S1 is shown in FIG. As a recognition target vocabulary set A
Recognition processing is performed for ("a", "b", "c").
Therefore, even if the recognition target vocabulary switching command is input to switch the current recognition target vocabulary set A to the recognition target vocabulary set B as in the above example, and then the number "1" is uttered, this number The input speech (S1) of “1” is the recognition target vocabulary set A (“a”, “b”,
In the conventional method in which “c”) is recognized, only the alphabet “a”, “b”, or “c” can be obtained as the recognition result, and the recognition result cannot be recognized correctly.

【００４２】以上に実施例を挙げて本発明を説明した
が、本発明は前記実施例に限るものではない。即ち、本
発明の要旨とするところは、認識対象語彙を適切に切り
換えて高い認識性能を得るために、音声の入力時刻と認
識対象語彙切り換えコマンド（認識対象語彙切り換え要
求）の入力時刻とを記憶して、これらの時刻から入力さ
れた音声をいずれの語彙を対象に認識処理すべきかを判
定し、音声認識を行うことにある。したがって、音声認
識方式、音声検出方式、認識対象語彙切り換えコマンド
入力方式、認識対象語彙判定規則などは、前記実施例で
示したものに限らない。Although the present invention has been described with reference to the embodiments, the present invention is not limited to the above embodiments. That is, the gist of the present invention is to store the input time of the voice and the input time of the recognition target vocabulary switching command (recognition target vocabulary switching request) in order to appropriately switch the recognition target vocabulary and obtain high recognition performance. Then, it is to determine which vocabulary the voice input from these times should be recognized, and perform voice recognition. Therefore, the voice recognition method, the voice detection method, the recognition target vocabulary switching command input method, the recognition target vocabulary determination rule, and the like are not limited to those shown in the above embodiment.

【００４３】例えば、前記実施例では、認識対象語彙判
定部７から音声認識部８に対する認識対象語彙の切り換
え通知を、同認識部８内のＨＭＭ認識部８４に導き、通
知された認識対象語彙の範囲内で認識処理を行うものと
して説明したが、当該切り換え通知を認識結果出力部８
６に導くようにし、ＨＭＭ認識部８４では、認識対象語
彙セットＡおよび認識対象語彙セットＢの両セットを全
て対象にして認識処理を行い、認識結果出力部８６にお
いて、切り換え通知により指定された切り換え後の認識
対象語彙セットの範囲内で認識結果を出力するようにす
ることも可能である。したがって、特許請求の範囲に記
載の「認識対象語彙を用いて認識処理を行う」ことに
は、この認識結果の出力処理も含まれる。For example, in the above-described embodiment, the recognition target vocabulary determination unit 7 sends a recognition target vocabulary switching notification to the voice recognition unit 8 to the HMM recognition unit 84 in the recognition unit 8 to recognize the notified recognition target vocabulary. Although it has been described that the recognition processing is performed within the range, the recognition result output unit 8 outputs the switching notification.
6, the HMM recognition unit 84 performs recognition processing on all of the recognition target vocabulary set A and the recognition target vocabulary set B, and the recognition result output unit 86 performs the switching specified by the switching notification. It is also possible to output the recognition result within the range of the later recognition target vocabulary set. Therefore, “performing recognition processing using the recognition target vocabulary” described in the claims also includes output processing of this recognition result.

【００４４】また、前記実施例では、音声入力開始時刻
Ｔs を音声の入力時刻として、コマンド時刻記憶部６の
内容（コマンド入力時刻Ｔc ）と比較する場合について
説明したが、音声入力開始時刻Ｔs に代えて、音声入力
開始時刻Ｔs と音声入力終了時刻Ｔe の中心の時刻（Ｔ
s ＋Ｔc ）／２、または音声入力終了時刻Ｔe を音声の
入力時刻とするようにしても構わない。In the above embodiment, the case where the voice input start time Ts is used as the voice input time and compared with the contents of the command time storage unit 6 (command input time Tc) has been described. Instead, the time of the center of the voice input start time Ts and the voice input end time Te (T
s + Tc) / 2, or the voice input end time Te may be used as the voice input time.

【００４５】また、前記実施例では、音声入力終了時刻
Ｔe が音声時刻記憶部４に記憶された際に、この音声時
刻記憶部４に既に記憶されている音声入力開始時刻Ｔs
を音声の入力時刻としてコマンド時刻記憶部６の内容
（コマンド入力時刻Ｔc ）と比較することにより認識対
象語彙の切り換えを判定するものとして説明したが、こ
の例のように音声の入力時刻として音声入力開始時刻Ｔ
s を適用する場合には、当該音声入力開始時刻Ｔs が音
声時刻記憶部４に記憶された際に、認識対象語彙の切り
換えを判定するようにしても構わない。Further, in the above embodiment, when the voice input end time Te is stored in the voice time storage unit 4, the voice input start time Ts already stored in the voice time storage unit 4 is obtained.
It has been described that the switching of the recognition target vocabulary is determined by comparing with the content of the command time storage unit 6 (command input time Tc) as the voice input time, but as in this example, the voice input time is used as the voice input time. Start time T
When s is applied, switching of the recognition target vocabulary may be determined when the voice input start time Ts is stored in the voice time storage unit 4.

【００４６】また、前記実施例では、コマンド入力部５
からの認識対象語彙切り換えコマンドが、認識対象語彙
（のセット）の単なる切り換えのみを指示するものであ
るとしたが、切り換え後の認識対象語彙（セット）を併
せて指定するものであっても構わない。この場合には、
認識対象語彙を一定順序で切り換えるのではなく、指定
の認識対象語彙に切り換えればよい。その他、本発明は
その要旨を逸脱しない範囲で種々変形して実施すること
ができる。In the above embodiment, the command input section 5
Although the recognition target vocabulary switching command from (1) is used to instruct only the switching of (the set of) the recognition target vocabulary, the recognition target vocabulary (set) after switching may also be designated. Absent. In this case,
Instead of switching the recognition target vocabulary in a fixed order, it is sufficient to switch to the designated recognition target vocabulary. In addition, the present invention can be variously modified and implemented without departing from the scope of the invention.

【００４７】[0047]

【発明の効果】以上説明したように本発明によれば、音
声の入力時刻および認識対象語彙の切り換え要求の入力
時刻を記憶しておき、その記憶しておいた時刻から、入
力された音声がいずれの語彙を認識対象語彙として発声
されたものかを判定する構成としたので、認識対象語彙
の切り換え要求の伝達時の遅延時間等による時間のずれ
の影響を受けることなく、認識対象語彙を適切に切り換
えることができ、高性能な音声認識を実現することがで
きる等の実用上多大なる効果が奏せられる。As described above, according to the present invention, the input time of the voice and the input time of the request for switching the recognition target vocabulary are stored, and the input voice is changed from the stored time. Since it is configured to determine which vocabulary was uttered as the recognition target vocabulary, the recognition target vocabulary is appropriate without being affected by the time lag due to the delay time etc. when transmitting the request for switching the recognition target vocabulary. It is possible to achieve a great effect in practical use, such as switching to, and realizing high-performance voice recognition.

[Brief description of drawings]

【図１】本発明の一実施例に係る音声認識装置の基本構
成を示すブロック図。FIG. 1 is a block diagram showing a basic configuration of a voice recognition device according to an embodiment of the present invention.

【図２】図１中の音声認識部８の構成を示すブロック
図。2 is a block diagram showing a configuration of a voice recognition unit 8 in FIG.

【図３】図２中の音響分析部８１の詳細を示す図。FIG. 3 is a diagram showing details of an acoustic analysis unit 81 in FIG.

【図４】同実施例における認識対象語彙判定に使用する
時刻の前後関係の一例を示す図。FIG. 4 is a diagram showing an example of a front-back relation of time used for recognition target vocabulary determination in the embodiment.

[Explanation of symbols]

１…音声入力部、２…音声検出部、３…タイマ部，４…
音声時刻記憶部（第１の時刻記憶手段）、５…コマンド
入力部（認識対象語彙切り換え要求入力手段）、６…コ
マンド時刻記憶部（第２の時刻記憶手段）、７…認識対
象語彙判定部、８…音声認識部、７１…現認識対象語彙
識別子保持部、Ａ，Ｂ…認識対象語彙セット。1 ... Voice input section, 2 ... Voice detection section, 3 ... Timer section, 4 ...
Voice time storage unit (first time storage unit), 5 ... Command input unit (recognition target vocabulary switching request input unit), 6 ... Command time storage unit (second time storage unit), 7 ... Recognition target vocabulary determination unit , 8 ... voice recognition unit, 71 ... current recognition target vocabulary identifier holding unit, A, B ... recognition target vocabulary set.

Claims

[Claims]

1. A voice recognition means for recognizing an input voice, a first time storage means for storing an input time of the voice, and a request for switching a recognition target vocabulary applied by the voice recognition means. And a second time storage means for storing the input time of the recognition target vocabulary switching request, and a comparison of both stored contents of the first and second time storage means. And a recognition target vocabulary determination unit that determines switching of the recognition target vocabulary applied by the voice recognition unit according to the comparison result, wherein the voice recognition unit recognizes according to the determination result of the recognition target vocabulary determination unit. A speech recognition device characterized by performing recognition processing using a target vocabulary.