JPH0627992A

JPH0627992A - Speech recognizing device

Info

Publication number: JPH0627992A
Application number: JP4184988A
Authority: JP
Inventors: Akira Nakayama; 昭中山
Original assignee: Toshiba Corp; Toshiba Computer Engineering Corp
Current assignee: Toshiba Corp; Toshiba Computer Engineering Corp
Priority date: 1992-07-13
Filing date: 1992-07-13
Publication date: 1994-02-04

Abstract

PURPOSE:To perform highly precise speech recognition processing at a high speed by easily detecting a silent section of a speech and efficiently evading meaningless and unnecessary calculation in the voiceless section. CONSTITUTION:The norm ¦¦x¦¦ of a speech pattern vector (x) as a recognition unit consisting of a feature parameter extracted from an input speech by a feature extraction part 1 is received from a similarity calculation part 3 and a similarity calculation control part 4 compares the norm ¦¦x¦¦ with a threshold value THLVL as the reference for silent section detection to detect the corresponding recognition unit being in the silent section when ¦¦x¦¦ <= THLVL. Further, the similarity calculation by the similarity calculation part 3 is omitted by the voiceless section detection by the similarity calculation control part 4 and a similarity value is automatically set to zero. Further, a noise level detection part 5 detects the noise level at the periphery of the device and the threshold value THLVL is set by a threshold value setting part 6 according to the noise level (n).

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は音声認識装置の改良に関
する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an improvement of a voice recognition device.

【０００２】[0002]

【従来の技術】音声認識の技術は、優れたマンマシン・
インタ−フェ−スを実現する上での重要な役割を担って
いる。最近では計算機の性能向上に伴い、連続音声認識
の研究や開発も盛んに行われるようになった。2. Description of the Related Art Speech recognition technology is an excellent man-machine
It plays an important role in realizing the interface. Recently, as the performance of computers has improved, continuous speech recognition has been actively researched and developed.

【０００３】このような装置では、音素と呼ばれるよう
な単位で認識処理を行うため、膨大な計算量が必要とさ
れ、リアルタイム処理に要するハードウェア規模はまだ
大きい。ところが、この計算では無駄な処理が多く、特
に音声の無音区間に対する無意味な計算が行われている
という問題があった。しかも、この無意味な計算の結
果、認識精度が劣化するという問題もあった。In such an apparatus, since recognition processing is performed in units called phonemes, a huge amount of calculation is required and the hardware scale required for real-time processing is still large. However, in this calculation, there is a lot of useless processing, and there is a problem in that meaningless calculation is performed especially for a silent section of voice. Moreover, there is a problem that recognition accuracy is deteriorated as a result of this meaningless calculation.

【０００４】[0004]

【発明が解決しようとする課題】このように従来の音声
認識装置では、無音区間に対する無意味で無駄な処理が
あり、認識精度の劣化と処理速度の低下をもたらすとい
う問題があった。As described above, the conventional voice recognition apparatus has a problem that there is meaningless and useless processing for a silent section, which causes deterioration of recognition accuracy and processing speed.

【０００５】本発明はこのような事情を考慮してなされ
たもので、その目的とするところは、音声の無音区間を
簡単に検出して、この無音区間での、無意味で無駄な計
算を効率よく回避することにより、高精度でしかも高速
な音声認識処理が行える音声認識装置を提供することに
ある。The present invention has been made in consideration of such circumstances, and an object of the present invention is to easily detect a silent section of a voice and perform meaningless and useless calculation in this silent section. It is an object of the present invention to provide a voice recognition device capable of performing voice recognition processing with high accuracy and high speed by avoiding it efficiently.

【０００６】[0006]

【課題を解決するための手段】本発明は、入力音声から
その特徴パラメ−タを抽出する特徴抽出手段と、この抽
出された特徴パラメータから成る認識単位の音声パター
ンベクトルｘと予め認識単位として用意した標準パター
ンセットｙ_k（ｋ＝１，２，…）との間で類似度を計算
する類似度計算手段とを備え、この計算された類似度値
をもとに認識結果を出力する音声認識装置において、上
記音声パターンベクトルｘのノルム‖ｘ‖の大きさによ
って音声の無音区間を認識単位で検出し、この検出結果
に応じて類似度計算手段の類似度値計算を制御する類似
度計算制御手段を設け、この類似度計算制御手段によっ
て無音区間が検出された場合には、類似度計算手段によ
る対応する類似度計算が省略されるようにしたことを特
徴とするものである。SUMMARY OF THE INVENTION The present invention provides a feature extraction means for extracting feature parameters from an input voice, a voice pattern vector x of a recognition unit composed of the extracted feature parameters, and a previously prepared recognition unit. And a standard pattern set y _k (k = 1, 2, ...) And a similarity calculation means for calculating a similarity, and a speech recognition for outputting a recognition result based on the calculated similarity value. In the apparatus, a similarity calculation control for detecting a silent section of a voice in a recognition unit according to the size of the norm ‖x‖ of the voice pattern vector x, and controlling the similarity value calculation of the similarity calculation means according to the detection result. Means is provided, and when a silence section is detected by the similarity calculation control means, the corresponding similarity calculation by the similarity calculation means is omitted. .

【０００７】また本発明は、装置周囲の騒音レベルを検
出する騒音レベル検出手段を更に設け、この検出された
騒音レベルに対応したしきい値と上記音声パターンベク
トルｘのノルム‖ｘ‖とを類似度計算制御手段にて比較
し、その比較結果により無音区間を検出するようにした
ことをも特徴とする。Further, the present invention further comprises noise level detecting means for detecting the noise level around the device, and the threshold value corresponding to the detected noise level is similar to the norm ‖x‖ of the voice pattern vector x. It is also characterized in that the degree calculation control means makes a comparison, and a silent section is detected based on the comparison result.

【０００８】[0008]

【作用】上記の構成においては、認識単位の音声パター
ンベクトルｘのノルム‖ｘ‖の大きさが類似度計算制御
手段にてチェックされ、その大きさが或るしきい値以下
の場合には、その認識単位の区間は音声の無音区間とし
て検出され、類似度計算手段による対応する類似度計算
が省略されるように制御される。ここで、音声パターン
ベクトルｘのノルム‖ｘ‖の大きさは装置周囲の騒音レ
ベルの影響を受けることから、上記のしきい値を騒音レ
ベルに応じて設定することで、音声の無音区間をより高
精度に検出することが可能となる。In the above arrangement, the magnitude of the norm ‖x‖ of the voice pattern vector x of the recognition unit is checked by the similarity calculation control means, and if the magnitude is less than a certain threshold value, The section of the recognition unit is detected as a silent section of voice, and is controlled so that the corresponding similarity calculation by the similarity calculation unit is omitted. Here, the size of the norm ‖x‖ of the voice pattern vector x is influenced by the noise level around the device. Therefore, by setting the above threshold value according to the noise level, the silent period of the voice is further improved. It is possible to detect with high accuracy.

【０００９】このように上記の構成においては、しきい
値と認識単位の音声パターンベクトルｘのノルム‖ｘ‖
を比較することで、音声の無音区間を簡単に検出できる
ようになり、この比較結果に応じて類似度計算を制御す
ることにより、音声の無音区間では、無意味な類似度値
の算出と無駄な計算を効率よく回避することが可能とな
る。したがって、本発明によれば、高精度でしかも高速
な認識処理を行うことができ、従来と比較して格段に装
置の性能が向上する。As described above, in the above configuration, the norm ‖x‖ of the threshold value and the voice pattern vector x of the recognition unit.
It becomes possible to easily detect the silent section of the voice by comparing the above.By controlling the similarity calculation according to the comparison result, in the silent section of the voice, meaningless similarity value calculation and waste are calculated. It is possible to efficiently avoid various calculations. Therefore, according to the present invention, highly accurate and high-speed recognition processing can be performed, and the performance of the device is significantly improved as compared with the related art.

【００１０】[0010]

【実施例】以下、本発明を単語音声認識に適用した一実
施例につき図面を参照して説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment in which the present invention is applied to word speech recognition will be described below with reference to the drawings.

【００１１】図１は本発明の一実施例に係る音声認識装
置の概略構成を示すブロック図である。図１において、
１は入力音声からその特徴量を抽出するための特徴抽出
部である。特徴抽出部１は、入力音声の特徴量を認識単
位（例えば音素のような時間的に短い単位）毎に音声パ
ターンベクトルｘとして抽出する。FIG. 1 is a block diagram showing a schematic configuration of a voice recognition device according to an embodiment of the present invention. In FIG.
Reference numeral 1 is a feature extraction unit for extracting the feature amount from the input voice. The feature extraction unit 1 extracts the feature amount of the input voice as a voice pattern vector x for each recognition unit (for example, a temporally short unit such as a phoneme).

【００１２】２は予め認識カテゴリ毎に用意された標準
パターンセットｙ_k（ｋ＝１，２…で、ｋは認識カテゴ
リに対応する）が格納された標準パターン記憶部、３は
類似度計算部である。類似度計算部３は、特徴抽出部１
によって抽出された音声パターンベクトルｘと標準パタ
ーン記憶部２に予め格納されている標準パターンセット
ｙ_k（ｋ＝１，２…）との間で類似度計算を行う。Reference numeral 2 is a standard pattern storage unit in which a standard pattern set y _k (k = 1, 2 ..., k corresponds to the recognition category) prepared in advance for each recognition category is stored, and 3 is a similarity calculation unit. Is. The similarity calculation unit 3 includes the feature extraction unit 1
The similarity calculation is performed between the voice pattern vector x extracted by and the standard pattern set y _k (k = 1, 2 ...) Prestored in the standard pattern storage unit 2.

【００１３】４は特徴抽出部１によって抽出された音声
パターンベクトルｘのノルム‖ｘ‖の大きさによって類
似度計算部３の類似度値計算を制御する類似度計算制御
部である。類似度計算制御部４は、音声パターンベクト
ルｘのノルム‖ｘ‖の大きさと後述するしきい値設定部
６によって設定されるしきい値ＴＨＬＶＬとを比較する
ことで、音声の無音区間を認識単位で検出し、その検出
結果に応じて類似度計算部３を制御する。Reference numeral 4 is a similarity calculation control unit for controlling the similarity value calculation of the similarity calculation unit 3 according to the size of the norm.parallel.x.parallel. Of the voice pattern vector x extracted by the feature extraction unit 1. The similarity calculation control unit 4 compares the size of the norm ‖x‖ of the voice pattern vector x with a threshold value THLVL set by a threshold value setting unit 6 to be described later to recognize a silent section of voice as a recognition unit. And the similarity calculation unit 3 is controlled according to the detection result.

【００１４】５は本装置が使用される周囲の騒音レベル
を検出する騒音レベル検出部、６はしきい値設定部であ
る。しきい値設定部６は、騒音レベル検出部５によって
検出された騒音レベルｎで決まるしきい値ＴＨ(n) を、
類似度計算制御部４での比較の基準となるしきい値ＴＨ
ＬＶＬとして設定する。Reference numeral 5 is a noise level detecting section for detecting a noise level around the apparatus in which this apparatus is used, and 6 is a threshold setting section. The threshold value setting unit 6 sets a threshold value TH (n) determined by the noise level n detected by the noise level detecting unit 5,
Threshold value TH that serves as a reference for comparison in the similarity calculation control unit 4
Set as LVL.

【００１５】７は類似度計算部３によって計算された類
似度の値から認識結果を判定する識別判定部である。識
別判定部７は、認識単位が例えば音素の場合であれば、
音素のラベル系列（音素ラティス）を出力する。Reference numeral 7 is an identification determination unit that determines a recognition result from the value of the similarity calculated by the similarity calculation unit 3. If the recognition unit is, for example, a phoneme, the identification determination unit 7
Outputs a phoneme label sequence (phoneme lattice).

【００１６】次に、図１の構成の動作を、図２のフロー
チャートを適宜参照して説明する。まず、特徴抽出部１
は、音声認識の対象となる入力音声の特徴量を、例えば
音素のような時間的に短い認識単位毎に音声パターンベ
クトル（入力パターンベクトル）ｘとして抽出する。す
ると類似度計算部３は、特徴抽出部１によって抽出され
た音声パターンベクトルｘのノルム‖ｘ‖を計算する
（ステップＳ１）。Next, the operation of the configuration of FIG. 1 will be described with reference to the flow chart of FIG. First, the feature extraction unit 1
Extracts the feature amount of the input speech that is the target of speech recognition as a speech pattern vector (input pattern vector) x for each temporally short recognition unit such as a phoneme. Then, the similarity calculation unit 3 calculates the norm ‖x‖ of the voice pattern vector x extracted by the feature extraction unit 1 (step S1).

【００１７】これと並行して、騒音レベル検出部５は、
特徴抽出部１によって入力音声の特徴抽出が行われてい
る際の周囲騒音レベルｎを例えば上記の認識単位で検出
し、しきい値設定部６に通知する。In parallel with this, the noise level detecting section 5
For example, the ambient noise level n when the feature extraction unit 1 is performing feature extraction of the input voice is detected in the above-described recognition unit, and the threshold value setting unit 6 is notified.

【００１８】しきい値設定部６は、騒音レベル検出部５
から通知された騒音レベルｎに応じて予め定められてい
るしきい値（騒音レベルｎに固有のしきい値）ＴＨ(n)
を、類似度計算制御部４で用いられるしきい値ＴＨＬＶ
Ｌとして設定する（ステップＳ２）。この騒音レベルｎ
としきい値ＴＨＬＶＬ（＝ＴＨ(n) ）との関係の一例を
図３に示す。図３の例では、騒音レベルｎがｎ1 より小
さい範囲では最小値ＴＨmin が、ｎ2 （但しｎ2 ＞ｎ1
）より大きい範囲では最大値ＴＨmax が、そしてｎ1
とｎ2 との間の範囲ではＴＨmin ，ＴＨmax 間を騒音レ
ベルｎに比例して増加させるような値が、しきい値ＴＨ
ＬＶＬとして設定される。The threshold setting unit 6 includes a noise level detecting unit 5
TH (n), a threshold value (threshold value specific to noise level n) that is predetermined according to the noise level n notified from
Is a threshold value THLV used in the similarity calculation control unit 4.
It is set as L (step S2). This noise level n
FIG. 3 shows an example of the relationship between the threshold value and the threshold value THLVL (= TH (n)). In the example of FIG. 3, the minimum value THmin is n2 (where n2> n1 in the range where the noise level n is smaller than n1).
) In the larger range, the maximum value THmax, and n1
The threshold value TH is such that the value between THmin and THmax is increased in proportion to the noise level n in the range between n and n2.
It is set as LVL.

【００１９】類似度計算制御部４は、類似度計算部３で
計算された音声パターンベクトルｘのノルム‖ｘ‖と、
しきい値設定部６によって設定されたしきい値ＴＨＬＶ
Ｌとを比較する（ステップＳ３）。The similarity calculation control unit 4 calculates the norm ‖x‖ of the voice pattern vector x calculated by the similarity calculation unit 3,
Threshold value THLV set by the threshold value setting unit 6
It is compared with L (step S3).

【００２０】類似度計算制御部４はステップＳ３の比較
の結果、‖ｘ‖＞ＴＨＬＶＬであるならば、対応する認
識単位の期間は音声の無音区間外であるものと判断し、
類似度計算部３に対して、通常の類似度計算を行わせ
る。これにより、類似度計算制御部４は、特徴抽出部１
からの音声パターンベクトルｘと標準パターン記憶部２
に予め格納されている標準パターンセットｙ_k（ｋ＝
１，２…）との間で、例えば、Ｓ_k＝ＳＩＭ（ｘ，ｙ_k）／‖ｘ‖ ……（１）If the result of the comparison in step S3 is that ‖x‖> THLVL, the similarity calculation control unit 4 judges that the period of the corresponding recognition unit is outside the silent interval of the voice,
The similarity calculator 3 is caused to perform normal similarity calculation. As a result, the similarity calculation control unit 4 causes the feature extraction unit 1 to
Voice pattern vector x from and standard pattern storage unit 2
Standard pattern set y _k (k =
1, 2, ...), for example, S _k = SIM (x, y _k ) / ‖x‖ (1)

【００２１】の形で定義される類似度計算を行う（ステ
ップＳ４）。この（１）式で、ＳＩＭ（ｘ，ｙk ）はベ
クトルｘとｙ_kとの類似度を定義する関数、‖ｘ‖はｘ
のノルムである。The similarity calculation defined by the form is performed (step S4). In equation (1), SIM (x, yk) is the function that defines the similarity between vectors x and y _k, ‖x‖ the x
Is the norm of.

【００２２】さて、類似度（Ｓ_k）を、音声パターンベ
クトルｘのノルム‖ｘ‖によって上記（１）式のように
正規化するのは、ベクトルｘの長さではなく、その方向
によって類似度を定義することで、入力音声の強弱に影
響されない認識結果が得られるという理由による。ここ
では、このことを強調するために、ノルム‖ｘ‖で割る
表記を類似度の定義式ＳＩＭ（ｘ，ｙ_k）に含めないこ
とにした。本実施例における具体的な類似度定義式とし
ては、優れたパターンマッチング性能を有する部分空間
法では、次の（２）式を適用するようにしている。Now, it is not the length of the vector x that normalizes the similarity (S _k ) as in the above equation (1) by the norm ‖x‖ of the voice pattern vector x, but the similarity by the direction. By defining, the recognition result that is not influenced by the strength of the input voice can be obtained. Here, in order to emphasize this fact, the notation of dividing by norm ‖x‖ is not included in the definition equation SIM (x, y _k ) of the similarity. As a specific similarity definition expression in this embodiment, the following expression (2) is applied in the subspace method having excellent pattern matching performance.

【００２３】[0023]

【数１】この（２）式で、ｙ^(k) _LはＬ軸に直交展開されたカテ
ゴリｋの標準パターンで、（ｘ，ｙ^(k) _L）はｘとｙ
^(k) _Lの内積を表す。[Equation 1] In this equation (2), y ^(k) _L is a standard pattern of category k that is orthogonally expanded on the L axis, and (x, y ^(k) _L ) is x and y.
^(k) Represents the dot product of _L.

【００２４】これに対し、上記ステップＳ３の比較の結
果、‖ｘ‖≦ＴＨＬＶＬであるならば、類似度計算制御
部４は、対応する認識単位の期間は音声の無音区間内で
あるものと判断し、類似度計算部３に対して、上記
（１）式に従う対応する類似度計算を省略して類似度
（類似度値）Ｓ_k＝０とするように制御する。これによ
り、類似度計算制御部４は、上記（１）式に従う類似度
計算を行わずに、Ｓ_k＝０とする（ステップＳ５）On the other hand, as a result of the comparison in step S3, if ‖x‖≤THLVL, the similarity calculation control unit 4 determines that the period of the corresponding recognition unit is within the silent period of the voice. Then, the similarity calculation unit 3 is controlled so that the corresponding similarity calculation according to the equation (1) is omitted and the similarity (similarity value) S _k = 0. As a result, the similarity calculation control unit 4 sets S _k = 0 without performing the similarity calculation according to the equation (1) (step S5).

【００２５】識別判定部７は、類似度計算部３によって
求められた類似度の値から認識結果を判定し、認識単位
が例えば音素の場合であれば、音素のラベル系列を出力
する。The identification determination unit 7 determines the recognition result from the value of the similarity calculated by the similarity calculation unit 3, and outputs the phoneme label series when the recognition unit is, for example, a phoneme.

【００２６】次に、類似度計算制御部４による上記ステ
ップＳ３の比較処理について、図４の動作説明図を参照
して詳細に説明する。まず図４において、１１は入力音
声のパターンを表しており、縦軸が特徴量、横軸が時間
（ｔ）に対応する。１２は音声パターン１１から音声パ
ターンベクトルｘを切り出すための窓で、音声の認識単
位（例えば音素）の大きさである。１３はこの窓１２の
中の音声パターンベクトルｘのノルム‖ｘ‖を、窓１２
を横軸（時間軸）方向にずらしながら示したものであ
る。Next, the comparison processing in step S3 by the similarity calculation control unit 4 will be described in detail with reference to the operation explanatory diagram of FIG. First, in FIG. 4, reference numeral 11 denotes an input voice pattern, where the vertical axis corresponds to the feature amount and the horizontal axis corresponds to time (t). Reference numeral 12 is a window for cutting out the voice pattern vector x from the voice pattern 11, and is a size of a voice recognition unit (for example, a phoneme). Reference numeral 13 indicates the norm ‖x‖ of the voice pattern vector x in the window 12
Is shown while being shifted in the horizontal axis (time axis) direction.

【００２７】この図４の例からも理解されるように、音
声の無音区間では当然ノルム‖ｘ‖の値は小さくなる。
この無音区間では音素の類似度計算は必要でない。しか
しながら従来は、無音区間が検出できず、この無音区間
でも類似度計算を実施していた。As can be understood from the example of FIG. 4, the value of norm.parallel.x.parallel. Is naturally small in the silent section of the voice.
Phoneme similarity calculation is not necessary in this silent section. However, conventionally, the silent section cannot be detected, and the similarity calculation is performed even in this silent section.

【００２８】もし、無音区間であっても、前記（１）式
で定義した類似度計算が行われるならば、類似度Ｓ_kと
して何等かの値が求められる。しかも、分母であるノル
ム‖ｘ‖の値が比較的小さいので、類似度値は無意味な
大きな値が算出される。この場合、認識精度の劣化とな
り、更に無駄な計算を行うという不具合が生じる。If the similarity calculation defined by the equation (1) is performed even in the silent section, some value is obtained as the similarity S _k . Moreover, since the value of the norm.parallel.x.parallel. Which is the denominator is relatively small, a meaningless large value is calculated as the similarity value. In this case, the recognition accuracy is deteriorated, which causes a problem that the calculation is further useless.

【００２９】しかし本実施例では、既に説明したよう
に、類似度計算制御部４において、ノルム‖ｘ‖としき
い値ＴＨＬＶＬとの比較を行うことで無音区間を効率的
に検出して、類似度計算部３を制御することにより、無
音区間では無意味な類似度値の算出と無駄な計算量を回
避することが可能となる。ここで、しきい値ＴＨＬＶＬ
として、前記したように騒音レベルｎに応じた値ＴＨ
(n) を用いたのは、騒音によって無音区間のノルムの値
が大きくなるので、しきい値ＴＨＬＶＬも騒音レベルに
追従させる必要があるためである。もし、しきい値ＴＨ
ＬＶＬとして固定値を用いたならば、騒音レベルが比較
的一定である場所で本装置を使用する場合は問題はない
が、騒音レベルが変動する場合には、無音区間の検出が
その変動に左右され、安定した無音区間検出が行えなく
なる。However, in the present embodiment, as described above, the similarity calculation control section 4 compares the norm ‖x‖ with the threshold value THLVL to efficiently detect the silent section, and to calculate the similarity. By controlling the calculation unit 3, it becomes possible to avoid meaningless similarity value calculation and unnecessary calculation amount in a silent section. Where the threshold value THLVL
As described above, the value TH corresponding to the noise level n
The reason (n) is used is that the norm value of the silent section becomes large due to noise, so that the threshold value THLVL also needs to follow the noise level. If the threshold TH
If a fixed value is used as the LVL, there is no problem when using this device in a place where the noise level is relatively constant, but when the noise level fluctuates, the detection of the silent section depends on the fluctuation. As a result, stable silent section detection cannot be performed.

【００３０】[0030]

【発明の効果】以上説明したように本発明によれば、入
力音声から認識単位で抽出される音声パターンベクトル
ｘのノルム‖ｘ‖の大きさをもとに音声の無音区間を認
識単位で検出し、この検出結果に応じて類似度値計算を
制御する構成としたので、無音区間が簡単に検出でき、
この無音区間での、無意味な認識結果と無駄な類似度計
算を効率よく回避することができる。As described above, according to the present invention, the silent section of the voice is detected in the recognition unit based on the size of the norm ‖x‖ of the voice pattern vector x extracted from the input voice in the recognition unit. However, since the similarity value calculation is controlled according to the detection result, the silent section can be easily detected,
It is possible to efficiently avoid meaningless recognition results and useless similarity calculation in this silent section.

【００３１】特に、音声パターンベクトルｘのノルム‖
ｘ‖を装置周囲の騒音レベルに対応したしきい値と比較
することで無音区間を検出する構成とした場合には、無
音区間の検出が騒音レベルの変動に左右されずに安定し
て行え、無意味な認識結果と無駄な類似度計算を一層効
率よく回避して、認識精度と認識処理速度の向上を図り
得る等の実用上多大なる効果が奏せられる。In particular, the norm of the voice pattern vector x!
When the silent section is detected by comparing x‖ with the threshold value corresponding to the noise level around the device, the silent section can be detected stably without being affected by the fluctuation of the noise level. It is possible to achieve a practically great effect such that the meaningless recognition result and the useless similarity calculation are more efficiently avoided, and the recognition accuracy and the recognition processing speed can be improved.

[Brief description of drawings]

【図１】本発明の一実施例に係る音声認識装置の概略構
成を示すブロック図。FIG. 1 is a block diagram showing a schematic configuration of a voice recognition device according to an embodiment of the present invention.

【図２】同実施例の動作を説明するためのフローチャー
ト。FIG. 2 is a flowchart for explaining the operation of the embodiment.

【図３】騒音レベルｎとしきい値ＴＨＬＶＬ（＝ＴＨ
(n) ）との関係の一例を示す図。FIG. 3 is a noise level n and a threshold value THLVL (= TH
(n)) The figure which shows an example of the relationship with.

【図４】入力音声パターンとノルム‖ｘ‖としきい値Ｔ
ＨＬＶＬとの関係を具体的に説明するための動作説明
図。[Fig. 4] Input voice pattern, norm ‖x‖ and threshold T
FIG. 6 is an operation explanatory diagram for specifically explaining the relationship with HLVL.

[Explanation of symbols]

１…特徴抽出部、２…標準パターン記憶部、３…類似度
計算部、４…類似度計算制御部、５…騒音レベル検出
部、６…しきい値設定部、７…識別判定部。DESCRIPTION OF SYMBOLS 1 ... Feature extraction part, 2 ... Standard pattern storage part, 3 ... Similarity calculation part, 4 ... Similarity calculation control part, 5 ... Noise level detection part, 6 ... Threshold setting part, 7 ... Identification determination part.

Claims

[Claims]

1. A feature extraction means for extracting feature parameters from an input voice, a voice pattern vector x of a recognition unit composed of the feature parameters extracted by the feature extraction means, and a standard pattern prepared in advance as a recognition unit. And a similarity calculation means for calculating a similarity with the set y _k (k = 1, 2, ...) And a recognition result is output based on the similarity value calculated by the similarity calculation means. In a voice recognition device, a silent segment of a voice is detected in a recognition unit according to the size of the norm ‖x‖ of the voice pattern vector x, and the similarity calculation for controlling the similarity value of the similarity calculation means is performed according to the detection result. A similarity calculation control means, and when the similarity calculation control means detects a silent section, the corresponding similarity calculation by the similarity calculation means is omitted. Speech recognition apparatus characterized by the.

2. A noise level detection means for detecting a noise level around the apparatus, wherein the similarity calculation control means is provided.
2. The threshold value corresponding to the noise level detected by the noise level detecting means is compared with the norm ‖x‖ of the voice pattern vector x, and a silent section is detected based on the comparison result. The voice recognition device described.