JP3083855B2

JP3083855B2 - Voice recognition method and apparatus

Info

Publication number: JP3083855B2
Application number: JP02413766A
Authority: JP
Inventors: 哲也室井
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1990-12-25
Filing date: 1990-12-25
Publication date: 2000-09-04
Anticipated expiration: 2015-09-04
Also published as: JPH04223499A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、音声認識方法及び装置
に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition method and apparatus.

【０００２】[0002]

【従来の技術】音声認識方法及び装置の従来例として
は、本出願人が特開昭64-23299号公報に開示した音声認
識方式などが存している。これは、予め音声信号を変換
した特徴ベクトルの時系列である各種の音声パターンの
部分パターンに対応する標準パターンを継続時間等でパ
ターン記憶手段に設定しておき、入力される音声信号を
マイクロフォン等からなる音声変換手段で音声パターン
に変換して部分パターンの継続時間を検出し、この部分
パターンの継続時間と標準パターンの継続時間とに従っ
て部分パターンと標準パターンとの類似度に相当するパ
ターン間距離を算定し、この算定結果に基づいて音声信
号の音韻を認識するようになっている。2. Description of the Related Art As a conventional example of a speech recognition method and apparatus, there is a speech recognition system disclosed by the present applicant in Japanese Patent Application Laid-Open No. Sho 64-23299. This is because, in advance, standard patterns corresponding to partial patterns of various audio patterns, which are time series of feature vectors obtained by converting audio signals, are set in the pattern storage means with durations and the like, and the input audio signals are input to a microphone or the like. The duration of the partial pattern is detected by converting the pattern into a voice pattern by the voice conversion means, and the distance between patterns corresponding to the similarity between the partial pattern and the standard pattern according to the duration of the partial pattern and the duration of the standard pattern. Is calculated, and the phoneme of the voice signal is recognized based on the calculation result.

【０００３】つまり、上述の音声認識方式では、標準パ
ターンの所定の状態に対応する音声パターンの部分パタ
ーンの継続時間をＸ、標準パターンの継続時間をＹ、予
め設定された定数をｗとすると、パターン間距離Ｄを、
Ｄ＝ｗ(Ｌx−Ｌj)²として算定するようになっている。That is, in the above-described voice recognition system, when the duration of a partial pattern of a voice pattern corresponding to a predetermined state of the standard pattern is X, the duration of the standard pattern is Y, and a predetermined constant is w, The distance D between patterns is
D = w (Lx−Lj) ² is calculated.

【０００４】[0004]

【発明が解決しようとする課題】上記公報に開示された
音声認識方式では、予め機器に設定された標準パターン
の継続時間と機器に入力された音声パターンの部分パタ
ーンの継続時間とに基づいて音声信号の音韻を認識する
ようになっている。In the voice recognition system disclosed in the above publication, a voice is recognized based on the duration of a standard pattern preset in the device and the duration of a partial pattern of the voice pattern input to the device. It recognizes the phoneme of the signal.

【０００５】しかし、上述した音声認識方式では、音声
パターンの部分パターンの継続時間の伸長とは無関係に
常時同一の演算処理が行なわれるので誤認識が発生しや
すくなっている。つまり、語尾の母音は極端に長くなる
ことはあっても極端に短くなることはほとんどないが、
上述の音声認識方式では、標準パターンの継続時間が予
め50(ms)などとして設定されているので、音声パターン
の継続時間が10(ms)や100(ms)等と変化すると良好な演
算結果を得ることができない。つまり、上述の音声認識
方式では、図８に例示するように、継続時間の分布が平
均値ｌに対して不均一になる標準パターンに対し、音声
パターンの継続時間が分布範囲外に短いａでも分布範囲
内で長いｂでも同一の演算処理を行なうので、誤認識が
発生しやすくなっている。However, in the above-described speech recognition system, the same arithmetic processing is always performed irrespective of the extension of the duration of the partial pattern of the speech pattern, so that erroneous recognition tends to occur. In other words, vowels at the end can be extremely long, but rarely extremely short,
In the above-described voice recognition method, since the duration of the standard pattern is set to 50 (ms) or the like in advance, if the duration of the voice pattern changes to 10 (ms) or 100 (ms), a good calculation result is obtained. I can't get it. That is, in the above-described speech recognition method, as illustrated in FIG. 8, even if the duration of the speech pattern is shorter than the distribution range, a Since the same arithmetic processing is performed even for a long b in the distribution range, erroneous recognition is likely to occur.

【０００６】[0006]

【課題を解決するための手段】請求項１記載の発明は、
入力される音声信号を音声変換手段が特徴ベクトルの時
系列である音声パターンに変換し、各種の音声パターン
の部分パターンに対応する標準パターンを少なくとも継
続時間と継続時間に対する複数の重み係数とで予めパタ
ーン記憶手段に設定し、音声パターンの部分パターンの
継続時間を時間検出手段が検出し、この検出された部分
パターンの継続時間と標準パターンの継続時間とを時間
比較手段が比較し、この比較結果の正負に従って標準パ
ターンの重み係数から所定の一つを係数選出手段が選出
し、この選出された重み係数を部分パターンと標準パタ
ーンとの差の二乗に乗算したものである部分パターンと
標準パターンとの類似度に相当するパターン間距離を類
似度算定手段が算定し、この算定結果に基づいて音声信
号の音韻を認識手段が認識するようにした。According to the first aspect of the present invention,
The voice conversion means converts the input voice signal into a voice pattern which is a time series of feature vectors, and a standard pattern corresponding to a partial pattern of various voice patterns is determined in advance by at least a duration and a plurality of weighting factors for the duration. Set in the pattern storage means, the time detection means detects the duration of the partial pattern of the voice pattern, and the time comparison means compares the duration of the detected partial pattern with the duration of the standard pattern. The coefficient selecting means selects a predetermined one from the weight coefficients of the standard pattern according to the sign of the standard pattern. The selected weight coefficient is referred to as the partial pattern and the standard pattern.
The similarity calculating means calculates the inter-pattern distance corresponding to the similarity between the partial pattern obtained by multiplying the square of the difference from the standard pattern and the standard pattern, and recognizes the phoneme of the voice signal based on the calculation result. Was made to recognize.

【０００７】請求項２記載の発明は、入力される音声信
号を音声変換手段が特徴ベクトルの時系列である音声パ
ターンに変換し、各種の音声パターンの部分パターンに
対応する標準パターンを少なくとも継続時間と前記継続
時間に対する複数の重み係数とで各種の音素毎に予めパ
ターン記憶手段に設定し、音声パターンの部分パターン
の継続時間を時間検出手段が検出し、この検出された部
分パターンの継続時間と標準パターンの継続時間とを時
間比較手段が比較し、この比較結果に従って標準パター
ンの重み係数から所定の一つを係数選出手段が選出し、
この選出された重み係数に従って部分パターンと標準パ
ターンとの類似度に相当するパターン間距離を類似度算
定手段が音素の種別に従って算定し、この算定結果に基
づいて音声信号の音韻を認識手段が認識するようにし、
また、算定されたパターン間距離と予め設定された閾値
とを距離比較手段が比較し、この比較結果に従って標準
パターンの重み係数を係数更新手段が更新するようにし
た。According to a second aspect of the present invention, an input voice signal is provided.
The sound conversion means is a sound pattern that is a time series of feature vectors.
Converts to turns and turns them into partial patterns of various voice patterns
The corresponding standard pattern at least for the duration and said continuation
Multiple weighting factors with respect to time are used to
Set in the turn storage means, partial pattern of voice pattern
Time detecting means detects the duration of the
The duration of the minute pattern and the duration of the standard pattern
Comparison means, and the standard pattern is
Coefficient selecting means selects a predetermined one from the weight coefficients of the
The partial pattern and the standard pattern are
Calculate similarity distance between patterns corresponding to similarity with turn
Is calculated according to the type of phoneme, and based on the calculation result.
The recognition means recognizes the phoneme of the voice signal based on the
Further, the calculated inter-pattern distance is compared with a preset threshold value by the distance comparing means, and the coefficient updating means updates the weight coefficient of the standard pattern according to the comparison result.

【０００８】請求項３記載の発明は、音声パターンの部
分パターンの音素の種別に従って類似度算定手段が算定
したパターン間距離に比して小さいパターン間距離が算
定される音素を音素検出手段が検出し、この検出された
音素の標準パターンの少なくとも重み係数をパターン更
新手段が更新するようにした。According to a third aspect of the present invention, the phoneme detecting means detects a phoneme whose inter-pattern distance is smaller than the inter-pattern distance calculated by the similarity calculating means according to the type of the phoneme of the partial pattern of the voice pattern. Then, the pattern updating means updates at least the weighting factor of the detected standard pattern of phonemes.

【０００９】請求項４記載の発明は、入力される音声信
号を特徴ベクトルの時系列である音声パターンに変換す
る音声変換手段を設け、各種の音声パターンの部分パタ
ーンに対応する標準パターンが少なくとも継続時間と継
続時間に対する複数の重み係数とで設定されたパターン
記憶手段を設け、音声変換手段が変換した音声パターン
の部分パターンの継続時間を検出する時間検出手段を設
け、この時間検出手段が検出した部分パターンの継続時
間とパターン記憶手段に記憶された標準パターンの継続
時間とを比較する時間比較手段を設け、この時間比較手
段の比較結果の正負に従って標準パターンの重み係数か
ら所定の一つを選出する係数選出手段を設け、この係数
選出手段が選出した重み係数を部分パターンと標準パタ
ーンとの差の二乗に乗算したものである部分パターンと
標準パターンとの類似度に相当するパターン間距離を算
定する類似度算定手段を設け、この類似度算定手段の算
定結果に基づいて音声信号の音韻を認識する認識手段を
設けた。According to a fourth aspect of the present invention, there is provided a voice converting means for converting an input voice signal into a voice pattern which is a time series of a feature vector, and at least a standard pattern corresponding to a partial pattern of various voice patterns is continued. Pattern storage means set with a plurality of weighting factors for time and duration, and time detection means for detecting the duration of a partial pattern of the voice pattern converted by the voice conversion means, and the time detection means detects Time comparison means for comparing the duration of the partial pattern with the duration of the standard pattern stored in the pattern storage means is provided, and a predetermined one is selected from the weight coefficients of the standard pattern according to the sign of the comparison result of the time comparison means. coefficients provided selecting means, the partial pattern and the standard pattern of weighting coefficients this coefficient selecting means has selected to
Means for calculating the distance between patterns corresponding to the similarity between the partial pattern obtained by multiplying the square of the difference from the standard pattern and the standard pattern, and based on the calculation result of the similarity calculating means, Recognition means for recognizing the phoneme of the signal is provided.

【００１０】請求項５記載の発明は、入力される音声信
号を特徴ベクトルの時系列である音声パターンに変換す
る音声変換手段を設け、各種の音声パターンの部分パタ
ーンに対応する標準パターンが少なくとも継続時間と継
続時間に対する複数の重み係数とで各種の音素毎に設定
されたパターン記憶手段を設け、音声変換手段が変換し
た音声パターンの部分パターンの継続時間を検出する時
間検出手段を設け、この時間検出手段が検出した部分パ
ターンの継続時間とパターン記憶手段に記憶された標準
パターンの継続時間とを比較する時間比較手段を設け、
この時間比較手段の比較結果に従って標準パターンの重
み係数から所定の一つを選出する係数選出手段を設け、
この係数選出手段が選出した重み係数に従って部分パタ
ーンと標準パターンとの類似度に相当するパターン間距
離を音素の種別に従って算定する類似度算定手段を設
け、この類似度算定手段の算定結果に基づいて音声信号
の音韻を認識する認識手段を設け、算定されたパターン
間距離と予め設定された閾値とを比較する距離比較手段
を設け、この比較結果に従って標準パターンの重み係数
を更新する係数更新手段を設けた。[0010] According to a fifth aspect of the present invention, an input voice signal is provided.
Signal into a speech pattern that is a time series of feature vectors
Audio conversion means to provide partial patterns of various audio patterns.
The standard pattern corresponding to the
Set for each phoneme with multiple weighting factors for duration
Is provided, and the voice conversion means converts the
When detecting the duration of a sub-pattern of a sound pattern
Interval detecting means, and the partial pattern detected by the time detecting means.
Turn duration and standard stored in pattern storage
Providing a time comparison means for comparing with the duration of the pattern,
According to the comparison result of this time comparison means, the weight of the standard pattern is
A coefficient selection means for selecting a predetermined one from the
The partial pattern is selected according to the weight coefficient selected by the coefficient selection means.
Pattern distance equivalent to the similarity between the pattern and the standard pattern.
A similarity calculation means for calculating the separation according to the phoneme type is provided.
Based on the result of the calculation by the similarity calculating means.
A recognition means for recognizing a phoneme of the same; a distance comparison means for comparing the calculated inter-pattern distance with a preset threshold; and a coefficient updating means for updating a weighting coefficient of the standard pattern according to the comparison result. .

【００１１】請求項６記載の発明は、音声パターンの部
分パターンの音素の種別に従って類似度算定手段が算定
したパターン間距離に比して小さいパターン間距離が算
定される音素を検出する音素検出手段を設け、この音素
検出手段が検出した音素の標準パターンの少なくとも重
み係数を更新するパターン更新手段を設けた。According to a sixth aspect of the present invention, there is provided a phoneme detecting means for detecting a phoneme whose inter-pattern distance is smaller than the inter-pattern distance calculated by the similarity calculating means in accordance with the type of phoneme of the partial pattern of the voice pattern. And a pattern updating means for updating at least the weighting coefficient of the standard pattern of the phoneme detected by the phoneme detecting means.

【００１２】[0012]

【作用】請求項１及び４記載の発明は、各種の音声パタ
ーンの部分パターンに対応する標準パターンを少なくと
も継続時間と継続時間に対する複数の重み係数とで予め
設定し、音声パターンの部分パターンの継続時間と標準
パターンの継続時間との比較結果に従って標準パターン
の重み係数から所定の一つを選出し、この選出された重
み係数に従って部分パターンと標準パターンとの類似度
に相当するパターン間距離を算定して音声信号の音韻を
認識することで、標準パターンの継続時間に対する音声
パターンの部分パターンの継続時間の大小に従って演算
処理には複数の重み係数の一つが選択的に利用されるの
で、語尾の母音のように継続時間の分布が不均一な音韻
でも良好に認識することができる。According to the first and fourth aspects of the present invention, a standard pattern corresponding to a partial pattern of various voice patterns is preset in at least a duration and a plurality of weighting factors for the duration, and the continuation of the partial pattern of the voice pattern is performed. A predetermined one is selected from the weighting factors of the standard pattern according to the result of comparison between the time and the duration of the standard pattern, and a pattern distance corresponding to the similarity between the partial pattern and the standard pattern is calculated according to the selected weighting factor. By recognizing the phoneme of the voice signal, one of a plurality of weighting factors is selectively used in the arithmetic processing according to the magnitude of the duration of the partial pattern of the voice pattern with respect to the duration of the standard pattern. A phoneme whose continuation time distribution is not uniform like a vowel can be recognized well.

【００１３】請求項２及び５記載の発明は、各種の音素
毎に予め設定された標準パターンと音声変換手段が変換
した音声パターンの部分パターンとのパターン間距離を
音素の種別に従って算定し、この算定されたパターン間
距離と予め設定された閾値との比較結果に従って標準パ
ターンの重み係数を更新することで、音声パターンと音
素が同一の標準パターンに対しては重み係数が更新され
てパターン間距離が短縮される。According to the second and fifth aspects of the present invention, the inter-pattern distance between a standard pattern preset for each type of phoneme and a partial pattern of the voice pattern converted by the voice conversion means is calculated according to the type of phoneme. By updating the weighting factor of the standard pattern according to the comparison result between the calculated distance between the patterns and a preset threshold, the weighting factor is updated for the standard pattern having the same voice pattern and phoneme, and the distance between the patterns is updated. Is shortened.

【００１４】請求項３及び６記載の発明は、音声パター
ンの部分パターンの音素の種別に従って算定したパター
ン間距離に比して小さいパターン間距離が算定される音
素を検出し、この検出された音素の標準パターンの少な
くとも重み係数をパターン更新手段が更新するようにし
たことにより、音声パターンと音素が同一の標準パター
ンに対しては重み係数の更新でパターン間距離が短縮さ
れ、かつ、音素が異なるにも関わらずパターン間距離が
小さい標準パターンに対しては重み係数の更新でパター
ン間距離が拡大される。According to the third and sixth aspects of the present invention, a phoneme whose inter-pattern distance is smaller than the inter-pattern distance calculated according to the phoneme type of the partial pattern of the voice pattern is detected, and the detected phoneme is detected. The pattern updating means updates at least the weighting factor of the standard pattern of the standard pattern, so that the distance between the patterns is shortened by updating the weighting factor for the standard pattern having the same phoneme and phoneme, and the phoneme is different. Nevertheless, for a standard pattern having a small inter-pattern distance, the inter-pattern distance is expanded by updating the weighting coefficient.

【００１５】[0015]

【実施例】請求項１及び４記載の発明の音声認識装置を
図１ないし図３に基づいて説明する。まず、この音声認
識装置１は、図１に例示するように、マイクロフォン等
からなる音声入力部２に、入力される音声信号を特徴ベ
クトルの時系列である音声パターンに変換する音声変換
手段である特徴系列変換部３が接続されており、予め各
種の標準パターンが設定されたパターン記憶手段である
標準パターン格納部４がＲＡＭ(Ｒandom Ａccess Ｍe
mory)等で形成されている。そして、この標準パターン
格納部４と前記特徴系列変換部３とが接続された照合部
５は、部分パターンの継続時間を検出する時間検出手段
と、部分パターンの継続時間と標準パターンの継続時間
とを比較する時間比較手段と、標準パターンの重み係数
から所定の一つを選出する係数選出手段と、パターン間
距離を算定する類似度算定手段と、音声信号の音韻を認
識する認識手段(何れも図示せず)とがファームウェアな
どで形成されたＣＰＵ(Ｃentral Ｐrocessing Ｕnit)
等からなっている。そして、前記標準パターン格納部４
に設定された標準パターンは、ここでは平均ベクトルＹ
と継続時間ｌ及び重み係数ｗ₁，ｗ₂で形成されている。DESCRIPTION OF THE PREFERRED EMBODIMENTS A speech recognition apparatus according to the first and fourth aspects of the present invention will be described with reference to FIGS. First, as illustrated in FIG. 1, the voice recognition device 1 is a voice conversion unit that converts a voice signal input to a voice input unit 2 including a microphone or the like into a voice pattern that is a time series of feature vectors. A feature sequence conversion unit 3 is connected, and a standard pattern storage unit 4 as a pattern storage unit in which various standard patterns are set in advance is stored in a RAM (Random Access Me).
mory) etc. Then, the collating unit 5 to which the standard pattern storage unit 4 and the feature sequence converting unit 3 are connected includes a time detecting unit for detecting the duration of the partial pattern, the duration of the partial pattern and the duration of the standard pattern. , A coefficient selecting means for selecting a predetermined one from the weighting coefficients of the standard pattern, a similarity calculating means for calculating a distance between patterns, and a recognition means for recognizing a phoneme of a voice signal (all of which are described below). CPU (Central Processing Unit) formed by firmware and the like
And so on. Then, the standard pattern storage unit 4
Is set to the average vector Y
And the duration 1 and the weight coefficients w ₁ and w ₂ .

【００１６】このような構成において、この音声認識装
置１では、音声入力部２から入力される音声信号が、特
徴系列変換部３で特徴ベクトルの時系列である音声パタ
ーンＸに変換される。ここで、このような音声認識装置
１に有用な特徴ベクトルの抽出手段としては各種の方式
が考えられるが、例えば、15チャンネルのバンドパスフ
ィルタ群(図示せず)の出力を10(ms)毎に抽出すること
や、5.0(ms)のフレーム周期でＦＦＴ(Ｆast Ｆourier
Ｔransform)スペクトラムを対数化した周波軸上で等分
化して加算平均を算定した20次元のベクトルを抽出する
ことや、14次のＬＰＣ(Ｌinear Ｐredictive Ｃodin
g)ケプストラムを10(ms)毎に抽出することなどが実施可
能である。そして、上述のようにして得られる音声パタ
ーンＸ＝ｘ₁，ｘ₂…ｘ_i…ｘ_I(Ｉはフレーム数)のｉ番目
のフレームの特徴ベクトルｘ_iは、チャンネル数に対応
した15次元などとなる。In such a configuration, in the speech recognition apparatus 1, the speech signal input from the speech input unit 2 is converted by the feature sequence conversion unit 3 into a speech pattern X which is a time series of feature vectors. Here, various methods can be considered as a means for extracting a feature vector useful for such a speech recognition apparatus 1. For example, the output of a band-pass filter group (not shown) of 15 channels is output every 10 (ms). Or FFT (Fast Fourier) with a 5.0 (ms) frame period.
Transform) Extracting a 20-dimensional vector obtained by equally dividing the spectrum on a logarithmic frequency axis and calculating an average, and a 14th-order LPC (Linear Predictive Codin)
g) It is possible to extract cepstrum every 10 (ms). Then, the feature vector x _i of the i-th frame of the speech pattern X = x ₁ obtained as described _{_{above, x 2 ... x i ... x}} I (I is the number of frames), 15-dimensional corresponding to the number of channels, etc. Becomes

【００１７】そして、照合部５では、入力された音声パ
ターンＸと、標準パターン格納部４に格納されているＮ
個の標準パターンＹ₁〜Ｙ_Nのｎ番目の標準パターンＹ_n
との類似度に相当するパターン間距離Ｄist(Ｘ，Ｙ_n)が
順次算定され、この算定結果が最も小さい標準パターン
が属する音韻であるカテゴリー(ここでは単語)が認識結
果として検出されることになる。Then, the collating unit 5 compares the input voice pattern X with the N stored in the standard pattern storage unit 4.
N-th reference pattern Y _n of pieces of reference pattern Y ₁ to Y _N
The inter-pattern distance Dist (X, Y _n ) corresponding to the degree of similarity is calculated sequentially, and the category (here, word) which is the phoneme to which the standard pattern having the smallest calculation result belongs is detected as the recognition result. Become.

【００１８】なお、各標準パターンＹ_nはＪ個の時系列
として形成されており、ｊ番目の状態には、標準パター
ンＹ_nのｊ番目の状態を代表する特徴ベクトルｙ_njと、
継続時間ｌ_njと、二つの重み係数ｗ_nj1，ｗ_nj2とが各々
に登録されている。そこで、音声パターンＸと標準パタ
ーンＹ_nとを照合した際、例えば、音声パターンＸの部
分パターンｘ_n+1，ｘ_n+2…ｘ_iが標準パターンＹ_nの第ｊ
状態に対応したとすると、この場合の(ｉ−ｍ≧ｌnj)と
(ｉ−ｍ＜ｌnj)との距離sd(m，i，n，j)は、Each standard pattern Y _n is formed as J time series, and the j-th state includes a feature vector y _nj representing the j-th state of the standard pattern Y _n ,
The duration l _nj and two weighting factors w _nj1 and w _nj2 are registered respectively. Therefore, when collated with the voice pattern X and the standard pattern Y _n, for example, part of the speech pattern X pattern _{_{x n + 1, x n +}} 2 ... x i is the j-th reference pattern Y _n
If it corresponds to the state, (im ≧ lnj) in this case
The distance sd (m, i, n, j) from (im <lnj) is

【数１】となる。なお、上記数式内で使用したdist₁は二つのベ
クトル間の距離を算定する演算式であり、dist₂は二つ
のスカラー間の距離を算定する演算式である。そこで、
これらの演算式としてユークリッド距離を使用すると、
図２及び図３のフローチャートに例示するように、この
場合の(ｉ−ｍ≧ｌnj)と(ｉ−ｍ＜ｌnj)との距離sd(m，
i，n，j)は、(Equation 1) Becomes Note that dist ₁ used in the above equation is an arithmetic expression for calculating the distance between two vectors, and dist ₂ is an arithmetic expression for calculating the distance between two scalars. Therefore,
Using the Euclidean distance as these equations,
As illustrated in the flowcharts of FIGS. 2 and 3, the distance sd (m, m) between (im ≧ lnj) and (im <lnj) in this case.
i, n, j)

【数２】となる。そこで、この場合のパターン間距離Ｄist(Ｘ，
Ｙ_n)は、(Equation 2) Becomes Therefore, the pattern distance Dist (X,
Y _n )

【数３】となる。なお、上記数式では０＝Ｓ(0)≦Ｓ(1)…≦Ｓ
(Ｎ)＝Ｉとなっており、図３のフローチャートで用いた
ＤＴは一時的に利用する変数である。(Equation 3) Becomes In the above equation, 0 = S (0) ≦ S (1).
(N) = I, and DT used in the flowchart of FIG. 3 is a variable used temporarily.

【００１９】つまり、この音声認識装置１では、標準パ
ターンの継続時間に対する音声パターンの部分パターン
の継続時間の大小に従って演算処理には二つの重み係数
の一方が利用されるので、語尾の母音のように継続時間
の分布が不均一な音韻でも良好に認識することができ
る。That is, in the speech recognition apparatus 1, one of the two weighting coefficients is used in the arithmetic processing according to the magnitude of the duration of the partial pattern of the speech pattern with respect to the duration of the standard pattern. Furthermore, even a phoneme whose distribution of duration is not uniform can be recognized well.

【００２０】なお、ここで上述のような数式の解法とし
て動的計画法を利用する場合を想定すると、累積距離を
格納する配列Ｄ(i，1)（1≦j≦I，1≦j≦J）を用意し、
手順Ｄ(i，1)＝sd(0，i，n，1) (1≦j≦I)手順Here, assuming a case where dynamic programming is used as a solution of the above-described mathematical formula, an array D (i, 1) (1 ≦ j ≦ I, 1 ≦ j ≦ J)
Procedure D (i, 1) = sd (0, i, n, 1) (1 ≦ j ≦ I)

【数４】手順Ｄist(Ｘ，Ｙ_n)＝Ｄ(Ｉ，Ｊ)とすることが考えら
れる。このようにすることで、パターン間距離Ｄistを
算定する数式を簡易に得ることができる。(Equation 4) It is conceivable that the procedure Dist (X, Y _n ) = D (I, J). In this manner, a mathematical expression for calculating the inter-pattern distance Dist can be easily obtained.

【００２１】請求項２及び５記載の発明の実施例を図４
に基づいて説明する。まず、この音声認識装置６では、
ＲＡＭ等のパターン記憶手段である標準パターン格納部
７には、標準パターンが各種の音素毎に設定されてお
り、ＣＰＵ等からなる照合部８には、部分パターンと標
準パターンとのパターン間距離を予め設定された閾値と
比較する距離比較手段等がファームウェアなどで形成さ
れている。そして、この照合部８には、前記標準パター
ン格納部７内の標準パターンの重み係数を更新する係数
更新手段である重み係数更新部９が接続されている。な
お、この他の構造は前述の音声認識装置１と同様になっ
ている。FIG. 4 shows an embodiment of the invention according to claims 2 and 5.
It will be described based on. First, in this voice recognition device 6,
A standard pattern is set for each type of phoneme in a standard pattern storage unit 7 which is a pattern storage means such as a RAM, and a matching unit 8 such as a CPU stores a pattern distance between the partial pattern and the standard pattern. Distance comparing means for comparing with a preset threshold is formed by firmware or the like. The matching unit 8 is connected to a weight coefficient updating unit 9 that is a coefficient updating unit that updates the weight coefficient of the standard pattern in the standard pattern storage unit 7. The other structure is the same as that of the speech recognition device 1 described above.

【００２２】このような構成において、この音声認識装
置６は、前述の音声認識装置１と略同様にして音声パタ
ーンの部分パターンと標準パターンとのパターン間距離
を算定する。つまり、フレーム数ｘの部分パターンａ
_i+1，ａ_i+2…ａ_i+xと特徴ベクトルがｂの標準パターン
ｙとのパターン間距離Ｄistを、In such a configuration, the speech recognition device 6 calculates the inter-pattern distance between the partial pattern of the speech pattern and the standard pattern in substantially the same manner as the speech recognition device 1 described above. That is, a partial pattern a of the number of frames x
_{i + 1} , a _{i + 2} ... a _{i + x} and the inter-pattern distance Dist between the standard pattern y with the feature vector b,

【数５】として算定する。この時、上記数式の右辺第一項はスペ
クトラムに関するユークリッド距離であり、右辺第二項
は部分パターンの継続時間である。つまり、この継続時
間Ｄはｘフレームと標準パターンの継続時間ｙとの距離
で、標準パターンの二つの重み係数ｗ₁，ｗ₂により、
(ｘ≧ｙ)の場合はＤ＝ｗ₁(ｘ−ｙ)²となり、(ｘ＜ｙ)の
場合はＤ＝ｗ₂(ｘ−ｙ)²となる。(Equation 5) Calculated as At this time, the first term on the right side of the above equation is the Euclidean distance related to the spectrum, and the second term on the right side is the duration of the partial pattern. That is, the duration D is the distance between the x frame and the duration y of the standard pattern, and is obtained by two weighting factors w ₁ and w ₂ of the standard pattern.
In the case of (x ≧ y), D = w ₁ (xy) ² , and in the case of (x <y), D = w ₂ (xy) ² .

【００２３】ここで、この音声認識装置６では、例え
ば、音声パターンの部分パターンｘの音素の種別が既知
の場合、この音素ｋの標準パターンｙが照合部８により
標準パターン格納部７から読出されて距離Ｄが再度算定
される。つぎに、この距離Ｄが閾値Ｄ₀と比較され、Ｄ
＞Ｄ₀の場合は標準パターンの二つの重み係数ｗ₁，ｗ₂
の所定の一方から更新定数ｗ₀が減算され、Ｄ＜Ｄ₀の場
合は標準パターンの二つの重み係数ｗ₁，ｗ₂の所定の一
方に更新定数ｗ₀が加算される。このようにすること
で、各音素毎に音声パターンの伸長に対する制限が低減
されるので、標準パターンと音声パターンとのパターン
間距離が短縮されることになる。従って、この音声認識
装置６では、上述のような動作が繰返されることで、標
準パターンの重み係数ｗが順次適切な値に更新されるの
で、音声認識の精度が向上することになる。Here, in the voice recognition device 6, for example, when the type of the phoneme of the partial pattern x of the voice pattern is known, the standard pattern y of the phoneme k is read from the standard pattern storage unit 7 by the collating unit 8. The distance D is calculated again. Next, this distance D is compared with a threshold value D _0, and D
> D ₀ , two weighting factors w ₁ and w ₂ of the standard pattern
The update constant w ₀ is subtracted from one of the predetermined constants, and if D <D ₀ , the update constant w ₀ is added to _one of the two weight coefficients w ₁ and w ₂ of the standard pattern. By doing so, the restriction on the expansion of the voice pattern is reduced for each phoneme, so that the distance between the standard pattern and the voice pattern is reduced. Therefore, in the voice recognition device 6, the above-described operation is repeated, so that the weight coefficient w of the standard pattern is sequentially updated to an appropriate value, so that the accuracy of voice recognition is improved.

【００２４】請求項３及び６記載の発明の実施例を図５
ないし図７に基づいて説明する。まず、この音声認識装
置１０では、部分パターンの音素の種別に従って算定さ
れたパターン間距離より小さいパターン間距離が算定さ
れる音素を検出する音素検出手段がＣＰＵ等からなる照
合部１１にファームウェアなどで形成されており、この
照合部１１には標準パターンの重み係数と継続時間とを
更新するパターン更新手段である標準パターン更新部１
２が接続されている。なお、この他の構造は前述の音声
認識装置６と同様になっている。FIG. 5 shows an embodiment according to the third and sixth aspects of the present invention.
7 will be described with reference to FIG. First, in the speech recognition apparatus 10, a phoneme detecting means for detecting a phoneme whose inter-pattern distance is calculated smaller than the inter-pattern distance calculated according to the phoneme type of the partial pattern is stored in a matching unit 11 including a CPU or the like by firmware or the like. The matching unit 11 includes a standard pattern updating unit 1 that is a pattern updating unit that updates a weight coefficient and a duration of the standard pattern.
2 are connected. The other structure is the same as that of the speech recognition device 6 described above.

【００２５】このような構成において、この音声認識装
置１０は、前述の音声認識装置６と同様にして音声パタ
ーンの部分パターンと標準パターンとのパターン間距離
を算定する。つまり、個数Ｊのｊ番目の音素に対して特
徴ベクトルがｂ_jで継続時間がｙ_jで重み係数がｗ_j1，ｗ
_j2の標準パターンｙと、フレーム数ｘの部分パターンａ
_i+1，ａ_i+2…ａ_i+xとのパターン間距離Ｄistを、In such a configuration, the speech recognition apparatus 10 calculates the inter-pattern distance between the partial pattern of the speech pattern and the standard pattern in the same manner as the speech recognition apparatus 6 described above. That is, for the j-th phoneme of the number J, the feature vector is b _j , the duration is y _j , and the weighting factors are w _j1 and w _j.
Standard pattern y of _j2 and partial pattern a of the number of frames x
_{i + 1} , a _{i + 2} ... the distance Dist between patterns with a _{i + x}

【数６】として算定する。この時、上記数式の右辺第一項は特徴
ベクトルに関するユークリッド距離であり、右辺第二項
は部分パターンの継続時間である。つまり、この継続時
間Ｄはｘフレームと標準パターンの継続時間ｙとの距離
で、標準パターンの二つの重み係数ｗ_j1，ｗ_j2により、
(ｘ≧ｙ_j)の場合はＤ＝ｗ_j1(ｘ−ｙ_j)²となり、(ｘ＜ｙ
_j)の場合はＤ＝ｗ_j2(ｘ−ｙ_j)²となる。(Equation 6) Calculated as At this time, the first term on the right side of the above equation is the Euclidean distance related to the feature vector, and the second term on the right side is the duration of the partial pattern. That is, the duration D is the distance between the x frame and the duration y of the standard pattern, and is calculated by two weighting factors w _j1 and w _j2 of the standard pattern.
In the case of (x ≧ y _j ), D = w _j1 (x−y _j ) ² , and (x <y
_In the case of _j ), D = w _j2 (x−y _j ) ² .

【００２６】ここで、この音声認識装置１０では、例え
ば、音声パターンの部分パターンが音素ｋであることが
既知である場合、重み係数を更新すべき他の音素ｍの存
在が検索される。この場合の更新条件としては、Ｄ
(ｘ，ｙ_m)＜Ｄ(ｘ，ｙ_k)となる音素ｍの集合の全要素
や、Ｄ(ｘ，ｙ_m)＜Ｄ(ｘ，ｙ_k)かつＤist(ｍ)＜Ｄist
(ｋ)となる音素ｍの集合の全要素などとすることが可能
であり、これらの条件に対してＤ(ｘ，ｙ_m)を小さいも
のからＭ個とすると云う条件を付与することも可能であ
る。このようにして更新する音素ｍ₁，ｍ₂…ｍ_Mが検出
されると、図７に例示するように、標準パターンＹkの
継続時間ｙkが、ｙk＝ｙk＋α(ｘ−ｙk)として更新され
(αは正の定数)、標準パターンＹkの二つの重み係数ｗk
₁，ｗk₂も、各々ｗk₁＝ｗk₁−ａ，ｗk₂＝ｗk₂−ｂ(ａ，
ｂは正の定数)として更新される。同様にｍ＝ｍ₁，ｍ₂
…ｍ_Mに対しても、各重み係数ｗm₁，ｗm₂が各々ｗm₁＝
ｗm₁＋ｃ，ｗm₂＝ｗm₂＋ｄ(ｃ，ｄは正の定数)として更
新される。Here, in the voice recognition device 10, for example, when it is known that the partial pattern of the voice pattern is the phoneme k, the presence of another phoneme m whose weighting coefficient should be updated is searched. The update condition in this case is D
_{(x, y m) <D} (x, y k) and all elements of the set of become phonemes _{m, D (x, y m} ) <D (x, y k) and Dist (m) <Dist
it is possible to like all elements of the set of phonemes m as a (k), D (x, y m) is also possible to impart a condition referred to as M pieces from smaller ones to these conditions It is. When the phonemes m ₁ , m ₂ ... _{M M} to be updated are detected in this way, the duration yk of the standard pattern Yk is updated as yk = yk + α (x−yk) as illustrated in FIG.
(α is a positive constant), two weighting factors wk of the standard pattern Yk
₁ and wk ₂ are also wk ₁ = wk ₁ -a and wk ₂ = wk ₂ -b (a,
b is a positive constant). Similarly, m = m ₁ , m ₂
... even for m _M, each weight factor wm _1, wm ₂ are each wm ₁ =
wm ₁ + c, wm ₂ = wm ₂ + d (c and d are positive constants) are updated.

【００２７】このようにすることで、図６に例示するよ
うに、音声パターンと音素ｋが同一の標準パターンＹk
に対しては、継続時間ｙkと重み係数ｗk₁，ｗk₂とが更
新されてパターン間距離Ｄ(ｘ，ｙk)が短縮され、カテ
ゴリーが異なる音素ｍであるにも関わらずパターン間距
離Ｄ(ｘ，ｙm)が小さい標準パターンＹmに対しては、重
み係数ｗm₁，ｗm₂が更新されてパターン間距離Ｄ(ｘ，
ｙm)が拡大されることになる。従って、この音声認識装
置１０では、上述のような動作が繰返されることで、各
標準パターンが各々適切な内容に更新されるので、より
音声認識の精度が向上することになる。By doing so, as shown in FIG. 6, the voice pattern and the phoneme k have the same standard pattern Yk.
, The duration yk and the weighting factors wk ₁ and wk ₂ are updated to shorten the inter-pattern distance D (x, yk), and the inter-pattern distance D ( x, with respect to the standard pattern Ym ym) is small, the weighting factor wm _1, between wm ₂ is updated pattern distance D (x,
ym) will be enlarged. Therefore, in the voice recognition device 10, the above-described operation is repeated, so that each standard pattern is updated to an appropriate content, so that the accuracy of voice recognition is further improved.

【００２８】[0028]

【発明の効果】請求項１及び４記載の発明は、入力され
る音声信号を音声変換手段が特徴ベクトルの時系列であ
る音声パターンに変換し、各種の音声パターンの部分パ
ターンに対応する標準パターンを少なくとも継続時間と
継続時間に対する複数の重み係数とで予めパターン記憶
手段に設定し、音声パターンの部分パターンの継続時間
を時間検出手段が検出し、この検出された部分パターン
の継続時間と標準パターンの継続時間とを時間比較手段
が比較し、この比較結果に従って標準パターンの重み係
数から所定の一つを係数選出手段が選出し、この選出さ
れた重み係数に従って部分パターンと標準パターンとの
類似度に相当するパターン間距離を類似度算定手段が算
定し、この算定結果に基づいて音声信号の音韻を認識手
段が認識するようにしたことにより、標準パターンの継
続時間に対する音声パターンの部分パターンの継続時間
の大小に従って演算処理には複数の重み係数の一つが選
択的に利用されるので、語尾の母音のように継続時間の
分布が不均一な音韻でも良好に認識することができ、高
性能な音声認識を簡易に実現することができる等の効果
を有するものである。According to the first and fourth aspects of the present invention, the input voice signal is converted into a voice pattern which is a time series of a feature vector by a voice conversion means, and a standard pattern corresponding to a partial pattern of various voice patterns is provided. Is set in the pattern storage means in advance with at least the duration and a plurality of weighting factors for the duration, the duration of the partial pattern of the voice pattern is detected by the time detection means, and the duration of the detected partial pattern and the standard pattern are detected. Time comparing means, and the coefficient selecting means selects a predetermined one from the weighting coefficients of the standard pattern according to the comparison result, and the similarity between the partial pattern and the standard pattern according to the selected weighting coefficient. Is calculated by the similarity calculating means, and the recognition means recognizes the phoneme of the voice signal based on the calculation result. As a result, one of a plurality of weighting factors is selectively used in the arithmetic processing according to the magnitude of the duration of the partial pattern of the voice pattern with respect to the duration of the standard pattern, so that the distribution of the duration like the vowel at the end Has an effect that even a non-uniform phoneme can be recognized well, and high-performance speech recognition can be easily realized.

【００２９】請求項２及び５記載の発明は、標準パター
ンを各種の音素毎にパターン記憶手段に予め設定し、こ
のパターン記憶手段に記憶された標準パターンと音声変
換手段が変換した音声パターンの部分パターンとのパタ
ーン間距離を類似度算定手段が音素の種別に従って算定
し、この算定されたパターン間距離と予め設定された閾
値とを距離比較手段が比較し、この比較結果に従って標
準パターンの重み係数を係数更新手段が更新するように
したことにより、音声パターンと音素が同一の標準パタ
ーンに対しては重み係数の更新でパターン間距離が短縮
されるので、音声認識の動作を繰返す毎に音声認識の精
度が向上する音声認識装置を得ることができる等の効果
を有するものである。According to the second and fifth aspects of the present invention, the standard pattern is preset in the pattern storage means for each phoneme, and the standard pattern stored in the pattern storage means and the part of the voice pattern converted by the voice conversion means are stored. The similarity calculation means calculates the distance between the patterns and the pattern according to the type of phoneme, and the distance comparison means compares the calculated distance between the patterns with a preset threshold value. Is updated by the coefficient updating means, the distance between the patterns is shortened by updating the weighting factor for the standard pattern having the same speech pattern and phoneme, so that the speech recognition operation is repeated every time the speech recognition operation is repeated. This has the effect that a voice recognition device with improved accuracy can be obtained.

【００３０】請求項３及び６記載の発明は、音声パター
ンの部分パターンの音素の種別に従って類似度算定手段
が算定したパターン間距離に比して小さいパターン間距
離が算定される音素を音素検出手段が検出し、この検出
された音素の標準パターンの重み係数と継続時間とをパ
ターン更新手段が更新するようにしたことにより、音声
パターンと音素が同一の標準パターンに対しては重み係
数の更新でパターン間距離が短縮され、かつ、音素が異
なるにも関わらずパターン間距離が小さい標準パターン
に対しては重み係数の更新でパターン間距離が拡大され
るので、音声認識の動作を繰返す毎に極めて音声認識の
精度が向上する音声認識装置を得ることができる等の効
果を有するものである。According to the third and sixth aspects of the present invention, the phoneme for which the inter-pattern distance smaller than the inter-pattern distance calculated by the similarity calculating means in accordance with the type of the phoneme of the partial pattern of the voice pattern is calculated. And the pattern updating means updates the weighting factor and the duration of the detected standard pattern of the phoneme, so that the weighting factor can be updated for the standard pattern having the same voice pattern and phoneme. Since the inter-pattern distance is shortened and the inter-pattern distance is increased by updating the weighting factor for a standard pattern having a small inter-pattern distance despite different phonemes, the repetition of the speech recognition operation becomes extremely large every time the speech recognition operation is repeated. This has effects such as that a speech recognition device with improved speech recognition accuracy can be obtained.

[Brief description of the drawings]

【図１】請求項１及び４記載の発明の実施例を示すブロ
ック図である。FIG. 1 is a block diagram showing an embodiment of the invention described in claims 1 and 4;

【図２】フローチャートである。FIG. 2 is a flowchart.

【図３】フローチャートである。FIG. 3 is a flowchart.

【図４】請求項２及び５記載の発明の実施例を示すブロ
ック図である。FIG. 4 is a block diagram showing an embodiment of the invention described in claims 2 and 5;

【図５】請求項３及び６記載の発明の実施例を示すブロ
ック図である。FIG. 5 is a block diagram showing an embodiment of the invention according to claims 3 and 6;

【図６】特性図である。FIG. 6 is a characteristic diagram.

【図７】フローチャートである。FIG. 7 is a flowchart.

【図８】従来例を示す特性図である。FIG. 8 is a characteristic diagram showing a conventional example.

[Explanation of symbols]

１，６，１０音声認識装置２音声変換手段４，７パターン記憶手段５時間検出手段かつ時間比較手段かつ係数
選出手段かつ類似度算定手段８時間検出手段かつ時間比較手段かつ係数
選出手段かつ類似度算定手段かつ距離比較手段９係数更新手段１１時間検出手段かつ時間比較手段かつ係数
選出手段かつ類似度算定手段かつ距離比較手段かつ音素
検出手段１２係数更新手段かつパターン更新手段1, 6, 10 voice recognition device 2 voice conversion means 4, 7 pattern storage means 5 time detection means, time comparison means, coefficient selection means, and similarity calculation means 8 time detection means, time comparison means, coefficient selection means, and similarity Calculating means and distance comparing means 9 coefficient updating means 11 time detecting means and time comparing means and coefficient selecting means and similarity calculating means and distance comparing means and phoneme detecting means 12 coefficient updating means and pattern updating means

フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 15/10 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of the front page (58) Field surveyed (Int.Cl. ⁷ , DB name) G10L 15/10 JICST file (JOIS)

Claims

(57) [Claims]

1. An audio signal input means converts an input audio signal into an audio pattern which is a time series of a feature vector, and converts a standard pattern corresponding to a partial pattern of various audio patterns into at least a duration and a plurality of patterns corresponding to the duration. Is set in advance in the pattern storage means with the weight coefficient of the above, the duration of the partial pattern of the voice pattern is detected by the time detection means, and the duration of the detected partial pattern is compared with the duration of the standard pattern. Means for comparing, the coefficient selecting means selects a predetermined one from the weighting coefficients of the standard pattern in accordance with the positive or negative of the comparison result , and determines the selected weighting coefficient as the difference between the partial pattern and the standard pattern.
The similarity calculating means calculates the inter-pattern distance corresponding to the similarity between the partial pattern multiplied by the square and the standard pattern, and the recognition means recognizes the phoneme of the voice signal based on the calculation result. A speech recognition method characterized by doing so.

2. An audio conversion means for inputting an audio signal.
Is converted into a voice pattern that is a time series of
Standard pattern corresponding to partial pattern of voice pattern
At least the duration and multiple weights for said duration
Preset in the pattern storage means for each type of phoneme with the only coefficients
And the duration of the partial pattern of the voice pattern is
The detecting means detects the continuation of the detected partial pattern.
The time comparison means compares the time with the duration of the standard pattern.
And comparing the weight of the standard pattern according to the comparison result.
The coefficient selection means selects a predetermined one from the coefficients, and this selection
The partial pattern and the standard pattern are
Calculate similarity distance between patterns corresponding to similarity with turn
Is calculated according to the type of phoneme, and based on the calculation result.
The recognition means recognizes the phoneme of the voice signal
The calculated distance between patterns is compared with a preset threshold value by a distance comparing means, and the coefficient updating means updates the weight coefficient of the standard pattern according to the comparison result. Ruoto voice recognition method.

3. The phoneme detecting means detects a phoneme whose inter-pattern distance is calculated to be smaller than the inter-pattern distance calculated by the similarity calculating means in accordance with the type of phoneme of the partial pattern of the voice pattern. 3. The speech recognition method according to claim 2, wherein the pattern updating means updates at least a weight coefficient of the standard pattern of phonemes.

4. A voice conversion means for converting an input voice signal into a voice pattern which is a time series of feature vectors, wherein a standard pattern corresponding to a partial pattern of various voice patterns has at least a duration and a duration corresponding to the duration. Pattern storage means set with a plurality of weighting factors; time detection means for detecting the duration of the partial pattern of the voice pattern converted by the voice conversion means; and continuation of the partial pattern detected by the time detection means. A time comparing means for comparing the time with the duration of the standard pattern stored in the pattern storing means, and a coefficient for selecting a predetermined one from the weighting coefficients of the standard pattern according to the sign of the comparison result of the time comparing means Selecting means, the weighting coefficient selected by the coefficient selecting means is used as the partial pattern and
A similarity calculating means for calculating a pattern-to-pattern distance corresponding to the similarity between the partial pattern and the standard pattern, which is obtained by multiplying the square of the difference between the standard pattern and the standard pattern , based on the calculation result of the similarity calculating means And a recognition unit for recognizing a phoneme of the voice signal.

5. When an input audio signal is a feature vector
A voice conversion means for converting to a voice pattern
Standard for various patterns of audio patterns
The pattern is at least for the duration and the duration
Pattern set for each phoneme with multiple weighting factors
Storage means, and a voice pattern converted by the voice conversion means.
Detection means for detecting the duration of a partial pattern of a pattern
And a pattern of the partial pattern detected by the time detecting means.
Duration and standard pattern stored in the pattern storage means
Time comparison means to compare the duration of the
Weight of the standard pattern according to the comparison result of the inter-comparison means.
A coefficient selection means for selecting a predetermined one from coefficients is provided.
The partial parameters according to the weighting factors selected by the coefficient selecting means of
Putter equivalent to the similarity between the turn and the standard pattern
Similarity calculation means that calculates the distance between phonemes according to the type of phoneme
And based on the calculation result of the similarity calculation means,
A recognizing means for recognizing a phoneme of a voice signal; a distance comparing means for comparing a calculated inter-pattern distance with a preset threshold; a coefficient updating means for updating a weight coefficient of the standard pattern according to the comparison result the characteristics and to Ruoto voice recognition device that is provided.

6. A phoneme detecting means for detecting a phoneme whose inter-pattern distance smaller than the inter-pattern distance calculated by the similarity calculating means according to the phoneme type of the partial pattern of the voice pattern is provided. 6. A speech recognition apparatus according to claim 5, further comprising a pattern updating means for updating at least a weight coefficient of a standard pattern of phonemes detected by said means.