JPS628800B2

JPS628800B2 -

Info

Publication number: JPS628800B2
Application number: JP54165578A
Authority: JP
Inventors: Nobuo Hataoka; Hiroshi Ichikawa; Yoshiaki Kitatsume; Eiji Oohira
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1979-12-21
Filing date: 1979-12-21
Publication date: 1987-02-24
Also published as: JPS5688199A; DE3048107A1

Description

【発明の詳細な説明】本発明は、音声認識装置において、音声パタン
における、特徴ベクトルの時系列として表現され
る特徴パタンの個人差による変動を大局的に正規
化または補正するパタンの前処理装置に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention provides a pattern preprocessing device for globally normalizing or correcting variations in a feature pattern expressed as a time series of feature vectors due to individual differences in a speech recognition device. Regarding.

音声パタンの特徴ベクトルの各成分の大きさを
表わす特徴パラメータとしては、 (i) 音声の周波数帯域をいくつかのチヤネルに分
割したときの各チヤネルに対応するフイルタバ
ンクの出力値 (ii) 自己相関係数 (iii) 偏自己相関係数｛………PARTIAL AUTO
−CORRELATION COEFFICIENT（以下、
PARCOR係数と呼ぶ）｝など種々の量を用いることが可能である。 The feature parameters that represent the magnitude of each component of the feature vector of the audio pattern are (i) the output value of the filter bank corresponding to each channel when the audio frequency band is divided into several channels, and (ii) the self-phase value. Relationship coefficient (iii) Partial autocorrelation coefficient {……PARTIAL AUTO
−CORRELATION COEFFICIENT (hereinafter referred to as
It is possible to use various quantities such as PARCOR coefficient)}.

しかし、いづれの量を用いても、音声情報の個
人差にもとづいて特徴パラメータに種々の変動が
生ずる結果、抽出された特徴パラメータの誤差も
大きくなり、音声パタンの正しい認識が困難にな
る問題があつた。 However, no matter which amount is used, various variations occur in the feature parameters based on individual differences in speech information, resulting in large errors in the extracted feature parameters, which poses the problem of making it difficult to correctly recognize speech patterns. It was hot.

音声パタンの場合、個人差にもとづく変動の代
表例としては、 (1) 入力レベルの変動にもとづく特徴パラメータ
の絶対レベルの変動 (2) フイルタバンクの出力値を特徴パラメータと
した場合におけるフオルマント周波数の変動な
どがあり、これらの変動にもとづく認識精度の
低下を防止するために次のような方法が考えら
れている。 In the case of speech patterns, typical examples of variations based on individual differences include: (1) variations in the absolute level of a feature parameter based on variations in input level; (2) variations in formant frequency when the output value of a filter bank is used as a feature parameter. There are fluctuations, etc., and the following methods have been considered to prevent the recognition accuracy from decreasing due to these fluctuations.

(1)に対しては、たとえばフイルタバンクの出力
値を特徴パラメータとした場合における各時刻で
の周波数方向の正規化（フイルタ出力の相対値
化）がおこなわれている。 For (1), normalization in the frequency direction (relative value conversion of filter output) is performed at each time when, for example, the output value of the filter bank is used as a feature parameter.

(2)に対しては、フオルマント周波数の変動が各
個人の声道長の違いによることから、音声情報よ
り声道長を推定して、この声道長を正規化するこ
とがおこなわれている。 Regarding (2), since the variation in formant frequency is due to the difference in the vocal tract length of each individual, the vocal tract length is estimated from speech information and this vocal tract length is normalized. .

しかし、(1)に対する従来の正規化方法では、た
とえば特徴パラメータの時間方向における絶対的
な大小関係に関する構造が破壊され、その結果と
して音声情報の有する本質的情報の一部が失なわ
れてしまう欠点があつた。 However, in the conventional normalization method for (1), for example, the structure regarding the absolute magnitude relationship in the time direction of feature parameters is destroyed, and as a result, some of the essential information of the audio information is lost. There were flaws.

また、(2)に対する従来の方法では声道長を正確
に推定することが困難で、推定にともなう誤差が
生ずるために正しい正規化が不可能になる欠点が
あつた。 Furthermore, the conventional method for (2) has the drawback that it is difficult to accurately estimate the vocal tract length, and errors associated with the estimation make correct normalization impossible.

したがつて、本発明の第１の目的は音声パタン
の有する本質的情報を失なうことなく、音声パタ
ンから抽出された特徴パラメータに存在する個人
差にもとづく特性の変動を吸収するパタン前処理
方法を提供することにある。 Therefore, the first object of the present invention is to provide pattern preprocessing that absorbs variations in characteristics based on individual differences in feature parameters extracted from voice patterns without losing the essential information of the voice patterns. The purpose is to provide a method.

さらに、本発明の第２の目的は前記特徴パラメ
ータの時間方向における構造を保持しつつ、終局
的には個人差にもとづく周波数特性の変動として
のホルマント周波数の変動をも吸収するパタン前
処理方法を提供することにある。 Furthermore, a second object of the present invention is to provide a pattern preprocessing method that maintains the structure of the characteristic parameters in the time direction while also absorbing fluctuations in formant frequency as fluctuations in frequency characteristics based on individual differences. It is about providing.

上記の目的を達成するため本発明においては、
２次元平面を構成する座標軸の一方を時間軸と
し、他方を周波数軸としておき、この周波数軸上
の一点に対する上記時間軸上の一連の点における
特徴パラメータについて、 (a) 上記一連の特徴パラメータの最大値による各
特徴パラメータの除算をおこなう。 In order to achieve the above object, in the present invention,
One of the coordinate axes constituting a two-dimensional plane is the time axis, and the other is the frequency axis. Regarding the feature parameters at a series of points on the time axis with respect to one point on this frequency axis, (a) The above series of feature parameters Perform division of each feature parameter by the maximum value.

(b) 対数による非線形な補正をおこなう (c) 対数による非線形な補正をおこなつた一連の
特徴パラメータに対してその最大値による除算
をおこなうなど、特徴パラメータの時間軸方向における値の
大小関係を保持した前処理をおこなう。(b) Perform nonlinear correction using a logarithm. (c) Calculate the magnitude relationship of the values of feature parameters in the time axis direction, such as by dividing a series of feature parameters that have undergone nonlinear correction using a logarithm by their maximum value. Perform retained pretreatment.

以下、第１図を参照して本発明の原理を説明す
る。 The principle of the present invention will be explained below with reference to FIG.

第１図は、音声のフイルタバンク出力値を特徴
パラメータとした場合のある特定のチヤネルにお
ける異なる話者が発する同一音声内容によるフイ
ルタバンク出力値の相違を示したものである。 FIG. 1 shows the difference in filter bank output values due to the same voice content uttered by different speakers in a particular channel when the voice filter bank output values are used as feature parameters.

第１図によれば、極大点や極小点を与える時間
軸上の位置は類似しているが、振巾値は大巾に異
なつていることがわかる。上記異なる話者による
音声パタンＡ，Ｂ間の類似度を通常のユークリツ
ド距離を尺度として求めると、話者の相意にもと
づく変動が音声パタンそのものの相違にもとづく
変動よりも大きくなつてしまうから、音声パタン
の差を正しく検出することができないことになつ
てしまう。 According to FIG. 1, it can be seen that although the positions on the time axis where the maximum points and minimum points are given are similar, the amplitude values differ widely. If the degree of similarity between speech patterns A and B by different speakers is calculated using the usual Euclidean distance, the variation based on the consensus of the speakers will be larger than the variation based on the difference in the speech patterns themselves. This results in the inability to correctly detect differences in voice patterns.

したがつて、上記振巾値を正規化または補正し
て話者の相違にもとづく変動を小さくすることが
必要になる。本発明では上記振巾値を正規化また
は補正するために、以下に述べる２ステツプのい
づれか一方またはこれらを組み合わせた方法をと
る。 Therefore, it is necessary to normalize or correct the amplitude value to reduce variations due to speaker differences. In the present invention, in order to normalize or correct the amplitude value, one of the following two steps or a combination thereof is used.

以下、たとえば２ステツプを組み合わせた前処
理法について説明するが、ステツプ１のみによる
前処理法やステツプを省略した前処理法も可能で
ある。 Hereinafter, for example, a preprocessing method that combines two steps will be described, but a preprocessing method that uses only step 1 or a preprocessing method that omits the step is also possible.

音声パタンから抽出された、時刻ｉ（ｉ＝１、
２、………、Ｉ）における特徴ベクトルa₁と、a₁
の時系列として表現される特徴パタンＡとをつぎ
のように定義しておく。 Time i (i=1,
2. Feature vector a ₁ in ……I) and a ₁
The feature pattern A expressed as a time series is defined as follows.

〓〓＝（ａ_i1、ａ_i2、………、ａ_iJ）Ａ＝a₁、a₂、………、ａここで、ａ_ij（ｊ＝１、２、………Ｊ）は時刻
ｉにおける第ｊ番目のチヤネルのフイルタバンク
出力値に相当する特徴パラメータ量である。〓〓=(a _i1 , a _i2 , ......, a _iJ ) A=a ₁ , a ₂ , ......, a Here, a _ij (j=1, 2, ......J) is the time i is the feature parameter amount corresponding to the filter bank output value of the j-th channel.

ステツプ１：対数による補正人間の聴特性を近似した振巾の補正のために、
10またはｅなどを底とする対数によるａ_ijの非線
形な補正をおこなう。Step 1: Logarithmic correction To correct amplitude that approximates human hearing characteristics,
Non-linear correction of a _ij is performed using a logarithm having a base of 10 or e.

補正後の特徴パラメータに対応する量ａ′_ｉｊは次
の(1)式により与えられる。 The quantity a′ _ij corresponding to the corrected feature parameter is given by the following equation (1).

ａ′_ｉｊ＝log（１＋ａ_ij／A₀）A₀：定数 (1) (1)式における１はａ_ij／A₀が０に近い値になつ
たときのａ′_ｉｊの急峻な変動を防止するために加算
されたものである。a′ _ij = log(1+a _ij /A ₀ )A ₀ : Constant (1) 1 in equation (1) prevents steep fluctuations in a′ _ij when a _ij /A ₀ becomes a value close to 0. It was added in order to

ステツプ２：時間軸方向の正規化周波数軸方向の各チヤネル毎に時間軸方向の大
局的正規化をおこなう。Step 2: Normalization in the time axis direction Global normalization in the time axis direction is performed for each channel in the frequency axis direction.

周波数軸方向の第ｊチヤネルにおける前記ａ
′_１ｊ、ａ′_２ｊ、………、ａ′_Ｉｊのうちの最大値を
Ｍ_jとす
るとき、前記特徴パラメータａ_ijに対応する正規
化後の特徴パラメータａ″_ｉｊは(2)式により与えられ
る。 The above a in the j-th channel in the frequency axis direction
When the maximum value of ′ _1j , a′ _2j , ......, a′ _Ij is M _j , the normalized feature parameter a″ _ij corresponding to the feature parameter a _ij is given by equation (2). It will be done.

ａ″_ｉｊ＝ａ′_ｉｊ／Ｍ_j (2) ただし、Ｍ_j＝Max（ａ′_ｉｊ、ａ_2j………、ａ′_Ｉｊ） (3) 上記２ステツプからなる正規化または補正法
は、前述のごとく、特徴パラメータの時間方向に
おける大小関係を保持し、かつ原音声情報の有す
る本質的特徴を明確に表現している新たな特徴パ
ラメータａ″_ｉｊを得ることを可能にするものであ
る。 a″ _ij =a′ _ij /M _j (2) However, M _j =Max(a′ _ij , a _2j ………, a′ _Ij ) (3) The above two-step normalization or correction method is This makes it possible to obtain new feature parameters a'' _ij that maintain the magnitude relationship of the feature parameters in the time direction and clearly express the essential features of the original audio information.

このａ″_ｉｊは下記の点で前記特徴パラメータａ_ij
よりもすぐれた特徴パラメータといえる。 This a″ _ij is the characteristic parameter a _ij in the following points.
It can be said that it is a better feature parameter than .

(α) 周波数軸方向の各チヤネル毎に、特徴パラ
メータの時間軸方向の最大値を同一レベル（た
とえば０〜１のレベルに正規化する場合の最大
レベル１）へ変換する写像であるため、従来問
題となつていた特徴パラメータの絶対レベルの
変動を吸収したことになる。(α) This is a mapping that converts the maximum value of the feature parameter in the time axis direction to the same level (for example, the maximum level 1 when normalizing to a level of 0 to 1) for each channel in the frequency axis direction. This means that the fluctuations in the absolute levels of the characteristic parameters, which had been a problem, have been absorbed.

特に上記ステツプ２だけによる正規化、すな
わちａ″_ｉｊ＝ａ_ij／Max（ａ_1j、ａ_2j、………、ａ_Ij）によると、ａ_ijの振巾値が小さい範囲において
は絶対レベルの変動が小さいのに、正規化後の
相対レベルでみると変動が拡大されるという不
都合が生じ得るが、ステツプ１による補正はこ
の不都合を除去する点で有効である。 In particular, according to the normalization using only Step 2 above, that is, a'' _ij = a _ij /Max (a _1j , a _2j , ......, a _Ij ), the absolute level will fluctuate in the range where the amplitude value of a _ij is small. There may be an inconvenience in that the variation is magnified when viewed in terms of the relative level after normalization even though it is small, but the correction in step 1 is effective in eliminating this inconvenience.

(β) 上記ステツプ２の正規化は、話者が異なつ
た場合でも、同一内容の音声の特徴パラメータ
を従来よりも大局的かつ明確に抽出する写像と
なつている。(β) The normalization in step 2 is a mapping that extracts characteristic parameters of speech with the same content more broadly and clearly than before even when speakers differ.

たとえば、周波数軸上の各チヤネル毎の特徴
パラメータの変化を強調したり、全時間にわた
り値の小さい特徴パラメータを有するチヤネル
においては、逆に値の小さいことが特徴となつ
てａ″_ｉｊの値を大きくする変換になつている。 For example, by emphasizing the change in the characteristic parameter for each channel on the frequency axis, or conversely, in a channel having a characteristic parameter with a small value over the entire time, the small value becomes a characteristic, and the value of _a''ij is The conversion is to make it larger.

(γ) とくにフイルタバンクの出力値を特徴パラ
メータとした場合における上記ステツプ２の正
規化は、話者間の差にもとづく周波数軸方向に
おける変動（たとえば話者間のホルマント周波
数の隣接チヤネルへの変動）を吸収する効果が
ある。(γ) In particular, in the case where the output value of the filter bank is used as the feature parameter, the normalization in step 2 above is based on variations in the frequency axis direction based on differences between speakers (for example, variations in formant frequency between speakers to adjacent channels). ) has the effect of absorbing

これは、周波数軸上の第ｊチヤネルにおける
特徴パラメータの最大値を与える時刻ｉにおい
ては、隣接する第（ｊ−１）チヤネルや第（ｊ
＋１）チヤネルにおける特徴パラメータが最大
になる確率が大きいので、ステツプ２で得られ
る特徴パラメータは周波数分析におけるＱ（共
振尖鋭度）を下げたことに相当する。これは、
話者間の差にもとづく特徴パラメータの変動を
小さくしたことに相当し、前記(2)に対する解決
策を与えるものである。 This means that at time i that gives the maximum value of the feature parameter in the j-th channel on the frequency axis, the adjacent (j-1)th channel or the (j-th
+1) Since there is a high probability that the characteristic parameter in the channel will be the maximum, the characteristic parameter obtained in step 2 corresponds to lowering the Q (resonance sharpness) in frequency analysis. this is,
This corresponds to reducing the variation in feature parameters based on differences between speakers, and provides a solution to (2) above.

なお、これまでは、本発明による前処理を音
声パタンのフイルタバンク出力値に適用する場
合を主にして説明をおこなつてきたが、自己相
関係数PARCOR係数に適用することも可能で
あり、またPARCOR係数に適用する場合には
たとえばPARCOR係数に対して適応逆フイル
タリング処理を施して反射係数に相当する量に
変換することにより絶対レベル変動に対する線
形性がなりたつようにすればよい。 Although the preprocessing according to the present invention has been mainly explained so far in the case where it is applied to the filter bank output value of the audio pattern, it is also possible to apply it to the autocorrelation coefficient PARCOR coefficient. When applied to PARCOR coefficients, for example, the PARCOR coefficients may be subjected to adaptive inverse filtering processing to convert them into quantities corresponding to reflection coefficients, thereby ensuring linearity with respect to absolute level fluctuations.

以下、本発明を実施例を参照して詳細に説明す
る。 Hereinafter, the present invention will be explained in detail with reference to Examples.

第２図は、本発明による前処理方法を実現する
回路の一実施例を示すブロツク構成図で、入力音
声から求められた前記特徴パラメータａ_ijは各チ
ヤネルｊ毎に（ａ_1j，ａ_2j，………，ａ_Ij）を１
ブロツクデータとして入力バツフア２１から読み
出されて対数変換・正規化部２２の対数変換部２
２１へ入力される。対数変換部２２１において
は、前記(1)式の演算が行され、その結果得られた
ブロツクデータａ′_１ｊ，ａ′_２ｊ，………，ａ′_Ｉｊが出力
線２２２を通して、最大値検出部２２３と正規化
部２２４へ入力される。最大値検出部２２３にお
いては前記(3)式の演算が実行され、その結果得ら
れたＭ_jがまた前記正規化部２２４へ入力され
る。正規化部２２４においては前記(2)式の演算が
実行され、その結果得られたブロツクデータａ
″_１ｊ，ａ″_２ｊ，………，ａ″_Ｉｊが出力バツフア２
３へ格
納される。 FIG _. 2 is a block diagram _showing an embodiment of a circuit implementing the preprocessing method according to _the present invention. ......, a _Ij ) is 1
The block data is read out from the input buffer 21 and sent to the logarithmic conversion section 2 of the logarithmic conversion/normalization section 22.
21. In the logarithmic conversion unit 221, the calculation of equation (1) is performed, and the resulting block data a' _1j , a' _2j , ......, a' _Ij is input to the maximum value detection section 223 and the normalization section 224 through the output line 222. The maximum value detection section 223 executes the calculation of the above equation (3), and the resulting M _j is also input to the normalization section 224 . The normalization unit 224 executes the calculation of equation (2) above, and the resulting block data a
″ _1j ,a″ _2j ,……,a″ _Ij is output buffer 2
3.

上記の演算処理が、たとえばｊ＝１、２、……
…、Ｊの順に制御部２４から出力される制御信号
にしたがつて実行される。 For example, if the above calculation process is j=1, 2,...
. . , J are executed in accordance with control signals output from the control unit 24 in the order.

第２図における対数変換部２２１はたとえば読
み出し専用メモリ（ROM）により構成すること
ができる。 The logarithmic conversion section 221 in FIG. 2 can be configured by, for example, a read-only memory (ROM).

この場合、入力バツフア２１の出力信号をその
アドレス信号として、アドレスａ_1j，ａ_2j，……
…，ａ_Ijに(1)式の演算結果であるデータａ′_１ｊ，ａ
′_２ｊ，………，ａ′_Ｉｊを書きこんでおき、制御部２
４
から与えられる読み出し信号にしたがつてこれを
読み出すようにすればよい。 In this case, the output signal of the input buffer 21 is used as the address signal, and the addresses a _1j , a _2j , . . .
…, a _Ij is the data a′ _1j , a which is the calculation result of equation (1)
' _2j , ......, a' _Ij are written and the control unit 2
4
This may be read out in accordance with a readout signal given from.

また、最大値検出部２２３は演算回路と、演算
結果を格納するレジスタから構成され、たとえば
ａ′_１ｊ，ａ′_２ｊ………，ａ′_Ｉｊの順にデータａ′
_ｉｊを減算回
路に入力し、レジスタに格納されているデータＲ
との減算をおこない、ａ′_ｉｊ−Ｒ＞０のときに限
り、レジスタの内容を更新し、新たにａ′_ｉｊを上記
レジスタに格納する処理をｉ＝１、２、……、Ｉ
について実行すればよい。 Further, the maximum value detection unit 223 is composed of an arithmetic circuit and a register that stores the arithmetic results, and for example, data a' is stored in the _order of a' _1j , a' _2j .
_ij is input to the subtraction circuit, and the data R stored in the register is
Then, only when a′ _ij −R>0, the contents of the register are updated and a′ _ij is newly stored in the register as i=1, 2, ..., I.
All you have to do is run it.

正規化部２２４は通常の除算器で構成できる。 The normalization unit 224 can be configured with a normal divider.

なお、対数変換・正規化部２２における上記演
算はソフトウエアにより実行するこも可能であ
る。 Note that the above calculation in the logarithmic transformation/normalization section 22 can also be executed by software.

第３図は第２図における本発明の回路を含む音
声認識システムの一実施例を示すブロツク構成図
で、第２図に記載された部分には同一番号を付し
てある。 FIG. 3 is a block diagram showing an embodiment of a speech recognition system including the circuit of the present invention shown in FIG. 2, in which the parts shown in FIG. 2 are given the same numbers.

入力音声は特徴抽出部３１において周波数分析
され、その結果抽出された特徴ベクトルa₁，a₂，
………，ａが時系列的に順次、前記入力バツフ
ア２１へ格納される。 The input speech is frequency-analyzed in the feature extraction unit 31, and the extracted feature vectors a ₁ , a ₂ ,
......, a are stored in the input buffer 21 in chronological order.

第２図において述べた過程の実行により出力バ
ツフア２３に格納された正規化データは認識部３
４に入力される。一方、音声の正規化された標準
パタンが標準パタンメモリ３２より順次読み出さ
れ、そのうちの１個が標準パタンバツフア３３を
通して、上記認識部３４に入力される。 The normalized data stored in the output buffer 23 by executing the process described in FIG.
4 is input. On the other hand, normalized standard patterns of speech are sequentially read out from the standard pattern memory 32, and one of them is inputted to the recognition section 34 through the standard pattern buffer 33.

認識部３４において、入力音声パタンに対応し
正規化データと、正規化された標準パタンとの類
似度が計算されて認識がこおこなわれ、認識結果
が端子３５に出力される。 The recognition unit 34 calculates the degree of similarity between the normalized data corresponding to the input speech pattern and the normalized standard pattern, performs recognition, and outputs the recognition result to the terminal 35.

第４図は本発明の前処理法を用いた場合と、従
来の前処理法を用いた場合との音声認識の結果の
分離度の差に関する実験データを示す。 FIG. 4 shows experimental data regarding the difference in the degree of separation of speech recognition results when the preprocessing method of the present invention is used and when the conventional preprocessing method is used.

第４図において、横軸は認識の際に付与される
重み量とし、縦軸は正しい認識結果が得られたと
きの最大類似度S₁と次大類似度S₂との比（S₁／
S₂）で与えられる分離度としたとき、（）〜
（）はそれぞれ下記の前処理をおこなつた場合
を示し、（）は従来の方法による場合を示す。 In Fig. 4, the horizontal axis represents _the amount of weight given during recognition, and _the vertical axis represents the ratio (S ₁ /
When the degree of separation is given by S ₂ ), () ~
() indicates the case where the following pre-processing was performed, and () indicates the case where the conventional method was used.

（）：ａ″_ｉｊ＝log（１＋ａ_ij／A₀）／Ｍ_j ただし、Ｍ_j＝Max｛log（１＋ａ_1j／A₀）、log（１＋ａ_2j／A₀、………、log（１＋ａ_Ij／A₀）｝（）：ａ″_ｉｊ＝log（１＋ａ_ij／A₀）（）：ａ″_ｉｊ＝ａ_ij／Max（ａ_1j，ａ_2j，………，ａ_Ij）第４図の実験結果から、本発明の前処理法によ
れば、類似度計算における重み量を適当に設定す
ることにより、他の方法より大きい分離度が得ら
れることがわかる。(): a″ _ij = log(1+a _ij /A ₀ )/M _j However, M _j =Max{log(1+a _1j /A ₀ ), log(1+a _2j /A ₀ ,……, log(1+a _Ij /A ₀ )} (): a″ _ij = log(1+a _ij /A ₀ ) (): a″ _ij = a _ij /Max(a _1j , a _2j , ………, a _Ij ) Experiment in Figure 4 From the results, it can be seen that according to the preprocessing method of the present invention, a greater degree of separation can be obtained than other methods by appropriately setting the weight amount in similarity calculation.

以上述べたように、本発明による前処理法は上
記特徴パタンの特徴を明確に抽出した新たな特徴
パタンへの変換を可能にするもので、上記前処理
法で得られた特徴パラメータは分離性が良く（級
内特徴が明確）、認識の信頼性を向上させる効果
があり有効である。上記の効果は、上記前処理正
規化法を組み入れた音声認識装置において認識率
が向上したという結果からも実証されている。 As described above, the preprocessing method according to the present invention makes it possible to convert the features of the above feature pattern into a new feature pattern that clearly extracts the features, and the feature parameters obtained by the above preprocessing method are separable. It is effective because it has good performance (intra-class characteristics are clear) and has the effect of improving recognition reliability. The above effect is also proven by the result that the recognition rate was improved in a speech recognition device incorporating the above preprocessing normalization method.

[Brief explanation of the drawing]

第１図は同一音声の話者による差をフイルタバ
ンク出力値により示す図、第２図は本発明の前処
理法を実現する回路の一実施例を示す図、第３図
は本発明の前処理法を用いた音声認識装置の一構
成例を示す図、第４図は入力音声パタンの分離度
に関する実験データを示す図である。２１：入力バツフア、２２：対数変換・正規化
部、２３：出力バツフア、２４：制御部。 FIG. 1 is a diagram showing differences between speakers of the same voice using filter bank output values, FIG. 2 is a diagram showing an embodiment of a circuit that implements the preprocessing method of the present invention, and FIG. 3 is a diagram showing the difference between speakers of the same voice. FIG. 4 is a diagram showing an example of the configuration of a speech recognition device using the processing method, and FIG. 4 is a diagram showing experimental data regarding the degree of separation of input speech patterns. 21: input buffer, 22: logarithmic conversion/normalization section, 23: output buffer, 24: control section.

Claims

[Claims]

1 Feature parameters a _ij (i = 1, 2, ......, I, j = 1, 2,
. . .J), and input (a _1j , a _2j , . . . a _Ij ) for each channel j from the means as one block data, and input a predetermined value for each parameter value in the block. a correction means for performing logarithmic conversion correction of the parameter; a maximum value detection means for detecting a large value of the parameter within one block data corrected by the correction means;
1. A preprocessing device for speech recognition, comprising normalizing means for normalizing the corrected one block data according to the maximum value of the parameter for each block.