JPH0219477B2

JPH0219477B2 -

Info

Publication number: JPH0219477B2
Application number: JP58069474A
Authority: JP
Inventors: Masao Watari
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1983-04-20
Filing date: 1983-04-20
Publication date: 1990-05-01
Also published as: JPS59195286A

Description

[Detailed description of the invention]

本発明は簡易形音声分析装置の改良に関する。通常、音声認識装置では、音声波形を分析し、
その分析出力である特徴パラメータの時系列とあ
らかじめ記憶されているパターンとの間で識別計
算を行い、認識結果を求めている。従来、この音
声認識装置に使用されている音声分析としてバン
ドスフイルタ分析やケプストラム分析や変形ケプ
ストラム分析があつた。音声波の声帯の振動によつて励振された声道か
らの放射出力であると考えることができ、音声信
号Ｇ（ｔ）は(1)式のように声道のインパルス応答
Ｒ（ｔ）と音源波形Ｓ（ｔ）の畳み込みで表わされ
る。Ｇ（ｔ）＝Ｒ（ｔ）＊Ｓ（ｔ） ……(1) ＊は畳み込み操作 (1)式をフーリエ変換すると Gf（ｗ）＝Rf（ｗ）×Sf（ｗ） ……(2) となる。音源特性Sf（ｗ）は周期的な線スペクト
ラムであり、声道特性Rf（ｗ）は、音声スペクト
ルGf（ｗ）のエンベロープである。このエンベロ
ープを得る方法として一定値以上の帯域幅を持つ
たバンドバスフイルタを音声帯域内に複数個並べ
るバンドバスフイルタ分析がある。一定値以上の
帯域幅を持つことにより音源特性であるスペクト
ルの影響を弱くし複数個並べることによりエンベ
ロープの全体の特性すなわち声道特性を得ること
ができていた。ところで、より精密な声道特性を得るために
は、バンドバスフイルタの帯域幅を細くする必要
があるが、細くすると音源スペクトルである線ス
ペクトルの影響が大きく表われてくる。このため
バンドパスフイルタの帯域幅はある程度以上細く
することはできず、バンドパスフイルタ分析では
より精密な声道特性を得ることはできなかつた。
一方、声道特性と音源特性を分離し、より精密な
声道特性を求める方法としてケプストラム分析が
ある。ケプストラム分析では、(2)式をさらにlog
変換し、 log｜Gf（ｗ）｜＝log｜Rf（ｗ）｜＋log｜Sf（ｗ） ……(3) 次に逆フーリエ変換によつてケプストラムを得
る。 Gc（ｑ）＝Rc（ｑ）＋Sc（ｑ） ……(4) この(4)式のように、スペクトラム領域での積が
ケプストラム領域では和となる。音源スペクトル
Sf（ｗ）である周期的な線スペクトルのケプスト
ラムSc（ｑ）は音源の周期Tpの近傍にのみ表われ
る。一方、声道スペクトルRf（ｗ）はGf（ｗ）の
エンベロープとして表われ、そのケプストラム
RC（ｑ）は低ケフレンシ部へ表われる。すなわ
ち、音声信号をケプストラム分析し、ケフレンシ
の低い成分に音源特性より分離された声道特性を
得ることができる。さらに特願昭56―069031号明細書（特開昭57―
185098号公報）に記載されているように音声のス
ペクトルの中より帯域内の周波数成分のみを切出
しゼロ周波数までシワトする切出し部を用けるこ
とにより、伝送路の帯域外の特性の影響を除去す
ることができる。また、周波数の高域部を圧縮す
る写像関数による周波数軸のスケール変換例えば
logスケール変換、Melスケール変換などを行う
スケール変換部を設けることにより、高域より低
域へ重みが置かれたすなわち人間の聴覚特性に近
い特性を持つた変形ケプストラムが得られる。前
記スケール変換は第１図に示すような写像関数
Sm＝Ｍ（Sl）により、伝送路の帯域内のみのデー
タをlogスケールまたはMelスケールへ並び換え
ることである。すなわとSl番目のスペクトルを
Sm番目の変形スペクトルとすることである。しかしながら、前記のバンドパスフイルタ分析
やケプストラム分析や変形ケプストラム分析は、
フーリエ変換を基本としており三角関数系との乗
算を必要とし装置が大型となる欠点があつた。一方、フーリエ変換の近似的な変換であるウオ
ルシユ変換は±１の２値の直交関数系による変換
であるため、ウオルシユスペクトルは加減算のみ
で求めることができる。このウオルシユ変換を用
いることにより特開昭57―700号公報に記載され
ているように小型の疑似バンドパスフイルタを実
現することが知られている。しかし疑似バンドパ
スフイルタではより細かで精密な声道特性を得ら
れない欠点があつた。本発明の目的は、ケプストラム分析における。
フーリエ変換と逆フーリエ変換を多値ウオルシユ
変換に置き換えることにより、ケプストラムの近
似値を得る小型の装置、すなわち、より細かで精
密な声道特性が得られかつ小型の音声分析装置を
提供することにある。本発明による音声分析装置は、入力信号の多値
ウオルシユ変換を行う第１多値ウオルシユ変換部
と、前記第１多値ウオルシユ変換部の出力よりウ
オルシユパワースペクトルを求めそのlog変換を
行うlog変換部と、ウオルシユ交番数軸のスケー
ル変換を行う写像関数Sm＝Ｍ（Sl）により前記
log変換部の出力を変形交番数軸へ写像するスケ
ール変換部と、前記スケール変換部の出力の多値
ウオルシユ変換を行う第２多値ウオルシユ変換部
を有している。次に本発明で使用する多値ウオルシユ変換につ
いて説明する。ウオルシユ変換はフーリエ変換に
おける直交関数系である三角関数を±１の２値の
関数であるウオルシユ関数への置き換えたもので
あり、加減算のみでフーリエ変換の近似値が得ら
れる。しかし三角関数を±１の２値関数へ近似さ
せているため、近似度合が悪かつた。一方、ウオ
ルシユ関数を多値化し複素数化させることにより
簡単な演算でフーリエ変換のよりよい近似値が得
られる多値ウオルシユ変換が同一出願人から昭和
58年４月11日に出願された特願昭58―63186号明
細書「多値ウオルシユ変換装置」に記載されてい
る。ここで多値ウオルシユ変換の原理について述
べる。すでに述べたとおり、ウオルシユ関数は、
三角関数を±１に量子化したものであるので、よ
り細かい量子化による多値ウオルシユ関数を導入
することによつて、よりフーリエスペクトルへ近
づけることができる。例えば、第１０図に示し
た。の８個の要素をもつ多値ウオルシユ関数が考えら
れる。しかし、この方法では The present invention relates to an improvement in a simplified speech analysis device. Normally, speech recognition devices analyze speech waveforms and
Discrimination calculations are performed between the time series of feature parameters that are the output of the analysis and pre-stored patterns to obtain recognition results. Conventionally, speech analyzes used in this speech recognition device include band filter analysis, cepstrum analysis, and modified cepstrum analysis. It can be considered to be the radiation output from the vocal tract excited by the vibration of the vocal cords of the voice wave, and the voice signal G(t) is expressed as the impulse response R(t) of the vocal tract as shown in equation (1). It is expressed by convolution of the sound source waveform S(t). G(t)=R(t)*S(t) ……(1) * is the convolution operation When formula (1) is Fourier transformed, Gf(w)=Rf(w)×Sf(w) ……(2) becomes. The sound source characteristic Sf(w) is a periodic line spectrum, and the vocal tract characteristic Rf(w) is an envelope of the voice spectrum Gf(w). As a method for obtaining this envelope, there is a band-pass filter analysis in which a plurality of band-pass filters having a bandwidth of a certain value or more are arranged in the audio band. Having a bandwidth above a certain value weakens the influence of the spectrum, which is a sound source characteristic, and by arranging multiple envelopes, it was possible to obtain the overall characteristics of the envelope, that is, the vocal tract characteristics. By the way, in order to obtain more precise vocal tract characteristics, it is necessary to narrow the bandwidth of the bandpass filter, but when it is narrowed, the influence of the line spectrum, which is the sound source spectrum, becomes more pronounced. For this reason, the bandwidth of the bandpass filter cannot be narrowed beyond a certain level, and more precise vocal tract characteristics cannot be obtained by bandpass filter analysis.
On the other hand, cepstral analysis is a method for separating vocal tract characteristics and sound source characteristics and obtaining more precise vocal tract characteristics. In cepstral analysis, equation (2) is further transformed into log
Transform, log|Gf(w)|=log|Rf(w)|+log|Sf(w)...(3) Next, obtain the cepstrum by inverse Fourier transformation. Gc (q) = Rc (q) + Sc (q) ... (4) As shown in equation (4), the product in the spectral domain becomes the sum in the cepstrum domain. sound source spectrum
The cepstrum Sc(q) of the periodic line spectrum Sf(w) appears only in the vicinity of the period Tp of the sound source. On the other hand, the vocal tract spectrum Rf(w) appears as an envelope of Gf(w), and its cepstrum
RC(q) appears in the low que frency part. That is, by performing cepstral analysis on the audio signal, it is possible to obtain vocal tract characteristics that are separated from the sound source characteristics into components with low quefrency. Furthermore, the specification of Japanese Patent Application No. 56-069031 (JP-A-57-069031)
As described in Publication No. 185098), by using an extraction section that extracts only the frequency components within the band from the voice spectrum and wrinkles them to zero frequency, the influence of the characteristics outside the band of the transmission path is removed. be able to. In addition, scale conversion of the frequency axis using a mapping function that compresses the high frequency part, for example,
By providing a scale conversion unit that performs log scale conversion, Mel scale conversion, etc., a modified cepstrum can be obtained in which weight is placed more on low frequencies than on high frequencies, that is, having characteristics close to human auditory characteristics. The scale conversion is performed using a mapping function as shown in Figure 1.
By using Sm=M(Sl), data only within the band of the transmission path is rearranged into log scale or Mel scale. The spectrum of Sl and
This is to be the Smth deformed spectrum. However, the above-mentioned bandpass filter analysis, cepstrum analysis, and modified cepstrum analysis
It is based on Fourier transform and requires multiplication with a trigonometric function system, which has the disadvantage of increasing the size of the device. On the other hand, since the Walsh transform, which is an approximation of the Fourier transform, is a transform using a binary orthogonal function system of ±1, the Walsh spectrum can be obtained only by addition and subtraction. It is known that by using this Walsh transform, a small pseudo bandpass filter can be realized as described in Japanese Patent Laid-Open No. 57-700. However, the pseudo bandpass filter had the disadvantage that finer and more precise vocal tract characteristics could not be obtained. The object of the invention is in cepstral analysis.
To provide a compact device for obtaining an approximate value of the cepstrum by replacing the Fourier transform and inverse Fourier transform with a multivalued Walsh transform, that is, a compact speech analysis device that can obtain finer and more precise vocal tract characteristics. be. The speech analysis device according to the present invention includes a first multi-value Walsh transform unit that performs a multi-value Walsh transform of an input signal, and a log transform that obtains a Walsh power spectrum from the output of the first multi-value Walsh transform unit and performs a log transform on the output of the first multi-value Walsh transform unit. and the mapping function Sm=M(Sl) that performs scale conversion of the Walsh alternating number axis.
It has a scale conversion section that maps the output of the log conversion section onto a modified alternating number axis, and a second multi-value Walsh conversion section that performs multi-value Walsh conversion of the output of the scale conversion section. Next, the multivalued Walsh transform used in the present invention will be explained. The Walsh transform replaces the trigonometric function, which is an orthogonal function system in the Fourier transform, with the Walsh function, which is a binary function of ±1, and an approximate value of the Fourier transform can be obtained only by addition and subtraction. However, since the trigonometric functions were approximated to binary functions of ±1, the degree of approximation was poor. On the other hand, the multivalued Walsh function, which can obtain a better approximation of the Fourier transform by simple calculations, was developed by the same applicant in the Showa era.
This method is described in Japanese Patent Application No. 1986-63186 entitled "Multiple Walsh Transform Device" filed on April 11, 1958. Here, we will discuss the principle of multivalued Walsh transform. As already mentioned, the Walsh function is
Since it is a trigonometric function quantized to ±1, by introducing a multivalued Walsh function with finer quantization, it can be made closer to the Fourier spectrum. For example, it is shown in FIG. A multivalued Walsh function with eight elements is considered. But with this method

【式】などの要素を持つため、その変換には乗算を必要
とする。特願昭58―63186号明細書では、８値ウオルシ
ユ変換の場合には、第１１図に示したように、（１，１＋ｊ，ｊ，−１＋ｊ，−１，−１，−ｊ，
−ｊ，１−ｊ）の８個の要素を用いている。この関数系による８
値ウオルシユ変換の演算は、±１，±ｊとの積の間
の演算であるので、加減算のみで実行できる。また、同様の考え方により、16値ウオルシユ変
換は、第１２図に示したような、（１，１＋１／２ｊ，１＋ｊ，１／２＋ｊ，ｊ，− １／２＋ｊ，−１＋ｊ，−１＋１／２ｊ，−１，−１− １／２ｊ，−１−ｊ，−１／２−ｊ，−ｊ，１／２−ｊ
，１−ｊ，１−１／２ｊ）の16個要素を用いる。これによる関数系を用い
る16値ウオルシユ変換の演算は、±１，±１／２，± ｊ，±１／２ｊとの積の演算であるため、シフタによる1/2化と加減算のみで実行でき、実質的に乗算
は不要である。入力時系列を逆２進順に並べた列ベクトルを
Ｘ、多値ウオルシユスペクトルをＷ、変換行列を
Ｃとすれば、Ｗ＝Ｃ・Ｘ＝G_o・G_o-1……G₁・Ｘ ……(6) ｎ回の行列の積として表現できる。ここで各G_iは
(7)，(8)，(9)式により決定される。 G_i＝E_iI_o-i ただしはクロネツカー積である L_i＝diag（1.〔ai〕．〔a² _i〕．…．〔a^2i-1 ₁〕）…
…(9) ただしI_iは2ⁱ行2ⁱ列の単位行列であり、diag（）
は括孤内を対角要素とする対角行列である。ここで〔a_i〕は多値化の数によつて決定され、
８値の場合は a_i＝ｅ×ｐ（jπ／2i）、a^k _i＝ｅ×ｐ（jθ）とし〔ｅ×ｐ（jθ）〕＝１、０θ＜π／４のとき＝１＋ｊ、π／４θ＜π／２のとき＝ｊ、π／２θ＜3π／４のとき＝１＋ｊ、3π／４θ＜πのときとする。また、16値の場合は〔ｅ×Ｐ（jθ）〕＝１、０θ＜π／８のとき＝１＋１／２ｉ、π／８θ＜π／４のとき＝１＋ｊ、π／４θ＜3π／８のとき＝１／２＋ｊ、3π／８θ＜π／２のとき＝ｊ、π／２θ＜5π／８のとき＝−１／２＋ｊ、5π／８θ＜3π／４のとき＝１＋ｊ、3π／４θ＜7π／８のとき＝−１＋１／２ｊ、7π／８θ＜πのときとする。また、逆２進順とは自然数を２進表現し、その
桁桁を逆転させた数を考え、その数の順序に並べ
ることであり、ｎ＝３の場合Ｘ＝（X₀X₄X₂X₆X₁X₅X₃X₇）となる。さらに
８値ウオルシユ変換の場合各G_iはとなる。これらG_iの各行ともゼロでない要素は２
つのみであり高速フーリエ変換で用いられるバタ
フライ演算と同形の演算にて求められることを示
している。このゼロでない要素は（±1.±ｊ）で
あるため複素数の加減算のみで実行できる。さら
に16値ウオルシユ変換の場合ゼロでない要素は
（±１／±１／２、±ｊ／±１／２ｊ）であるためシフ
ト演算と複素数の加減算のみで実行できる。また、
この時得られる多値ウオルシユスペクトルのW_i
とW_o/2-i（Ｎ＝2ⁿ）は共役複素数となる。本発明の音声分析装置は、ケプストラム分析に
おけるフーリエ変換と逆フーリエ変換を多値ウオ
ルシユ変換へ置き換えることにより、加減算器等
による簡単な演算器で構成できる利点を持つてい
る。さらにウオルシユ変換を用いた疑似バンドパ
スフイルタ分析装置に比較し、より細かで精密な
声道特性が得られる利点を持つている。次に本発明の装置の具体的な構成を図面を参照
しながら説明する。本発明の実施例は第２図に示すように、第１バ
ツフアメモリ部１、第１多値ウオルシユ演算部
２、第１多値ウオルシユ変換制御部３、log変換
部４、スケール変換部５、第２バツフアメモリ
６、第２多値ウオルシユ演算部７、第２多値ウオ
ルシユ変換制御部８より構成される。始めに入力
時系列データが第１バツフアメモリ部１へ入力さ
れ一時記憶される。記憶された後、第４図に示し
たｎ＝４の場合の計算の流れ図に従つた第１多値
ウオルシユ変換制御部３の制御信号により、第１
段より第ｎ段まで計算が進められる。第ｉ段の処
理は、第４図に示した第ｉ段の2^n-1個のバタフラ
イ演算を実行することであり、(7)式のG_iの行列を
乗ずることを意味している。バタフライ演算は Y_a＝X_a＋X_b・a_k Y_b＝X_a−X_b・a_k ……(10) であり、第３図に示す第１多値ウオルシユ演算部
２にて求められる。バタフライ演算では始めに
X_a．X_bが第１バツフアメモリ部１より読み出さ
れ、X_aの実数部と虚数部がレジスタ２０１，２
０２へ、X_bの実数部と虚数部がレジスタ２０３，
２０４へそれぞれ一時格納される。X_a・a_kの複
素数乗算は８値ウオルシユ変換の場合は次の４通
りの加減算にて実行される。（z_R＋jz_I）＝（X_bR＋jX_bI）・a_kとし、 a₀＝１のとき z_R＝X_bR z_I＝X_bI a₂＝１＋ｊのとき z_R＝X_bR−X_bI z_I＝X_bR＋X_bI a₄＝ｊのとき z_R＝−X_bI z_I＝X_bR a₀＝１＋ｊのとき z_R＝−X_bR−X_bI z_I＝X_bR−X_bI ……(11) 第４図の計算の流れ図中のa₁.a₃.a₅.a₇はそれぞ
れa₀.a₂.a₄.a₆と同じ値である。(11)式の演算は第１
多値ウオルシユ変換制御部３の制御信号のもとで
スイツチ２１１と加減算器２２１と２２２により
求められる。すなわちスイツチ２１１は加減算器
２２１と２２２の入力をX_bR，X_bI、ゼロのどれ
かを選択し、加減算器２２１と２２２は加算又は
減算又は加算符号反転を行い前記(11)式の演算を行
う。つづいて(10)式の加算および減算が実数部、虚
数部に分けて加算器２３１と２３２および減算器
２３３と２３４にて行われる。得られた結果Y_a，
Y_bは第１バツフアメモリ部１のX_a，X_bが記憶さ
れていた場所へ書かれる。最終段である第ｎ段ま
で前記処理が終了すると、第１バツフアメモリ１
に多値ウオルシユスペクトラムが得られる。多値ウオルシユ変換が終了した後、第５図に示
すlog変換部４とスケール変換部５によつて、log
パワー多値ウオルシユスペクトルが求められ、ス
ケール変換を行う写像関数S_n＝Ｍ（S_e）により変
形交番数軸へ写像を行う。すなわち、スケール変
換制御部５１は第６図に示すタイムチヤートに従
つた制御信号を発し、始めに第１バツフアメモリ
部１より信号a₁に従つて多値ウオルシユスペクト
ルの偶数項W_2iと奇数項W_2i+1を順次読み出し、
log変換部４の乗算器４１で２乗され加算器４２
とアキユムレータ４３を用いてパワー多値ウオル
シユスペクトル（Pi＝W² _2i＋W² _2i+1）が求められ、
つづいてlog変換部４にてlog変換され、信号a2を
アドレスとして指示された写像関数値Ｍ（ｉ）を
写像関数テーブルメモリ部５２より読み出し、そ
の出力Ｍ（ｉ）を第２バツフアメモリ部６のアド
レス信号a3としてlogパワー多値ウオルシユスペ
クトル（logP_i）を第２バツフアメモリ部６のＭ
（ｉ）番地へ格納する。第２の多値ウオルシユ変
換は第１の多値ウオルシユ変換と同様に動作し第
２バツフアメモリ部６、第２多値ウオルシユ変換
部７、第２ウオルシユ変換制御部８にて実行され
る。なお、以上の説明では、スケール変換後に再
び多値ウオルシユ変換を行なうものとして説明し
たが、このスケール変換後（正確には絶対値化し
た直後）には、その信号は縦軸に対称な実関数と
なつている。一般に、縦軸に対称な実関数のフー
リエ変換とフーリエ逆変換は、同じ結果を与え
る。この多値ウオルシユ変換においても同様のこ
とが成立する。したがつて、すでに述べた実施例
において、第２多値ウオルシユ変換部７が、多値
逆ウオルシユ変換を行なわせたとしても、結果は
同一となる。ところで、通常音声認識ではケプストラムの低
次の項のみ使用するため、第２の多値ウオルシユ
変換は低次の項のみ計算すればよい。そのため第
２の多値ウオルシユ変換(6)式の変換行列の低次の
項のみすなわち W_k＝_N-1 〓^l=0 H_kl・X_l ……(12) の小さいｋについてのみ計算すればよい。ここで
さらにXlは偶関数であるのでH_klの実数部である
H′_klを使用すればよい。 W_k＝_N-1 〓^l=0 H′_kl・X_l ……(13) 本発明の第２の実施例は、第２のウオルシユ変
換を（13）式にて求める装置である、第１の実施
例における第２多値ウオルシユ変換制御部８、第
２多値ウオルシユ演算部７を第７図に示す構成へ
変更したものである。第２多値ウオルシユ変換制
御部７は第８図に示すタイムチヤートに従つた制
御信号を発し、信号cl７によつてアキユムレータ
７２をクリヤし、信号ｋ１に従つて第２バツフア
メモリ部６より変形logパワー多値ウオルシユス
ペクトルX_l＝logPM（ｉ）を読み出し、多値ウオ
ルシユ変換行列の実数部H′_klに従つた＋１または
−１の信号ｂ２により加減算器７１はアキユムレ
ータ７２との間で加算または減算を行う。すなわ
ち信号ｂ２が＋１の場合はACC＋X_l→ACCを行
い、信号ｂ２が−１の場合はACC―X_l→ACCを
行う。信号ｂ１がＮ―１となつた時アキユムレー
タ７２へウオルシユ変換値W_kすなわち疑似ケプ
ストラムが得られる。次に本発明の第３の実施例は多値ウオルシユ変
換として16値ウオルシユ変換を採用した場合の装
置であり、第１の実施例における第１多値ウオル
シユ演算部を第９図に示す構成へ変更したもので
ある。計算は第１の実施例と同様に進められる。
第１の実施例と異なる点はバタフライ演算におけ
る乗算要素a_kの値が８種類あることである。(10)式
における複素数乗算は次の８通りの演算にて実行
される。（z_R＋jz_I）＝（X_bR＋jX_bI）・a_kとし a₀＝１のとき z_R＝X_bR z_I＝X_bI a₁＝１＋１／２ｊのとき z_R＝X_bR−１／２X_bI z_I＝１／２X_bR＋X_bI a₂＝１＋ｊのとき z_R＝X_bR−X_bI z_I＝X_bR＋X_bI a₃＝１／２＋ｊのとき z_R＝１／２X_bR−X_bI z_I＝X_bR＋１／２X_bI a₄＝ｊのとき z_R＝−X_bI z_I＝X_bR a₅＝−１／２＋ｊのとき z_R＝−１／２X_bR−X_bI Z_I＝X_bR−１／２X_bI a₆＝−１＋ｊのとき z_R＝−X_bR−X_bI z_I＝X_bR−X_bI a₇＝−１＋１／２ｊのとき z_R＝−X_bR−１／２X_bI z_I＝１／２X_bR−X_bI ……(14) シフタ２４１と２４２は１ビツト右シフトする
ことにより１／２X_bRおよび１／２X_bIを求め、スイツチ２１２は加減算器２２１と２２２の入力を
X_bR．X_bI．１／２X_bR．１／２X_bI．ゼロのどれかを選択し、加減算器２２１と２２２にて加算又は減算
と符号反転を行い(11)式の複素数乗算を実行する。以上本発明を実施例に基づき説明したが、これ
らの記載は本発明の範囲を限定するものではな
い。特に本発明の実施例ではFWTのアルゴリズ
ムとして(6)式に示すように入力時系を逆２進順に
並べG₁よりG_oまで順次積を取り求めていたが、
（13）式に示すような正順序の時系列X′とG_o ^T、
よりG₁ ^Tまで順次積を取り、その結果として逆２
進順のウオルシユスペクトルW′を得る方法も採
用できることは明白である。 W′＝G_1T・G_2T・…G_o ^T・X′ ……(15) また、パワースペクトルをP_i＝W² _2i＋W² _2i＋１と
して求めるが、乗算器を必要としているためP_i＝
｜W_2i｜＋｜W_2i＋１｜のように絶対値の和とし
てパワースペクトルを近似的に求める方法も採用
できることは明白である。Since it has elements such as [expression], multiplication is required for its conversion. In the specification of Japanese Patent Application No. 58-63186, in the case of 8-value Walsh transform, as shown in FIG.
-j, 1-j) are used. 8 due to this function system
Since the value Walsh conversion is an operation between the products of ±1 and ±j, it can be executed only by addition and subtraction. Also, based on the same idea, the 16-value Walsh transform is as shown in Fig. 12, -1, -1- 1/2j, -1-j, -1/2-j, -j, 1/2-j
, 1-j, 1-1/2j) are used. The 16-value Walsh transform using this function system is a multiplication operation with ±1, ±1/2, ±j, ±1/2j, so it can be performed only by halving using a shifter and addition/subtraction. , virtually no multiplication is required. If the column vector of the input time series arranged in reverse binary order is X, the multilevel Walsh spectrum is W, and the transformation matrix is C, then W=C・X = G _o・G _o-1 ...G ₁・X ...(6) It can be expressed as a product of n matrices. Here each G _i is
Determined by equations (7), (8), and (9). G _i = E _i I _oi where is the Kronetzker product L _i = diag (1.[ai].[a ² _i ].….[a ^2i-1 ₁ ])…
…(9) where I _i is an identity matrix with 2 ⁱ rows and 2 ⁱ columns, and diag( )
is a diagonal matrix whose diagonal elements are inside the parentheses. Here, [a _i ] is determined by the number of multilevel conversions,
In the case of 8 values, a _i = e x p (jπ/2i), a ^k _i = e x p (jθ), [e x p (jθ)] = 1, when 0θ < π/4 = 1 + j, π When /4θ<π/2 =j, when π/2θ<3π/4 =1+j, and when 3π/4θ<π. In addition, in the case of 16 values, [e×P(jθ)] = 1, when 0θ<π/8 = 1+1/2i, when π/8θ<π/4 = 1+j, when π/4θ<3π/8 When = 1/2 + j, when 3π/8θ < π/2 = j, when π/2θ < 5π/8 = -1/2 + j, when 5π/8θ < 3π/4 = 1 + j, 3π/4θ < 7π /8=−1+1/2j, 7π/8θ<π. In addition, reverse binary order means to express a natural number in binary, consider a number with its digits reversed, and arrange it in the order of that number. In the case of n=3, X=(X ₀ X ₄ X ₂ X ₆ X ₁ X ₅ X ₃ X ₇ ). Furthermore, in the case of 8-level Walsh transform, each G _i is becomes. The number of non-zero elements in each row of G _i is 2
This shows that it can be obtained by an operation isomorphic to the butterfly operation used in fast Fourier transform. Since this non-zero element is (±1.±j), it can be executed only by adding and subtracting complex numbers. Furthermore, in the case of 16-value Walsh transform, the non-zero elements are (±1/±1/2, ±j/±1/2j), so it can be executed only by shift operations and addition/subtraction of complex numbers. Also,
W _i of the multilevel Walsh spectrum obtained at this time
and W _o/2-i (N=2 ⁿ ) are conjugate complex numbers. The speech analysis device of the present invention has the advantage that it can be configured with a simple arithmetic unit such as an adder/subtractor by replacing the Fourier transform and inverse Fourier transform in cepstral analysis with a multivalued Walsh transform. Furthermore, compared to pseudo-bandpass filter analyzers that use Walsh transform, it has the advantage of providing more detailed and precise vocal tract characteristics. Next, the specific configuration of the apparatus of the present invention will be explained with reference to the drawings. The embodiment of the present invention, as shown in FIG. It is composed of a 2-buffer memory 6, a second multi-valued Walsh calculation section 7, and a second multi-valued Walsh conversion control section 8. First, input time series data is input to the first buffer memory section 1 and temporarily stored. After being stored, the first
The calculation proceeds from stage to nth stage. The processing at the i-th stage is to execute the 2 ^n-1 butterfly operations of the i-th stage shown in FIG. 4, and means to multiply by the matrix of G _i in equation (7). The butterfly operation is Y _a =X _a +X _b ·a _k Y _b =X _a −X _b ·a _k (10), and is obtained by the first multivalued Walsh arithmetic unit 2 shown in FIG. At the beginning of butterfly operation
X _a . X _b is read from the first buffer memory section 1, and the real part and imaginary part of X _a are stored in registers 201 and 2.
02, the real part and imaginary part of X _b are stored in the register 203,
204, respectively. Complex number multiplication of X _a · a _k is performed by the following four types of addition and subtraction in the case of 8-value Walsh transform. (z _R + jz _I ) = (X _bR + jX _bI ) · a _k , when a ₀ = 1 z _R = X _bR z _I = X _bI when a ₂ = 1 + j z _R = X _bR −X _bI z _I ＝X _bR ＋X _bI a ₄ When = j z _R = −X _bI z _I ＝X _bR a ₀ When = 1 + j z _R = −X _bR −X _bI z _I ＝X _bR −X _bI ...(11) In the calculation flowchart of FIG. 4, a ₁ .a ₃ .a ₅ .a ₇ are the same values as a ₀ .a ₂ .a ₄ .a ₆ , respectively. The calculation of equation (11) is the first
It is determined by the switch 211 and the adders/subtractors 221 and 222 under the control signal of the multilevel Walsh conversion control section 3. In other words, the switch 211 selects the inputs of the adders/subtractors 221 and 222 from X _bR , X _bI , or zero, and the adders/subtractors 221 and 222 perform addition, subtraction, or addition sign inversion to perform the calculation of equation (11) above. . Subsequently, addition and subtraction in equation (10) are performed by adders 231 and 232 and subtracters 233 and 234, dividing into real and imaginary parts. The obtained result Y _a ,
Y _b is written to the location in the first buffer memory section 1 where X _a and X _b were stored. When the processing is completed up to the n-th stage, which is the final stage, the first buffer memory 1
A multivalued Walsh spectrum is obtained. After the multivalued Walsh transform is completed, the log conversion unit 4 and scale conversion unit 5 shown in FIG.
A power multilevel Walsh spectrum is obtained, and is mapped onto a modified alternating number axis using a mapping function S _n =M (S _e ) that performs scale conversion. That is, the scale conversion control section 51 issues _a control signal according to the time chart shown in _FIG . Read W _2i+1 sequentially,
Squared by the multiplier 41 of the log conversion unit 4 and added by the adder 42
The power multilevel Walsh spectrum (Pi=W ² _2i +W ² _2i+1 ) is obtained using the and accumulator 43,
Subsequently, the mapping function value M(i), which is log-converted in the log conversion section 4 and designated using the signal a2 as an address, is read out from the mapping function table memory section 52, and the output M(i) is stored in the second buffer memory section 6. The log power multilevel Walsh spectrum (logP _i ) is stored in the second buffer memory section 6 as the address signal a3.
(i) Store at address. The second multi-value Walsh conversion operates in the same manner as the first multi-value Walsh conversion, and is executed by the second buffer memory section 6, the second multi-value Walsh conversion section 7, and the second Walsh conversion control section 8. Note that the above explanation assumes that the multivalued Walsh transform is performed again after scale conversion, but after this scale conversion (more precisely, immediately after converting to an absolute value), the signal becomes a real function that is symmetrical about the vertical axis. It is becoming. Generally, the Fourier transform and inverse Fourier transform of a real function symmetric about the vertical axis give the same result. The same thing holds true in this multivalued Walsh transformation. Therefore, in the embodiments already described, even if the second multi-value Walsh transform section 7 performs the multi-value inverse Walsh transform, the result will be the same. By the way, in normal speech recognition, only the low-order terms of the cepstrum are used, so the second multi-level Walsh transform only needs to calculate the low-order terms. Therefore, if we calculate only the low-order terms of the transformation matrix of the second multivalued Walsh transformation equation (6), that is, only the small k of W _k = _N-1 〓 ^l=0 H _kl・X _l ...(12) good. Furthermore, since Xl is an even function, it is the real part of H _kl .
Just use H′ _kl . W _k = _N-1 〓 ^l=0 H′ _kl・X _l ...(13) The second embodiment of the present invention is a device for calculating the second Walsh transform using equation (13). In this embodiment, the second multi-valued Walsh conversion control section 8 and the second multi-valued Walsh calculation section 7 in the embodiment are changed to the configuration shown in FIG. The second multi-level Walsh conversion control section 7 issues a control signal according to the time chart shown in FIG. The adder/subtractor 71 reads out the multi-valued Walsh spectrum X _l =logPM(i) and performs addition or subtraction with the accumulator 72 using the +1 or -1 signal b2 according to the real part H′ _kl of the multi-valued Walsh transform matrix. I do. That is, when the signal b2 is +1, ACC+X _l →ACC is performed, and when the signal b2 is -1, ACC-X _l →ACC is performed. When the signal b1 becomes N-1, a Walsh transform value _Wk , that is, a pseudo cepstrum is obtained in the accumulator 72. Next, a third embodiment of the present invention is an apparatus in which a 16-value Walsh transform is adopted as the multi-value Walsh transform, and the first multi-value Walsh calculation section in the first embodiment is changed to the configuration shown in FIG. This has been changed. The calculation proceeds in the same manner as in the first embodiment.
The difference from the first embodiment is that there are eight types of values for the multiplication element a _k in the butterfly operation. Complex number multiplication in equation (10) is performed using the following eight operations. (z _R + jz _I ) = (X _bR + jX _bI )・a _k When a ₀ = 1 z _R = X _bR z _I = X _bI When a ₁ = 1 + 1/2j z _R = X _bR - 1/2X _bI z _I =1/2X _bR +X _bI a ₂ =1+j z _R =X _bR −X _bI z _I =X _bR +X _bI a ₃ =1/2+j z _R =1/2X _bR −X _bI z _I = X _bR ₊ 1/2X _bI a ₄ When ₌ j z _R ₌ -X _bI z _I = X _bR a ₅ = -1/2 + j When z _R = _-1/2 -1/2X _bI a ₆ When =-1+j z _R =-X _bR -X _bI z _I =X _bR -X _bI a ₇ When =-1+1/2j z _R =-X _bR -1/2X _bI z _I = 1 _/ _2x _bR _-
_XbR ． _XbI ． 1/2X _bR . 1/2X _bI . One of the zeros is selected, and adder/subtractor 221 and 222 perform addition or subtraction and sign inversion to execute complex multiplication of equation (11). Although the present invention has been described above based on Examples, these descriptions do not limit the scope of the present invention. In particular, in the embodiment of the present invention, as an FWT algorithm, the input time series is arranged in reverse binary order as shown in equation (6), and the products are sequentially obtained from G ₁ to G _o .
(13) The forward-ordered time series X′ and G _o ^T as shown in equation (13),
Take the products sequentially up to G ₁ ^T , and as a result, the inverse 2
It is obvious that the method of obtaining the progressive Walsh spectrum W′ can also be adopted. W′=G _1T・G _2T・…G _o ^T・X′……(15) Also, the power spectrum is obtained as P _i =W ² _2i +W ² _2i +1, but since a multiplier is required, P _i =
It is obvious that a method of approximately obtaining the power spectrum as a sum of absolute values such as |W _2i |+|W _2i +1| can also be adopted.

[Brief explanation of drawings]

第１図はスケール変換を示す図であり、第２図
は本発明の第１の実施例のブロツク図であり、第
３図は第１多値ウオルシユ演算部２のブロツク図
であり、第４図は第１多値ウオルシユ変換の計算
の流れの図であり、第５図はlog変換部４とスケ
ール変換部５のブロツク図であり、第６図はスケ
ール変換のタイムチヤートであり、第７図は本発
明の第２の実施例における第２多値ウオルシユ演
算部２のブロツク図であり、第８図は第２多値ウ
オルシユ変換のタイムチヤートであり、第９図は
本発明の第３の実施例における第１多値ウオルシ
ユ演算部２のブロツク図、第１０図，第１１図，
第１２図は本発明で用いる多値ウオルシユ変換を
説明するための図である。図において、１は第１バツフアメモリ部、２は
第１多値ウオルシユ演算部、３は第１多値ウオル
シユ変換制御部、４はlog変換部、６は第２バツ
フアメモリ部、７は第２多値ウオルシユ演算部、
８は第２多値ウオルシユ変換制御部２０１，２０
２，２０３，２０４はレジスタ、２１１，２１２
はスイツチ、２２１，２２２は加減算器、２３
１，２３２は加算器、２３３，２３４は減算器、
２４１，２４２はシフタである。第５図において
４１は乗算器、４２は加算器、４３はアキムレー
タ、４４はlog変換器、５１はスケール変換制御
部、５２は写像関数テーブルメモリ、７１は加算
器、７２はアキユムレータである。 FIG. 1 is a diagram showing scale conversion, FIG. 2 is a block diagram of the first embodiment of the present invention, FIG. The figure shows the calculation flow of the first multi-level Walsh transform, FIG. 5 is a block diagram of the log converter 4 and the scale converter 5, FIG. The figure is a block diagram of the second multi-level wallet calculation unit 2 in the second embodiment of the present invention, FIG. 8 is a time chart of the second multi-level wallet conversion, and FIG. A block diagram of the first multivalued Walsh calculation unit 2 in the embodiment of FIG. 10, FIG. 11,
FIG. 12 is a diagram for explaining the multivalued Walsh transform used in the present invention. In the figure, 1 is a first buffer memory section, 2 is a first multi-value Walsh calculation section, 3 is a first multi-value Walsh conversion control section, 4 is a log conversion section, 6 is a second buffer memory section, and 7 is a second multi-value Walsh arithmetic unit,
8 is a second multi-value Walsh conversion control unit 201, 20
2, 203, 204 are registers, 211, 212
is a switch, 221, 222 is an adder/subtractor, 23
1,232 is an adder, 233,234 is a subtracter,
241 and 242 are shifters. In FIG. 5, 41 is a multiplier, 42 is an adder, 43 is an accumulator, 44 is a log converter, 51 is a scale conversion control section, 52 is a mapping function table memory, 71 is an adder, and 72 is an accumulator.

Claims

[Claims]

1. A first multi-value Walsh transform unit that performs a multi-value Walsh transform of an input signal; a Walsh power spectrum is obtained from the output of the first multi-value Walsh transform unit; Mapping function Sm = M (Sl) that performs scale conversion
A speech analysis device comprising: a scale conversion unit that maps the output of the log conversion unit onto a modified alternating number axis; and a second multi-value Walsh conversion unit that performs multi-value Walsh conversion of the output of the scale conversion unit. .