JPH0646359B2

JPH0646359B2 - Word speech recognizer

Info

Publication number: JPH0646359B2
Application number: JP59023282A
Authority: JP
Inventors: 光生下谷; 昌弘日比野
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1984-02-10
Filing date: 1984-02-10
Publication date: 1994-06-15
Anticipated expiration: 2009-06-15
Also published as: JPS60166993A

Description

【発明の詳細な説明】［発明の技術分野］この発明は単語音声認識装置に関し、特に、音声の特徴
抽出をピッチの定数倍の共振周波数を有するディジタル
フィルタで行なうようななピッチ適応型の単語音声認識
装置における特徴パラメータ抽出方法の改良に関する。Description: TECHNICAL FIELD OF THE INVENTION The present invention relates to a word voice recognition device, and more particularly to a pitch adaptive word such that a voice feature is extracted by a digital filter having a resonance frequency that is a constant multiple of the pitch. The present invention relates to improvement of a method for extracting characteristic parameters in a voice recognition device.

［従来技術］第１図は従来のピッチ適応式単語音声認識装置の電気的
構成を示すブロック図である。まず、第１図を参照して
従来の単語音声認識装置の構成について説明する。第１
図において、マイクロホン１１から入力された音声信号
は、マイクロホンアンプ１２に入力されて増幅された
後、ＡＧＣ回路１３に与えられる。ＡＧＣ回路１３は、
入力信号の大きさが変動しても、一定出力が得られるよ
うに、その内部に設けられた増幅器の利得を自動的に制
御するものである。ＡＧＣ回路１３の出力は、Ａ／Ｄ変
換回路１４に与えられ、ディジタル信号に変換される。
Ａ／Ｄ変換回路１４の出力は、波形メモリ１５に与えら
れる。この波形メモリ１５は、１フレームの入力波形デ
ータを一時記憶するものである。波形メモリ１５の出力
は特徴抽出部２に与えられる。[Prior Art] FIG. 1 is a block diagram showing an electrical configuration of a conventional pitch adaptive word speech recognition apparatus. First, the configuration of a conventional word voice recognition device will be described with reference to FIG. First
In the figure, an audio signal input from a microphone 11 is input to a microphone amplifier 12, amplified, and then given to an AGC circuit 13. The AGC circuit 13 is
The gain of the amplifier provided inside is automatically controlled so that a constant output can be obtained even if the magnitude of the input signal changes. The output of the AGC circuit 13 is given to the A / D conversion circuit 14 and converted into a digital signal.
The output of the A / D conversion circuit 14 is given to the waveform memory 15. The waveform memory 15 temporarily stores one frame of input waveform data. The output of the waveform memory 15 is given to the feature extraction unit 2.

特徴抽出部２はピッチ周期抽出回路２１とフィルタ係数
設定回路２２とディジタルフィルタ２３とレベルル計算
回路２５と始終端検出回路６を含む。ピッチ周期抽出回
路２１は波形メモリ１５に一時記憶された１フレームの
入力音声波形におけるピッチ周波数を抽出するものであ
る。フィルタ係数設定回路２２はピッチ周期抽出回路２
１によって抽出された入力音声波形のピッチ周波数に基
づいて、フィルタの共振周波数がピッチ周波数の整数倍
となるようにフィルタ係数を設定するためのものであ
る。ディジタルフィルタ２３はフィルタ係数設定回路２
１で設定されたフィルタ係数に応じてフィルタ係数を決
定する。レベル計算回路２５は波形メモリ１５に一時記
憶された入力音声波形レベルを計算するものである。こ
のレベル計算回路２５の出力は認識処理部５と始終端検
出回路６とに与えられる。始終端検出回路６はレベル計
算回路２５で計算されたレベルを用いて入力音声信号の
始終端を検出するものである。The feature extraction unit 2 includes a pitch period extraction circuit 21, a filter coefficient setting circuit 22, a digital filter 23, a level calculation circuit 25, and a start / end detection circuit 6. The pitch period extraction circuit 21 extracts the pitch frequency in the input speech waveform of one frame temporarily stored in the waveform memory 15. The filter coefficient setting circuit 22 is a pitch period extraction circuit 2
Based on the pitch frequency of the input speech waveform extracted by 1, the filter coefficient is set so that the resonance frequency of the filter is an integral multiple of the pitch frequency. The digital filter 23 is the filter coefficient setting circuit 2
The filter coefficient is determined according to the filter coefficient set in 1. The level calculation circuit 25 calculates the input voice waveform level temporarily stored in the waveform memory 15. The output of the level calculation circuit 25 is given to the recognition processing section 5 and the start / end detection circuit 6. The start / end detection circuit 6 detects the start / end of the input audio signal using the level calculated by the level calculation circuit 25.

認識処理部５に関連して設けられる入力パターンメモリ
３はは特徴抽出部２で分析さされた音声の特徴パラメー
タを一時記憶するものである。また、登録パターンメモ
リ４は登録時に分析抽出された登録語の特徴パラメータ
あるいは標準音声の特徴パラメータを記憶して、認識処
理部５に与えるためのものである。認識処理部５は入力
パターンメモリ３に記憶された特徴パラメータと登録パ
ターンメモリ４に予め登録されている特徴パラメータと
を用いて認識処理を行なう。なお、認識処理部５はたと
えばマイクロプロセッサを中心にして構成される。The input pattern memory 3 provided in association with the recognition processing unit 5 temporarily stores the characteristic parameters of the voice analyzed by the characteristic extraction unit 2. Further, the registered pattern memory 4 is for storing the characteristic parameter of the registered word or the characteristic parameter of the standard voice analyzed and extracted at the time of registration and giving it to the recognition processing section 5. The recognition processing unit 5 performs recognition processing using the characteristic parameters stored in the input pattern memory 3 and the characteristic parameters registered in advance in the registered pattern memory 4. The recognition processing unit 5 is mainly composed of a microprocessor, for example.

上述の第１図に示した単語音声認識装置においては、音
声波形を一定時間のフレームに分割し、フレームごとに
ピッチ周波数に整数倍における周波数スペクトラムを特
徴パラメータとして抽出する。認識処理時には、登録単
語と入力単語のフレーム間における音韻の差異を表わす
数値として、比較すべきフレーム間の距離が特徴パラメ
ータを用いて計算され、この値を使用してマッチング処
理が行なわれる。In the word voice recognition device shown in FIG. 1 described above, the voice waveform is divided into frames of a fixed time, and the frequency spectrum at an integer multiple of the pitch frequency is extracted as a characteristic parameter for each frame. During the recognition process, the distance between the frames to be compared is calculated as a numerical value indicating the difference in phoneme between the frame of the registered word and the frame of the input word, and the matching process is performed using this value.

第２図は第１図に示した認識処理部５に含まれる距離計
算部の構成を示すブロック図である。第２図において、
距離計算部５１は入力パターンメモリに記憶されている
分析された音声の特徴パラメータと、登録パターンメモ
リ４に記憶されている標準音声の特徴パラメータにおけ
るフレーム間の距離を計算するものである。この距離計
算部５１はパラメータ一時記憶メモリ５１１と５１２と
チェビシェフ距離計算回路５１３とを含む。パラメータ
一時記憶メモリ５１１は入力パターンメモリに記憶され
ている１フレーム分の特徴パラメータを一時記憶するも
のであり、パラメータ一時記憶メモリ５１２は登録パタ
ーンメモリに記憶されている１フレーム分の特徴パラメ
ータを一時記憶するものである。そして、チェビシェフ
距離計算回路５１３はパラメータ一時記憶メモリ５１１
に一時記憶した内容と、パラメータ一時記憶メモリ５１
２に一時記憶した内容とのチェビシェフ距離を計算する
ものである。チェビシェフ距離計算回路５１３によって
計算されたチェビシェフ距離はマッチング処理部５２に
与えられ、マッチング処理部５２は入力パターンメモリ
３に記憶している分析された音声の特徴パラメータと標
準音声の特徴パラメータとのマッチングを行なう。FIG. 2 is a block diagram showing the configuration of the distance calculation unit included in the recognition processing unit 5 shown in FIG. In FIG.
The distance calculation unit 51 calculates a distance between frames in the analyzed voice feature parameter stored in the input pattern memory and the standard voice feature parameter stored in the registered pattern memory 4. The distance calculation unit 51 includes parameter temporary storage memories 511 and 512 and a Chebyshev distance calculation circuit 513. The parameter temporary storage memory 511 temporarily stores one frame of characteristic parameters stored in the input pattern memory, and the parameter temporary storage memory 512 temporarily stores one frame of characteristic parameters stored in the registered pattern memory. It is something to remember. Then, the Chebyshev distance calculation circuit 513 uses the parameter temporary storage memory 511.
And the parameter temporary storage memory 51
The Chebyshev distance with the content temporarily stored in 2 is calculated. The Chebyshev distance calculated by the Chebyshev distance calculation circuit 513 is given to the matching processing unit 52, and the matching processing unit 52 matches the feature parameter of the analyzed voice stored in the input pattern memory 3 with the feature parameter of the standard voice. Do.

次に、第１図および第２図を参照して従来の単語音声認
識装置における動作について説明する。マイクロホン１
１で取込まれた音声信号はマイクロホンアンプ１２によ
って増幅されてＡＧＣ回路１３に与えられる。ＡＧＣ回
路１３は音声信号の入力波形の最高値が一定水準となる
ように調整し、Ａ／Ｄ変換回路１４に与える。Ａ／Ｄ変
換回路１４は所定のサンプリング点ごとに入力波形をデ
ィジタル信号にコード化する。１フレーム分のサンプリ
ングデータは波形メモリ１５に与えられて一時記憶され
る。波形メモリ１５に記憶された波形データはレベル計
算回路２５とピッチ周期抽出回路２１に入力される。レ
ベルル計算回路２５は波形メモリ１５から与えられた波
形データのレベルを計算し、その計算結果を認識処理部
５と始終端検出回路６とに与える。Next, the operation of the conventional word speech recognition apparatus will be described with reference to FIGS. 1 and 2. Microphone 1
The audio signal captured at 1 is amplified by the microphone amplifier 12 and given to the AGC circuit 13. The AGC circuit 13 adjusts the maximum value of the input waveform of the audio signal to a constant level, and supplies it to the A / D conversion circuit 14. The A / D conversion circuit 14 encodes the input waveform into a digital signal at each predetermined sampling point. The sampling data for one frame is given to the waveform memory 15 and temporarily stored. The waveform data stored in the waveform memory 15 is input to the level calculation circuit 25 and the pitch period extraction circuit 21. The level calculation circuit 25 calculates the level of the waveform data supplied from the waveform memory 15, and supplies the calculation result to the recognition processing unit 5 and the start / end detection circuit 6.

始終端検出回路６はレベル計算回路２５からの計算結果
に基づいて、音声信号の入力波形の始端と終端とを検出
して音声信号区間の判定を行ない、その判定結果を認識
処理部５に与える。ピッチ周期抽出回路２１は波形メモ
リ１５から与えられた波データに基づいて、そのピッチ
周期を抽出して認識処理部５とフィルタ係数設定回路２
２とに与える。フィルタ係数設定回路２２はピッチ周期
抽出回路２１が計算したピッチ周期に基づいて、ディジ
タルフィルタ２３がピッチ周波数の整数倍における共振
周波数をもつようにフィルタ係数をディジタルフィルタ
２３に設定する。ディジタルフィルタ２３は、フィルタ
係数設定回路２２によって設定されたフィルタ係数に基
づいて、波形メモリ１５から与えられた波形データにお
ける１フレーム分の周波数スペクトラムを計算する。The start / end detection circuit 6 detects the start and end of the input waveform of the voice signal based on the calculation result from the level calculation circuit 25, determines the voice signal section, and gives the determination result to the recognition processing unit 5. . The pitch cycle extraction circuit 21 extracts the pitch cycle based on the wave data given from the waveform memory 15 to recognize the recognition processing section 5 and the filter coefficient setting circuit 2.
Give to 2 and. The filter coefficient setting circuit 22 sets the filter coefficient in the digital filter 23 based on the pitch cycle calculated by the pitch cycle extraction circuit 21 so that the digital filter 23 has a resonance frequency at an integral multiple of the pitch frequency. The digital filter 23 calculates the frequency spectrum of one frame in the waveform data given from the waveform memory 15 based on the filter coefficient set by the filter coefficient setting circuit 22.

上述の一連の動作により、特徴抽出部２は１単語分の特
徴パラメータとして、ピッチ周波数時系列［ｆ_pi］，ス
ペクトラム時系列パターン［Cim］，ｉ＝１，２…
Ｉ），（ｍ＝１，２…Ｍ），（Ｉ：分析単語のフレーム
数，Ｍ：スペクトラム分析のためのフィルタの個数）を
得る。このようにして得られた１単語分の特徴パラメー
タは登録モードにおいては登録パターンメモリ４に記憶
され、認識モードにおいては入力パターンメモリ３に記
憶した後、認識処理部５がパターンマッチングの手法に
より認識処理を行なう。Through the series of operations described above, the feature extraction unit 2 uses the pitch frequency time series [f _pi ], the spectrum time series pattern [Cim], i = 1, 2, ... As the feature parameters for one word.
I), (m = 1, 2 ... M), (I: number of frames of analysis word, M: number of filters for spectrum analysis). The feature parameter for one word thus obtained is stored in the registered pattern memory 4 in the registration mode, stored in the input pattern memory 3 in the recognition mode, and then recognized by the recognition processing unit 5 by the pattern matching method. Perform processing.

認識処理部５は第２図に示すように、入力パターンと登
録パターンとのフレーム間における距離を計算する距離
計算部５１を有するが、入力パターンメモリ３に記憶さ
れている特徴パラメータを［ｆ_pi］，［ａ_im］とし、登
録パターンメモリ４に記憶されているマッチングを行な
うためのテンプレートの特徴パラメータをピッチ周波数
時系列［ｆ_qj］，スペクトラム時系列パターン［ｂ_jm］
（ｊ＝１，２，…，Ｊ）とすると、入力パターンのフレ
ームｉと登録パターンのフレームｊとの距離ｄ（ｉ，
ｊ）はとなる。この計算を行なうために、マッチング処理部５
２が入力パターンメモリ３と登録パターンメモリ４とに
制御信号を与えると、入力パターンメモリ３からａ_ｉ＝
（ａ_i1，ａ_i2…ａ_im）がパラメータ一時記憶メモリ５１
１に与えられ、登録パターンメモリ４からｂ_ｊ＝
（ｂ_j1，ｂ_j2，…，ｂ_jm）がパラメータ一時記憶メモリ
５１２に与えられた後、チェビシェフ距離計算回路５１
３が前述の第（１）式の計算を行なって、ｄ（ｉ，ｊ）
を求める。距離計算回路５１が計算したｄ（ｉ，ｊ）を
用いてマッチング処理部５２は周知のパターンマッチン
グの手法を用いてマッチング処理を行なう。そして、入
力パターンと登録パターンとのマッチング距離が求めら
れ、最小のマッチング距離を持つ登録パターンが認識結
果として選ばれる。As shown in FIG. 2, the recognition processing unit 5 has a distance calculation unit 51 that calculates the distance between the frames of the input pattern and the registered pattern, but the feature parameter stored in the input pattern memory 3 is [f _pi ], [A _im ], and the characteristic parameters of the template stored in the registered pattern memory 4 for performing matching are pitch frequency time series [f _qj ] and spectrum time series pattern [b _jm ].
(J = 1, 2, ..., J), the distance d (i, i between the frame i of the input pattern and the frame j of the registered pattern
j) is Becomes In order to perform this calculation, the matching processing unit 5
2 gives a control signal to the input pattern memory 3 and the registered pattern memory 4, the input pattern memory 3 outputs a _i =
(A _i1 , a _i2 ... a _im ) is the parameter temporary storage memory 51.
1 from the registered pattern memory 4 b _j =
After (b _j1 , b _j2 , ..., B _jm ) is given to the parameter temporary storage memory 512, the Chebyshev distance calculation circuit 51
3 performs the calculation of the above-mentioned equation (1), and d (i, j)
Ask for. The matching processing unit 52 uses d (i, j) calculated by the distance calculation circuit 51 to perform matching processing using a known pattern matching method. Then, the matching distance between the input pattern and the registered pattern is obtained, and the registered pattern having the smallest matching distance is selected as the recognition result.

上述のごとく、従来の単語音声認識装置における特徴抽
出方法は、ａ_imとｂ_jmとの差を距離計算の主要な値とし
ているが、ａ_imはｆ_pi×ｍの周波数スペクトラムであ
り、ｂ_jmはｆ_qi×ｍの周波数スペクトラムである。とこ
ろが、ｆ_piとｆ_qiは一般に等しくないので、異なる周波
数でスペクトラムを比較することになり、ｆ_piとｆ_qjと
が大きく異なると、特徴パラメータとして不適当であ
り、認識性能が下がるという欠点があった。As described above, in the feature extraction method in the conventional word speech recognition apparatus, the difference between a _im and b _jm is the main value of the distance calculation, but a _im is the frequency spectrum of f _pi × m, and b _jm Is the frequency spectrum of f _qi × m. However, since f _pi and f _qi are not generally equal, spectrums are compared at different frequencies, and if f _pi and f _qj are significantly different, they are unsuitable as feature parameters and the recognition performance is degraded. there were.

［発明の概要］それゆえに、この発明の主たる目的は、特徴抽出に用い
るディジタルフィルタの共振周波数を予め定められた周
波数に最も近いピッチ周波数の整数倍の周波数になるよ
うに設定することにより、周波数スペクトラムを有効に
抽出して、認識処理時間を短縮し得る認識性能の優れた
単語音声認識装置を提供することである。SUMMARY OF THE INVENTION Therefore, a main object of the present invention is to set a resonance frequency of a digital filter used for feature extraction to a frequency that is an integral multiple of a pitch frequency that is closest to a predetermined frequency. It is an object of the present invention to provide a word speech recognition device having excellent recognition performance that can effectively extract a spectrum and shorten the recognition processing time.

この発明の上述の目的およびその他の目的と特徴は以下
に図面を参照して行なう詳細な説明から一層明らかとな
ろう。The above and other objects and features of the present invention will become more apparent from the detailed description given below with reference to the drawings.

［発明の実施例］第３図はこの発明の一実施例の電気的構成を示す概略ブ
ロック図である。この第３図に示す実施例は、フィルタ
係数設定回路２２に関連して定比較周波数メモリ２４を
設けた以外は前述の第１図と同じである。この定比較周
波数メモリ２４は周波数スペクトラム抽出のために予め
定める周波数を記憶していて、これをフィルタ係数設定
回路２２に与えるものである。[Embodiment of the Invention] FIG. 3 is a schematic block diagram showing an electrical configuration of an embodiment of the present invention. The embodiment shown in FIG. 3 is the same as FIG. 1 described above except that a constant comparison frequency memory 24 is provided in association with the filter coefficient setting circuit 22. The constant comparison frequency memory 24 stores a predetermined frequency for frequency spectrum extraction and supplies it to the filter coefficient setting circuit 22.

第４図は第１図に示した従来の単語音声認識装置とこの
発明の一実施例によるスペクトラムパターンの抽出結果
を示す図である。FIG. 4 is a diagram showing the result of spectrum pattern extraction by the conventional word voice recognition apparatus shown in FIG. 1 and an embodiment of the present invention.

次に、第３図および第４図を参照してこの発明の一実施
例の動作について説明する。なお、この第３に示す実施
例は、フィルタ係数の設定以外は前述の第１図および第
２図に示した従来の単語音声認識装置と同じであるた
め、フィルタ係数設定動作以外の詳細な説明は省略す
る。音声が入力され、ピッチ周期抽出回路２１の出力で
あるピッチ周波数ｆ_piがフィルタ係数設定回路２２に与
えられると、フィルタ係数設定回路２２は定比較周波数
メモリ２４から周波数スペクトラム抽出のために予め定
められた周波数（定比較周波数）ｆ_ｃ＝ｆ_c1，ｆ_c2，…
ｆ_cN）、（Ｎはフィルタのチャネル数）を受取る。次
に、を満足する正の整数ｋを計算し、ディジタルフィルタ２
３の共振周波数ｆ_０がｋ・ｆ_piになるようにフィルタ係
数を設定する。この設定は、ｎ＝１からＮまで繰返さ
れ、ディジタルフィルタ２３はフィルタ係数を受取り、
スペクトルパターンａ_ci＝（ａ_ci1，ａ_ci2，…ａ_ciN）
を抽出する。ａ_ｉとａ_ciとの関係を第４図に示す。Next, the operation of the embodiment of the present invention will be described with reference to FIGS. 3 and 4. Since the third embodiment is the same as the conventional word voice recognition apparatus shown in FIGS. 1 and 2 except for the setting of the filter coefficient, detailed description other than the filter coefficient setting operation will be given. Is omitted. When the voice is input and the pitch frequency f _pi which is the output of the pitch period extraction circuit 21 is given to the filter coefficient setting circuit 22, the filter coefficient setting circuit 22 is predetermined from the constant comparison frequency memory 24 for frequency spectrum extraction. Frequency (constant comparison frequency) f _c = f _c1 , f _c2 , ...
f _cN ), (N is the number of filter channels). next, A positive integer k that satisfies
The filter coefficient is set so that the resonance frequency f _{0 of} 3 becomes k · f _pi . This setting is repeated from n = 1 to N, the digital filter 23 receives the filter coefficients,
Spectral pattern a _ci = (a _ci1 , a _ci2 , ... a _ciN )
To extract. The relationship between a _i and a _ci is shown in FIG.

上述の計算をフレームごとに行なうことにより、特徴抽
出部２は１単語分の特徴パラメータとしてａ_ci、ｉ＝
１，２，…，Ｉ、ｆ_piを抽出する。なお、認識時には入
力パターンａ_ci、ｉ＝１，２，…，Ｉのフレームｉと登
録パターンｂ_cj、ｊ＝１，２，…，Ｊのフレームｊとの
距離ｄ（ｉ，ｊ）はで求めることになる。By performing the above calculation for each frame, the feature extraction unit 2 uses a _ci , i = as the feature parameters for one word.
1, 2, ..., I, and f _pi are extracted. At the time of recognition, the distance d (i, j) between the frame i of the input pattern a _ci , i = 1, 2, ..., I and the frame j of the registered pattern b _cj , j = 1, 2 ,. Will be asked for.

［発明の効果］以上のように、この発明によれば、フィルタ係数設定手
段によって予め定められた固定の周波数（固定比較周波
数）に最も近い周波数を設定するようにしたのでで、ピ
ッチ周波数は変化しても比較的近い共振周波数を設定す
ることになり、同一番号のスペクトラムに関してはスペ
クトラムの周波数成分が大きく異なることがない。この
ことは、マッチング処理のフレーム間の距離計算におい
て有利に働き、音声認識性能が向上する。[Effects of the Invention] As described above, according to the present invention, since the frequency closest to the fixed frequency (fixed comparison frequency) predetermined by the filter coefficient setting means is set, the pitch frequency changes. However, the resonance frequencies are set relatively close to each other, and the frequency components of the spectrum do not differ greatly for the spectrum of the same number. This is advantageous in the distance calculation between frames in the matching process, and the voice recognition performance is improved.

[Brief description of drawings]

第１図は従来のピッチ適応型の単語音声認識装置におけ
る電気的構成を示すブロック図である。第２図は第１図
に示した単語音声認識装置における距離計算部の構成を
示すブロック図である。第３図はこの発明の一実施例の
電気的構成を示すブロック図である。第４図は従来の単
語音声認識装置とこの発明の一実施例によるスペクトラ
ムパターンの抽出結果を示す図である。図において、１は音声入力部、２は特徴抽出部、３は入
力パターンメモリ、４は登録パターンメモリ、５は認識
処理部、６は始終端検出回路、１１はマイクロホン、１
２はマイクロホンアンプ、１３はＡＧＣ回路、１４はＡ
／Ｄ変換回路、１５は波形メモリ、２１はピッチ周期抽
出回路、２２はフィルタ係数設定回路、２３はディジタ
ルフィルタ、２４は定比較周波数メモリ、２５はレベル
計算回路、５１は距離計算部、５２はマッチング処理
部、５１１，５１２はパラメータ一時記憶メモリ、５１
３はチェビシェフ距離計算回路を示す。FIG. 1 is a block diagram showing the electrical configuration of a conventional pitch-adaptive word speech recognition device. FIG. 2 is a block diagram showing a configuration of a distance calculation unit in the word voice recognition device shown in FIG. FIG. 3 is a block diagram showing the electrical construction of an embodiment of the present invention. FIG. 4 is a diagram showing a result of spectrum pattern extraction by a conventional word voice recognition device and an embodiment of the present invention. In the figure, 1 is a voice input unit, 2 is a feature extraction unit, 3 is an input pattern memory, 4 is a registered pattern memory, 5 is a recognition processing unit, 6 is a start / end detection circuit, 11 is a microphone, 1
2 is a microphone amplifier, 13 is an AGC circuit, and 14 is A
/ D conversion circuit, 15 waveform memory, 21 pitch period extraction circuit, 22 filter coefficient setting circuit, 23 digital filter, 24 constant comparison frequency memory, 25 level calculation circuit, 51 distance calculation unit, 52 A matching processing unit 511, 512 is a temporary parameter storage memory, 51
3 shows a Chebyshev distance calculation circuit.

Claims

[Claims]

1. A voice signal input means for converting voice into an electric signal for input, a feature extraction means for extracting a feature parameter of a voice signal waveform input from the voice signal input means, and an extraction by the feature extraction means. Input pattern storage means for storing the characteristic parameters of the recognized word voice to be recognized, registered pattern storage means for storing in advance the characteristic parameters of the plurality of word voices extracted by the characteristic extraction means, and the input pattern storage means And a voice recognition processing unit that performs voice recognition processing by calculating the degree of similarity between the feature parameters of the input voice stored in the input pattern storage unit and the feature parameters of the plurality of word voices stored in the registered pattern storage unit. The means is set with a detecting means for detecting the pitch frequency f _p1 of the voice signal input from the voice signal input means. A digital filter whose resonance frequency changes in accordance with the filter coefficient and extracts a plurality (N) of spectrum data of the audio signal as the characteristic parameter; and a plurality of predetermined (N) for extracting a frequency spectrum. ), The fixed comparison frequency storage means for storing the fixed comparison frequencies f _c1 , f _c2 ... F _cN , and the detection of the pitch frequency f _p1 by the detection means, Min | k · f _p1 −f _cn | , (K = 1, 2
...) to calculate the positive integer k _n that satisfies the resonance frequency of the digital filter and a filter coefficient setting means for setting a _{_{k n · f p1 (n =}} 1~N), word recognition device.