JPS6075898A

JPS6075898A - Word voice recognition equipment

Info

Publication number: JPS6075898A
Application number: JP58183842A
Authority: JP
Inventors: 光生下谷; 日比野　昌弘; 憲司嶋
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1983-09-30
Filing date: 1983-09-30
Publication date: 1985-04-30
Also published as: JPH0461359B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】［発明の技術分野］この発明は、単語音声認識装置に関し、特にたとえば音
声によって種々の態器を制御したり、データをエントリ
１″るための単ｉ！Ｉｉ音声認識装置に関する。さらに
特定的には、音声の特徴の１つとして有声音の周波数ス
ペクトルを用いて認識する単語音声認識装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Technical Field of the Invention] The present invention relates to a word voice recognition device, and particularly to a word voice recognition device for controlling various devices or inputting data by voice, for example. The present invention relates to a word speech recognition device, and more particularly to a word speech recognition device that recognizes voiced sounds using the frequency spectrum of voiced sounds as one of the characteristics of speech.

［従来技術］第１図は従来の単語音声認識装置（以下単に認識装置と
称する）の−例を示す概略ブロック図である。図におい
て、音声入力部１は、図示しないが、マイクロホン、ア
ンプ、低域通過フィルタなどを含み、音声を電気治りに
変換して入力するものである。この音声入力部１の出力
は特徴抽出部２に与えられるとともに、始ｎ端検出回路
６に与えられる。特徴抽出部２は、入Ｊフされた音声信
号を分析し、音声の特徴パラメータを抽出する。特徴抽
出部２で抽出された音声の特徴パラメータは、認識処理
部５に与えられる。始終端検出回路６は、単語音声の始
端と終端とを検出する回路である。[Prior Art] FIG. 1 is a schematic block diagram showing an example of a conventional word speech recognition device (hereinafter simply referred to as a recognition device). In the figure, an audio input section 1 includes a microphone, an amplifier, a low-pass filter, etc. (though not shown), and converts audio into electrical signals for input. The output of the audio input section 1 is given to the feature extraction section 2 and also to the start n-end detection circuit 6. The feature extraction unit 2 analyzes the input audio signal and extracts audio feature parameters. The voice feature parameters extracted by the feature extraction section 2 are given to the recognition processing section 5. The start and end detection circuit 6 is a circuit that detects the start and end of word speech.

始終端検出回路６の検出結果は認識処理部５に与えられ
る。この認識処理部５は、マイクロプロセッサやマイク
ロコンピュータなどによって構成され、音声の認識処理
を行なうものである。この認識処理部５には、入カバタ
ーンメモリ３および登録パターンメモリ４が接続される
。The detection result of the start/end detection circuit 6 is given to the recognition processing section 5. The recognition processing section 5 is composed of a microprocessor, a microcomputer, etc., and performs speech recognition processing. An input pattern memory 3 and a registered pattern memory 4 are connected to the recognition processing section 5.

上述のような認ｌＩｌ装置においては、音声波形を一定
時間のフレームに分割し、そのフレームごとの周波数ス
ペクトラムを特徴パラメータとして抽出する。そして、
認識処理部５は、登録モードにおいては、抽出された登
録語の特徴パラメータあるいは標準音声の特徴パラメー
タを登録パターンメモリ４に書込む。すなわち、登録パ
ターンメモリ４には、予め複数重ｇＢの音声の特徴パラ
メータが記憶される。また、認識処理部５は、音響の認
識モードにおいては、抽出された単語音声の特徴パラメ
ータを入カバターンメモリ３に書込む。そして、この入
カバターンメモリ３に記憶された特徴パラメータと登録
パターンメモリ４に記憶された複数単語の特徴パラメー
タとの類似度を順次計棹し、その計棹枯果に基づいて単
語音声の認識を行なう。In the above-mentioned recognition device, the audio waveform is divided into frames of a fixed time, and the frequency spectrum of each frame is extracted as a characteristic parameter. and,
In the registration mode, the recognition processing section 5 writes the extracted feature parameters of the registered word or feature parameters of the standard speech into the registered pattern memory 4. That is, the registered pattern memory 4 stores in advance feature parameters of multiple gB voices. Furthermore, in the acoustic recognition mode, the recognition processing unit 5 writes the feature parameters of the extracted word sounds into the input pattern memory 3. Then, the degree of similarity between the feature parameters stored in the input pattern memory 3 and the feature parameters of multiple words stored in the registered pattern memory 4 is calculated in sequence, and word sounds are recognized based on the calculated results. Do the following.

第２図は第１図に示す特徴抽出部２の詳細を示す回路図
である。図において、音声入力部１からの音声信号は、
帯域通過フィルタ２０１−１，２０１°−２・・・２０
１−Ｎに与えられる。これら帯域通過フィルタは、音声
信号波形の特定の周波数成分を通過させるものである。FIG. 2 is a circuit diagram showing details of the feature extraction section 2 shown in FIG. 1. In the figure, the audio signal from the audio input section 1 is
Bandpass filter 201-1, 201°-2...20
1-N. These bandpass filters pass specific frequency components of the audio signal waveform.

各帯域通過フィルタ２０１−１〜２０１−Ｎの出力は、
それぞれ平滑回路２０２−１〜２０２−Ｎに与えられる
。各平滑回路２０２−１〜２０２−Ｎの出力はアナログ
マルチプレクサ２０３に与えられる。このアナログマル
チプレクサ２０３は、各平滑回路２０２−１〜２０２−
Ｎの出力を時分割で通過させる回路である。アナログマ
ルチプレクサ２０３の出力はＡ　、／　Ｄ変換回路２０
３に与えられ、ディジタルデータに変換されて出力され
る。゛第３図は第２図に示す帯域通過フィルタ２０１−１〜２
０１−Ｎの周波数特性を示す図である。The output of each bandpass filter 201-1 to 201-N is
The signals are respectively applied to smoothing circuits 202-1 to 202-N. The output of each smoothing circuit 202-1 to 202-N is given to an analog multiplexer 203. This analog multiplexer 203 includes each smoothing circuit 202-1 to 202-
This is a circuit that passes the output of N in a time-division manner. The output of the analog multiplexer 203 is A/D conversion circuit 20
3 and is converted into digital data and output.゛Figure 3 shows the bandpass filters 201-1 and 201-2 shown in Figure 2.
It is a figure which shows the frequency characteristic of 01-N.

この第３図に示すように、Ｎ個のフィルタによって、音
声波形のすべての周波数成分をほぼ均等に抽出するよう
に設定されている。この場合、音声の特徴はＮ１１ｌの
フィルタによって抽出された周波数成分のＮ１１ｌの値
の大小パターンによって表現される。上記Ｎは通常８〜
１６で、音声波形に雑音。As shown in FIG. 3, the N filters are set to extract almost equally all frequency components of the audio waveform. In this case, the characteristics of the voice are expressed by the magnitude pattern of the N11l values of the frequency components extracted by the N11l filter. The above N is usually 8~
At 16, there is noise in the audio waveform.

が混入していない場合は比較的良好な音声の特徴パラメ
ータを得ることができる。したがって、認識性能も十分
満足できるものであった。しかしながら、音声に工場騒
音や他者の話し声などの騒音が混入している場合には、
音声と同時に騒音の周波数成分も帯域通過フィルタを通
過し、特徴パラメータの値に影響を与えることになる。If the voice is not mixed in, relatively good voice characteristic parameters can be obtained. Therefore, the recognition performance was also sufficiently satisfactory. However, if the audio is mixed with noise such as factory noise or other people's voices,
At the same time as the voice, the frequency components of the noise also pass through the bandpass filter, which affects the values of the characteristic parameters.

特徴パラメータの抽出精度をスペクトラムの歪で評価す
るならば、従来の認識装置では、入力波形の騒音による
スペクトラム歪がそのまま特徴パラメータにも現われる
ことになる。したがっ°Ｃ１従来の認識装置は騒音の高
い環境で使用した場合、認識性能が著しく劣化するとい
う欠点があった。If the extraction accuracy of feature parameters is evaluated based on spectral distortion, in conventional recognition devices, spectral distortion due to noise in the input waveform will appear as it is in the feature parameters. Therefore, when the conventional recognition device is used in a noisy environment, the recognition performance deteriorates significantly.

［発明の慨要］この発明は上述のような従来の認識装置の欠点を除去す
るためになされたものであり、特徴抽出部を音声のビッ
ヂに適応したディジタルフィルタを用いて構成すること
により、騒音環境下でも浸れた認識性能を持つ音声認識
装置を提供することを目的としている。[Summary of the Invention] This invention was made to eliminate the drawbacks of the conventional recognition device as described above, and by configuring the feature extraction section using a digital filter adapted to the bits of the voice, The purpose of this invention is to provide a speech recognition device that has excellent recognition performance even in a noisy environment.

以下、図面に示す実施例とともにこの発明をより具体的
に説明づる。Hereinafter, this invention will be described in more detail with reference to embodiments shown in the drawings.

［発明の実施例］第４図はこの発明の一実施例を示す概略ブロック図であ
る。図において、音声入力部１０は、マイクロホン１１
と、マイクロホンアンプ１２と、ＡＧＣ回路１３と、Ａ
　、／　Ｄ変換回路１４と、波形メモリ１５とを含む。[Embodiment of the Invention] FIG. 4 is a schematic block diagram showing an embodiment of the invention. In the figure, the audio input section 10 includes a microphone 11
, microphone amplifier 12, AGC circuit 13, and A
, /D conversion circuit 14 and a waveform memory 15.

音声入力部１０の出力は、レベル計算回路７に与えられ
るとともに、特徴抽出部２０に与えられる。レベル計算
回路７の出力は始終端検出回路６に与えられるとともに
、認識処理部５０に与えられる。始終端検出回路６の出
力は認識処理部５０に与えられる。一方、特徴抽出部２
０は、ピッチ周期抽出回路２１と、フィルタ係数設定回
路２２と、ディジタルフィルタ２３とを含む。ピッチ周
期抽出回路２１およびテ゛イジタルフィルタ２３には、
前述の音声入力部１０の出力が与えられる。ピッチ周期
抽出回路２１の出力は、認識処理部５０に与えられると
ともに、フィルタ係数設定回路２２に与えられる。フィ
ルタ係数設定回路２２の出力はディジタルフィルタ２３
に与えられる。ディジタルフィルタ２３の出力は認識処
理部５０に与えられる。認識処理部５０には、第１図の
回路と同様の入カバターンメモリ３および登録パターン
メモリ４が接続される。The output of the audio input section 10 is given to the level calculation circuit 7 and also to the feature extraction section 20. The output of the level calculation circuit 7 is given to the start/end detection circuit 6 as well as to the recognition processing section 50. The output of the start/end detection circuit 6 is given to the recognition processing section 50. On the other hand, feature extraction section 2
0 includes a pitch period extraction circuit 21, a filter coefficient setting circuit 22, and a digital filter 23. The pitch period extraction circuit 21 and the digital filter 23 include:
The output of the aforementioned audio input section 10 is given. The output of the pitch period extraction circuit 21 is given to the recognition processing section 50 and also to the filter coefficient setting circuit 22. The output of the filter coefficient setting circuit 22 is a digital filter 23
given to. The output of the digital filter 23 is given to the recognition processing section 50. An input pattern memory 3 and a registered pattern memory 4 similar to the circuit shown in FIG. 1 are connected to the recognition processing section 50.

次に、第４図の実施例の動作を説明する。マイクロホン
１１で取込ｌυだ音声の入力波形はマイクロホンアンプ
１２で増幅され、波形の最高値が一定水準になるように
ＡＧＣ回路１３で調整されて、Ａ　、／　Ｄ変換回路１
４でサンプリング点ごとにディジタル数値に変換される
。１フレ一ム分のサンプリングデータは波形メモリ１５
に一時記憶される。Next, the operation of the embodiment shown in FIG. 4 will be explained. The input waveform of the voice captured by the microphone 11 is amplified by the microphone amplifier 12, adjusted by the AGC circuit 13 so that the highest value of the waveform is at a certain level, and then sent to the A/D conversion circuit 1.
4, each sampling point is converted into a digital value. The sampling data for one frame is stored in the waveform memory 15.
is temporarily stored.

レベル計算回路７Ｉ３よび特徴抽出部２０は波形メモリ
１５のデータＸ　（＋　＞、（ｉ　＝１．２．・・・■
ｆ）を用いて以下に示すような処理を行なう。The level calculation circuit 7I3 and the feature extraction unit 20 calculate the data X (+ >, (i = 1.2...■) in the waveform memory 15.
f) to perform the following processing.

まず、レベル計算回路７は、次式（１）に示すように、
サンプリングデータの自乗用Ｐを計算し、そのフレーム
の電力（パワー）に対応づる数値をめる。First, the level calculation circuit 7 calculates, as shown in the following equation (1),
Calculate the square P of the sampling data and find a value corresponding to the power of that frame.

ＦＰ＝Σ　ｘ（１）”　・・・（１）ｉ・１この数値Ｐは認識処理部５０に与えられて、入力された
波形信号が有声音であるか否かの判定に用いられる。F P=Σ x (1)'' (1) i·1 This numerical value P is given to the recognition processing unit 50 and used to determine whether the input waveform signal is a voiced sound.

次に、ピッチ周期抽出回路２１は、次式（２）で示すよ
うに、波形メモリ１５の波形データの自己相関関数値Ｃ
ＯＲ（τ）を計算し、ピッチ周期をピッチ周期探索範囲
内のうち最大の自己関数値を与えるτとしてめる。Next, the pitch period extraction circuit 21 calculates the autocorrelation function value C of the waveform data in the waveform memory 15, as shown in the following equation (2).
OR(τ) is calculated, and the pitch period is set as τ that gives the maximum self-function value within the pitch period search range.

・・・（２）フィルタ係数設定回路２２は、ピッチ周波数（ピッチ周
期の逆数）の整数倍がディジタルフィルタ２３の共振周
波数になるようなフィルタ係数を発生し、その発生した
フィルタ係数をディジタルフィルタ２３に設定する。な
お、このフィルタ係数設定回路２２は、フィルタ係数テ
ーブルをＲＯＭなどで構成し、ピッチ周波数およびその
整数倍に対応してＲＯＭの内容を検索する手段によって
実現される。(2) The filter coefficient setting circuit 22 generates a filter coefficient such that an integral multiple of the pitch frequency (reciprocal of the pitch period) becomes the resonance frequency of the digital filter 23, and sets the generated filter coefficient to the digital filter 23. Set to . Note that this filter coefficient setting circuit 22 is realized by means of configuring a filter coefficient table in a ROM or the like and searching the contents of the ROM in correspondence with the pitch frequency and its integral multiple.

第５図は第４図に示すディジタルフィルタ２３の一構成
例を示すブロック図である。図において、第４図の波形
メモリ１５の出力×　（１）は１次差弁回路２３１に与
えられる。この１次差分回路２３１は、たとえば減算！
ｌなどによって構成され、高域周波数を強トドするため
のらのである。１次差分回路２３１の出力は、２段格子
形フィルタ２３２にりえらｈる。この２段格子形フィル
タ２３２は、３個の加減算器２３２１〜２３２３と、３
個の乗粋器２３２４〜２３２６と、２１１Ｉの遅延回路
２３２７〜２３２８とを含んで偶成される。２段格子形
フィルタ２３２の出力は自乗回路２３３に与えられる。FIG. 5 is a block diagram showing an example of the configuration of the digital filter 23 shown in FIG. 4. In the figure, the output x (1) of the waveform memory 15 in FIG. This first-order difference circuit 231 performs, for example, subtraction!
It is composed of 1, etc., and is used to strongly suppress high frequencies. The output of the first-order difference circuit 231 is sent to a two-stage lattice filter 232. This two-stage lattice filter 232 includes three adders and subtracters 2321 to 2323, and three
It is constructed by including multipliers 2324 to 2326 and delay circuits 2327 to 2328 of 211I. The output of the two-stage lattice filter 232 is given to a square circuit 233.

この自乗回路２３３の出力は積算回路２３４に与えられ
る。この積算回路２３４の出力５（ｎ）がフィルタ出力
として認識処理部５０に与えられる。The output of this square circuit 233 is given to an integration circuit 234. The output 5(n) of this integration circuit 234 is given to the recognition processing section 50 as a filter output.

次に、上述のディジタルフィルタ２３の動作についてに
２明１−る。第４図の波形メモリ１５に記憶されたザン
ブリングデータｘ（ｉ）はディジタルフィルタの１次に
弁回路２３′Ｉに入力され、ここで次式（３）のｔ１樟
が行なわれる。Next, the operation of the digital filter 23 described above will be explained. The summing data x(i) stored in the waveform memory 15 of FIG. 4 is input to the primary valve circuit 23'I of the digital filter, where t1 of the following equation (3) is performed.

Δ＞＋　（ｉ　）＝ｘ　（ｉ　）−ｘ　（ｉ−１）・・
・〈３）１次差分回路２３１の出力Δχ　（１）は２段
格子形フィルタ２３２に与えられる。この２段格子形フ
ィルタ２３２では、次式（４）〜（７）の逐次計算が実
行される。Δ>+ (i)=x (i)-x (i-1)...
-<3) The output Δχ (1) of the first-order difference circuit 231 is given to the two-stage lattice filter 232. This two-stage lattice filter 232 sequentially calculates the following equations (4) to (7).

Ｖｚ＜ｉ）−Δ×　（ｉ）＋）＜２（ｎ　）　−１１２（１−１）・・・（４）Ｖ＋　（＋　＞＝ｙ：　（ｉ　）＋ＫＩ（ｎ＞　・　ｂ　、（ｉ　１）・・・（５）１１２　（＋　＞＝＋１．　（ｉ　−１）ｉ＜、（ｎ＞
　・　ｙ、（ｉ）　・・・　（６）ｌｚ　＜１　＞−Ｖ
、　（＋　＞　・・・　（７）２段高次形フィルタ２３
２の出力ｙ＋（ｉ）は自乗回路２３３および積算回路２
３４にて次式（８）の演算が行なわれる。Vz<i)-Δ×(i)+)<2(n)-112(1-1)...(4) V+(+>=y:(i)+KI(n>・b,(i 1) ) ...(5) 112 (+ >=+1. (i -1)i<, (n>
・y, (i) ... (6) lz <1>-V
, (+ > ... (7) Two-stage high-order filter 23
The output y+(i) of 2 is the square circuit 233 and the integration circuit 2.
At 34, the following equation (8) is calculated.

ｆｓ（ｎ）−Σ　ｙ　＋　（ｉ　）　’ｙ　＋　（＋　＞
−’　（８）−Å 以上のごとく１℃、フィルタ出力５（ｎ）が導出される
。ただし、初期１自ｂ２　（１）、ｂ、（１）は０であ
る。また、１１はフィルタ係数設定回路２２の設定した
ｎｌｌ目の計数１直を意味し、これはピッチ周波数のｎ
次調波に対応した計数（直でもある。f s(n)−Σ y + (i) 'y + (+ >
-' (8) -Å As described above, the filter output 5(n) at 1° C. is derived. However, the initial values of 1 and b2 (1), b, and (1) are 0. Further, 11 means the nllth count 1 shift set by the filter coefficient setting circuit 22, which is the nllth count of the pitch frequency.
Counts corresponding to harmonics (also direct).

２段格子形フィルタ２３２は、共振周波数ｆ。。The two-stage lattice filter 232 has a resonance frequency f. .

バンド幅Ｂ。の共振特性を有し、フィルタ係数Ｋｌ　（
１１）　、　Ｋ２　（ｎ　＞と共振特性の間に次式（９
）、（１０）の関係がある。Band width B. It has a resonance characteristic of filter coefficient Kl (
11), the following equation (9
), (10).

Ｋ＋（１１）←ｃｏｓ　２π＜ｒａ／ｆｓ）　・・・（
９）Ｋ２　（ｎ　）Ｊ＝Ｆ−ｅｘｐ　（−２π８０／ｆ
ｓ）・・・（１０）但し、ｆ、はサンプリング周波数である。K+(11)←cos 2π<ra/fs) ・・・(
9) K2 (n) J=F-exp (-2π80/f
s)...(10) However, f is the sampling frequency.

Ｋ２（ｎ）→−１ｔなわちバンド幅Ｂ０が極めて小さい
場合はＱの高い鋭峰性の第６図に示すような特性を有す
る。ディジタルフィルタ２３の演算処理は、所定の次数
Ｎまでのフィルタ係数値に対して同一の波形データ×　
（ｉ）についてＮ回実行され、Ｎ個の出ノＪｓ　（ｎ　
）（ｎ　＝１．２．　・−Ｎ）を得る。既に述べたよう
に、フィルタ係数に１（ｎ　）、　Ｋ２　（１１＞はフ
ィルタ係数設定回路２２によってピッチ周波数の周波成
分がフィルタの共振周波数に一致するように設定される
ので、フィルタ出力５（ｎ）は波形データ×　（ｉ）に
含まれるピッチ周波数調波成分のみを抽出した値に対応
することになる。このフィルタ出力５（ｎ）は第４図に
示す０識処理部５０に与えられて、認識処理のための主
要データとして用いられる。When K2(n)→-1t, that is, the band width B0 is extremely small, the characteristic shown in FIG. 6 with a high Q value is obtained. The arithmetic processing of the digital filter 23 is performed using the same waveform data x
(i) is executed N times, and N outputs Js (n
) (n = 1.2. -N) is obtained. As already mentioned, the filter coefficients 1(n) and K2(11> are set by the filter coefficient setting circuit 22 so that the frequency component of the pitch frequency matches the resonance frequency of the filter, so the filter output 5(n) ) corresponds to a value obtained by extracting only the pitch frequency harmonic component included in the waveform data x (i).This filter output 5(n) is given to the zero-identity processing unit 50 shown in FIG. , used as the main data for recognition processing.

１５１処理部５０はディジタルフィルタ２３から与えら
れる特徴パラメータ５（ｎ）の振幅５時間軸の正規化を
行なった後、登録モードにおいては正規化したスペクト
ル時系列パターンを登録パターンメモリ４に書込み、ｍ
ｌモードでは入カバターンメモリ３に書込む。さらに認
識モードでは、認識処理部５は登録パターンメモリ４の
内容と入カバターンメモリ３の内容との類似度をパター
ンマツチングにより計算し認識結果を得る。なお、音声
信号の始終端検出については、始終端検出回路６がレベ
ル計算回路７の計算するパワーに基づいて行なう。これ
らの動作は、第１図における回路の動作とほぼ同様であ
る。151 The processing unit 50 normalizes the amplitude 5 time axis of the feature parameter 5(n) given from the digital filter 23, and then writes the normalized spectral time series pattern to the registered pattern memory 4 in the registration mode, and
In the l mode, the data is written to the input pattern memory 3. Further, in the recognition mode, the recognition processing section 5 calculates the degree of similarity between the contents of the registered pattern memory 4 and the contents of the input pattern memory 3 by pattern matching to obtain a recognition result. Note that the start/end detection circuit 6 detects the start/end of the audio signal based on the power calculated by the level calculation circuit 7. These operations are substantially similar to those of the circuit in FIG.

次に、第４図に示す実施例の特徴となる機能を説明する
。この実施例の特徴の１つは、ディジタル処理によりフ
レームごとに波形のレベルを計算し、音声波形の有意な
部分すなわち母音のフレームを検出することである。他
の特徴は、係る母音フレームについてピッチ周期を自己
相関法などの手段によりめることである。さらに他の特
徴は、共振形ディジタルフィルタを設けることによって
ピッチ周波数の調波成分のみを抽出することである。音
声波形の一般的な特徴として、母音など有声音はパワー
が大きく、高い騒音の混入による音声情報のマスクされ
る割合が小さい。また、自己相関法なとのピッチ抽出法
を用いれば白色性雑音の混入があってもピッチ周期の抽
出は精度良く行なえる。また、母音などの有声音はピッ
チ周波数の整数（８のところにのみ成分を有するＩｌ１
敗スペクトル柄造含有し、このスペクトラムのパターン
が母音の識別に有効な情報となっている。したがって、
第４図の認識装置の特徴抽出部２０が抽出するピッチ周
波数の整数ｆ８の周波数に共振するディジタルフィルタ
２３の出力は、母音の特徴をそのまま表現した特徴パラ
メータとなっている。しかも、高い騒音の混入があって
も、その騒音のほとんどの周波数成分は、ディジタルフ
ィルタ２３によって遮断され出力されることがない。し
たがって、入力音声波形が騒音によって大きなスペクト
ル歪を有したものであっても、特徴パラメータは歪を受
けることが小さく、認識にとって有効な特徴パラメータ
とすることができる。したがって。Next, the features of the embodiment shown in FIG. 4 will be explained. One of the features of this embodiment is that the level of the waveform is calculated for each frame by digital processing, and significant parts of the speech waveform, that is, frames of vowels, are detected. Another feature is that the pitch period for such a vowel frame is determined by means such as an autocorrelation method. Yet another feature is that only the harmonic components of the pitch frequency are extracted by providing a resonant digital filter. As a general feature of speech waveforms, voiced sounds such as vowels have high power, and the proportion of speech information that is masked by high noise contamination is small. Furthermore, if a pitch extraction method such as an autocorrelation method is used, the pitch period can be extracted with high accuracy even if white noise is mixed in. In addition, voiced sounds such as vowels have an integer pitch frequency (Il1, which has a component only at 8).
This spectrum pattern is effective information for vowel identification. therefore,
The output of the digital filter 23 that resonates at the frequency of the integer f8 of the pitch frequency extracted by the feature extraction unit 20 of the recognition device shown in FIG. 4 is a feature parameter that directly expresses the features of the vowel. Furthermore, even if high noise is mixed in, most frequency components of the noise are blocked by the digital filter 23 and are not output. Therefore, even if the input speech waveform has large spectral distortion due to noise, the feature parameters are less likely to be distorted and can be effective feature parameters for recognition. therefore.

第４図の認識装置では、騒音による認識性能の劣化を極
めて小さくでき、認嘆性能の向上を図ることができる。In the recognition device shown in FIG. 4, deterioration in recognition performance due to noise can be minimized, and recognition performance can be improved.

第７図は第４図に示すディジタルフィルタ２３の、Ｊ　
７　、他の構成例を示すブロック図である。なお、この
第７図では以下の点を除いて第５図に示すディジタルフ
ィルタと同様の構成であり、相当する部分には同一の参
照番号を付しその説明を省略する。。この実施例は、１
次差分回路２３１の前段に乗算器２３５を設け、音声波
形の拡大縮小ができることが特徴である。乗算器２３５
を挿入することにより、前述の第（３）式は次式（１１
）で示すようになる。FIG. 7 shows J of the digital filter 23 shown in FIG.
7 is a block diagram showing another configuration example. Note that this FIG. 7 has the same configuration as the digital filter shown in FIG. 5 except for the following points, and corresponding parts are given the same reference numerals and their explanations will be omitted. . In this example, 1
A feature is that a multiplier 235 is provided before the next difference circuit 231, and the audio waveform can be enlarged or reduced. Multiplier 235
By inserting , the above equation (3) becomes the following equation (11
).

Δ×　（１）−α（ｙ、（ｉ　）−ｘ　（ｉ−１））・
・・　（１１）ここで、αは任意に値を設定できる波形乗算係数である
。音声波形のパワーが大きすぎると、前述の第（８）式
で示すｆＡｔ’ｉ結果５（ＴＩ＞がオーバフローする可
能性がある。そこで、波形のパワーが大きければαを小
さくし、パワーが小さければαを大きくすれば、フィル
タ演梓のタイナミックレンジが向上する。パワー値はレ
ベル計算回路７が計詐したものを用いることができる。Δ× (1)-α(y, (i)-x (i-1))・
(11) Here, α is a waveform multiplication coefficient whose value can be set arbitrarily. If the power of the audio waveform is too large, the fAt'i result 5 (TI> shown in equation (8) above) may overflow. Therefore, if the power of the waveform is large, α should be small; For example, if α is increased, the dynamic range of filter operation is improved.The power value calculated by the level calculation circuit 7 can be used.

α−α、とαが一定のときフィルタ演梓が発散しない保
証のあるパワーをＰ、とすると、パワーがＰ、以下のと
きは α＝α、・・・（１２）パワーがＰＨを越えるときは α＝Ｕ＋　Ｃｒ１７ｖ　・・・（１３）とすればスペク
トル演算の発散を防ぐことができる。なお、上述の第（
１３）式においてＰはパワーを表わす。このαとパワー
の関係を第８図に示す。認識処理部５ｏはレベル計算回
路７がらパワーが与えられると、αの計算を行ない、乗
算器２３５にこのαを設定する。Let P be the power that guarantees that the filter expansion will not diverge when α − α and α are constant, then when the power is P and below, α = α, (12) When the power exceeds PH By setting α=U+Cr17v (13), divergence in spectrum calculation can be prevented. In addition, the above-mentioned No. (
In formula 13), P represents power. The relationship between α and power is shown in FIG. When the recognition processing unit 5o receives power from the level calculation circuit 7, it calculates α and sets this α in the multiplier 235.

第９図は第４図に示すディジタルフィルタ２３のさらに
他の構成例を示すブロック図である。なお、第５図に示
すディジタルフィルタと同様の部分には同一の参照番号
を付し、その説明を省略する。この実旋例は、積算回路
２３４の後段に乗算器２３６を挿入してフィルタの周波
数特性の調整ができることが特徴である。２段格子形フ
ィルタ２３２のゲインＧＮ　（ｎ　”）は、次式〈１４
）で表わされる。FIG. 9 is a block diagram showing still another example of the configuration of the digital filter 23 shown in FIG. 4. Note that the same parts as in the digital filter shown in FIG. 5 are given the same reference numerals, and the explanation thereof will be omitted. This practical example is characterized in that a multiplier 236 is inserted after the integration circuit 234 to adjust the frequency characteristics of the filter. The gain GN (n '') of the two-stage lattice filter 232 is expressed by the following equation <14
).

なお、上述の第（１４）式においてＢ７．Ｂ２は次式（
１５）、（１６）で表わされるものである。Note that in the above equation (14), B7. B2 is the following formula (
15) and (16).

Ｂ＋　＝に＋　（ｎ　）　＝に＋　（ＩＩ　＞　・Ｋ２
　（ｎ　）・・・（１５）８２　＝に２　（ｎ　Ｌ　−（１６）フィルタ係数設定回路２２は、Ｋ＋（ｎ＞、に２（ｎ）
だけでなく、次式（−１７＞で示されるゲイン補正係数
をｆ！算器２３６に設定する。B+ = to + (n) = to + (II > ・K2
(n)...(15) 82 = 2 (n L - (16) The filter coefficient setting circuit 22 calculates K+(n>, 2(n)
In addition, a gain correction coefficient expressed by the following equation (-17>) is set in the f! calculator 236.

Ｇ（ｎ）＝１／（ＧＮ２　（ｎ））　−（１７）このよ
うに、乗算器２３６を設け、フィルタｙ４粋により得た
スペクトルｆｌｓ（ｎ）にＧ（Ｎ）を掛けることにより
、ゲイン一定のフィルタ演算結果を得ることができる。G(n) = 1/(GN2 (n)) - (17) In this way, by providing the multiplier 236 and multiplying the spectrum fls(n) obtained by the filter y4 by G(N), the gain is constant. It is possible to obtain the result of filter calculation.

そして、これにより認識性能の向上を図ることができる
。As a result, recognition performance can be improved.

なお、以上説ｐ１°、た実施例では、説明の都合上認識
装置を特定話者登録形として説明したが、単Ｉｎ音声の
特徴パラメータを予めＲＯＭに記憶している不特定話者
の音声認識装置でも実現可能なことは言うまでもない。In the above embodiments, the recognition device was described as a specific speaker registration type for convenience of explanation. It goes without saying that this can also be achieved with equipment.

また、以上の実施例では、ディジタルフィルタを２段格
子形フィルタを中心に説明したが、フレームごとに特性
を変化することのできるＱの高い横形のディジタルフィ
ルタであれば、２段格子形フィルタでなくてもよい。Furthermore, in the above embodiments, the digital filter was mainly explained as a two-stage lattice filter, but if it is a horizontal digital filter with a high Q that can change the characteristics for each frame, a two-stage lattice filter can be used. You don't have to.

また、上述の実施例では、１個のディジタルフィルタに
時分割的にフィルタ係数を設定して用いるようにしたが
、複数個のディジタルフィルタを並列的に設け、各ディ
ジタルフィルタに同時に異なるフィルタ係数を設定し、
それによって各ディジタルフィルタの共振周波数がピッ
チ周波数の整数倍となるようにしてもよい。Furthermore, in the above embodiment, the filter coefficients are set and used in one digital filter in a time-sharing manner, but it is also possible to provide multiple digital filters in parallel and set different filter coefficients to each digital filter at the same time. Set,
Thereby, the resonance frequency of each digital filter may be an integral multiple of the pitch frequency.

さらに、第７図に示す実施例においては、波形乗算係数
αを音声信号のパワーによりめたが、パワー以外にその
他音声信号の波形の大きさに対応する量（たとえばレベ
ル）を計算し、この計算した量によりαをめるようにし
てもよい。Furthermore, in the embodiment shown in FIG. 7, the waveform multiplication coefficient α is determined by the power of the audio signal, but in addition to the power, other quantities (for example, level) corresponding to the waveform size of the audio signal are calculated, and α may be set by the calculated amount.

さらに、第９図に示す実施例においては、フィルタ出力
のゲインが完全に一定になるようにゲイン補正係数を定
めたが、フィルタ出力のゲインは完全に一定にならなく
ともよく、多少のばらつきがあってもよい。Furthermore, in the embodiment shown in FIG. 9, the gain correction coefficient is determined so that the gain of the filter output is completely constant; however, the gain of the filter output does not have to be completely constant, and some variation may occur. There may be.

［発明の効果］以上のように、この発明によれば、音声信号のピッチ周
波数の調波成分のみを特徴パラメータとして抽出するよ
うにしているので、入力音声信号は騒音によって大きな
スペクトル歪を有したものであっても特徴パラメータは
歪を受けることが小さり、シたがって騒音による認識性
能の劣化の極めて小さい優れた認識装置を得ることがで
きる。[Effects of the Invention] As described above, according to the present invention, only the harmonic components of the pitch frequency of the audio signal are extracted as feature parameters, so that the input audio signal has large spectral distortion due to noise. Even if the noise is different, the feature parameters are less likely to be distorted, and therefore an excellent recognition device with very little deterioration of recognition performance due to noise can be obtained.

[Brief explanation of drawings]

第１図は従来の認識装置の一例を示す概略ブロック図で
ある。第２図は第１図に示す特徴抽出部２の詳細を示す
ブロック図である。第３図は第２図に示す？ｔｌ域通過
フィルタ２０１−１〜２０１−Ｎの周波数特性を示づ図
である。第４図はこの発明の一実施例を示す概略ブロッ
ク図である。第５図は第４図に示すディジタルフィルタ
２３の一栴成例を示すブロック図である。、ｌ第６図は
第５図に示す２段格子形フィルタ２３２の周波数特性を
示す図である。第７図は第４図に示すディジタルフィル
タ２３の他の構成例を示すブロック図である。第８図は音声信号のパワーと第７図に示す乗算器２３５
に設定される波形乗算係数αとの関係を示す図である。第９図は第４図に示すディジタルフィルタ２３のさらに
他の構成例を示すブロック図である。図において、３は入カバターンメモリ、４は登録パター
ンメモリ、７はレベル計算回路、１０は音声入力部、２
０は特徴抽出部、２１はピッチ周期抽出回路、２２はフ
ィルタ係数設定回路、２３はディジタルフィルタ、５０
は認識処理部、２３２は２段格子形フィルタを示す。代理人　大　岩　増　雌第１図第３図 ■ 第６図手続補正書（自発ン１．事件の表示　特願昭５８−１８３８４２号２、発明
の名称単語音声認識装置３、補正をする者代表者片山仁へ部４、代理人５、補正の対象明ｌＢ書の発明の詳細な説明の欄６、補正の内容（１）　明細書第１２頁第１３行の「高次形」を「格子
形」に補正する。（２）　明細書第１６頁第１５行の「する。。」を「す
る。」に補正する。（３）　明細書第１８頁第１６行のｒＢ＋−に＋　（ｎ
　）　−に＋　（ｎ　）　−に２　（ｎ　）　Ｊ　を　
［Ｂ　、−に＋　（ｎ　）−Ｋｌ　（ｎ　）　−に２　
（ｎ　）Ｊに補正する。以上手続補正書（自発）特許庁長官殿１、事件の表示　特願昭５８−１８３８４２号２、発明
の名称単語音声認識装置３、補正をする者５、補正の対象図面の第６図６、補正の内容図面の第６図を別紙のとおり補正する。以上FIG. 1 is a schematic block diagram showing an example of a conventional recognition device. FIG. 2 is a block diagram showing details of the feature extraction section 2 shown in FIG. 1. Figure 3 is shown in Figure 2? It is a figure which shows the frequency characteristic of tl pass filters 201-1-201-N. FIG. 4 is a schematic block diagram showing one embodiment of the present invention. FIG. 5 is a block diagram showing an example of the digital filter 23 shown in FIG. 4. , l FIG. 6 is a diagram showing the frequency characteristics of the two-stage lattice filter 232 shown in FIG. FIG. 7 is a block diagram showing another example of the configuration of the digital filter 23 shown in FIG. 4. FIG. 8 shows the power of the audio signal and the multiplier 235 shown in FIG.
FIG. 3 is a diagram showing the relationship between the waveform multiplication coefficient α and the waveform multiplication coefficient α set to . FIG. 9 is a block diagram showing still another example of the configuration of the digital filter 23 shown in FIG. 4. In the figure, 3 is an input pattern memory, 4 is a registered pattern memory, 7 is a level calculation circuit, 10 is an audio input section, 2
0 is a feature extraction unit, 21 is a pitch period extraction circuit, 22 is a filter coefficient setting circuit, 23 is a digital filter, 50
2 shows a recognition processing unit, and 232 shows a two-stage lattice filter. Agent Masu Oiwa Figure 1 Figure 3 ■ Figure 6 Procedural amendment (self-initiated) 1. Indication of the incident Japanese Patent Application No. 183842/1982 2 Name of the invention Word speech recognition device 3 Representative of the person making the amendment Part 4 to Hitoshi Katayama, Agent 5, Subject of amendment Column 6 of detailed explanation of the invention in book IB, Contents of amendment (1) "Higher order form" on page 12, line 13 of the specification has been changed to "lattice" (2) Amend “Do” on page 16, line 15 of the specification to “do.” (3) Add + (n to rB+- on page 18, line 16 of the specification).
) − to + (n) − to 2 (n) J
[B, − to + (n) − Kl (n) − to 2
(n) Corrected to J. Written amendment to the above procedure (voluntary) Mr. Commissioner of the Japan Patent Office 1. Indication of the case: Japanese Patent Application No. 58-183842 2. Name of the invention: Word speech recognition device 3. Person making the amendment 5. Figure 6 of the drawing to be amended. Contents of the amendment Figure 6 of the drawings will be amended as shown in the attached sheet. that's all

Claims

[Scope of Claims] 1) Audio signal input means for converting audio into an electrical signal and inputting it; feature extraction means for extracting feature parameters of the audio signal waveform input from the audio signal input means; and the feature extraction means. an input back storage means for recording the feature parameters of the native language speech to be recognized extracted by the feature extraction means; a registration pattern for storing in advance the feature parameters of the plurality of word sounds extracted by the feature extraction means storage means,
and a speech recognition process that calculates the degree of similarity between the feature parameters of the input speech stored in the input cover pattern storage means and the feature parameters of the plurality of word sounds stored in the registered pattern storage means, and performs speech recognition processing. The feature extracting means includes means for detecting the pitch frequency of the audio signal inputted from the audio signal inputting means, and a resonance frequency of which changes according to a set filter coefficient, and a digital filter for extracting the spectrum data of No. 18 using the characteristic parameters; and means for setting the number of filters of the digital filter so that the resonance frequency of the digital filter is Mvi times the pitch frequency. Including, word speech recognition device. (2) The word speech 12 according to claim 1, wherein the digital filter is digitized by one, and the filter coefficient setting means sets the filter coefficient to the digital filter in a time-sharing manner. recognition device. (3) A plurality of the digital filters are arranged in parallel, and the filter coefficient setting means sets a different filter coefficient to each of the digital filters arranged in parallel. The word speech recognition device according to item 1. (4) The digital filter according to any one of claims 1 to 3, wherein the digital filter further includes means for adjusting the level of the input audio signal according to the level of the input audio signal. ! iSpeech recognition device. (5) The digital filter further includes means for adjusting the level of the output signal according to the resonance frequency so that the level of the output signal at each resonance frequency is constant. The word speech recognition device according to any one of Item 3.