JPS63121100A

JPS63121100A - Feature pattern extraction for voice recognition equipment

Info

Publication number: JPS63121100A
Application number: JP26723386A
Authority: JP
Inventors: 納田　重利
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1986-11-10
Filing date: 1986-11-10
Publication date: 1988-05-25

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔産業上の利用分野〕この発明は、例えば不特定話者の音声を単語単位で認識
するのに適用される音声認識装置における特徴パターン
抽出方法に関する。DETAILED DESCRIPTION OF THE INVENTION [Industrial Field of Application] The present invention relates to a feature pattern extraction method in a speech recognition device that is applied, for example, to recognizing speech of unspecified speakers word by word.

[Summary of the invention]

この発明は、音声認識装置の特徴抽出部における特徴パ
ターン抽出方法において、音声の二値スペクトルパター
ンのチャンネル方向に連なる「１」の領域と連なる「０
」の領域とを判定して、各領域内に存在するチャンネル
の個数を特徴値として抽出して容易に周波数軸方向の平
行移動的変動を正規化することにより、話者の話し方及
び話者の違い等により影響されることがないようにして
認識率の向上を図るものである。The present invention provides a feature pattern extraction method in a feature extraction unit of a speech recognition device.
” area and extract the number of channels existing in each area as a feature value to easily normalize translational fluctuations in the frequency axis direction. This is intended to improve the recognition rate without being influenced by differences.

[Conventional technology]

本願出願人により、先に提案されている特願昭５９−１
０６１７７号明細書に示される音声認識装置は、音声入
力部としてのマイクロホン、前処理回路。Japanese Patent Application No. 59-1 previously proposed by the applicant
The speech recognition device shown in the specification of No. 06177 includes a microphone as a speech input section and a preprocessing circuit.

音響分析器、特徴データ抽出器、登録パターンメモリ及
びパターンマツチング判定器等により構成されている。It is composed of an acoustic analyzer, a feature data extractor, a registered pattern memory, a pattern matching judger, etc.

この音声認識装置は、マイクロホンから入力される音声
信号を前処理回路において、音声認識に必要とされる帯
域に制限し、Ａ／Ｄ変換器によりディジタル音声信号と
し、このディジタル音声信号を音響分析器に供給する。This speech recognition device limits the audio signal input from the microphone to the band required for speech recognition in a preprocessing circuit, converts it into a digital audio signal using an A/D converter, and converts this digital audio signal into an acoustic analyzer. supply to.

そして、音響分析器において、音声信号を周波数スペク
トルに変換し、例えば対数軸上で一定間隔となるように
Ｎ個の周波数を代表値として周波数スペクトルを正規化
して、フレーム周期毎にＮチャンネルのスペクトルデー
タにより構成されるフレームデータを特徴データ抽出器
に供給する。Then, in the acoustic analyzer, the audio signal is converted into a frequency spectrum, and the frequency spectrum is normalized using N frequencies as representative values, for example, at regular intervals on the logarithmic axis. Frame data constituted by the data is supplied to a feature data extractor.

特徴データ抽出器は、隣り合うフレームデータの距離を
計算し、夫々のフレーム間距離の総和により、音声信号
の始端フレームから終端フレームまでのＮ次元ベクトル
の軌跡長を求め、最も語数が多く長い音声の場合に特徴
を抽出するのに必要な所定の分割数でもって軌跡長を埠
分割し、その分割点に対応したフレームデータのみを特
徴データとして抽出して、話者の音声の発生速度変動に
影響されることがないように時間軸を正規化し出力する
。The feature data extractor calculates the distance between adjacent frame data, calculates the trajectory length of the N-dimensional vector from the start frame to the end frame of the audio signal by summing the distances between each frame, and extracts the longest audio with the largest number of words. In this case, the trajectory length is divided into sections using a predetermined number of divisions necessary to extract features, and only the frame data corresponding to the division points are extracted as feature data, thereby adjusting for fluctuations in the rate of speech generation of the speaker. Normalize and output the time axis so that it is not affected.

この特徴データを登録時においては、登録パターンメモ
リに供給して登録特徴データブロック（標準パターン）
として記憶し、認識時においては、入力音声信号を前述
した処理により、入力特徴データブロック（入カバター
ン）とし、パターンマツチング判定器に供給する。そし
てパターンマツチング判定器において、入力特徴データ
ブロックと登録特徴データブロックとの間でパターンマ
ツチングを行う。When registering this feature data, it is supplied to the registered pattern memory and used as a registered feature data block (standard pattern).
At the time of recognition, the input audio signal is processed as described above to become an input feature data block (input cover pattern) and supplied to a pattern matching determiner. A pattern matching determiner performs pattern matching between the input feature data block and the registered feature data block.

パターンマツチング判定器は、登録特徴データブロック
を構成するフレームデータと入力特徴データブロックを
構成するフレームデータとの間でフレーム間距離を計算
し、フレーム間距離の総和をマツチング距離とし、他の
登録特徴データブロックに関しても同様にマツチング距
離を算出して、マツチング距離が最小で十分に距離が近
いものと判断される登録特徴データブロックに対応する
単語を認識結果として出力する。The pattern matching determiner calculates the inter-frame distance between the frame data constituting the registered feature data block and the frame data constituting the input feature data block, and uses the sum of the inter-frame distances as the matching distance, The matching distance is similarly calculated for the feature data block, and the word corresponding to the registered feature data block for which the matching distance is the minimum and the distance is determined to be sufficiently close is output as a recognition result.

しかし、従来の音声認識装置においては、音響分析器か
ら出力されるフレームデータが特徴データ抽出器を介し
てそのまま登録特徴データブロックとして登録パターン
メモリに記憶されるため、登録パターンメモリのメモリ
量が膨大なものとなる問題点があった。これと共に、パ
ターンマツチング時においても、データ量に応じてその
計算処理時間が長（なる問題点があった。　　　　゛こ
のため、フレームデータを構成するスペクトルデータの
夫々を二値化し、登録パターンメモリの容量を低減させ
てマツチング処理時間を短縮化する音声認識装置（特願
昭６０−１６６１９１号明細書）が本願出願人により提
案されている。However, in conventional speech recognition devices, the frame data output from the acoustic analyzer is directly stored in the registered pattern memory as a registered feature data block via the feature data extractor, so the amount of memory in the registered pattern memory is enormous. There were some serious problems. At the same time, there was also the problem that during pattern matching, the calculation processing time was long depending on the amount of data. ゛For this reason, each of the spectral data that makes up the frame data is binarized, and the registered pattern memory is The applicant of the present application has proposed a speech recognition device (Japanese Patent Application No. 166191/1982) which shortens the matching processing time by reducing the capacity.

この音声認識装置は、種々の原因により変動するスペク
トルの傾向を補正するための傾向値を算出し、この傾向
値に基づいてスペクトルの傾向を平坦化して話者の個人
差や周囲ノイズ等に影響されることがないようにフレー
ムデータを正規化した後にフレームデータを構成するス
ペクトルデータの夫々を二値化処理し、得られた二値パ
ターンに基づいてパターンマツチングを行うものである
。This speech recognition device calculates a tendency value to correct the tendency of the spectrum that fluctuates due to various causes, and flattens the tendency of the spectrum based on this tendency value to compensate for the influence of individual differences between speakers and surrounding noise. After the frame data is normalized to avoid any distortion, each piece of spectral data constituting the frame data is binarized, and pattern matching is performed based on the obtained binary pattern.

[Problem that the invention seeks to solve]

しかし、音声信号の時系列の周波数スペクトルは、一般
に話者の話し方及び話者の違い等により時間軸方向に変
動するだけでなく、周波数軸方向にも大きく変動するこ
とが知られている。このため、従来の時間軸正規化処理
のみの特徴パターン抽出方法が用いられた音声認識装置
では、話者の話し方にむらがないように規定するか、ま
たは、標準パターンを複数個用意するマルチテンプレー
ト方式にすることにより周波数軸方向の変動に対処して
いた。しかし、この音声入力時の話し方規定は実用的で
なく、また、マルチテンプレート方式を用いた場合には
、登録パターンメモリのメモリ量が膨大なものとなり、
それに伴って、処理時間も長くなる欠点がある。− ところで、男性１女性、子供、成人等の話者の違いによ
る周波数軸方向の変動は、一般に対数目盛上の平行移動
的変動が多いもので、このような変動を正規化すること
ができれば不特定話者を対象とした音声の認識率を向上
させることが可能となる。このため、周波数軸方向の平
行移動的な変動を容易に正規化することができる特徴パ
ターン抽出方法が要望されている。However, it is known that the time-series frequency spectrum of an audio signal generally fluctuates not only in the time axis direction but also largely in the frequency axis direction due to the speaking style of speakers, differences among speakers, and the like. For this reason, in speech recognition devices that use conventional feature pattern extraction methods that involve only time-axis normalization processing, it is necessary to specify the speaker's speaking style evenly, or use a multi-template template that prepares multiple standard patterns. By using this method, fluctuations in the frequency axis direction were dealt with. However, this speaking style regulation when inputting voice is not practical, and when the multi-template method is used, the amount of registered pattern memory becomes enormous.
Along with this, there is a drawback that the processing time becomes longer. - By the way, variations in the frequency axis direction due to differences between speakers, such as men, women, children, and adults, generally involve parallel shifts on a logarithmic scale, and if such variations can be normalized, it can be It is possible to improve the recognition rate of speech targeted at a specific speaker. Therefore, there is a need for a feature pattern extraction method that can easily normalize translational fluctuations in the frequency axis direction.

従って、この発明の目的は、周波数軸方向の平行移動的
変動を容易に正規化することができ、不特定話者に対す
る認識率を向上させることができる音声認識装置におけ
る特徴パターン抽出方法を提供することにある。Therefore, an object of the present invention is to provide a feature pattern extraction method in a speech recognition device that can easily normalize translational fluctuations in the frequency axis direction and improve the recognition rate for unspecified speakers. There is a particular thing.

[Means for solving problems]

この発明は、入力音声信号のスペクトルパターンを所定
のしきい値で比較して二値スペクトルパターンを得るス
テップ（二値化器１２）と、二値スペクトルパターンに
おいて各フレームのチャンネル方向に関して「１」また
は「０」の連続する数を算出してパターンマトリクスを
得るステップ（領域距離特徴抽出器１３）と、二値スペ
クトルパターンの端部に相当するパターンマトリクスの
所定の列に含まれる値を正規化するステップ（周波数軸
正規化器１５）とから成ることを特徴とする音声認識装
置における特徴パターン抽出方法である。The present invention includes a step (binarizer 12) of comparing spectral patterns of input audio signals with a predetermined threshold value to obtain a binary spectral pattern; Alternatively, a step (area distance feature extractor 13) of calculating the consecutive number of "0" to obtain a pattern matrix, and normalizing the values included in a predetermined column of the pattern matrix corresponding to the ends of the binary spectral pattern. (frequency axis normalizer 15).

[Effect]

二値化器１２から出力されるＭ行Ｎ列のマトリクスＸＭ
Ｎで表現される二値スペクトルパターンが領域距離特徴
抽出器１３に供給され、チャンネル方向に連なる「１」
及び「０」の領域とが判別されて各領域（１≦ｉ５１．
ｉは整数）に存在するチャンネルの個数が特徴値として
抽出される。この抽出処理によりＭ行Ｉ列のマトリクス
ＸＮＩで表現される初期特徴パターンが形成され、この
初期特徴パターンが時間軸正規化器１４に供給される。Matrix XM of M rows and N columns output from the binarizer 12
A binary spectral pattern expressed by N is supplied to the region distance feature extractor 13, and a series of "1"s in the channel direction are
and the area of “0”, and each area (1≦i51.
i is an integer) is extracted as a feature value. Through this extraction process, an initial feature pattern expressed by a matrix XNI of M rows and I columns is formed, and this initial feature pattern is supplied to the time axis normalizer 14.

時間軸正規化器１４において時系列軌跡に沿って正規化
処理がなされ、初期特徴パターンがＪ行■列のマトリク
スＸＪＩにより表現される時間軸方向の変動に影響され
ない特徴パターンとされ、この特徴パターンが周波数軸
正規化器１５に供給される０周波数軸正規化器１５にお
いて特徴パターンの（ｉ−１）の列の特徴値のうちで最
小のものが判断され、（ｉ−１）の列の各特徴値から最
小と判断される特徴値が減算され、得られた夫々の値が
（ｉ−１）の列に関する新たな特徴値とされて周波数軸
方向の変動に影響されない特徴パターンが形成され、こ
の特徴パターンに基づいてパターンマツチングがなされ
る。In the time axis normalizer 14, normalization processing is performed along the time series trajectory, and the initial feature pattern is made into a feature pattern that is not affected by fluctuations in the time axis direction expressed by a matrix XJI with J rows and ■ columns, and this feature pattern is supplied to the frequency axis normalizer 15.The frequency axis normalizer 15 determines the minimum feature value among the feature values in the (i-1) column of the feature pattern, and The feature value that is determined to be the minimum is subtracted from each feature value, and each obtained value is used as a new feature value for the column (i-1) to form a feature pattern that is not affected by fluctuations in the frequency axis direction. , pattern matching is performed based on this feature pattern.

〔Example〕

ａ、音声認識装置の全体構成とその動作説明以下、この
発明の一実施例を図面を参照して説明する。第１図は、
この発明を実用するのに用いられる音声認識装置の一例
としての概要を示すもので、この発明は、第１図におい
て６で示される特徴抽出部の処理に関するものである。a. Description of the overall structure and operation of the speech recognition device An embodiment of the present invention will be described below with reference to the drawings. Figure 1 shows
This figure shows an outline of an example of a speech recognition device used to put the present invention into practice, and the present invention relates to the processing of the feature extraction unit shown by 6 in FIG.

第１図において、ｌで示されるのが音声入力部としての
マイクロホンを示している。マイクロホン１からのアナ
ログ音声信号がフィルタ２に供給される。フィルタ２は
、例えばカットオフ周波数７．５ｋｌＬｚのローパスフ
ィルタであり、音声信号がフィルタ２において、音声認
識に必要とされる７、５ｋＨｚ以下の帯域に制限され、
この音声信号がアンプ３を介してＡ／Ｄ変換器４に供給
される。In FIG. 1, l indicates a microphone as an audio input section. An analog audio signal from microphone 1 is supplied to filter 2 . The filter 2 is, for example, a low-pass filter with a cutoff frequency of 7.5 klLz, and the audio signal is limited in the filter 2 to a band of 7.5 kHz or less required for speech recognition.
This audio signal is supplied to an A/D converter 4 via an amplifier 3.

Ａ／Ｄ変換器４は、例えばサンプリング周波数１６．５
ｋＨｚの８ビツトＡ／Ｄ変換器であり、音声信号がＡ／
Ｄ変換器４において、アナログ−ディジタル変換されて
８ビツトのディジタル信号とされ、音響分析器５に供給
される。The A/D converter 4 has a sampling frequency of 16.5, for example.
This is a kHz 8-bit A/D converter, and the audio signal is
The D converter 4 performs analog-to-digital conversion into an 8-bit digital signal, which is then supplied to the acoustic analyzer 5.

音響分析器５は、音声信号を周波数スペクトルに変換し
て、例えばＮチャンネルのスペクトルデータ列を発生す
るものである。音響分析器５において、音声信号が演算
処理により周波数スペクトルに変換され、例えば対数軸
上で一定間隔となるＮ個の周波数を代表値とするスペク
トルデータ列が得られる。従って、音声信号がＮチャン
ネルの離散的な周波数スペクトルの大きさによって表現
−される、そして、単位時間（フレーム周期）毎にＮチ
ャンネルのスペクトルデータ列が１つのフレームデータ
として出力される。即ち、フレーム周期毎に音声信号が
Ｎ次元ベクトルにより表現されるパラメータとして切り
出され、特徴抽出部６に供給される。The acoustic analyzer 5 converts the audio signal into a frequency spectrum and generates, for example, an N-channel spectrum data string. In the acoustic analyzer 5, the audio signal is converted into a frequency spectrum through arithmetic processing, and a spectral data string whose representative values are, for example, N frequencies at regular intervals on the logarithmic axis is obtained. Therefore, an audio signal is expressed by the magnitude of discrete frequency spectra of N channels, and a spectral data string of N channels is outputted as one frame data every unit time (frame period). That is, the audio signal is cut out as a parameter expressed by an N-dimensional vector every frame period, and is supplied to the feature extraction section 6.

例えば、音声区間の終端に対応するフレームをＭとした
場合、第２図に示すように、各々がチャンネル１〜チヤ
ンネルＮのスペクトルデータにより構成されるフレーム
データが時系列でフレーム１からフレームＭまで特徴抽
出部６に供給される。For example, if the frame corresponding to the end of the audio section is M, as shown in Figure 2, the frame data each consisting of spectrum data of channel 1 to channel N is chronologically arranged from frame 1 to frame M. It is supplied to the feature extraction section 6.

特徴抽出部６は、後に詳述するように、スペクトル傾向
正規化、二値化、領域距離特徴抽出１時間軸正規化及び
周波数軸正規化等の各種の処理を行って、特徴パターン
を形成するもので特徴抽出部６において形成された特徴
パターンがモード切替回路７に供給される。The feature extraction unit 6 performs various processes such as spectral tendency normalization, binarization, region distance feature extraction 1 time axis normalization, and frequency axis normalization to form a feature pattern, as will be described in detail later. The feature pattern formed in the feature extraction section 6 is supplied to the mode switching circuit 7.

この特徴パターンが登録時においては、モード切替回路
７を介して登録パターンメモリ８に供給され、標準パタ
ーンとして記憶される。また、認識時においては、入力
音声信号が前述した処理により特徴パターンとされ、こ
の特徴パターンが入カバターンとしてパターンマツチン
グ判定器９に供給される。そして、比較の対象とされる
登録パターンメモリ８の標準パターンがパターンマツチ
ング判定器９に供給され、入カバターンと比較の対象と
される全ての標準パターンとの間において、パターンマ
ツチングが行われる。When this characteristic pattern is registered, it is supplied to the registered pattern memory 8 via the mode switching circuit 7 and stored as a standard pattern. Further, during recognition, the input audio signal is converted into a feature pattern by the above-described processing, and this feature pattern is supplied to the pattern matching determiner 9 as an input pattern. Then, the standard pattern in the registered pattern memory 8 to be compared is supplied to the pattern matching determiner 9, and pattern matching is performed between the input cover pattern and all the standard patterns to be compared. .

パターンマツチング判定器９において、入カバターンを
構成するフレームと比較の対象とされる標準パターンを
構成するフレームどの間において、フレーム間距離が求
められ、その総和がマツチング距離とされる。そして比
較の対象とされる全ての登録パターンに関して求められ
たマツチング距離のうちで最小でかつ十分に距離が近い
ものと判断される登録パターンに対応する単語が認識結
果として出力される。In the pattern matching determiner 9, inter-frame distances are determined between the frames constituting the input cover pattern and the frames constituting the standard pattern to be compared, and the sum of these distances is taken as the matching distance. Then, a word corresponding to a registered pattern that is determined to be the smallest and sufficiently close among the matching distances found for all registered patterns to be compared is output as a recognition result.

ｂ、特徴抽出部における特徴パターン抽出方法の説明第３図Ａは、前述した音声認識装置における特徴抽出部
６の一例としての構成を示すもので、第３図Ａに示すよ
うに、特徴抽出部６がスペクトル傾向正規化器１１．二
値化器１２．領域距離特徴抽出器１３９時間軸正規化器
１４及び周波数軸正規化器１５により構成される。b. Description of feature pattern extraction method in feature extraction unit FIG. 3A shows an example of the configuration of the feature extraction unit 6 in the speech recognition device described above. 6 is a spectral tendency normalizer 11. Binarizer 12. It is composed of a region distance feature extractor 139, a time axis normalizer 14, and a frequency axis normalizer 15.

スペクトル傾向正規化器１１に音響分析器５からの時系
列のフレームデータが供給され、スペクトル傾向正規化
器１１において、順次供給されるフレームデータ毎にス
ペクトルデータの傾向正規化処理がなされる０例えば、
各フレームデータを構成するスペクトルデータに関して
傾向変動を補正する傾向値がチャンネル１から所定のチ
ャンネルｎ（１≦ｎ≦Ｎ、ｎは整数）までのスペクトル
データの平均値と、所定のチャンネルｎから最大チャン
ネルＮまでのスペクトルデータの平均値との平均値に適
当な係数が乗ぜられることにより求められる。この各チ
ャンネルのスペクトルデータに関して求められた傾向値
と対応するスペクトルデータとの間において減算がなさ
れ、スペクトル傾向が平坦化され、話者の個人差及び周
囲ノイズ等に影響されることがないようにスペクトルデ
ータ向が正規化される。全てのフレームに関して同様に
傾向正規化処理がなされ、傾向正規化されたフレームデ
ータが二値化器１２に供給される。Time-series frame data from the acoustic analyzer 5 is supplied to the spectral trend normalizer 11, and the spectral trend normalizer 11 performs trend normalization processing on the spectral data for each sequentially supplied frame data. ,
The trend value for correcting trend fluctuations regarding the spectrum data constituting each frame data is the average value of the spectrum data from channel 1 to a predetermined channel n (1≦n≦N, n is an integer), and the maximum value from the predetermined channel n. It is obtained by multiplying the average value of the spectrum data up to channel N by an appropriate coefficient. Subtraction is performed between the trend value determined for the spectrum data of each channel and the corresponding spectrum data, and the spectrum trend is flattened so that it is not influenced by individual differences between speakers, surrounding noise, etc. Spectral data orientation is normalized. Trend normalization processing is similarly performed on all frames, and the trend-normalized frame data is supplied to the binarizer 12.

二値化器１２において、１個のフレームデータにより表
現されるスペクトルエンベロープのホルマント部が１と
なるように適当な値に設定されたしきい値で以てフレー
ムデータを構成する８ビツトのスペクトルデータの夫々
が二値化される０例えば、スペクトルデータと適当な値
に設定されたしきい値とが比較され、しきい値より大と
なる値のスペクトルデータが「１」とされ、しきい値よ
り小となる値のスペクトルデータが「０」とされる。二
値化器１２において形成されたこの二値スペクトルパタ
ーンが領域距離特徴抽出器１３に供給される。In the binarizer 12, 8-bit spectral data constituting frame data is processed with a threshold value set to an appropriate value so that the formant part of the spectral envelope expressed by one frame data is 1. For example, the spectrum data is compared with a threshold value set to an appropriate value, and the spectrum data with a value greater than the threshold value is set to "1", and the threshold value is The spectrum data having the smaller value is set to "0". This binary spectral pattern formed in the binarizer 12 is supplied to the region distance feature extractor 13.

領域距離特徴抽出器１３は、各フレームの周波数軸方向
、即ち、チャンネル方向に連なる「１」の領域及び連な
る「０」の領域とを判別し、各領域に存在するチャンネ
ルの個数を特徴値として抽出するもので、この抽出処理
により初期特徴パターンを形成する。The region distance feature extractor 13 determines a region of "1" and a region of "0" that are continuous in the frequency axis direction of each frame, that is, in the channel direction, and uses the number of channels existing in each region as a feature value. This extraction process forms an initial feature pattern.

つまり、周波数軸となるチャンネル番号をｎ（１≦ｎ≦
Ｎ、　　ｎは整数）とし、時間軸としてのフレーム番号
をｍ（１≦ｍ≦Ｍ、ｍは整数）として二値スペクトルパ
ターンがＭ行Ｎ列マトリクスＸｌ″Ｎで表現されるもの
とすると、各フレーム（各ｍ行）につきチャンネル方向
（例えばチャンネル番号が大きくなるｎ列方向）に向か
って連続する「１」の領域と連続する「０」の領域とを
判定し、始端となる領域を１として１から終端となる領
域Ｉまで領域番号ｉ　　（１≦ｉ≦Ｉ、ｉは整数５を付
加する。そして各領域に存在するチャンネルの個数を特
徴値として抽出し、この特徴値を領域番号順に並べるこ
とで、Ｍ行Ｉ列のマトリクスＸ−により表現される初期
特徴パターンを形成する。In other words, the channel number on the frequency axis is n (1≦n≦
N, where n is an integer), and the frame number as the time axis is m (1≦m≦M, m is an integer), and the binary spectral pattern is expressed as an M row, N column matrix Xl''N. For each frame (each m rows), determine the continuous "1" area and the continuous "0" area in the channel direction (for example, in the n column direction where the channel number increases), and set the starting area as 1. Area number i from 1 to the terminal area I (1≦i≦I, where i is an integer 5.Then, the number of channels existing in each area is extracted as a feature value, and these feature values are arranged in the order of the area number. In this way, an initial feature pattern expressed by a matrix X- of M rows and I columns is formed.

例えば、第４図Ａに示すようにチャンネル１〜チヤンネ
ル１５までの１５個のチャンネルにより各フレームが構
成され、８個のフレームにより構成される二値スペクト
ルパターンが存在する場合には、各フレームに関して連
なる「１」領域（図中における斜線領域）と連なる「０
」領域が判定され、例外的なフレームを除いて６領域に
分割される。そして各領域に存在するチャンネルの個数
が特徴値として図中矢印で示すように対応した形で抽出
される。従って、８行１５列マトリクスで表現される第
４図Ａに示す二値スペクトルパターンが第４−８に示す
ように８行６列マトリクスで表現される初期特徴パター
ンとされる。このように領域距離特徴抽出器１３におい
て形成された初期特徴パターンが時間軸正規化器１４に
供給される。For example, if each frame is composed of 15 channels from channel 1 to channel 15 as shown in FIG. 4A, and there is a binary spectrum pattern composed of 8 frames, then for each frame A continuous “1” area (shaded area in the figure) and a continuous “0” area (shaded area in the figure)
” region is determined and divided into 6 regions except for exceptional frames. Then, the number of channels existing in each region is extracted as a feature value in a corresponding manner as shown by the arrows in the figure. Therefore, the binary spectral pattern shown in FIG. 4A expressed by an 8 rows and 15 columns matrix is taken as the initial feature pattern expressed by an 8 rows by 6 columns matrix as shown in 4-8. The initial feature pattern thus formed in the region distance feature extractor 13 is supplied to the time axis normalizer 14.

時間軸正規化器１４は、例えば特徴ベクトル（連なる「
１」領域及び連なるｒＯＪ領域の領域数Ｉに対応するＩ
次元ベクトル）空間上における時系列軌跡に沿って正規
化処理を行って初期特徴パターンを時間軸方向に圧縮（
若しくは伸長）する。The time axis normalizer 14 uses, for example, a feature vector (a series of
I corresponding to the number of regions I of the "1" region and the continuous rOJ region
The initial feature pattern is compressed in the time axis direction by performing normalization processing along the time series trajectory in space (dimensional vector).
or elongate).

例えば、時間軸正規化器１４において、初期特徴パター
ンを構成する隣り合うフレームの対応する各領域の特徴
値の差の絶対値が求められ、その総和が隣り合うフレー
ムに関するフレーム間距離とされる。更にフレーム間距
離の総和が求められ、始端フレーム１から終端フレーム
Ｍまでの■次元ベクトルの軌跡長が求められる。そして
特徴を抽出するのに必要な所定の分割数（例えばＪ）で
もって軌跡長が等分割される。３個の分割点の夫々に対
応して近接存在するフレームのみが抽出され、話者の音
声の発生速度変動に影響されることがないように時間軸
が正規化されて出力される。従って、Ｍ行Ｉ列のマトリ
クスＸＭＩにより表現される初期特徴パターンがＪ行■
列のマトリクスＸＪＩのマトリクスにより表現される特
徴パターンに圧縮（若しくは伸長）される。時間軸正規
化器１４において形成された特徴パターンが周波数軸正
規化器１５に供給される。For example, in the time axis normalizer 14, the absolute value of the difference between the feature values of corresponding regions of adjacent frames constituting the initial feature pattern is determined, and the sum thereof is taken as the interframe distance between the adjacent frames. Furthermore, the sum of the inter-frame distances is determined, and the trajectory length of the ■-dimensional vector from the starting frame 1 to the ending frame M is determined. Then, the trajectory length is equally divided by a predetermined number of divisions (for example, J) necessary for extracting features. Only frames existing in close proximity to each of the three division points are extracted, and the time axis is normalized and output so as not to be affected by variations in the rate of speech generation of the speaker. Therefore, the initial feature pattern expressed by the matrix XMI with M rows and I columns is
It is compressed (or expanded) into a characteristic pattern expressed by a matrix of columns XJI. The feature pattern formed in the time axis normalizer 14 is supplied to the frequency axis normalizer 15.

周波数軸正規化器１５は、周波数軸方向の平行移動的変
動の正規化処理を行って、最終的な特徴パターンを形成
する。The frequency axis normalizer 15 performs normalization processing on translational fluctuations in the frequency axis direction to form a final feature pattern.

時間軸正規化器１４からの特徴パターンの（ｉ＝１）の
列以外の特徴値は、領域距離特徴抽出器１３における処
理によって既に周波数軸方向の平行移動的変動に対して
不変な値とされているため、周波数軸正規化器１５にお
いて特徴パターンの（ｉ−１）の列の特徴値に関しての
み計算処理がなされる。例えば、特徴パターンの（ｉ＝
１）の列の特徴値のうちで最小値となるものが判断され
、（ｉ　＝　１）の列の各特徴値から最小と判断される
特徴値が減算される。この減算処理により得られた夫々
の値が（ｉ＝１）の例に関する新たな特徴値とされる。The feature values other than the (i=1) column of the feature pattern from the time axis normalizer 14 have already been made into values that are invariant to translational fluctuations in the frequency axis direction through processing in the region distance feature extractor 13. Therefore, the frequency axis normalizer 15 performs calculation processing only on the feature values of the (i-1) column of the feature pattern. For example, for the feature pattern (i=
The minimum value among the feature values in the column 1) is determined, and the feature value determined to be the minimum is subtracted from each feature value in the column (i = 1). Each value obtained by this subtraction process is used as a new feature value for the example (i=1).

例えば、第４図Ｂに示す特徴パターンが周波数軸正規化
器１５に供給された場合（実際には、時間軸正規化器１
４を介されるため圧縮若しくは伸長された形となる）に
は、第４図Ｂにおいて左端となる（ｉ＝１）の列に関し
てのみ処理がなされて特徴値の最小となるものが「３」
と判断され、各特徴値から「３」が減算される。この減
算処理により得られた夫々の値が図中矢印で示すように
（ｉ＝１）の列に関する新たな特徴値とされて第４図Ｂ
に示す特徴パターンが第４図Ｃ示す特徴パターンとされ
る。For example, if the feature pattern shown in FIG.
4, resulting in a compressed or expanded form), processing is performed only on the leftmost column (i=1) in FIG. 4B, and the one with the minimum feature value is "3".
It is determined that "3" is subtracted from each feature value. Each value obtained by this subtraction process is set as a new feature value for the column (i=1) as shown by the arrow in the figure, and is shown in FIG. 4B.
The characteristic pattern shown in FIG. 4C is taken as the characteristic pattern shown in FIG.

このように周波数軸正規化器１５において形成された特
徴パターンが前述したように登録時においては登録パタ
ーンメモリ８に供給され、認識時においてはパターンマ
ツチング判定器９に供給される。As described above, the characteristic pattern formed in the frequency axis normalizer 15 is supplied to the registered pattern memory 8 at the time of registration, and is supplied to the pattern matching determiner 9 at the time of recognition.

尚、この発明の一実施例においては、ｆＩＩ域距離特徴
抽出処理を行う際にチャンネル１からチャンネル番号が
大きくなる方向に順次連続する「１」または「０」の領
域を判定する場合について説明したが、チャンネルＮか
らチャンネル番号が小さくなる方向に順次連続する「１
」または「０」の領域を判定してその各領域に対応して
初期特徴パターンを形成するようにしても良い。In one embodiment of the present invention, a case has been described in which a region of successive "1" or "0" is determined from channel 1 in the direction of increasing channel number when performing fII range distance feature extraction processing. is a sequence of successive "1" from channel N in the direction of decreasing channel number.
” or “0”, and an initial feature pattern may be formed corresponding to each region.

また、この発明の一実施例の特徴抽出部６においては、
スペクトル傾向正規化、二値化、領域距離特徴抽出１時
間軸正規化及び周波数軸正規化の順で処理がなされる場
合について説明したが、特徴抽出部６の構成を第３図Ｂ
に示すように二値化器１２の前段に時間軸正規化器１４
を設ける構成として、例えば特徴ベクトル（音響分析器
５のチャンネル数に対応するＮ次元ベクトル）空間上に
おける時系列軌跡に沿って時間軸正規化処理を行ってか
ら二値化し、領域距離特徴抽出処理を行って初期特徴パ
ターンを形成した後に周波数軸正規化処理を行うように
しても良い。更に、特徴抽出部６を第３図Ｃに示す構成
として、二値化処理の後に時間軸正規化処理を行い、そ
の後に領域距離特徴抽出処理を行って初期特徴パターン
を形成して周波数軸正規化処理を行うようにしても良い
。Further, in the feature extraction unit 6 of one embodiment of the present invention,
Although we have described the case where processing is performed in the order of spectral tendency normalization, binarization, region distance feature extraction 1, time axis normalization, and frequency axis normalization, the configuration of the feature extraction unit 6 is shown in Fig. 3B.
As shown in the figure, a time axis normalizer 14 is installed before the binarizer 12.
For example, a feature vector (an N-dimensional vector corresponding to the number of channels of the acoustic analyzer 5) is subjected to time-axis normalization processing along a time-series trajectory in space, then binarized, and area distance feature extraction processing is performed. The frequency axis normalization process may be performed after the initial feature pattern is formed by performing the above steps. Furthermore, the feature extraction unit 6 is configured as shown in FIG. 3C, and after the binarization process, the time axis normalization process is performed, and then the area distance feature extraction process is performed to form an initial feature pattern, and the frequency axis normalization process is performed. It is also possible to perform conversion processing.

〔Effect of the invention〕

この発明では、二値化器から供給されるＭ行Ｎ列のマト
リクスＸＭＮで表現される二値スペクトルパターンが領
域距離特徴抽出器に供給され、チャンネル方向に連なる
「１」及び「０」の領域とが判別されて各領域（１≦ｉ
≦Ｉ、ｉは整数）に存在するチャンネルの個数が特ｍ値
として抽出される。この抽出処理によりＭ行１列のマト
リクスＸ旧で表現される初期特徴パターンが形成され、
この初期特徴パターンが時間軸正規化器に供給される。In this invention, a binary spectral pattern expressed by a matrix XMN of M rows and N columns supplied from a binarizer is supplied to a region distance feature extractor, and a region of "1" and "0" continuous in the channel direction is supplied to the region distance feature extractor. is determined and each area (1≦i
≦I, i is an integer) is extracted as the characteristic m value. Through this extraction process, an initial feature pattern expressed by a matrix X with M rows and 1 column is formed,
This initial feature pattern is supplied to a time axis normalizer.

時間軸正規化器において時系列軌跡に沿って正規化処理
がなされ初期特徴パターンがＪ行Ｉ列のマトリクスＸＪ
Ｉにより表現される時間軸方向の変動に影響されない特
徴パターンとされ、この特徴パターンが周波数軸正規化
器に供給される。周波数軸正規化器において特徴パター
ンの（ｉ＝１）の列の特徴値のうちで最小のものが判断
され、（ｉ　＝　１）の列の各特徴値から最小と判断さ
れる特徴値が減算、され、得られた夫々の値が（１＝１
）の列に関する新たな特徴値として周波数軸方向の変動
に影響されない特徴パターンが形成され、この特徴パタ
ーンに基づいてパターンマツチングがなされる。The time axis normalizer performs normalization processing along the time series trajectory, and the initial feature pattern is a matrix XJ with J rows and I columns.
This feature pattern is expressed by I and is not affected by fluctuations in the time axis direction, and this feature pattern is supplied to the frequency axis normalizer. In the frequency axis normalizer, the minimum feature value in the (i = 1) column of the feature pattern is determined, and the minimum feature value is subtracted from each feature value in the (i = 1) column. , and the obtained values are (1=1
) A feature pattern that is not affected by fluctuations in the frequency axis direction is formed as a new feature value for the column, and pattern matching is performed based on this feature pattern.

従って、この発明に依れば、周波数軸方向の平行移動的
変動を容易に正規化することができると共に、時間軸方
向の変動に関しても容易に正規化することができ、不特
定話者に対する認識率を向上させることが可能となる。Therefore, according to the present invention, translational fluctuations in the frequency axis direction can be easily normalized, and fluctuations in the time axis direction can also be easily normalized. This makes it possible to improve the rate.

また、この発明に依れば、従来の音声認識装置の場合の
ように音声入力時の話し方規定をする必要がなく、また
、マルチテンプレート方式にする必要がなくなるため、
登録パターンメモリの容量を低減すると共に、認識処理
の高速化を図ることができ、小型で低価格、然も実用的
な音声認識装置を提供することが可能となる。Further, according to the present invention, there is no need to specify the speaking style during voice input as in the case of conventional speech recognition devices, and there is no need to use a multi-template method.
It is possible to reduce the capacity of the registered pattern memory and to speed up recognition processing, making it possible to provide a small, low-cost, and practical speech recognition device.

[Brief explanation of the drawing]

第１図はこの発明の一実施例のブロック図、第２図はこ
の発明の一実施例における音響分析器から出力される時
系列フレームデータの構成を示す路線図、第３図Ａはこ
の発明の一実施例における特徴抽出部の構成を示すブロ
ック図、第３図Ｂ及び第３ＵｊＪＣはこの発明の一実施
例における特徴抽出部の構成の他の例及び更に他の例の
ブロック図、第４図はこの発明の一実施例における特徴
抽出部の動作説明に用いる路線図である。図面における主要な符号の説明にマイクロホン、　５：音響分析器、６：特徴抽出部、　８：登録パターンメモリ、９：パタ
ーンマツチング判定器、　　１１ニスベクトル傾向正規
化器、　１２：二値化器、１３１領域距離特徴抽出器、
　　１４：時間軸正規化器、　１５：周波数軸正規化器
。代理人　　　弁理士　杉　浦　正　知的＆９・１フレームヂーヲ第２図第３図Ａ％虫Ｊｈ　＊　却σ）＃Ｉｒバ（の、イヒ　０イラ°１
第３図Ｂつジ）イ政オ由出、麿やＯａメｑの　そ１；イでのイウ
゛１第３図ＣFIG. 1 is a block diagram of an embodiment of this invention, FIG. 2 is a route diagram showing the structure of time-series frame data output from an acoustic analyzer in an embodiment of this invention, and FIG. 3A is a block diagram of an embodiment of this invention. FIGS. 3B and 3UjJC are block diagrams showing the configuration of the feature extraction unit in one embodiment of the present invention, and FIGS. The figure is a route map used to explain the operation of the feature extraction section in one embodiment of the present invention. Microphone, 5: Acoustic analyzer, 6: Feature extractor, 8: Registered pattern memory, 9: Pattern matching judger, 11 Varnish vector tendency normalizer, 12: Binarizer , 131 region distance feature extractor,
14: Time axis normalizer, 15: Frequency axis normalizer. Agent Patent Attorney Tadashi Sugiura Intellectual & 9.1 Frame Figure 2 Figure 3 A
Figure 3B Figure 3C

Claims

[Scope of Claims] Obtaining a binary spectral pattern by comparing spectral patterns of input audio signals with a predetermined threshold; and a step of comparing spectral patterns of input audio signals with a predetermined threshold; obtaining a pattern matrix by calculating consecutive numbers of , and normalizing the values included in a predetermined column of the pattern matrix corresponding to the ends of the binary spectral pattern. Feature pattern extraction method in speech recognition device.