JPS6225798A

JPS6225798A - Voice recognition equipment

Info

Publication number: JPS6225798A
Application number: JP16508085A
Authority: JP
Inventors: 納田　重利
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1985-07-26
Filing date: 1985-07-26
Publication date: 1987-02-03

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】 ’Ｅ、−Ｄ　１１月の１１を寥■だ！φ日■〔産業上の
利用分野〕この発明は、例えば特定話者の音声を単語単位で認識す
るのに適用される音声認識装置に関する。[Detailed Description of the Invention] 'E, -D Win the 11th of November! φ日■ [Industrial Application Field] The present invention relates to a speech recognition device that is applied to, for example, recognizing the speech of a specific speaker word by word.

[Summary of the invention]

この発明は、音声認識装置において、種々の原因により
変動するスペクトルの傾向を補正するための傾向値を算
出し、この傾向値に基づいてスペクトルの傾向を平坦化
することにより話者の個人差や周囲のノイズ等に影響さ
れることがないようにして認識率の向上を図るようにし
たものである。The present invention calculates a tendency value for correcting the spectral tendency that fluctuates due to various causes in a speech recognition device, and flattens the spectral tendency based on this tendency value, thereby eliminating individual differences between speakers. This is intended to improve the recognition rate without being affected by surrounding noise, etc.

[Conventional technology]

従来の音声Ｅｌ　ｍ＆装置としては、例えば音声入力部
としてのマイクロホン、前処理回路、音響分析器、特徴
データ抽出器、登録パターンメモリ及びパターンマツチ
ング判定器等により構成されるものが知られている。As a conventional audio Elm& device, one is known that is composed of, for example, a microphone as an audio input section, a preprocessing circuit, an acoustic analyzer, a feature data extractor, a registered pattern memory, a pattern matching determiner, etc. .

この音声認識装置は、マイクロホンから入力される音声
信号を前処理回路において、音声認識に必′Ｗとされる
帯域に制限し、Ａ／Ｄ変換器によりディジタル音声信号
とし、このディジタル音声信号を音響分析器に供給する
。This speech recognition device limits the audio signal input from the microphone to the band required for speech recognition in a preprocessing circuit, converts it into a digital audio signal using an A/D converter, and converts this digital audio signal into an audio signal. feed the analyzer.

そして、音響分析器において、音声信号を周波数スペク
トルに変換し、例えば対数軸上で一定間隔となるように
Ｎ個の周波数を代表値として周波数スペクトルを正規化
して、フレーム周期毎にＮチャンネルのスペクトルデー
タにより構成されるフレームデータを特徴データ抽出器
に供給する。Then, in the acoustic analyzer, the audio signal is converted into a frequency spectrum, and the frequency spectrum is normalized using N frequencies as representative values, for example, at regular intervals on the logarithmic axis. Frame data constituted by the data is supplied to a feature data extractor.

特徴データ抽出器は、隣り合うフレームデータの距離を
計算し、夫々のフレーム間距離の総和により、音声信号
の始端フレームから終端フレームまでのＮ次元ヘクトル
の軌跡長を求め、最も語数が多く長い音声の場合に特徴
を抽出するのに必要な所定の分割数でもって軌跡長を等
分割し、その分割点に対応したフレームデータのみを特
徴データとして抽出して、話者の音声の発生速度変動に
影響されることがないように時間軸を正規化し出力する
。The feature data extractor calculates the distance between adjacent frame data, calculates the N-dimensional hector trajectory length from the start frame to the end frame of the audio signal by summing the distances between each frame, and extracts the longest audio with the largest number of words. In this case, the trajectory length is equally divided by the predetermined number of divisions necessary to extract the features, and only the frame data corresponding to the division points are extracted as feature data, thereby adjusting for fluctuations in the rate of speech generation of the speaker. Normalize and output the time axis so that it is not affected.

この特徴データを登録時においては、登録パターンメモ
リに供給して登録特徴データプロ、り（標準パターン）
として記憶し、認識時においては、入力音声信号を前述
した処理により、入力特徴データブロック（入カバター
ン）とし、パターンマツチング判定器に供給する。そし
てパターンマツチング判定器において、人力特徴データ
ブロックと登録特徴データブロックとの間でパターンマ
ツチングを行う。When registering this feature data, it is supplied to the registered pattern memory and registered feature data processor (standard pattern).
At the time of recognition, the input audio signal is processed as described above to become an input feature data block (input cover pattern) and supplied to a pattern matching determiner. A pattern matching determiner performs pattern matching between the human feature data block and the registered feature data block.

パターンマツチング判定器は、登録特徴データブロック
を構成するフレームデータと入力特徴データブロックを
構成するフレームデータとの間でフレーム間距離を計算
し、フレーム間距離の総和をマツチング距離とし、他の
登録特徴データブロックに関しても同様にマツチング距
離を算出して、マツチング距離が最小で十分に距離が近
いものと判断される登録特徴データブロックに対応する
単語を認識結果として出力する。The pattern matching determiner calculates the inter-frame distance between the frame data constituting the registered feature data block and the frame data constituting the input feature data block, and uses the sum of the inter-frame distances as the matching distance, The matching distance is similarly calculated for the feature data block, and the word corresponding to the registered feature data block for which the matching distance is the minimum and the distance is determined to be sufficiently close is output as a recognition result.

[Problem that the invention seeks to solve]

しかし、音声信号の周波数スペクトルは話者の個人差及
び周囲ノイズ等の混入によってその傾向が大きく変化す
るもので、この傾向を正規化しないと認識率が極めて低
下する。However, the tendency of the frequency spectrum of a voice signal changes greatly due to individual differences between speakers and the inclusion of surrounding noise, and unless this tendency is normalized, the recognition rate will be extremely low.

例えば第５図Ａに示すフレームデータが第５図Ｂ示すよ
うなスペクトル傾向を持つノイズにより変形され、第５
図Ｃに示すようなフレームデータとされたとする。パタ
ーンマツチング判定器において、第５図Ａに示すフレー
ムと第５図Ｃに示すフレームとの距離が求められると、
そのフレーム間距離は大きな値として計算され、マツチ
ング距離が大きな誤差を含んだものとされて誤認識され
る可能性が高くなる。このため、スペクトルの傾向変動
を補正して、話者の個人差や周囲ノイズに影響されるこ
とがないようにスペクトルの傾向を平坦化（正規化）す
ることが提案されている。For example, the frame data shown in FIG. 5A is deformed by noise having a spectral tendency as shown in FIG.
Assume that the frame data is as shown in Figure C. When the pattern matching determiner determines the distance between the frame shown in FIG. 5A and the frame shown in FIG. 5C,
The inter-frame distance is calculated as a large value, and the matching distance is assumed to include a large error, increasing the possibility that it will be misrecognized. For this reason, it has been proposed to flatten (normalize) the spectral trend by correcting the spectral trend fluctuation so that it is not influenced by individual differences between speakers or ambient noise.

例えば最小二乗法等でスペクトル傾向を一次関数で推定
し正規化する手法や所定のチャンネル幅間で部分的に平
均化した補正関数で正規化する手法が提案されているが
、前者の場合は、計算が複雑なばかりか傾向が曲線を描
く場合に適用することができず、また、後者の場合は、
スペクトルエンベロープがなめらかな場合に適用するこ
とができない欠点を有するものであった。For example, methods have been proposed in which the spectral tendency is estimated and normalized using a linear function using the least squares method, and a method in which normalization is performed using a correction function that is partially averaged between a predetermined channel width. Not only is the calculation complicated, it cannot be applied when the trend is curved, and in the latter case,
This method has the disadvantage that it cannot be applied when the spectral envelope is smooth.

従って、この発明の目的は、簡単でかつ高速に任意のス
ペクトル傾向を正確に正規化することができる手段を有
した音声認識装置を堤供することにある。Accordingly, an object of the present invention is to provide a speech recognition device having means capable of accurately normalizing any spectral tendency simply and quickly.

[Means for solving problems]

この発明は、Ｎチャンネルの周波数スペクトルに変換さ
れ、Ｎチャンネルの周波数スペクトルの時系列データが
入力される音声認識装置において、時系列データの各フ
レームのスペクトルデータに関して所定のチャンネルよ
り低い全てのチャンネルのスペクトルデータの第１の平
均値を算出すると共に、所定のチャンネルより高い全て
のチャンネルまでのスペクトルデータの第２の平均値を
算出し、第１の平均値と、第２の平均値との平均値を所
定チャンネルに対応する傾向値として算出し、スペクト
ルデータから傾向値を相殺し、傾向１１一層化されたス
ベ、り）〜ルデータを得て入力音声信′弓をＩＬ　識す
るよう；こしたことを特徴とする音声Ｊ、こ識装置であ
る。The present invention provides a speech recognition device in which time-series data of N-channel frequency spectra is converted into N-channel frequency spectra and is input, in which all channels lower than a predetermined channel are A first average value of the spectral data is calculated, and a second average value of the spectral data up to all channels higher than a predetermined channel is calculated, and the average value of the first average value and the second average value is calculated. The value is calculated as a trend value corresponding to a predetermined channel, and the trend value is offset from the spectral data to obtain trend 11 layered smoothness data to recognize the input audio signal. It is a voice J, knowledge device that is characterized by the following.

〔作用］スペクトルの傾向を正規化する手段としてスペクトル傾
向正規化器６が設けられ、スペクトル傾向正規化器６シ
こおいて、時系列フレームデータのフレーム毎に、チャ
ンネルｌから所定のチャンネルｎ（１≦ｎ≦Ｎ）までの
スペクトルデータの平均値が求められると共に、所定の
チャンネルｎから最大手ヤンネルＮまでのスペクトルデ
ータの平均値が求め：Ｊ　＋１．、夫々の平均値の更に
平均値が求められて所定のチャンネルｎに関する傾向値
とされ、各チャンネルのスペクトルデータと対応する傾
向値との間において夫々減算処理がなされることによＱ
つスペクトル傾向が平坦化される。[Operation] A spectral tendency normalizer 6 is provided as a means for normalizing the spectral tendency. 1≦n≦N), and the average value of spectral data from a predetermined channel n to the largest channel N is determined: J +1. , the average value of each average value is further calculated and used as a trend value for a predetermined channel n, and a subtraction process is performed between the spectrum data of each channel and the corresponding trend value to calculate Q.
spectral trends are flattened.

（実施例〕以下、この発明の一実施例を図面を参照して説明する。(Example〕 An embodiment of the present invention will be described below with reference to the drawings.

第）１ン１は、この発明の一実協例を示すもので、第１
図においてｌが音声入力部としてのマイクロホンを示し
ている。Part 1) shows a practical example of this invention.
In the figure, l indicates a microphone as an audio input section.

マイクロホン１からのアナログ音声信号がフィルタ２に
供給される。フィルタ２は、例えば力。An analog audio signal from microphone 1 is supplied to filter 2 . Filter 2 is, for example, a force filter.

トオフ周波数７．５ＫＨｚのローパスフィルタであり、
音声信号がフィルタ２において、音声認識に必要とされ
る７、５　にＨｚ以下の帯域乙こ制限され、この音声信
号がアンプ３を介してＡ／Ｄ変換器４に供給される。It is a low-pass filter with a turn-off frequency of 7.5KHz,
The audio signal is filtered in a filter 2 to limit the band below 7.5 Hz, which is required for speech recognition, and this audio signal is supplied to an A/D converter 4 via an amplifier 3.

Ａ／Ｄ変換器４は、例えば、サンプリング周波数１２．
５Ｋ）ｌｚの８ビツトＡ／Ｄ変喚器であり、音声信号が
Ａ／Ｄ変換器４において、アナログ−ディジタル変換さ
れて、８ビツトのディジタル信号とされ、音響分析器５
に供給される。For example, the A/D converter 4 has a sampling frequency of 12.
5K)lz 8-bit A/D converter, the audio signal is analog-to-digital converted in the A/D converter 4 to become an 8-bit digital signal, and the audio signal is converted into an 8-bit digital signal.
supplied to

音響分析器５は、音声信号を周波数スペクトルＱこ変換
して、例えばＮチャンネルのスペクトルデータ列を発生
するものである。音響分析器５において、音声信号が演
算処理により周波数スペクトルに変換され、例えば、対
数軸上で一定間隔となるＮ　ｆｉｌ＋１の１３１波数を
代表値とするスペクトルデータ列が得られる。従って、
音声信号がＮチャンネルの離散的な周波数スペクトルの
大きさによって表現される。そして、単位時間（フレー
ム周！ＩＪｌ）毎乙こ＼千ふン不ルのスペクトルデータ
列が一つのフレームデータとして出力される。即ち、フ
レーム１川回毎に音声信号がＮ次元ヘクトルにより表現
さ礼るパラメータとじて切り出され、スペクトル傾′向
［Ｔ−硯化器６に供給される。The acoustic analyzer 5 converts the audio signal into a frequency spectrum (Q) to generate, for example, an N-channel spectral data string. In the acoustic analyzer 5, the audio signal is converted into a frequency spectrum through arithmetic processing, and, for example, a spectral data string whose representative value is 131 wave numbers of N fil+1, which are spaced at regular intervals on the logarithmic axis, is obtained. Therefore,
An audio signal is expressed by the magnitude of discrete frequency spectra of N channels. Then, every unit time (frame cycle! IJl), a thousand spectral data strings are output as one frame data. That is, for each frame, the audio signal is cut out as parameters expressed by N-dimensional vectors, and is supplied to the spectral trend converter 6.

例え：、と、音声区間の柊・端に対応するフレームを１
とした場合、第２図に示すように、各々がチャンネル１
〜ヂヤン不ルＮのデータにより構成されるフレームデー
タが１フレームからｌフレームまでスペクトル傾向正規
化器６に供給される。For example: , the frame corresponding to the end of the voice section is 1
In this case, as shown in Figure 2, each channel 1
Frame data constituted by data from 1 frame to 1 frame are supplied to the spectral tendency normalizer 6.

スペクトル傾向正規化器６において、順次供給されるフ
レームデータ毎にスペクトルデータの傾向［Ｆ規化処理
がなされる。即ち、フレームデータを構成する各チャン
ネルのスペクトルデータに関して（中間変動を補正する
傾向値Ｆ７が下記の弐により算出さ種、る。In the spectral tendency normalizer 6, the tendency [F normalization process] of the spectral data is performed for each sequentially supplied frame data. That is, regarding the spectral data of each channel constituting the frame data (the tendency value F7 for correcting intermediate fluctuations is calculated by the following 2).

つまり、チャンネル１から所定のチャンネルｎ（１≦ｎ
≦Ｎ）までのスペクトルデータの平均値が求められると
共に所定のチャンＩ、ルｎから最大チャンネルＮまでの
スペクトルデータの平均値が求められ、夫々の平均値の
史に平均が求められ、１頃向イ直Ｆ１とされる。各チャ
ンネルのスペクトルデータと対応する傾向値Ｆ１との闇
において減算がなされることにより、スペクトル傾向が
平坦化され、話者の個人差及び周囲ノイズ等に影響され
ることがないようにスペクトル傾向が正規化される。ｌ
フレームから１フレームまで全てのフレームに関して同
様に傾向正規化処理がなされ、傾向正規化されたフレー
ムデータが特徴データ抽出器７に供給される。In other words, from channel 1 to predetermined channel n (1≦n
≦N), the average value of the spectrum data from the predetermined channels I and N to the maximum channel N is determined, and the average value is determined for the history of each average value, and around 1 It is considered to be straight forward F1. By subtracting the spectral data of each channel and the corresponding trend value F1, the spectral trend is flattened, and the spectral trend is unaffected by individual differences between speakers and surrounding noise. Normalized. l
Trend normalization processing is similarly performed on all frames from frame to frame 1, and the trend-normalized frame data is supplied to the feature data extractor 7.

特徴データ抽出器７において、【Ａり合うフレームデー
タの距離が計算される。例えば、各チャンネルに関して
スペクトルデータの差の絶対値が夫々求められ、その総
和かフレーム間距離とされる。In the feature data extractor 7, the distance between the matching frame data is calculated. For example, the absolute value of the difference in spectrum data for each channel is determined, and the sum of the values is determined as the interframe distance.

史に、フレーム間距離の総和が求められ、音声信号の始
端フレームから終端フレームまでのＮ次元ヘクトルの軌
跡長が求められる。そして最も語数が多く長い音声の場
合に特徴を抽出するのに必要な所定の分割数でもって軌
跡長が等分割され、その分割点に対応したフレームデー
タのみが特徴データとして抽出され、話者の音声の発生
速度変動に影響されることがないように時間軸が正規化
されて出力される。In the history, the sum of the interframe distances is determined, and the trajectory length in N-dimensional hectares from the start frame to the end frame of the audio signal is determined. Then, in the case of the longest speech with the largest number of words, the trajectory length is divided equally by a predetermined number of divisions necessary to extract features, and only the frame data corresponding to the division points are extracted as feature data, and the speaker's The time axis is normalized and output so as not to be affected by variations in the rate of sound generation.

この特徴データが登録時においては、登録パターンメモ
リ８に供給され、登録特徴データブロックとして記憶さ
れる。認識時においては、入力音声信号が前述した処理
を経て、入力特徴データプロ、りとされ、パターンマツ
チング判定器９に供給され、入力特徴データブロックと
全ての登録特徴データブロックとの間においてパターン
マツチングが行われる。At the time of registration, this feature data is supplied to the registered pattern memory 8 and stored as a registered feature data block. At the time of recognition, the input audio signal undergoes the above-mentioned processing, is converted into an input feature data processor, and is supplied to the pattern matching determiner 9, where patterns are determined between the input feature data block and all registered feature data blocks. Matching is performed.

パターンマツチング判定器９において、入力特徴データ
ブロックを構成するフレームと比較の対象とされる登録
特徴データプロ、りを構成するフレームとの間において
、フレーム間距離が求められ、その総和がマツチング距
離とされる。そして全ての登録特徴データブロックに関
して求められたマツチング距離のうちで最小でかつ十分
に距離が近いものと判断される登録特徴データブロック
に対応する単語が認識結果として出力される。In the pattern matching determiner 9, the interframe distance is determined between the frame forming the input feature data block and the frame forming the registered feature data block to be compared, and the sum of the distances is determined as the matching distance. It is said that Then, a word corresponding to the registered feature data block that is determined to be the smallest and sufficiently close among the matching distances determined for all registered feature data blocks is output as a recognition result.

上述のこの発明の一実施例におけるスペクトル傾向正規
化器６の動作を第３図に示すフローチャートを参照して
説明する。The operation of the spectral tendency normalizer 6 in the embodiment of the present invention described above will be explained with reference to the flowchart shown in FIG.

音響分析器５から順次フレームデータがスペクトル傾向
正規化器６に供給され、各フレーム毎にステップ■〜■
の処理が行われる。Frame data is sequentially supplied from the acoustic analyzer 5 to the spectral tendency normalizer 6, and steps ■ to ■ are performed for each frame.
processing is performed.

まず、チャンネル番号を示す変数ｎが１に初期設定され
る（ステップ■）。ステップ■において、チャンネル１
に関する補正関数の計算処理がなされ、傾向変動を補正
する傾向値Ｆ、が２（Ｎ＋１−１）　　・ｌにより求めらね、る。First, a variable n indicating a channel number is initialized to 1 (step ■). In step ■, channel 1
A correction function is calculated, and a trend value F, which corrects the trend fluctuation, is obtained from 2(N+1-1)·l.

そして、正規化処理がステップ■においてなされ、チャ
ンネル１のスペクトルデータＳ１から傾向値Ｆ１が減算
され、この減算結果がチャンネルｌのスペクトルデータ
Ｓ、とされる。Then, normalization processing is performed in step (2), and the trend value F1 is subtracted from the channel 1 spectrum data S1, and the result of this subtraction is taken as the channel I spectrum data S.

ステップ■において、チャンネル番号を示す変数ｎ（−
１）と最大チャンネル数Ｎとの比較がなされ、ｎがイン
クリメントされて（ｎ＝２）　　（ステップ■）とされ
、チャンネル２に関する計算処理に移行する。In step ■, a variable n(-
1) is compared with the maximum number of channels N, and n is incremented (n=2) (step ■), and the calculation process regarding channel 2 is started.

傾向変動を補正する傾向値Ｆ２が２　１＋１−２）　　・２により求められ（ステップ■）、チャンネル２のスペク
トルデータＳ２から傾向値Ｆ２が減算され、この減算結
果がナヤンネル２のスペクトルデータＳ２とされる。（
スーシーソプ■）。The trend value F2 for correcting the trend fluctuation is determined by 2 1 + 1-2) 2 (step ■), the trend value F2 is subtracted from the spectrum data S2 of channel 2, and the result of this subtraction is taken as the spectrum data S2 of Nayannel 2. Ru. (
Susie Seop ■).

更にｎがインクリメントされながら、上述したステップ
■〜■の処理が繰り返し行われ、所定チャンネルに関し
ての傾向値Ｆ７が２　（Ｎ＋１−ｎ）　　・ｎにより求められ（ステップ■）、所定チャンネルのスペ
クトルデータＳ、、から（中間（直Ｆ、、が減算され、
このＭ算結果が所定チャンネルのスペクトルデータＳ１
とされる。チャンネル番号を示す変数ｎが最大チャンネ
ル数Ｎとされ、最大チャンネルＮに関して傾向正規化処
理がなされると、一つのフレームに関する計算が終了さ
れる。Further, while n is incremented, the processes of steps ■ to ■ described above are repeated, and the trend value F7 for the predetermined channel is determined by 2 (N+1-n) ・n (step ■), and the spectrum data S of the predetermined channel is obtained. , , (intermediate (direct F, , is subtracted from
This M calculation result is the spectrum data S1 of the predetermined channel.
It is said that When the variable n indicating the channel number is set to the maximum number of channels N, and the trend normalization process is performed on the maximum channel N, the calculation regarding one frame is completed.

例えば、第５図Ａに示すようなチャンネル１〜チヤンネ
ル１６の１６個のスペクトルデータにより構成され、各
チャンネルのスペクトルデータの大きさが（８，１２，
１６，１７，１２，１４，１８，１６，１２゜１０、　
６．１２．　９．　８．　６．　５　）とされるフレー
ムがあるとすると、チャンネル１〜チヤンネル１６まで
の傾向値Ｆ、、は第４図Ｂに示す曲線を描き、傾向正規
化されたスペクトルデータは傾向値Ｆ１を基イ桿として
平坦化され、第４１個Ｃに示すものとされる。このよう
に全てのフレームに関して傾向正規化処理がなされ、固
有の特徴的なスペクトルデータが保存されながらスペク
トル１頃向が平坦化される。For example, it is composed of 16 spectral data of channels 1 to 16 as shown in FIG.
16, 17, 12, 14, 18, 16, 12°10,
6.12. 9. 8. 6. 5), the trend values F, from channel 1 to channel 16 draw the curve shown in Figure 4B, and the trend normalized spectrum data is based on the trend value F1. It is flattened and is shown in the 41st piece C. In this way, the trend normalization process is performed on all frames, and the spectrum around 1 is flattened while preserving unique characteristic spectral data.

尚、この発明の一実施例においては、スペクトル傾向正
規化器６を特徴データ抽出器７の前段に設ける構成とし
て説明したが、スペクトル傾向正規化器６を特徴データ
抽出器７の後段に設ける構成としても良い。また、この
発明は、ハードワイヤードの構成に限らず、マイクロコ
ンピュータ又はマイクロプログラム方式を用いてソフト
ウェアにより処理を行うようにしても良い。In one embodiment of the present invention, the spectral tendency normalizer 6 has been described as being provided before the feature data extractor 7, but the spectral tendency normalizer 6 may be provided after the feature data extractor 7. It's good as well. Further, the present invention is not limited to a hard-wired configuration, and processing may be performed by software using a microcomputer or a microprogram method.

〔Effect of the invention〕

この発明では、スペクトルの傾向を正規化する手段とし
てスペクトル傾向正規化器が設けられ、スペクトル傾向
正規化器において、時系列フレームデータのフレーム毎
にチャンネルｌから所定のチャンネルｎ（１≦ｎ≦Ｎ）
までのスペクトルデータの平均値が求められると共に、
所定のチャンネルから最大チャンネルＮまでのスペクト
ルデータの平均値が求められ、夫々の平均値の更に平均
値が求められて所定のチャンフルｎに関する傾向値とさ
れ、各チャンネルのスペクトルデータと対応する傾向値
との間において夫々減算処理がなされることによりスペ
クトル傾向が平坦化される。In this invention, a spectral tendency normalizer is provided as a means for normalizing the spectral tendency, and in the spectral tendency normalizer, channels l to predetermined channels n (1≦n≦N )
The average value of the spectrum data up to
The average value of the spectrum data from a predetermined channel to the maximum channel N is calculated, and the average value of each average value is further calculated to obtain a trend value for the predetermined channel number N, and the trend value corresponding to the spectrum data of each channel is calculated. The spectral tendency is flattened by performing subtraction processing between the two.

従って、この発明に依れば、薄型でかつ高速に任意のス
ペクトル傾向を正確に正規化することができ、計算処理
時間が短縮されると共にＥ’２−　Ｒｈ率が向上される
。Therefore, according to the present invention, it is possible to accurately normalize any spectral tendency with a thin structure and at high speed, and the calculation processing time is shortened and the E'2-Rh rate is improved.

また、従来の音声認３ａｌｌ装置においては、話者を特
定し、かつ周囲ノイズ等の異音の混入のないように制限
されていたが、この発明に依ればその必要がなくなる。Further, in conventional voice recognition 3all devices, there are restrictions on identifying the speaker and preventing the mixing of foreign sounds such as ambient noise, but this invention eliminates the need for this.

[Brief explanation of drawings]

第１図はこの発明の一実施例の全体の構成のブロック図
、第２図はこの発明の一実施例にぢける時系列フレーム
データのデータ構成の説明に用いる路線図、第３図はこ
の発明の一実施例におけるスペクトル傾向正規化器の動
作説明に用いるフローチャート、第４図Ａ、第４図Ｂ及
び第４図Ｃはこの発明の一実施例におけるスペクトル傾
向正規化器の動作説明に用いる路線図、第５図は従来の
技術の説明に用いる路線図である。図面における主要な符号の説明１−マイクロホン、　　５：音響分析器。６：スペクトル傾向正規化器。７：特徴データ抽出器。８：登録パターンメモリ。９；パターンマツチング判定器。代理人　　　弁理士　杉　浦　正　知全１本０楕べ第１図 ■〜糸り）」フし−ム子′ピタ第２図動イ乍をと開用の７０−十や−ト第３図第４図Ａ口第４図Ｂ第４図ＣFig. 1 is a block diagram of the overall configuration of an embodiment of the present invention, Fig. 2 is a route map used to explain the data structure of time-series frame data in an embodiment of the invention, and Fig. 3 is a route diagram of this embodiment. Flowcharts of FIGS. 4A, 4B, and 4C used to explain the operation of the spectral tendency normalizer in an embodiment of the invention are used to explain the operation of the spectral tendency normalizer in an embodiment of the invention. Route Map FIG. 5 is a route map used to explain the conventional technology. Explanation of main symbols in the drawings 1 - Microphone, 5: Acoustic analyzer. 6: Spectral trend normalizer. 7: Feature data extractor. 8: Registered pattern memory. 9; Pattern matching judge. Agent Patent Attorney Tadashi Sugiura Chizen 1 0 oval Figure 1 ■ ~ Itori) 70-10-Yato Figure 3 Figure 4A Figure 4B Figure 4C

Claims

[Scope of Claims] A speech recognition device in which an input audio signal is converted into N-channel frequency spectra, and to which time-series data of the N-channel frequency spectra is input, wherein a predetermined method is provided for the spectrum data of each frame of the time-series data. A first average value of the spectral data of all channels lower than the channel is calculated, and a second average value of the spectral data of all channels higher than the predetermined channel is calculated, and the first average value of the spectral data of all channels higher than the predetermined channel is calculated. and the second average value is calculated as a trend value in the predetermined channel, and the trend value is offset from the spectrum data to obtain trend-normalized spectrum data to obtain the input audio signal. A speech recognition device characterized in that it recognizes speech.