JPS63259694A

JPS63259694A - Sound source normalization

Info

Publication number: JPS63259694A
Application number: JP62094761A
Authority: JP
Inventors: 正照赤羽; 幸田中; 雅男渡
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1987-04-17
Filing date: 1987-04-17
Publication date: 1988-10-26
Anticipated expiration: 2012-10-27
Also published as: JP2668877B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔産業上の利用分野〕この発明は、例えば、不特定話者を対象とする音声認識
装置の認識処理に用いられる音源正規化方法に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a sound source normalization method used, for example, in recognition processing of a speech recognition device targeted at unspecified speakers.

[Summary of the invention]

この発明は、例えば、不特定話者を対象とする音声認識
装置の！！識処理に用いられる音源正規化方法において
、周波数分析により得られる各チャンネルの出力値から
パワーの平均値を減算してスペクトルエンベロープを平
行移動させ、略々中央となるチャンネルにおいて出力値
が略々０となるような形として、スペクトルエンベロー
プの近似直線の傾きを加減算だけで算出し、この算出さ
れた近似直線の傾きのみで容易に、然も、効率的に音源
正規化処理を行えるようにしたものである。This invention is applicable to, for example, a speech recognition device for unspecified speakers! ! In the sound source normalization method used for identification processing, the average power value is subtracted from the output value of each channel obtained by frequency analysis, and the spectral envelope is shifted in parallel, so that the output value of the approximately central channel is approximately 0. The slope of the approximate straight line of the spectral envelope is calculated simply by addition and subtraction, and the sound source normalization process can be performed easily and efficiently using only the calculated slope of the approximate straight line. It is.

[Conventional technology]

従来の不特定話者を対象とする音声認識装置では、標準
パターンを登録した話者以外の不特定な話者の音声に関
しても十分に認識処理が行えるようになされている。Conventional speech recognition devices targeted at unspecified speakers are designed to be able to perform sufficient recognition processing on the voices of unspecified speakers other than those who have registered standard patterns.

これらの音声認識装置においては、認識率を向上させる
ために、何らかの方法で話者の個人差等による周波数ス
ペクトルの全体的な傾向及びバラツキを正規化すること
が必要とされている。一般的な正規化手法としては、例
えば、最小二乗法等で周波Ｒスペクトルのエンベロープ
を一次関数で推定して正規化する手法が知られている。In these speech recognition devices, in order to improve the recognition rate, it is necessary to normalize the overall tendency and dispersion of the frequency spectrum due to individual differences among speakers by some method. As a general normalization method, for example, a method is known in which the envelope of the frequency R spectrum is estimated by a linear function and normalized using the least squares method or the like.

例えば、入力音声を周波数分析して得られる時系列上の
１フレ一ム分のデータによる一例としてのスペクトルエ
ンベロープを第３図に示す。第３図は、横軸を周波数（
チャンネル）とし、縦軸をレベルとしたもので、第３図
において３２で示される破線がエンベロープを示し、破
％’ｉ３２上の各点が周波数分析部からの各出力値を示
している。For example, FIG. 3 shows an example of a spectral envelope based on data for one frame in time series obtained by frequency analysis of input audio. In Figure 3, the horizontal axis is the frequency (
In FIG. 3, the broken line 32 indicates the envelope, and each point on the broken line 32 indicates each output value from the frequency analysis section.

このようなスペクトルエンベロープに対して正規化処理
を行う場合には、先ず、破線３２上の各出力値に対して
誤差の二乗和が最小となる直線を推定する。つまり、周
波数分析部のチャンネル数をｎとし、各チャンネルから
の出力値をｘｉ｛ｉ＝１・・・・・・ｎ）とすると、第
３図において３１で示される最小二乗近似直線は、ｙ＝ａ　ｉ　＋ｂ、　　｛ｉ　＝　１−−−ｎ）　　　
−＝（１１となり、この時、出力値ｘ１に対応する直線
３１上の各点の誤差の二乗和ｆ　（ｘｌ）は、とな４゜
この誤差の二乗和ｆ　（ｘｌ）を最小とする傾きａ及び
切片すによって最小二乗近似直線３１が決定される。When performing normalization processing on such a spectral envelope, first, a straight line is estimated for which the sum of squares of errors is minimum for each output value on the broken line 32. In other words, if the number of channels in the frequency analysis section is n and the output value from each channel is xi{i=1...n), then the least squares approximation straight line indicated by 31 in FIG. 3 is y =a i +b, {i = 1---n)
−=(11, and at this time, the sum of squares f (xl) of the errors at each point on the straight line 31 corresponding to the output value x1 is 4° The slope that minimizes the sum of squares f (xl) of this error A least squares approximation straight line 31 is determined by a and the intercept S.

実際に傾きａ及び切片すを算出する場合には、の関係が
成り立つため、傾きａ及び切片すがｉ＝ｌ　　　　　　
　　ｉ＝１られる。得られた最小二乗近似直線３１に基づいて各出
力値ｘ８が正規化される。即ち、正規化さＸｉ　＝　Ｘ
ｌ　−（ａ　ｉ　＋　ｂ）　　　　　　＝（５）により
算出され、この減算処理により話者の個人差等によるス
ペクトルの全体的な傾向及びバラツキが平坦化される。When actually calculating the slope a and the intercept, the relationship holds true, so the slope a and the intercept are i=l
i=1. Each output value x8 is normalized based on the obtained least squares approximation straight line 31. That is, normalized Xi = X
It is calculated by l - (a i + b) = (5), and this subtraction process flattens the overall tendency and dispersion of the spectrum due to individual differences among speakers.

[Problem that the invention seeks to solve]

しかしながら、前述した従来の最小二乗法を用いた正規
化方法においては、最小二乗近似直線の傾きａと切片す
とを求めなければならず、更に、それらの計算処理の過
程において複数回の乗算を実行しなければならない。こ
のため、計算処理のソフトウェアのステップ数が増大し
、処理時間が長くなる問題点があった。However, in the conventional normalization method using the least squares method described above, it is necessary to find the slope a and the intercept of the least squares approximation straight line, and in addition, multiple multiplications are required in the process of these calculations. must be carried out. For this reason, there is a problem that the number of software steps for calculation processing increases and the processing time becomes longer.

従って、この発明の目的は、近似直線に基づいて容易に
然も効率的に周波数スペクトルの傾向を正規化すること
ができる音源正規化方法を提供することにある。Therefore, an object of the present invention is to provide a sound source normalization method that can easily and efficiently normalize the tendency of a frequency spectrum based on an approximate straight line.

[Means for solving problems]

この発明では、１チャンネルから中央までのチャンネル
までのパワーの和Ｐ１　と中央以上のチャンネルからｎ
チャンネルまでのパワーの和Ｐ２と全体のパワーの和（
Ｐｌ＋Ｐりからパワーの平均値ｍを求めるステップと、
各チャンネルの出力値からパワーの平均値ｍを減算して
Ｙムを求めるステップと、平均値のパワーが減算された
出力値Ｘｉのパワーの和と等しいパワーの和を有するよ
うな近似直線の傾きａを求めるステップと、ｎチャンネ
ル分の正規化値ＸｌをＸｉ　−ｘｉ−ａ　　｛ｉ　−（ｎ＋　１）／２）　・
−＝（９）により求めるステップとにより正規化処理が
なされる。In this invention, the sum of powers P1 from channel 1 to channels up to the center and n from channels above the center are calculated.
The sum of the power up to the channel P2 and the sum of the entire power (
a step of calculating the average value m of power from Pl+P;
A step of subtracting the average power value m from the output value of each channel to obtain Ym, and a slope of the approximate straight line such that the power of the average value has a sum of powers equal to the sum of the powers of the subtracted output value Xi. a and the normalized value Xl for n channels as Xi −xi−a {i −(n+ 1)/2)
The normalization process is performed by the step of obtaining −=(9).

[Effect]

周波数分析により得られる各チャンネルの出力値Ｘ（か
らパワーの平均値ｍが求められ、各チャンネルの出力値
Ｘ、からパワーの平均値ｍが減算されて初期的な正規化
値Ｙ４が算出される。この減算処理により、１個のフレ
ームのスペクトルエンベロープが平行移動されて略々中
央に位置するチャンネルにおいて出力値が略々０とされ
、初期的な正規化値Ｘｉにより描かれるエンベロープの
近似直線がｙ　＝　ａ　　（１−（ｎ　＋　１　）　／　２　）（
ｎ：チャンネル数、ｉ：チャンネル番号１≦ｉ≦ｎ）　
　　　　　　　　　　　・・・・・・（８）とされ、近
似直線の傾きａが１チャンネルから略々中央となるチャ
ンネルまでのパワーの和Ｐ＋　と略々中央となるチャン
ネルからｎチャンネルまでのパワーの相Ｐ２とを用いてにより加減算のみで算出される。得られた近似直線の傾
きａにより最終的な正規化処理なされ、正規化値Ｘｉがｘｌ　＝ｘｌ　−ａ　（１−（ｎ＋１）／２）＝ｘｌ　
−ｍ−ａ　｛ｉ　−（ｎ＋１）　／２）｛ｉ＝１．・・
・・・・ｎ）・・・・・・（９）により算出される。The average power value m is calculated from the output value X of each channel obtained by frequency analysis, and the average power value m is subtracted from the output value X of each channel to calculate the initial normalized value Y4. Through this subtraction process, the spectral envelope of one frame is translated in parallel so that the output value becomes approximately 0 in the channel located approximately at the center, and the approximate straight line of the envelope drawn by the initial normalized value Xi is y = a (1-(n + 1) / 2) (
n: number of channels, i: channel number 1≦i≦n)
......(8), and the slope a of the approximate straight line is the sum P+ of the power from channel 1 to the approximately central channel, and the phase P2 of the power from approximately the central channel to channel n. It is calculated using only addition and subtraction. The final normalization process is performed using the slope a of the obtained approximate straight line, and the normalized value Xi is xl = xl - a (1-(n+1)/2) = xl
−m−a {i −(n+1) /2) {i=1.・・・
... n) ... Calculated by (9).

〔Example〕

ａ、音声認識装置の構成とその処理の流れ以下、この発
明の一実施例を図面を参照して説明する。第２図は、音
声認識装置においてなされる処理の流れの一例を概念的
に示したもので、この発明は、第２図において１５で示
される音源正規化処理に係わるものである。a. Configuration of speech recognition device and its processing flow An embodiment of the present invention will be described below with reference to the drawings. FIG. 2 conceptually shows an example of the flow of processing carried out in a speech recognition device, and the present invention relates to the sound source normalization processing indicated by 15 in FIG.

第２図において、１１で示される部分が人力系を示して
いる。入力系１１において、音声認識処理に必要とされ
る前処理が行われる。例えば、入力音声がマイクロホン
を介してアンプに供給され、入力音声信号が増幅されて
ローパスフィルタに供給すれる。ローパスフィルタにお
いてｔｕｆｔ処理に必要とされる帯域に入力音声信号が
制限される。In FIG. 2, the part indicated by 11 represents the human-powered system. In the input system 11, preprocessing required for speech recognition processing is performed. For example, input audio is supplied to an amplifier via a microphone, and the input audio signal is amplified and supplied to a low-pass filter. The input audio signal is limited to the band required for tuft processing in the low-pass filter.

そして、入力音声が所定のサンプリング周波数でアナロ
グ−ディジタル変換される。入力系１１からのディジタ
ルの音声信号が分析系１２に供給される。Then, the input audio is converted from analog to digital at a predetermined sampling frequency. A digital audio signal from an input system 11 is supplied to an analysis system 12 .

分析系１２は、例えば、ｎ個のバンドパスフィルタから
成るディジタルバンドパスフィルタバンク等により構成
されており、分析系１２において、入力音声信号に対す
る周波数分析がなされる。例えば、ディジタルバンドパ
スフィルタバンクの各通過帯域の中心周波数は、対数軸
上で等間隔となるように割り振られており、このディジ
タルバンドパスフィルタバンクに入力音声信号を供給し
て得られるｎチャンネルの出力の夫々が２乗され、更に
、平均化されてパワースペクトルとされる。The analysis system 12 includes, for example, a digital band-pass filter bank made up of n band-pass filters, and performs frequency analysis on the input audio signal. For example, the center frequencies of each passband of a digital bandpass filter bank are distributed at equal intervals on the logarithmic axis, and the n-channel frequency obtained by supplying an input audio signal to this digital bandpass filter bank is Each of the outputs is squared and then averaged to form a power spectrum.

従って、音声信号が対数軸上で等間隔となるｎチャンネ
ルのパワースペクトルの大きさによって表現される。そ
して、単位時間（フレーム周期）毎にｎチャンネルのパ
ワースペクトルを示すデータ列が１個のフレームとして
出力される。即ち、フレーム周期毎に音声信号がｎ次元
ベクトルにより表現されるパラメータとして切り出され
、認識処理系１３に供給される。Therefore, the audio signal is expressed by the magnitude of the power spectrum of n channels that are equally spaced on the logarithmic axis. Then, a data string indicating the power spectrum of n channels is outputted as one frame every unit time (frame period). That is, the audio signal is cut out as a parameter expressed by an n-dimensional vector every frame period, and is supplied to the recognition processing system 13.

認識処理系１３は、例えば、特＠量抽出器、音声区間判
定器、音源正規化器、リジェクト判定器。The recognition processing system 13 includes, for example, a special@quantity extractor, a speech interval determiner, a sound source normalizer, and a reject determiner.

ＮＡＴ処理部、マツチング判定部等により構成されてい
る。尚、認識処理系１３においてなされる第２図におい
て破線で囲まれた部分に関しては、フレーム単位での処
理がなされる。It is composed of a NAT processing section, a matching judgment section, etc. Note that the portion surrounded by the broken line in FIG. 2, which is performed by the recognition processing system 13, is processed frame by frame.

先ず、分析系１２からの各フレームのｎチャンネルのパ
ワースペクトルに対してパラメータ変換処理１４がなさ
れ、例えば、パワースペクトルが対数変換されて対数パ
ワースペクトルとされる。First, a parameter conversion process 14 is performed on the n-channel power spectrum of each frame from the analysis system 12, and, for example, the power spectrum is logarithmically transformed into a logarithmic power spectrum.

そして、音源正規化処理１５．音声区間判定処理１６、
特徴量抽出処理１７の各処理が行われる。Then, sound source normalization processing 15. Voice section determination processing 16,
Each process of the feature amount extraction process 17 is performed.

音源正規化器において、各フレーム毎にスペクトルエン
ベロープに対する近似直線が推定され、この近似直線に
より音源正規化処理１５がなされる。この発明の音源正
規化方法においては、近似直線の傾きが加減算のみで算
出され、近似直線の切片成分を算出することなく、音源
正規化処理１５がなされる。In the sound source normalizer, an approximate straight line to the spectrum envelope is estimated for each frame, and the sound source normalization process 15 is performed using this approximate straight line. In the sound source normalization method of the present invention, the slope of the approximate straight line is calculated only by addition and subtraction, and the sound source normalization process 15 is performed without calculating the intercept component of the approximate straight line.

また、特徴量抽出器において、特徴量抽出処理１７がな
される。例えば、フレーム単位で入力音声信号のゼロク
ロス数がカウントされ、カウント値が求められると共に
、各フレームにおける入力音声信号のパワー、即ち、２
乗和が求められる。Further, a feature extraction process 17 is performed in the feature extractor. For example, the number of zero crossings of the input audio signal is counted in frame units, and the count value is determined, and the power of the input audio signal in each frame, that is, 2
The sum of the products is calculated.

それと共に、各フレームの音素性、即ち、スペクトルエ
ンベロープの形状の特徴等が検出される。At the same time, the phoneme characteristics of each frame, that is, the characteristics of the shape of the spectral envelope, etc. are detected.

これらの処理により得られた特徴量を示すデータが新た
にパラメータとして各フレームに付加される。Data indicating the feature amount obtained through these processes is newly added to each frame as a parameter.

更に、音声区間判定器においてゼロクロス数のカウント
値、各フレームのパワー及び音源正規化情報に基づいて
複合的な音声区間判定処理１６がなされ、例えば、無音
、無声音及び有声音の判定が行われて音声区間が決定さ
れる。Further, in the speech segment determination device, a complex speech segment determination process 16 is performed based on the count value of the number of zero crossings, the power of each frame, and the sound source normalization information, and for example, silence, voiceless sound, and voiced sound are determined. A voice interval is determined.

この時、リジェクト判定器において、周囲ノイズ等と入
力音声とを区別するために、リジェクト処理２１がなさ
れる。例えば、各フレームのパワーのレベルが所定のし
きい値と比較され、所定のしきい値より大とされる時に
は、音声が入力されたとして音声区間判定処理１６及び
特ｆ＆Ｉ抽出処理１７がなされ、所定のしきい値より小
とされる時には、周囲ノイズ等と判断されて棄却され、
無効入力とされる。At this time, a reject process 21 is performed in the reject determiner in order to distinguish between ambient noise and the input voice. For example, the power level of each frame is compared with a predetermined threshold, and when it is found to be greater than the predetermined threshold, it is assumed that speech has been input and the speech section determination processing 16 and special f&I extraction processing 17 are performed, When it is smaller than a predetermined threshold, it is judged as ambient noise etc. and is rejected.
It is considered an invalid input.

音声区間判定処理１６により決定された音声区間に対応
したフレームのみが有効とされて特徴パターンが形成さ
れ、この特徴パターンに対してＮＡ　Ｔ　（Ｎｏｒｍａ
ｌｉｚａｔｉｏｎ　　Ａｌｏｎｇ　　Ｔｒａｊｅｃｔｏ
ｒｙ）処理１８がなされる。即ち、特徴ベクトル（パラ
メータの個数に対応するものでＮ個のパラメータで表さ
れる場合にはＮ次元ベクトル）空間上における時系列軌
跡に沿って正規化処理がなされ、特徴パターンが時間軸
方向に圧縮（若しくは伸長）される０例えば、特徴パタ
ーンを構成する隣り合うフレーム間のフレーム間距離が
計算され、更に、フレーム間距離の総和が求められて特
徴パターンの始端フレームから終端フレームまでの軌跡
長が求められる。そして、特徴パターンの持つ特徴を抽
出するのに必要とされる所定の分割数でもって軌跡長が
等分割され、分割点に対応して近接存在するフレームの
みが抽出されて話者の音声の発生速度変動に影響される
ことがないように時間軸が正規化される。Only the frames corresponding to the voice interval determined by the voice interval determination process 16 are considered valid to form a characteristic pattern, and NA T (Norma) is applied to this characteristic pattern.
lization Along Trajecto
ry) Process 18 is performed. That is, normalization processing is performed along a time-series trajectory in the feature vector (corresponding to the number of parameters, and is represented by N parameters, an N-dimensional vector) space, and the feature pattern is transformed in the time axis direction. For example, the inter-frame distance between adjacent frames that make up the feature pattern is calculated, and the sum of the inter-frame distances is then calculated to calculate the trajectory length from the starting frame to the ending frame of the feature pattern. is required. Then, the trajectory length is divided into equal parts by a predetermined number of divisions required to extract the features of the feature pattern, and only frames that are close to each division point are extracted to generate the speaker's voice. The time axis is normalized so that it is not affected by speed fluctuations.

ＮＡＴ処理１８がなされた特徴パターンに対して２ビツ
ト化処理１９がなされ、例えば、特徴パターンを構成す
る各フレームの各データが２ビツトとされてデータ量が
圧縮される。The feature pattern that has been subjected to the NAT processing 18 is subjected to 2-bit conversion processing 19, and, for example, each data in each frame constituting the feature pattern is converted to 2 bits, thereby compressing the amount of data.

予め登録されている標準パターンと入力された音声の特
徴パターンとの間において、マツチング処理２０がなさ
れ、例えば、比較の対象として選択される全ての標準パ
ターンとの間においてパターンマツチングがなされる。Matching processing 20 is performed between the standard pattern registered in advance and the input voice characteristic pattern, for example, pattern matching is performed between all the standard patterns selected as comparison targets.

例えば、特徴パターンを構成するフレームと標準パター
ンを構成するフレームとの間において、フレーム間距離
が求められ、その総和がマツチング距離とされる。For example, inter-frame distances are determined between frames that make up the characteristic pattern and frames that make up the standard pattern, and the sum of the distances is taken as the matching distance.

この時、リジェクト判定器においてリジェクト処理２１
がなされる０例えば、各標準パターンと９間において算
出されたマツチング距離が所定のしきい値と比較され、
所定のしきい値より大とされるものに関しては、該当し
ないとして棄却される。そして、判定処理２２がなされ
、所定のしきい値より小とされたマツチング距離のうち
で最小となるものが判断され、マツチング距離が最小と
なる標準パターンに対応する単語が認識結果とされる。At this time, reject processing 21 is performed in the reject determination device.
For example, the matching distance calculated between each standard pattern and 9 is compared with a predetermined threshold,
Those that are greater than a predetermined threshold are rejected as not applicable. Then, a determination process 22 is performed to determine the minimum matching distance that is smaller than a predetermined threshold, and the word corresponding to the standard pattern with the minimum matching distance is determined as a recognition result.

ｂ、音源正規化処理の説明前述した音源正規化処理について第１図Ａ−Ｃを参照し
て説明する。尚、第１図Ａ−Ｃの夫々は、横軸が周波数
（チャンネル）を示し、縦軸がパワースペクトルのレベ
ルを示している。b. Description of sound source normalization processing The above-mentioned sound source normalization processing will be explained with reference to FIGS. 1A to 1C. In each of FIGS. 1A to 1C, the horizontal axis represents the frequency (channel), and the vertical axis represents the level of the power spectrum.

例えば、入力音声を周波数分析して得られる時系列上の
１フレ一ム分のデータによる一例としてのスペクトルエ
ンベロープを第１図Ａに示す、第１図Ａにおいて１で示
される実線がスペクトルエンベロープを示し、実線上の
各点が周波数分析して得られる各チャンネル（例えばチ
ャンネル数をｎとするとｎ＝１６）の出力値を示してい
る。また、第１図Ａにおいてｌａで示される実線がスペ
クトルエンベロープ１に対する近似直線である。For example, FIG. 1A shows an example of a spectral envelope based on one frame of time-series data obtained by frequency analysis of input audio. In FIG. 1A, the solid line indicated by 1 indicates the spectral envelope. Each point on the solid line indicates the output value of each channel (for example, where n is the number of channels, n=16) obtained by frequency analysis. In addition, the solid line indicated by la in FIG. 1A is an approximate straight line for the spectrum envelope 1.

第１図Ａにおけるスペクトルエンベロープ１上の各点の
パワーの平均値ｍは、各チャンネルの出力値をＸ、とす
ると、により算出される。この平均値ｍを用いて下記（７）弐
に示す処理がなされる。つまり、各チャンネルの出力値
Ｘｉからパワーの平均値ｍが減算されて初期的な正規化
値Ｙよが算出される。The average value m of the power at each point on the spectrum envelope 1 in FIG. 1A is calculated as follows, where X is the output value of each channel. Using this average value m, the process shown in (7) 2 below is performed. That is, the average power value m is subtracted from the output value Xi of each channel to calculate the initial normalized value Y.

Ｘｉ　＝ｘ、−ｍ　　　｛ｉ−１＋・・・・・・ｎ）・
・・・・・（７）上記（７）式に示す処理により、スペ
クトルエンベロープ１及び近似直線１ａが第１図Ａに示
すように平行移動され、実線２及び２ａに示すものとさ
れる。この時の近似直線２ａは、略々中央となる位置の
チャンネルにおいてＸ軸と交差するため、ｙ＝ａ　　｛
ｉ　−（ｎ＋１）／２）（ｎ：チャンネル数、ｉ：チャンネル番号１≦ｉ：５ｎ
）　　　　　　　　　　　　　・・・・・・（８）と仮
定することができる。このため、最終的な正規化値Ｘｉ
がＸｔ　−Ｘｔ　　ａ　　｛ｉ−（ｎ＋１）／２）Ｘｉ　
−ｍ−ａ　　｛ｉ　　　（ｎ＋１）　／２）｛ｉ＝１．
・・・・・・ｎ）・・・・・・（９）によって算出する
ことができる。Xi =x, -m {i-1+...n)・
(7) By the process shown in the above equation (7), the spectral envelope 1 and the approximate straight line 1a are translated in parallel as shown in FIG. 1A, and are made into the solid lines 2 and 2a. At this time, the approximate straight line 2a intersects the X axis at the approximately central position of the channel, so y=a {
i − (n+1)/2) (n: number of channels, i: channel number 1≦i: 5n
) ...(8) can be assumed. Therefore, the final normalized value Xi
is Xt −Xt a {i−(n+1)/2)Xi
−m−a {i (n+1) /2) {i=1.
. . . n) . . . It can be calculated by (9).

例えば、上記（９）式によって正規化された各チャンネ
ルの出力（！　Ｘ　ｔが第１図Ｂにおいて３の破線で示
すようなエンベロープを描く、ものとする。このエンベ
ロープ３により形成される斜線の領域４の面積を８１と
すると、面積Ｓ１は、（尚、［］ニガウス記号）により算出される。また、前述の（８）式を満足する第
１図Ｃにおいて５の実線で示す近似直線により形成され
る斜線の領域６の面積を８２とすると、面積Ｓ２は、・・・・・・Ｏυ により算出される。これらの面積ｓ１と８２とが等しい
と仮定することができ、近似直線５の傾きａが・・・・・・（転）により算出される。For example, assume that the output (! Assuming that the area of region 4 is 81, the area S1 is calculated by (in addition, the [ ] Nygauss symbol).Also, by the approximate straight line indicated by the solid line 5 in Figure 1C, which satisfies the above-mentioned equation (8), Assuming that the area of the hatched region 6 formed is 82, the area S2 is calculated by Oυ.It can be assumed that these areas s1 and 82 are equal, and the approximate straight line 5 The slope a is calculated as follows.

上記（ロ）式の右辺の分母は、チャンネル数ｎが固定で
あるため、定数となる。従って、近似直線の傾きａは、
１チャンネルがら略々中央に位置するチャンネルまでの
前半部における出力値の和ＰＬと、略々中央に位置する
チャンネルからｎチャンネルまでの後半部における出力
値の和Ｐ２との差の定数倍で算出される。The denominator on the right side of the above equation (b) is a constant because the number of channels n is fixed. Therefore, the slope a of the approximate straight line is
Calculated by multiplying the difference between the sum PL of the output values in the first half from channel 1 to the channel located approximately in the center and the sum P2 of the output values in the second half from the channel located approximately in the center to channel n. be done.

即ち、各チャンネルの出力に対して乗算を行うことなく
、加減算のみで近似直線の傾きａを求めることができ、
下記（９）′式により最終的な正規化値Ｘｌが算出され
る。In other words, the slope a of the approximate straight line can be found by only adding and subtracting without performing multiplication on the output of each channel.
The final normalized value Xl is calculated by the following equation (9)'.

ＸＩ　−ＸＩ　−ａ　　｛ｉ　−（ｎ＋１）／２）ｘ　
｛ｉ　−（ｎ＋１）／２）　　　・旧・・（９）実際の
計算においては、分析系１２のチャンネル数が偶数（ｎ
＝２ｍ）とされているが奇数（ｎ−２ｍ＋１）とされて
いるかによって計算処理が若干具なるもので、夫々の場
合について以下に示す。XI −XI −a {i −(n+1)/2)x
{i − (n+1)/2) ・Old... (9) In actual calculations, the number of channels in the analysis system 12 is an even number (n
= 2m), but the calculation process is slightly different depending on whether it is an odd number (n-2m+1), and each case will be described below.

ｉ）チャンネル数ｎが偶数とされ、（ｎ＝２ｍ。i) The number of channels n is an even number (n=2m.

ｍ　”　１　＊　２　＋・・・・・・）の場合（前記（
９）′式の右辺の第２項の分母）がｉ＝剛＋ｌ　　　　
　　　　　ｉ＝１＝ｍ”　　　　　　　　　　　　　　　　とされる。In the case of m ” 1 * 2 +...) (as described above (
9) The denominator of the second term on the right side of the equation is i=rigid+l
i=1=m".

ｘ　｛ｉ　−（２ｍ＋１）　／２）、’−２ｎ＋”　ｘｔ　＝ｈ”　Ｘｉ　　　２ｋ　ｉ　
＋　（２１１＋１）　　ｋ、’、　　Ｓ＋　　Ｘｚ　＝
Ｓ＋　　Ｘ＝　　　Ｓｔ　　ｌ　＋Ｓ３　　となる。x {i −(2m+1) /2) ,'−2n+” xt =h” Xi 2k i
+ (211+1) k,', S+ Xz =
S+X=St l +S3.

尚、Ｓ＋　＝　　２ｎ＋”　。In addition, S+ = 2n+”.

ｓ、　＝　（２ｏ＋　Ｉ）　ｋ　　　　　　　である。s, = (2o+I)k.

ｉｉ）チャンネル数ｎが奇数とされ、（ｎ＝２ｍ＋１、
　ｍ＝１．２．・旧・・）の場合（前記Ｃ９）式の右辺の第２項の分母）が−ｍ（ｍ＋１
）　　　　　　　　　　　　とされる。ii) The number of channels n is an odd number, (n=2m+1,
m=1.2.・Old...), the denominator of the second term on the right side of equation C9 above) is -m(m+1
).

ｘ　｛ｉ　−（ｍ＋１）　）、゛、　　　ＩＩ（１ｍ＋１）ＸＩ　　＝ｍ（ｍ＋１）
ｘａ　　−ｋ　　”　　ｉ　　＋ｋ　　”　　（ｍ＋１
）、’、Ｓ＋′Ｘ１−５＋′Ｘｌ５ｚ’ｉ＋ｓｓ′とさ
れる。x {i − (m+1) ) , ゛, II (1m+1)XI = m(m+1)
xa −k ” i +k ” (m+1
), ', S+'X1-5+'Xl5z'i+ss'.

尚、Ｓ　１　　’　＝　ｒｎ　（ｍ　＋　ｌ　）３３　
　”＝　（ｍ＋１）ｋ’　　　　　である。In addition, S 1 ′ = rn (m + l)33
”=(m+1)k'.

尚、チャンネル数ｎが偶数（ｎ　＝　２　ｍ）の場合及
び奇数（ｎ＝２ｍ＋１）の場合の両者共に、Ｓｌ　（又
はＳ、　　’）　ｘ”；えの形とされて正規化（１［１
が定数倍されるが、認識処理においては、相対比較であ
るため、何ら認識率に影響を与えることがなく、ｓ、　
　（又はＳ、　　′）Ｘｘｚ　は、Ｔ１の定数倍と、Ｓ
３　　（又はＳ、′〉との和から逐次ｓ２（又は８２　
′）だけ減算した形で算出される。Note that in both cases where the number of channels n is an even number (n = 2 m) and an odd number (n = 2 m + 1), it is taken as the form Sl (or S, ')
is multiplied by a constant, but in the recognition process, since it is a relative comparison, it does not affect the recognition rate in any way, and s,
(or S, ′)Xxz is a constant times T1 and S
3 (or S, ′〉) to successively s2 (or 82
′) is calculated by subtracting the amount.

また、チャンネル数が奇数（ｎ　＝　２　ｍ　＋　１　
）とされる時には、１チャンネルがら（ｍ＋１）チャン
ネルまでを前半部として出力の和Ｐ、を算出すると共に
、（ｍ＋２）チャンネルからｎチャンネルまでを後半部
として出力の和Ｐｔを算出して正規化する場合について
説明したが、チャンネル数が奇数の時には、中央に位置
する（ｍ＋１）チャンネルの出力値を両者の計算に用い
るようにして出力の和Ｐ＋、Ｐｇを求めても良く、また
、中央に位置する（ｍ＋１）チャンネルの出力値を無視
した形で出力の和Ｐｒ、Ｐｇを求めるようにしても良い
。Also, if the number of channels is odd (n = 2 m + 1
), the sum P of outputs is calculated with channels 1 to channel (m+1) as the first half, and the sum Pt of outputs is calculated as the second half from channel (m+2) to channel n for normalization. However, when the number of channels is odd, the sum of the outputs P+ and Pg may be obtained by using the output value of the (m+1) channel located in the center for both calculations. The output sums Pr and Pg may be determined while ignoring the output values of the (m+1) located channels.

〔Effect of the invention〕

この発明では、周波数分析により得られる各チャンネル
の出力値Ｘｉからパワーの平均値ｍが求められ、各チャ
ンネルの出力値Ｘ！からパワーの平均値ｍが減算されて
初期的な正規化処理がなされる。この初期的な正規化に
より、１個のフレームのスペクトルエンベロープが平行
移動されて略々中央に位置するチャンネルにおいて出力
値が略々Ｏとされ、近似直線の傾きａが１チャンネルか
ら略々中央となるチャンネルまでのパワーの和Ｐ１と略
々中央となるチャンネルからｎチャンネルまでのパワー
の和Ｐ２とを用いた加減算のみの式により算出される。In this invention, the average power value m is determined from the output value Xi of each channel obtained by frequency analysis, and the output value X! of each channel is calculated. An initial normalization process is performed by subtracting the average value m of power from . With this initial normalization, the spectral envelope of one frame is translated in parallel so that the output value in the channel located approximately at the center becomes approximately O, and the slope a of the approximate straight line changes from channel 1 to approximately the center. It is calculated by an equation consisting only of addition and subtraction using the sum P1 of the powers up to channels equal to n and the sum P2 of powers from the approximately central channel to n channels.

得られた近偵直線線の傾きａにより最終的な正規化処理
がなされる。The final normalization process is performed using the slope a of the obtained close-up straight line.

従って、この発明に依れば、従来の最小二乗法を用いた
正規化処理に必要であった切片すを算出することなく、
傾きａのみにより容易に然も効率的に周波数スペクトル
の傾向を正規化することができる。Therefore, according to the present invention, there is no need to calculate the intercept, which was necessary for the normalization process using the conventional least squares method.
The tendency of the frequency spectrum can be easily and efficiently normalized using only the slope a.

また、この発明に依れば、正規化処理に用いられる近似
直線の傾きａを加減算のみにより算出することができ、
更に、効率的に周波数スペクトルの傾向を正規化するこ
とができる。Further, according to the present invention, the slope a of the approximate straight line used in the normalization process can be calculated only by addition and subtraction,
Furthermore, it is possible to efficiently normalize the tendency of the frequency spectrum.

尚、１個のフレームに対する従来の最小二乗法を用いた
正規化処理と、この発明における正規化処理との演算量
の比較を参考のために記す。For reference, a comparison of the amount of computation between the conventional normalization process using the least squares method and the normalization process according to the present invention for one frame will be described.

最小二乗法を用いて正規化処理を行う場合には、前記（
４）式に示すように、ＩＸｘｌ、　　　　｛ｉ＝＝１１・−・・−ｎ）　　・
−・−・−（Ｊ４Ｊなる乗算をｎ回実行し、更に、 ΣｉｘΣＸ！　　　　　　　　　　　・・・川αつなる
乗算を１口実行して傾きａ及び切片すを決定する。そし
て、正規化値を算出する段階でａｘ＋　　　　　　　　
　　　　　　・・・・・・αｅなる乗算をｎ回実行する
ことが必要とされる。When performing normalization processing using the least squares method, the above (
4) As shown in the formula, IXxl, {i==11・−・・−n)・
−・−・−(J4J multiplication is executed n times, and ΣixΣX! ... River α is executed once to determine the slope a and the intercept S. Then, the normalized value is calculated. ax+ at stage
. . . It is necessary to perform the multiplication αe n times.

一方、この発明の正規化処理に依れば、加減算のみによ
り算出される傾きａのみで正規化が行われるため、上記
α優式及び０５１式に相当する乗算が不必要とされ、上
記αω式に相当するａｘ　｛ｉ　−（ｎ＋１）／２）なる乗算を１回のみ実行することで、正規化値を得るこ
とができ、極めて効率的に処理される。On the other hand, according to the normalization process of the present invention, since normalization is performed only with the slope a calculated only by addition and subtraction, multiplication corresponding to the above α-y formula and 051 formula is unnecessary, and the above αω formula By performing the multiplication ax {i − (n+1)/2) corresponding to only once, the normalized value can be obtained and is processed very efficiently.

[Brief explanation of drawings]

第１図Ａ−Ｃはこの発明の一実施例の説明に用いる路線
図、第２図は音声認識装置の説明に用いる一例としての
概念図、第３図は従来の音源正規化方法の説明に用いる
路線図である。代理人　　　弁理士　杉　浦　正　知第１図1A to 1C are route maps used to explain an embodiment of the present invention, FIG. 2 is a conceptual diagram as an example used to explain a speech recognition device, and FIG. 3 is a diagram used to explain a conventional sound source normalization method. This is the route map to be used. Agent Patent Attorney Masato Sugiura Figure 1

Claims

[Claims] The sum P_1 of the power from channel 1 to the center channel, the sum P_2 of the power from channels above the center to n channels, and the sum of the total power (P_1+P_
2), a step of calculating the average power value m from the output value of each channel, a step of subtracting the above average power value m from the output value of each channel to obtain @x@_i, and an output value @ from which the power of the above average value is subtracted. A step of calculating the slope a of an approximate straight line having a sum of power equal to the sum of powers of x@_i, and a normalized value for n channels ■_i according to the following equation (9).
A sound source normalization method characterized by comprising a step of determining . ■_i=@x@_i-a{i-(n+1)/2}...
...(9)