JPS5995600A

JPS5995600A - Voice recognition equipment

Info

Publication number: JPS5995600A
Application number: JP57207178A
Authority: JP
Inventors: 徹上田; 厚夫田中
Original assignee: Computer Basic Technology Research Association Corp
Current assignee: Computer Basic Technology Research Association Corp
Priority date: 1982-11-25
Filing date: 1982-11-25
Publication date: 1984-06-01

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】く技術分野〉本発明は音声認識装置の改良に関し、更に詳細には特徴
パラメータとしてフィルタバンクの出力のようにパワー
情報を含むパラメータを用いる場合のパワー正規化に加
良を加えたものである。[Detailed Description of the Invention] [Technical Field] The present invention relates to an improvement in a speech recognition device, and more particularly, to an improvement in power normalization when a parameter including power information, such as the output of a filter bank, is used as a feature parameter. is added.

〈従来技術〉音声認識装置において、特徴パラメータとしてフィルタ
バンク出力のようなパワー情報を含んだ信号を用いる場
合、パワー正規化の処理を行なって、ゲインの影響を取
り除くことが普通行なわれている。<Prior Art> In a speech recognition device, when a signal containing power information such as a filter bank output is used as a feature parameter, power normalization processing is usually performed to remove the influence of gain.

従来のこのようなパワー正規化の処理方法には相加平均
を用いる方法と相乗平均を用いる方法かあり、それぞれ
次のような式で表わすことが出来る。Conventional power normalization processing methods include a method using an arithmetic mean and a method using a geometric mean, and each can be expressed by the following equations.

し但し、ｋ：特徴パラメータの次数Ｐｉ：正規化前のｉ次のパラメータＷｉ：正規化後のｉ次のパラメータこのような式にしたがってパワー正規化を実行しようと
した場合、（１）式の方法では右辺ｋ　　　　　　　　
　　　　　　　　　　　　ｉと展開して割り算が不要と
なり、また対数値ノｏｇＰ　ｉ　はリードオンリメモリ
（ＲＯＭ）を用いて高速に算出することが可能であるが
、が広くなるためＲＯＭで対処することは容易ではない
。However, k: the order of the feature parameter Pi: the i-th parameter before normalization Wi: the i-th parameter after normalization When attempting to perform power normalization according to such a formula, the equation (1) In the method, the right side k
It is possible to expand it to i, eliminating the need for division, and to calculate the logarithmic value nogP i at high speed using a read-only memory (ROM), but it is not easy to handle it with a ROM because it becomes large. .

また上記（２）式の方法では対数値ノｏｇＰｉ　　をＲ
ＯＭを用いて容易に求めることが出来るが、割り算を一
回含むために、演算速度が遅くなる問題があった。In addition, in the method of equation (2) above, the logarithm value ogPi is R
Although it can be easily determined using OM, there is a problem that the calculation speed is slow because it includes one division.

更に正規化を行なわなければ、ゲインの影響が大き過ぎ
て実用的な装置が得られない問題があった。If normalization is not performed further, there is a problem in that the influence of the gain is too large to provide a practical device.

く目　的〉本発明は上記従来の問題点を除去した音声認識装置を提
供することを目的として成されたものであり、本発明に
よれば上記（２）式によるパワー正規化を割り算を用い
て行なうのではなく、シフトレジスタを用いて近似的に
行ない、パワー情報を残してお（事により認識率の向上
を図り、また計算速度を高速に行なうことができるよう
にした音声認識装置が提供される。Purpose of the present invention The present invention has been made with the purpose of providing a speech recognition device that eliminates the above conventional problems.According to the present invention, the power normalization according to the above formula (2) is performed using division. Instead, it is performed approximately using a shift register, and the power information is retained (thereby improving the recognition rate and increasing the calculation speed.) be done.

〈実施例〉以下、本発明の一実施例を詳細に説明する。<Example> Hereinafter, one embodiment of the present invention will be described in detail.

本発明の実施例によれば、特徴パラメータを抽出する抽
出部と、この抽出部からの対数出力の各チャンネルの総
和を計算する演算手段と、この演算手段により算出され
た総和の導入されるシフトレジスタと、このシフトレジ
スタの出力を上記の各チャンネルの対数パワーから引算
する演算手段とを備え、特徴パラメータ抽出部からの対
数出力の各チャンネルの総和を計算し、この算出された
総和をシフトレジスタにより、２’（＋１＝１．２゜・
・・）倍することにより近似的な平均の対数パワーを求
め、この平均対数パワーを各チャンネルの対数パワーか
ら引′算することにより近似的にパワー正規化を行なう
ように構成されており、このような構成により認識率の
向上を図り、また計算速度を高速に行なうことが可能と
なる。According to an embodiment of the present invention, there is provided an extraction section for extracting feature parameters, a calculation means for calculating the sum of each channel of the logarithmic output from the extraction section, and a shift for introducing the summation calculated by the calculation means. It is equipped with a register and an arithmetic means for subtracting the output of this shift register from the logarithmic power of each channel described above, calculates the sum of each channel of the logarithm output from the feature parameter extraction section, and shifts this calculated sum. By register, 2'(+1=1.2°・
), the approximate average logarithmic power is obtained, and this average logarithmic power is subtracted from the logarithmic power of each channel to approximately perform power normalization. With such a configuration, it is possible to improve the recognition rate and increase the calculation speed.

次に図面を用いて本発明の一実施例を詳細に説明する。Next, one embodiment of the present invention will be described in detail using the drawings.

第１図は本発明を実施した音声認識装置の一例を示すブ
ロック図である。FIG. 1 is a block diagram showing an example of a speech recognition device embodying the present invention.

第１図において発声された入力音声はマイクロホン等の
検出器／により電気信号に変換される。The input voice uttered in FIG. 1 is converted into an electrical signal by a detector such as a microphone.

この検出器／は、人の可聴周波数である。２θ数Ｈｚ〜
７．５ＫＨｚ程度の周波数を検出することが出来、特に
人の会話音声領域の周波数を歪なく検出し得るものが好
ましい。この検出器／の出力側には増幅器−が接続され
る。この増幅器コは前記の音声周波数が歪なく増幅し得
るものが好ましい。この増幅器２の出力側に特徴パラメ
ータ抽出部３が接続され、この特徴パラメータ抽出部３
により入力音声の特徴パラメータが抽出される。This detector/ is a human audible frequency. 2θ several Hz~
It is preferable to use a device that can detect a frequency of about 7.5 KHz, and in particular can detect frequencies in the human conversation voice range without distortion. An amplifier is connected to the output side of this detector. This amplifier is preferably one that can amplify the audio frequency without distortion. A feature parameter extraction section 3 is connected to the output side of this amplifier 2.
The characteristic parameters of the input voice are extracted.

特徴パラメータ抽出部３は互いに通過帯域を異ならせた
複数個の帯域フィルタと各帯域フィルタの出入力をホー
ルドするサンプルホールド回路と、このサンプルホール
ド回路の出力を順次１０ミリ秒程度の間隔でサンプリン
グするアナログスイッチと、このアナログスイッチの出
力を例えば７．２ビツトのデジタル信号に変換し、特徴
パラメータＰｉを出力する％変換器及び特徴パラメータ
Ｐｉを対数変換する対数化ＲＯＭＪ／とから構成されて
いる。The feature parameter extraction unit 3 includes a plurality of band filters having different passbands, a sample hold circuit that holds the input and output of each band filter, and sequentially samples the output of the sample hold circuit at intervals of about 10 milliseconds. It consists of an analog switch, a % converter that converts the output of the analog switch into a 7.2-bit digital signal, and outputs a characteristic parameter Pi, and a logarithmization ROMJ/ that logarithmically converts the characteristic parameter Pi.

上記特徴パラメータ抽出部３内の７％変換器より出力さ
れた特徴パラメータＰｉは対数化ＲＯＭ３／により対数
変換され、この対数変換された値が次段のパワー正規化
部グ及びパターン圧縮部５を通って正規化され、この正
規化された特徴パラメータと標準パターンメモリ乙に記
憶された標準特徴パラメータとがマツチング部Ｚにより
比較されて入力音声が認識され、その結果が出力部２に
出力されるように構成されている。The feature parameter Pi output from the 7% converter in the feature parameter extraction section 3 is logarithmically transformed by the logarithmization ROM 3/, and this logarithmically transformed value is used in the power normalization section and pattern compression section 5 in the next stage. The normalized feature parameters and the standard feature parameters stored in the standard pattern memory B are compared by the matching section Z to recognize the input speech, and the result is output to the output section 2. It is configured as follows.

上記パワー正規化部グは第一図に示すように特徴パラメ
ータ抽出部３内のノに変換出力Ｐｉの対数変換量カッｏ
ｇＰｉの各チャンネルの総和を計算±１加算器グ／とこ
の加算器４１／の出力（総和）ｍ　７０ｇＰｉの入力さ
れるシフトレジスタ・　１ｉｉｔ６２とこのシフトレジスタグーの出力を各チャン
ネルの対数パワーノｏｇＰｉから引算する減算器Ｚ３と
から構成されている。As shown in Fig. 1, the power normalization section calculates the logarithmic transformation amount of the conversion output Pi in the feature parameter extraction section 3.
Calculate the sum of each channel of gPi ±1 The output (sum) of the adder g/ and this adder 41/ m The input shift register of 70 gPi 1 The output of iit62 and this shift register g is the logarithmic power log of each channel ogPi and a subtracter Z3 that subtracts from.

上記の如き構成において、特徴パラメータ抽出部３内の
夕変換器より出力されたパラメータＰｉが対数化ＲＯＭ
３／によりそれぞれ対数化され、その出力の全チャンネ
ルの総和が加算器グーにより算出され、その総和がシフ
トレジスタグーに入力され、該シフトレジスタ４ｔ、２
により総和を下位にｎビットシフトして各チャンネルの
出力ｆｆｌｏｇＰｉから減算器グ３により引き算が行な
われ、パワー正規化した特徴パラメータＷｉが作成され
る。In the above configuration, the parameter Pi output from the converter in the feature parameter extraction unit 3 is stored in the logarithmization ROM.
3/, the sum of all channels of the output is calculated by the adder, and the sum is input to the shift register 4t, 2.
Then, the sum is shifted to the lower n bits and subtracted from the output fflogPi of each channel by the subtracter G3, thereby creating a power-normalized feature parameter Wi.

即ち、本発明によればパワー正規化部りにおいて次式（
３）に従った正規化が行なわれる。That is, according to the present invention, the following equation (
Normalization according to 3) is performed.

但し１．、−、ｎ　（ｎ＝１　、２、−）ここで、イ、
は−の階乗の演算であるため上記シフトレジスターｌｔ
コのシフト動作により、高速かつ簡単に演算を行うこと
ができる。However, 1. , -, n (n=1, 2, -) where, i,
Since is an operation of the factorial of -, the above shift register lt
This shift operation allows calculations to be performed quickly and easily.

また上記（３）式での／（／　の項を −（−±□　）ｋｌ′に′２に′３但し、ｋ’ｉ　＝、２　”　（ｒｒ＝１　、２、−）と
変換して、精度を上げる事も可能である。Also, convert the /(/ term in equation (3) above to -(-±□) kl' to '2 to'3. However, k'i =, 2'' (rr=1, 2, -) , it is also possible to increase the accuracy.

例えばに’１＝、２　　、に２＝、２　　　、に’３＝
、２’とした場合加算器り／の出力を３ビツト及びＺビ
ットのシフトレジスタにそれぞれ並列に入力し、その出
力を加減算器に加えて算出結果を２ビツトのシフトレジスタに入力してのため
の修正項を算出することが出来る。For example, '1=, 2, 2=, 2, '3=
, 2', the output of the adder/is input in parallel to the 3-bit and Z-bit shift registers, the output is added to the adder/subtractor, and the calculation result is input to the 2-bit shift register. It is possible to calculate the correction term for .

く効　果〉以上述べたように本発明によれば特徴パラメータ抽出部
からの対数出力の各チャンネルの総和を計算し１．シフ
トレジスタにより、２″″″ｎ（ｎ＝１゜２、・・・）
倍することにより近似的な平均の対数パワーを求め、そ
れを各チャンネルの対数パワーから引き算することによ
り近似的にパワー正規化を割り算処理を伴なうことなく
高速に行なうことが出来、その結果パワー情報を考慮し
た認識を高速かつ比較的簡便に行なうことが出来る。Effect> As described above, according to the present invention, the sum of each channel of the logarithmic output from the feature parameter extraction unit is calculated.1. By shift register, 2″″n (n=1゜2,...)
By multiplying the approximate average logarithmic power, and subtracting it from the logarithmic power of each channel, power normalization can be approximately performed at high speed without involving division processing. Recognition considering power information can be performed at high speed and relatively easily.

[Brief explanation of the drawing]

第１図は本発明を実施した音声認識装置の構成を示すブ
ロック図、第一図はその要部構成プロ・ツク図である。３・・・特徴パラメータ抽出部、グ・・・パワー正規化
部、３／・・・対数変換ＲＯＭ、グ／・・・加算器、グ
ー・・・シフトレジスタ、グ３・・・減算器、Ｐｉ・・
・正規化前のｉ次の特徴パラメータ、Ｗｉ・・・正規化
後のｉ次の特徴パラメータ。代理人　弁理士　福　士　愛　彦（他２名）５７５１　事件の表示特願昭５７−２０７１７８２　発明の名称音声認識装置３　補正をする者事件との関係　　　特許出願人住　所　　〒１０８東京都港区三田１丁目４番２８号デ
ンシケイサンキ　キホンギジＴノケンキュウクミアイ名
称　電子計算機基本技術研究組合理事長　　関　本　忠　弘４、代理人住　所　　〒５４５大阪市阿倍野区長池町２２番２２号
自　　発６、補正の対象７、補正の内容（１）明細書第１頁第２０行目の「加良」の記載を「改
良」と訂正致し１す。（２）同書第７頁！１８行目の「２の階乗」の記載を「
２のべき乗」と訂正致し１す。以　　上FIG. 1 is a block diagram showing the configuration of a speech recognition device embodying the present invention, and FIG. 1 is a block diagram showing the configuration of its main parts. 3... Feature parameter extraction unit, G... Power normalization unit, 3/... Logarithmic conversion ROM, G/... Adder, G... Shift register, G3... Subtractor, Pi...
- i-th feature parameter before normalization, Wi... i-th feature parameter after normalization. Agent Patent attorney Aihiko Fukushi (and 2 others) 575 1 Patent application for indication of the case 1987-207178 2 Name of the invention Speech recognition device 3 Person making the amendment Relationship to the case Patent applicant address 108 Tokyo Port Address: 6, 22-22 Nagaike-cho, Abeno-ku, Osaka-shi, Osaka 545 Subject of amendment 7, contents of amendment (1) The description of "Kara" in line 20 of page 1 of the specification has been corrected to "improvement"1. (2) Page 7 of the same book! Change the description of “factorial of 2” on line 18 to “
I have corrected it to ``a power of 2.''that's all

Claims

[Claims] 1. An extraction unit that extracts feature parameters, an arithmetic unit that calculates the sum of each channel of logarithmic output from the extraction unit, and a shift register into which the sum calculated by the arithmetic unit is introduced. and arithmetic means for calculating the output of the shift register from the logarithmic power of each of the channels, the speech recognition device being configured to normalize the power of the feature parameter.