JPH04362700A

JPH04362700A - Voice recognition device

Info

Publication number: JPH04362700A
Application number: JP3165028A
Authority: JP
Inventors: Takashi Ariyoshi; 有吉　敬
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1991-06-10
Filing date: 1991-06-10
Publication date: 1992-12-15

Abstract

PURPOSE:To realize a logarithmic converting method which can handle even a small input without increasing an error and eliminates an abrupt change in the characteristic of an output value and to perform fast logarithmic conversion without using a large memory. CONSTITUTION:The voice recognition device consists of an acoustoelectric transducer 10, an acoustic analyzing means 20, a registered voice storage means 30, and a recognizing process means 40 which performs a recognizing process by using the feature quantity of an input voice obtained by the acoustic analyzing means 20 and the feature quantity of a registered voice stored in the registered voice storage means 30. This acoustic analyzing means 20 has a logarithmic converting means 24 and converts the input signal according to a logarithmic curve when the input signal is larger than a predetermined threshold value and according to a straight line which connects with the logarithmic curve at the threshold value and passes an origin when the input signal is less than the threshold value.

Description

[Detailed description of the invention]

【０００１】0001

【技術分野】本発明は、信号処理技術、より詳細には、
音声信号の対数変換技術に関し、例えば、音声のレベル
の違いが大きく、あるいは、騒音の大きい環境下での音
声処理装置、例えば、事務所内、自動車内、工場内、家
庭内で使用される音声処理装置に応用して好適なもので
ある。[Technical Field] The present invention relates to signal processing technology, more specifically,
Regarding logarithmic conversion technology for audio signals, for example, audio processing equipment used in environments with large differences in audio levels or in noisy environments, such as audio processing used in offices, cars, factories, and homes. This is suitable for application to devices.

【０００２】0002

【従来技術】音声処理技術においては、パワーやパワー
スペクトルなどの特徴量を扱う上で、対数変換がしばし
ば用いられている。これは、音声の広いダイナミックレ
ンジに対応するためであったり、人間の聴覚特性に合せ
るためであったり、レベルの異なる音声の特徴量を等価
的に扱うためであったりする。この変換では、入力（≧
０）に対し、単純に対数変換を施すと、入力０の時に、
出力が負の無限大になるので実用的でない。そこで、一
般的には、μｌｏｇ変換が用いられる。すなわち、例え
ば、ｙ＝Ａ・ｌｏｇ（ｘ＋１）　　　　（Ａ：定数）で
ある。しかし、対数変換自身が小さな入力に対し、大き
な誤差を与えることに加え、この変換は、１を足すこと
により小さな入力に対して更に誤差を大きくしている。これらの問題点を解決するための従来技術としては、特
公昭６３−３４４７７号公報があるが、この公報に記載
された発明では、2. Description of the Related Art In audio processing technology, logarithmic transformation is often used to handle feature quantities such as power and power spectrum. This may be to accommodate a wide dynamic range of audio, to match the human hearing characteristics, or to treat feature amounts of audio with different levels equivalently. In this transformation, the input (≧
0), by simply applying logarithmic transformation, when the input is 0, we get
This is not practical because the output becomes negative infinity. Therefore, μlog conversion is generally used. That is, for example, y=A·log(x+1) (A: constant). However, in addition to the logarithmic transformation itself giving a large error for small inputs, this transformation further increases the error for small inputs by adding 1. As a conventional technique for solving these problems, there is Japanese Patent Publication No. 63-34477, but the invention described in this publication

【０００３】0003

【数１】[Math 1]

【０００４】として、一定値以下の入力には、同じ出力
を与えて、出力の差分（あるいは微分）をとった場合の
誤差をなくしているが、その反面、小さな信号が全く表
現できないという新たな欠点と、上式の曲線が滑らかで
ないために、その一定値付近の入力に対し、出力の差分
（あるいは微分）は、不連続となり、後の処理に悪影響
を及ぼすという新たな欠点が生じる。また、対数変換は
、ＲＯＭテーブルを参照することが一般的であるが、こ
の方式は、大きなメモリ領域を必要とし、メモリの小さ
なＤＳＰ（デジタルシグナルプロセッサ）などでの処理
には不向きである。[0004] The same output is given to inputs below a certain value, eliminating the error caused by taking the difference (or differentiation) of the outputs. Another disadvantage is that because the curve in the above equation is not smooth, the difference (or differential) of the output becomes discontinuous for an input around a constant value, which causes a new disadvantage in that it adversely affects subsequent processing. Further, although logarithmic conversion generally refers to a ROM table, this method requires a large memory area and is not suitable for processing in a DSP (digital signal processor) or the like with a small memory.

【０００５】[0005]

【目的】本発明は、上述のごとき実情に鑑みてなされた
もので、音声信号処理において、小さな入力に対しても
誤差を増大させずに扱うことができ、かつ、出力の値が
急激に性質を変えるようなことがない対数変換法を実現
すること、更には、多量のメモリを使用せず、対数変換
を高速に実行すること、更には、小さな入力に対しても
正確に音声区間を検出し、正確に音声の特徴量を抽出す
ることを目的としてなされたものである。[Purpose] The present invention has been made in view of the above-mentioned circumstances, and is capable of handling even small inputs without increasing errors in audio signal processing, and in which the output value suddenly changes Achieving a logarithmic conversion method that does not change the process, executing logarithmic conversion at high speed without using a large amount of memory, and accurately detecting speech intervals even for small inputs. However, this method was developed for the purpose of accurately extracting voice features.

【０００６】[0006]

【構成】本発明は、上記目的を達成するために、（１）
音声を入力するための音響電気変換手段と、上記音響電
気変換手段で得られた入力音声を音響分析する音響分析
手段と、予め登録された登録音声の特徴量を記憶する登
録音声記憶手段と、上記音響分析手段で得られた入力音
声の特徴量と上記登録音声記憶手段に記憶された登録音
声の特徴量とを用いて認識処理を行なう認識処理手段と
を具備して成る音声認識装置において、上記音響分析手
段は、対数変換手段を有し、該対数変換手段は、入力信
号が予め定められたしきい値以上であれば、対数曲線に
基づいて該入力信号の変換を行ない、該入力信号が該し
きい値以下であれば、該しきい値において上記対数曲線
に接続し、かつ、原点を通る直線に基づいて該入力信号
の変換を行なうこと、或いは、（２）前記（１）の音声
認識装置であって、前記対数変換手段は、デジタル信号
の比較演算を行なう比較手段と、デジタル信号のシフト
演算を行なうシフト手段と、デジタル信号の加算を行な
う加算手段とを有し、前記（１）記載の対数変換の近似
計算を行なうこと、或いは、前記（１）又は（２）の音
声認識装置であって、（３）前記対数変換手段に入力さ
れる入力信号は、前記音声入力の電力、又は振幅である
こと、或いは、（４）前記音声入力の電力スペクトル又
は振幅スペクトルであることを特徴としたものである。以下、本発明の実施例に基づいて説明する。[Structure] In order to achieve the above objects, the present invention provides (1)
an acousto-electric conversion means for inputting a voice; an acoustic analysis means for acoustically analyzing the input voice obtained by the acousto-electric conversion means; a registered voice storage means for storing feature quantities of registered voices registered in advance; A speech recognition device comprising a recognition processing means for performing recognition processing using the feature quantity of the input speech obtained by the acoustic analysis means and the feature quantity of the registered speech stored in the registered speech storage means, The acoustic analysis means has a logarithmic conversion means, and the logarithmic conversion means converts the input signal based on a logarithmic curve when the input signal is equal to or higher than a predetermined threshold value. is below the threshold, connecting to the logarithmic curve at the threshold and converting the input signal based on a straight line passing through the origin, or (2) converting the input signal according to (1) above. In the speech recognition device, the logarithmic conversion means has a comparison means for performing a comparison operation of digital signals, a shift means for performing a shift operation of the digital signals, and an addition means for performing addition of the digital signals, and the logarithmic conversion means has the above-mentioned ( 1) Approximate calculation of the logarithmic transformation as described above is performed, or in the speech recognition device of the above (1) or (2), (3) the input signal input to the logarithmic transformation means is based on the voice input. (4) It is a power spectrum or an amplitude spectrum of the audio input. Hereinafter, the present invention will be explained based on examples.

【０００７】以下に、本発明の詳細な説明を行なう。図
１は、本発明による音声認識装置の一実施例を説明する
ための構成図で、図中、１０は入力音声を電気信号に変
換するマイクロホンである。２０は、マイクロホン１０
で得られた信号に対して音響分析を行ない、入力音声の
特徴量を抽出する音響分析部であり、２１はアナログ信
号をデジタル信号に変換するＡ／Ｄ変換部である。２２
は、２乗器、平滑器（図示しない）から成り、入力音声
の電力を求める電力演算部であり、２３はバンドパスフ
ィルタバンク、２乗器、平滑器（図示しない）から成り
、入力音声の電力スペクトルを求める電力スペクトル演
算部である。２４は、電力演算部２２で得られた電力、
及び電力スペクトル演算部２３で得られた電力スペクト
ルを対数変換する対数変換部、２５は、対数変換部２４
で得られた入力音声の電力から公知である２しきい値法
により入力音声の音声区間を検出する音声区間検出部、
２６は、対数変換部２４で得られた入力音声の電力スペ
クトルに対して公知である最小２乗誤差近似直線による
補正（ＬＳＦＬ補正）を行なうＬＳＦＬ補正演算部、２
７は、音声区間検出部２５で得られた音声区間情報と、
ＬＳＦＬ補正演算部２６で得られたＬＳＦＬ補正済みの
入力音声の電力スペクトルから、公知である２値のタイ
ムスペクトルパターン（ＢＴＳＰ）を演算し入力音声の
特徴量とするＢＴＳＰ演算部である。３０は、予め登録
された音声の２値のタイムスペクトルパターン（ＢＴＳ
Ｐ）を記憶する登録音声記憶部、４０は、音響分析部２
０で得られた入力音声の特徴量と、登録音声記憶部３０
に記憶された登録音声の特徴量とから、認識処理を行な
い、結果を入力音声の認識結果とする認識処理部である
。[0007] The present invention will be described in detail below. FIG. 1 is a block diagram for explaining an embodiment of a speech recognition device according to the present invention. In the figure, 10 is a microphone that converts input speech into an electrical signal. 20 is microphone 10
An acoustic analysis section performs an acoustic analysis on the signal obtained in the above and extracts the feature amount of the input voice, and 21 is an A/D conversion section that converts an analog signal into a digital signal. 22
23 is a power calculation unit that consists of a squarer and a smoother (not shown) and calculates the power of the input audio. 23 is a power calculation unit that consists of a bandpass filter bank, a squarer, and a smoother (not shown), and calculates the power of the input audio. This is a power spectrum calculation unit that calculates a power spectrum. 24 is the power obtained by the power calculation unit 22;
and a logarithmic transformation unit 25 that logarithmically transforms the power spectrum obtained by the power spectrum calculation unit 23;
a voice section detection unit that detects the voice section of the input voice using the well-known two-threshold method from the power of the input voice obtained in the step;
26 is an LSFL correction calculation unit that performs correction (LSFL correction) using a known least squares error approximation straight line to the power spectrum of the input voice obtained by the logarithmic conversion unit 24;
7 is the voice interval information obtained by the voice interval detection unit 25;
This BTSP calculation unit calculates a known binary time spectrum pattern (BTSP) from the power spectrum of the LSFL-corrected input audio obtained by the LSFL correction calculation unit 26 and uses it as a feature quantity of the input audio. 30 is a pre-registered audio binary time spectrum pattern (BTS
The registered voice storage unit 40 that stores the sound analysis unit 2
0 and the registered voice storage unit 30
This is a recognition processing unit that performs recognition processing based on the feature quantities of the registered speech stored in the , and uses the result as the recognition result of the input speech.

【０００８】以下に、前記音響分析部２０中の対数変換
部２４の動作を詳しく説明する。自然対数は、入力をｘ
（≧０）、出力をｙ（≧０）とすると、ｙ＝ｌｏｇｘ　
　　　　　　　　　　　　　　　　　　　　　　　　　
　　　　　　　　　　　　　　　　　（１）で表わされ、ｘ＝ｅにおいて、直線、ｙ＝ｘ／ｅ　　　　　　　　　　　　　　（２）と接す
る。そこで、次のような関数を考える。The operation of the logarithmic conversion section 24 in the acoustic analysis section 20 will be explained in detail below. The natural logarithm takes the input as x
(≧0), and the output is y (≧0), then y=logx

It is expressed as (1) and touches the straight line, y=x/e (2) at x=e. Therefore, consider the following function.

【０００９】[0009]

【数２】[Math 2]

【００１０】この式の関数を図２に示す。この関数は、
すべての点において、連続かつ微分可能であるので、滑
らかな入力の変化に対して、変換後の出力も滑らかに変
化する。図３は、この変換方式をＲＯＭテーブル５０に
よって実現した実施例である。本発明の別の実施例は、
ＲＯＭテーブルを用いず近似式の演算を行なう。式（３
）の関数を入力１２ビットのデジタル値を出力８ビット
のデジタル値に対数変換する場合の例に置き換えるとす
る。扱い易さの点から、自然対数の底ｅの代わりに２を
用いて、式（３）を、A function of this equation is shown in FIG. this function is,
Since it is continuous and differentiable at all points, the converted output also changes smoothly in response to smooth changes in the input. FIG. 3 shows an embodiment in which this conversion method is implemented using a ROM table 50. Another embodiment of the invention is:
An approximate expression is calculated without using a ROM table. Formula (3
) is replaced with an example in which a 12-bit input digital value is logarithmically converted into an output 8-bit digital value. For ease of handling, using 2 instead of the base e of the natural logarithm, formula (3) can be written as

【００１１】[0011]

【数３】[Math 3]

【００１２】とする。この関数を図４のａに示す。更に
、対数を折線で近似したものが、[0012] This function is shown in Figure 4a. Furthermore, the logarithm approximated by a broken line is

【００１３】[0013]

【数４】[Math 4]

【００１４】である。この関数を図４のｂに示す。この
関数は折れ線を用いているが、対数関数との誤差は小さ
く、また、各接点での両折れ線の傾きの差は小さいので
、実用上問題はない。図５にこの処理のフローチャート
を示し、図６にこれをＣ言語で記述したプログラムを示
す。このプログラムは、非常に簡素であり、高速に実行
される。また、入出力のビット数がこの例と異なった場
合も、同様に実現される。[0014] This function is shown in FIG. 4b. Although this function uses a polygonal line, the error from the logarithmic function is small, and the difference in slope between the two polygonal lines at each contact point is small, so there is no problem in practice. FIG. 5 shows a flowchart of this process, and FIG. 6 shows a program written in C language. This program is very simple and runs fast. Furthermore, even if the number of input/output bits is different from this example, it can be implemented in the same way.

【００１５】尚、この実施例では、電力、及び、電力ス
ペクトルに対してのみ、対数変換を行っているが、対数
変換を必要とする音響分析に関する他の分析量に対して
も、同様に実施することができる。又、区間検出部２５
の区間検出法は、対数変換後の電力に対する時間軸上の
差分から音声の立ち上がり、立ち下がりを求めるなど、
他の方法でも良い。更に、音声分析部２０で求める入力
音声の特徴量、登録音声記憶部３０に記憶された登録音
声の特徴量、認識処理部４０で行なわれる認識方式など
は、他の方法でも良い。また、以上の説明から明らかな
ように、本発明の基本的な技術思想は、電気的な信号を
対数変換することにあり、以上に説明した音声認識装置
は、この対末変換技術の一応用例である。In this example, logarithmic transformation is performed only on the power and the power spectrum, but it can be similarly performed on other analyzed quantities related to acoustic analysis that require logarithmic transformation. can do. Also, the section detection section 25
The interval detection method calculates the rise and fall of audio from the difference on the time axis of the power after logarithmic transformation, etc.
Other methods are also possible. Further, the feature amount of the input voice determined by the voice analysis section 20, the feature amount of the registered voice stored in the registered voice storage section 30, the recognition method performed by the recognition processing section 40, etc. may be other methods. Furthermore, as is clear from the above explanation, the basic technical idea of the present invention is to logarithmically convert an electrical signal, and the speech recognition device described above is an example of an application of this logarithmic conversion technology. It is.

【００１６】[0016]

【効果】請求項１に記載の発明によれば、小さな入力に
対しては、原点を通る一次関数を用いて変換を行なうの
で、小さな入力に対しても誤差を増大させずに扱うこと
ができ、かつ、大きな入力に対しては、通常の対数関数
を用い、しかも、その一次関数とその対数関数とは連続
かつ微分可能な点で接続しているので、出力の値が急激
に性質を変えるようなことがないという対数変換法が実
現できる。請求項２に記載の発明によれば、デジタル信
号に対する比較演算、シフト演算、加算演算による簡素
な処理方式を用いて変換を行なっているので、多量のメ
モリを使用せず、対数変換を高速に実行することができ
る。請求項３に記載の発明によれば、請求項１又は２記
載の対数変換を行なった電力を用いて音声区間検出を行
なうことができるので、レベルの小さな音声の区間を正
確に検出することができる。特に、例えば、対数変換後
の電力の差分なども用いて音声区間検出処理を行なえば
、レベルの小さな音声も、誤差や雑音の影響が少ない状
態で、レベルの大きな音声と同様に扱うことができる。請求項４に記載の発明によれば、請求項１又は２記載の
対数変換を行なった電力スペクトルを用いて入力音声の
音声の特徴量を求めることができるので、スペクトルの
小さな帯域に関して誤差が小さくなる。[Effect] According to the invention set forth in claim 1, since the transformation is performed using a linear function passing through the origin for small inputs, it is possible to handle small inputs without increasing errors. , and for large inputs, a normal logarithmic function is used, and since the linear function and the logarithmic function are connected at a continuous and differentiable point, the output value changes properties rapidly. It is possible to realize a logarithmic conversion method that does not cause such problems. According to the invention described in claim 2, since the conversion is performed using a simple processing method using comparison operations, shift operations, and addition operations on digital signals, logarithmic conversion can be performed at high speed without using a large amount of memory. can be executed. According to the invention set forth in claim 3, voice section detection can be performed using the logarithmically transformed power as set forth in claim 1 or 2, so that it is possible to accurately detect voice sections with low levels. can. In particular, if speech interval detection processing is performed using, for example, the power difference after logarithmic transformation, low-level speech can be treated in the same way as high-level speech with less influence of errors and noise. . According to the invention set forth in claim 4, since the voice feature amount of the input voice can be determined using the power spectrum subjected to the logarithmic transformation according to claim 1 or 2, the error is small in a small band of the spectrum. Become.

[Brief explanation of drawings]

【図１】　　本発明による音声認識装置の一実施例を説
明するための構成図である。FIG. 1 is a configuration diagram for explaining an embodiment of a speech recognition device according to the present invention.

【図２】　　本発明の実施に使用される対数変換の一例
を示す図である。FIG. 2 is a diagram illustrating an example of logarithmic transformation used in implementing the present invention.

【図３】　　図２に示した対数変換を行なう回路の一例
を示す図である。FIG. 3 is a diagram showing an example of a circuit that performs the logarithmic transformation shown in FIG. 2;

【図４】　　対数変換の近似計算の一例を示す図である
。FIG. 4 is a diagram illustrating an example of approximate calculation of logarithmic transformation.

【図５】　　図４に示した近似計算を実行するためのフ
ローチャートの一例を示す図である。5 is a diagram showing an example of a flowchart for executing the approximate calculation shown in FIG. 4. FIG.

【図６】　　図５に示したフローチャートをＣ言語で記
述したプログラムの一例を示す図である。6 is a diagram showing an example of a program in which the flowchart shown in FIG. 5 is written in C language.

[Explanation of symbols]

１０…マイクロホン、２０…音響分析部、２１…Ａ／Ｄ
変換部、２２…電力演算部、２３…電力スペクトル演算
部、２４…対数変換部、２５…音声区間検出部、２６…
ＬＳＦＬ補正演算部、２７…ＢＴＳＰ演算部、３０…登
録音声記憶部、４０…認識処理部、５０…ＲＯＭ。10...Microphone, 20...Acoustic analysis section, 21...A/D
Conversion unit, 22... Power calculation unit, 23... Power spectrum calculation unit, 24... Logarithmic conversion unit, 25... Voice section detection unit, 26...
LSFL correction calculation section, 27... BTSP calculation section, 30... Registered voice storage section, 40... Recognition processing section, 50... ROM.

Claims

[Claims]

1. Acoustoelectric conversion means for inputting voice, acoustic analysis means for acoustically analyzing input voice obtained by the acoustoelectric conversion means, and registration device for storing feature quantities of registered voices registered in advance. It comprises a voice storage means, and a recognition processing means for performing recognition processing using the feature amount of the input voice obtained by the acoustic analysis means and the feature amount of the registered voice stored in the registered voice storage means. In a speech recognition device,
The acoustic analysis means has a logarithmic conversion means, and the logarithmic conversion means converts the input signal based on a logarithmic curve when the input signal is equal to or higher than a predetermined threshold value. is less than or equal to the threshold value, the speech recognition device converts the input signal based on a straight line that connects to the logarithmic curve at the threshold value and passes through the origin.

2. The speech recognition device according to claim 1, wherein the logarithmic conversion means comprises a comparison means for performing a comparison operation on digital signals, a shift means for performing a shift operation on the digital signals, and a shift means for performing an addition operation on the digital signals. 2. A speech recognition device comprising: addition means for performing the approximate calculation of the logarithmic transformation according to claim 1.

3. The speech recognition device according to claim 1, wherein the input signal input to the logarithmic conversion means is the power or amplitude of the speech input.

4. The speech recognition device according to claim 1, wherein the input signal input to the logarithmic conversion means is a power spectrum or an amplitude spectrum of the speech input. Device.