JP3298658B2

JP3298658B2 - Voice recognition method

Info

Publication number: JP3298658B2
Application number: JP09788092A
Authority: JP
Inventors: 良介濱崎; 英樹小島
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1992-04-17
Filing date: 1992-04-17
Publication date: 2002-07-02
Anticipated expiration: 2017-07-02
Also published as: JPH05297889A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は音声認識方式に関し、特
に、本発明は、予め登録しておいた辞書の各テンプレー
トと入力パターンとのマッチングを行うことにより類似
度または距離を計算して音声の認識を行う音声認識方式
において、辞書側の各テンプレートに認識時の雑音パタ
ーンを足し合わせることにより、高レベルの雑音下にお
ける認識率の向上を図る耐雑音音声認識方式に関するも
のである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition system, and more particularly to a voice recognition system which calculates a similarity or a distance by matching each template of a dictionary registered in advance with an input pattern. The present invention relates to a noise-tolerant speech recognition method for improving a recognition rate under high-level noise by adding a noise pattern at the time of recognition to each template on the dictionary side in a speech recognition method for recognizing.

【０００２】[0002]

【従来の技術】従来から提案されている耐雑音音声認識
手法として、雑音が重畳されている認識時の音声信号
から、音声区間の前後の雑音区間の雑音成分を取り出
し、パワー・スペクトル次元において、雑音が重畳され
ている音声信号から上記雑音成分を差し引くスペクトル
・サブトラクション法や、上記のようにして取り出さ
れた認識時の雑音パワー・スペクトルを、予め登録して
おいた辞書側の各テンプレートに足し加えるテンプレー
ト・アディション法等があり、従来からその有効性が確
認されている。2. Description of the Related Art As a noise-tolerant speech recognition method conventionally proposed, a noise component of a noise section before and after a speech section is extracted from a speech signal at the time of recognition in which noise is superimposed. The spectrum subtraction method of subtracting the noise component from the voice signal on which noise is superimposed, or the noise power spectrum at the time of recognition extracted as described above is added to each template on the dictionary side registered in advance. There is a template addition method to be added, and its effectiveness has been conventionally confirmed.

【０００３】図９は上記したの手法により雑音処理を
行う音声認識方式を示す図であり、同図において、２０
１は音声入力部、２０２は第１の周波数分析部、２０
２’は第２の周波数分析部、２０４は第１のデータ圧縮
部、２０４’は第２のデータ圧縮部、２０５は辞書、２
０６は雑音付加部、２０７は照合部、２０８は非線形処
理部である。FIG. 9 is a diagram showing a speech recognition system for performing noise processing by the above-described method.
1 is a voice input unit, 202 is a first frequency analysis unit, 20
2 ′ is a second frequency analysis unit, 204 is a first data compression unit, 204 ′ is a second data compression unit, 205 is a dictionary,
06 is a noise adding unit, 207 is a matching unit, and 208 is a nonlinear processing unit.

【０００４】同図において、音声登録時には、音声入力
部２０１より音声信号が入力されると、第１の周波数分
析部２０２において、各周波数または各周波数帯域（以
下チャンネルという）に音声が分析され、各チャンネル
毎に音声のパワー・スペクトルの大きさが出力される。
第１の周波数分析部２０２の出力は、第１のデータ圧縮
部において、対数を計算することにより対数変換され、
辞書２０５にテンプレートとして格納される。[0004] In FIG. 1, at the time of voice registration, when a voice signal is input from a voice input unit 201, a voice is analyzed at a first frequency analysis unit 202 at each frequency or each frequency band (hereinafter referred to as a channel). The magnitude of the power spectrum of the sound is output for each channel.
An output of the first frequency analysis unit 202 is logarithmically converted by calculating a logarithm in a first data compression unit,
It is stored in the dictionary 205 as a template.

【０００５】また、音声の認識時には、音声入力部２０
１より音声信号が入力されると、上記と同様、第１の周
波数分析部２０２において、各チャンネルについて音声
が分析され、各チャンネル毎に音声のパワー・スペクト
ルの大きさが出力される。第１の周波数分析部２０２の
出力は、第１のデータ圧縮部２０４において、対数を計
算することにより対数変換され照合部２０７に与えられ
る。At the time of voice recognition, the voice input unit 20
When an audio signal is input from No. 1, similarly to the above, the first frequency analysis unit 202 analyzes the audio for each channel and outputs the magnitude of the power spectrum of the audio for each channel. The output of the first frequency analysis unit 202 is logarithmically converted by calculating the logarithm in the first data compression unit 204, and is provided to the matching unit 207.

【０００６】一方、雑音区間に関しては、音声入力部２
０１より雑音が入力されると第２の周波数分析部２０
２’において、各チャンネルについて雑音が分析され、
雑音付加部２０６に与えられる。また、辞書２０５に格
納されている各テンプレートは、前記したように、登録
時に対数をとって保存してあるので、非線形処理部２０
８において、対数の逆関数である指数関数を用いて音声
パワーの次元に戻して、雑音付加部２０６に与えられ
る。On the other hand, regarding the noise section, the voice input unit 2
01, the second frequency analysis unit 20
At 2 ', the noise is analyzed for each channel,
The signal is provided to the noise adding unit 206. Further, as described above, since each template stored in the dictionary 205 is logarithmically stored at the time of registration, it is stored in the dictionary 205.
At 8, the sound power is returned to the dimension of the voice power using an exponential function that is an inverse logarithmic function, and is provided to the noise adding unit 206.

【０００７】雑音付加部２０６においては、非線形処理
部２０８の出力と第２の周波数分析部２０２’の出力を
足し合わせることにより、辞書に格納されているテンプ
レートに雑音成分を加える。雑音付加部２０６の出力は
第２のデータ圧縮部２０４’において対数がとられ、照
合部２０７に与えられる。照合部２０７においては、第
１のデータ圧縮部２０４の出力である入力音声パターン
と第２のデータ圧縮部２０４’の出力である雑音処理さ
れた辞書側のテンプレートとのマッチングが取られ、認
識結果が出力される。[0007] The noise adding unit 206 adds a noise component to the template stored in the dictionary by adding the output of the nonlinear processing unit 208 and the output of the second frequency analysis unit 202 '. The output of the noise adding unit 206 is logarithmically calculated in a second data compression unit 204 ′, and is supplied to a matching unit 207. The matching unit 207 performs matching between the input voice pattern output from the first data compression unit 204 and the template on the noise-processed dictionary side output from the second data compression unit 204 ′, and obtains a recognition result. Is output.

【０００８】[0008]

【発明が解決しようとする課題】上記したの手法にお
いては、入力音声に対して、雑音処理を一度だけ施せば
すむので、処理が簡単であるので良く用いられるが、そ
の反面、雑音成分だけでなく、音声成分まで引いてしま
う欠点がある。また、上記したの手法においては、雑
音成分を引く代わりに辞書側に雑音成分を足しているの
で、本質的でないスペクトル・パターンの変形がない反
面、その雑音処理を辞書のテンプレートの数だけ行う必
要があり、その有効性が示されているにも係わらず、実
用的でなかった。In the above-described method, the input speech is subjected to noise processing only once, so that the processing is simple and is often used. On the other hand, only the noise component is used. However, there is a drawback that audio components are also drawn. In addition, in the above-mentioned method, since noise components are added to the dictionary side instead of subtracting noise components, there is no deformation of the spectrum pattern which is not essential, but the noise processing needs to be performed by the number of dictionary templates. And, despite its effectiveness, was not practical.

【０００９】すなわち、図９に示した例においては、辞
書側の各テンプレートを音声パワーの次元に戻すために
非線形処理部２０８における非線形処理を１回行い、照
合のためにデータ圧縮部２０４において、非線形処理を
１回行う必要があり、合わせて、各テンプレートについ
て２回の非線形処理を行う必要がある。従って、辞書が
Ｎ個のテンプレートを持っているとすると、２Ｎ回の非
線形処理とＮ回の足し算が必要であり、処理量が多く、
実時間処理が困難であるという問題点があった。[0009] That is, in the example shown in FIG. 9, the nonlinear processing in the nonlinear processing unit 208 is performed once in order to return each template on the dictionary side to the dimension of the audio power, and the data compression unit 204 performs the comparison in the non-linear processing unit 208. The nonlinear processing needs to be performed once, and the nonlinear processing needs to be performed twice for each template. Therefore, assuming that the dictionary has N templates, 2N nonlinear processing and N additions are required, and the processing amount is large.
There is a problem that real-time processing is difficult.

【００１０】また、従来の音声認識装置においては、次
の理由により、周波数分析された各チャンネルの出力を
対数変換していた。１）感覚の強度が音圧の強度にほぼ比例するというウェ
ーバ・フェヒナの法則に対応していることを考慮してい
る。２）マッチングの時パワー・レベルの低い部分と高い部
分を同じ重みで計算できる。３）ダイナミック・レンジを低下させずにデータ圧縮を
することができる。４）パワーの正規化などの、本来乗除算でしなければな
らないスペクトル変換処理を加減算だけで行うことがで
き、演算速度を上げることができる。Further, in the conventional speech recognition apparatus, the output of each frequency-analyzed channel is logarithmically converted for the following reason. 1) It considers that it corresponds to Weber-Fechna's law that the intensity of the sense is almost proportional to the intensity of the sound pressure. 2) At the time of matching, low and high power levels can be calculated with the same weight. 3) Data compression can be performed without reducing the dynamic range. 4) Spectral conversion processing that must be performed by multiplication and division, such as power normalization, can be performed only by addition and subtraction, and the calculation speed can be increased.

【００１１】しかし、上記１）ないし３）に関しては、
対数と同じ様な特性をした関数であればよいし、また、
４）に関しても、パワーの正規化など乗除算を行う必要
がなければ対数である必要はなく、逆にここであげたよ
うな２つのパワー・スペクトルの和・差の変換値を必要
とする場合には、対数は利点があるとは言えない。本発
明は上記した従来技術の欠点に鑑み発明されたものであ
って、取り出された認識時の雑音パワー・スペクトル
を、予め登録しておいた辞書側の各テンプレートに足し
加えることにより、雑音が重畳された音声を認識するよ
うにした音声認識方式において、対数変換に換え、非線
型関数ｆ（ｘ）＝ｂ・ｅｘｐ（ａｘ）＋ｃを用いて変換
することにより、計算量を大幅に削減することができ、
また、高レベル雑音下においても音声を良好に認識する
ことができる音声認識方式を提供することを目的とす
る。However, regarding the above 1) to 3),
Any function that has the same characteristics as logarithm may be used.
Regarding 4), if it is not necessary to perform multiplication / division such as power normalization, the logarithm does not need to be obtained. Conversely, when the converted value of the sum / difference of the two power spectra is required. Logarithm has no advantage. The present invention has been made in view of the above-described drawbacks of the related art, and the noise is reduced by adding the extracted noise power spectrum at the time of recognition to each template on the dictionary side registered in advance. In a speech recognition method for recognizing superimposed speech, the amount of calculation is significantly reduced by performing conversion using a non-linear function f (x) = b.exp (ax) + c instead of logarithmic conversion. It is possible,
It is another object of the present invention to provide a speech recognition system capable of satisfactorily recognizing speech even under high-level noise.

【００１２】[0012]

【課題を解決するための手段】図１は本発明の原理図で
ある。本発明は上記課題を解決するため図１のように構
成したものであり、本発明の請求項１の発明は、発生さ
れた音声音響信号を電気信号に変換する音声入力部１
と、音声入力信号を周波数分析し、各分析フレーム毎に
複数チャンネルの分析データからなる入力音声パターン
を出力する第１および第２の周波数分析部２，２’と、
第１および第２周波数分析部２，２’において分析され
た周波数パターンを、非線型変換によってダイナミック
・レンジを保ちながらデータ圧縮する第１および第２の
データ圧縮部４，４’と、学習データから作成されるテ
ンプレートをデータ圧縮後、格納する辞書５と、音声認
識時に、第２のデータ圧縮部４’の出力より得られる雑
音成分をパワー・スペクトルの次元でテンプレートに付
加する雑音付加部６と、第１のデータ圧縮部４において
圧縮された入力音声パターンと雑音付加部６の出力との
照合を行い、両者の間の類似度あるいは距離を演算する
照合部７とを備えた音声認識方式において、第１および
第２のデータ圧縮部４，４’における非線型関数として
下式を用いるとともに、ｆ（ｘ）＝ｂ・ｅｘｐ（ａｘ）＋ｃ（ａ，ｂ，ｃは
定数）上記非線型関数の定数ａ，ｂ，ｃを決定する定数決定部
３を設け、上記非線形関数を用いて、第１および第２の
周波数分析部２，２’が出力する周波数パターンを第１
および第２のデータ圧縮部４，４’によりデータ圧縮す
るように構成したものである。FIG. 1 is a diagram illustrating the principle of the present invention. The present invention is configured as shown in FIG. 1 in order to solve the above-mentioned problem, and the invention of claim 1 of the present invention provides a voice input unit 1 for converting a generated voice acoustic signal into an electric signal.
First and second frequency analysis units 2 and 2 ′ for performing frequency analysis of an audio input signal and outputting an input audio pattern including analysis data of a plurality of channels for each analysis frame;
First and second data compression units 4 and 4 'for compressing the frequency pattern analyzed by the first and second frequency analysis units 2 and 2' while maintaining a dynamic range by non-linear conversion; A dictionary 5 for storing the template created from the data after compression, and a noise addition unit 6 for adding a noise component obtained from the output of the second data compression unit 4 ′ to the template in the dimension of the power spectrum during speech recognition. And a collating unit 7 for collating the input speech pattern compressed by the first data compression unit 4 with the output of the noise adding unit 6 and calculating the similarity or distance between the two. In the above, the following equation is used as a nonlinear function in the first and second data compression units 4 and 4 ′, and f (x) = b · exp (ax) + c (where a, b and c are constants) ) The constant a nonlinear function, b, a constant determining unit 3 for determining a c provided, using the above non-linear function, a frequency pattern in which the first and second frequency analysis unit 2, 2 'outputs the first
And data compression by the second data compression units 4 and 4 '.

【００１３】請求項２の発明は請求項１の発明におい
て、雑音付加部６において、辞書５から読み出されたテ
ンプレートに、下式により雑音を付加するように構成し
たものである。ｘ３＝（ｘ１−ｃ）・（ｘ２−ｃ）／ｂ＋ｃここで、ｘ１：辞書５から読み出されたテンプレート
の値ｘ２：データ圧縮部４’より得られる雑音成分ｘ３：雑音付加部６の出力請求項３の発明は請求項１または請求項２の発明におい
て、非線型関数ｆ（ｘ）が取り扱うｘの範囲内におい
て、非線型関数ｆ（ｘ）と対数関数ｌｏｇ（ｘ）との最
大誤差が最小になる定数ａ，ｂ，ｃを定めるように構成
したものである。According to a second aspect of the present invention, in the first aspect of the invention, the noise adding section 6 adds noise to the template read from the dictionary 5 by the following equation. x3 = (x1-c) · (x2-c) / b + c where: x1: the value of the template read from the dictionary 5 x2: the noise component obtained from the data compression unit 4 ′ x3: the output of the noise addition unit 6 According to a third aspect of the present invention, in the first or second aspect, the maximum error between the nonlinear function f (x) and the logarithmic function log (x) within the range of x handled by the nonlinear function f (x). Are determined so as to determine constants a, b, and c that minimize.

【００１４】請求項４の発明は請求項１または請求項２
の発明において、非線型関数ｆ（ｘ）が取り扱うｘの範
囲内において、非線型関数ｆ（ｘ）と対数関数ｌｏｇ
（ｘ）との差の絶対値の積分値が最小になる定数ａ，
ｂ，ｃを定めるように構成したものである。請求項５の
発明は請求項１または請求項２の発明において、非線型
関数ｆ（ｘ）が取り扱うｘの範囲内において、非線型関
数ｆ（ｘ）と対数関数ｌｏｇ（ｘ）との２乗誤差の積分
値が最小になる定数ａ，ｂ，ｃを定めるように構成した
ものである。According to a fourth aspect of the present invention, there is provided the first or second aspect.
In the invention of the above, within the range of x handled by the nonlinear function f (x), the nonlinear function f (x) and the logarithmic function log
(A), a constant that minimizes the integral value of the absolute value of the difference from
It is configured to determine b and c. According to a fifth aspect of the present invention, in the first or second aspect, within the range of x handled by the nonlinear function f (x), the square of the nonlinear function f (x) and the logarithmic function log (x) is set. The configuration is such that constants a, b, and c that minimize the integrated value of the error are determined.

【００１５】請求項６の発明は請求項１または請求項２
の発明において、非線型関数ｆ（ｘ）が取り扱うｘの範
囲内において、非線型関数ｆ（ｘ）と対数関数ｌｏｇ
（ｘ）とをテイラー展開し、その３項までの各項の係数
どうしが等しいとしたときの定数ａ，ｂ，ｃについての
連立方程式によって定数ａ，ｂ，ｃを決定するように構
成したものである。[0015] The invention of claim 6 is claim 1 or claim 2.
In the invention of the above, within the range of x handled by the nonlinear function f (x), the nonlinear function f (x) and the logarithmic function log
(X) is subjected to Taylor expansion, and the constants a, b, and c are determined by simultaneous equations for the constants a, b, and c when the coefficients of each term up to the three terms are assumed to be equal. It is.

【００１６】請求項７の発明は請求項１または請求項２
の発明において、認識結果計数部を設けて照合部７にお
ける認識結果を計数し、その認識率に基づき非線形関数
の定数ａ，ｂ，ｃを変化させ、最適な定数を決定するよ
うに構成したものである。請求項８の発明は請求項１，
２，３，４，５，６または請求項７の発明において、非
線形関数の定数ｃをゼロに設定したものである。[0016] The invention of claim 7 is claim 1 or claim 2.
In the invention, the recognition result counting section is provided to count the recognition results in the matching section 7, and the constants a, b, and c of the nonlinear function are changed based on the recognition rate to determine the optimum constant. It is. The invention of claim 8 is claim 1,
2, 3, 4, 5, 6, or 7, wherein the constant c of the nonlinear function is set to zero.

【００１７】請求項９の発明は請求項８の発明におい
て、辞書５から読み出されたテンプレートに、第２のデ
ータ圧縮部４’より得られる雑音成分を乗算することに
より、雑音付加部６の出力を得るように構成したもので
ある。請求項１０の発明は請求項９の発明において、辞
書５にテンプレートを格納する際、非線型関数の定数ｂ
によりテンプレート値を割っておくように構成したもの
である。According to a ninth aspect of the present invention, in the invention of the eighth aspect, the template read out from the dictionary 5 is multiplied by a noise component obtained from the second data compression unit 4 ′, so that the noise addition unit 6 It is configured to obtain an output. According to a tenth aspect of the present invention, in storing the template in the dictionary 5, the constant b of the non-linear function
, The template value is divided.

【００１８】請求項１１の発明は請求項９の発明におい
て、第１のデータ圧縮部４において圧縮された入力音声
パターンに非線型関数の定数ｂを乗算し、その結果を照
合部７に与えるように構成したものである。請求項１２
の発明は請求項９の発明において、第２のデータ圧縮部
４’において圧縮された雑音パターンを非線型関数の定
数ｂにより除算し、その結果を雑音付加部６に与えるよ
うに構成したものである。According to an eleventh aspect of the present invention, in the ninth aspect of the invention, the input voice pattern compressed by the first data compression unit 4 is multiplied by a constant b of a non-linear function, and the result is supplied to the matching unit 7. It is what was constituted. Claim 12
According to the ninth aspect of the present invention, the noise pattern compressed in the second data compression section 4 'is divided by a constant b of a nonlinear function, and the result is given to the noise addition section 6. is there.

【００１９】[0019]

【作用】請求項１ないし請求項２の発明において、音声
の認識時、入力された音声は周波数分析部２により、周
波数分析され、データ圧縮部４において、非線型関数ｆ
（ｘ）＝ｂ・ｅｘｐ（ａｘ）＋ｃによりデータ圧縮が行
われて、照合部７に与えられる。According to the first and second aspects of the present invention, at the time of recognizing the voice, the input voice is frequency-analyzed by the frequency analysis unit 2, and the nonlinear function f
Data compression is performed according to (x) = b · exp (ax) + c, and the result is provided to the matching unit 7.

【００２０】一方、雑音信号は周波数分析部２’により
周波数分析され、データ圧縮部４’において非線型関数
ｆ（ｘ）によりデータ圧縮されて、雑音付加部６に与え
られる。雑音付加部６は、辞書５より読み出されたテン
プレートに圧縮された雑音成分を付加し照合部７に与え
る。On the other hand, the noise signal is frequency-analyzed by a frequency analysis unit 2 ′, data is compressed by a non-linear function f (x) in a data compression unit 4 ′, and given to a noise addition unit 6. The noise adding unit 6 adds a compressed noise component to the template read from the dictionary 5 and provides the template to the matching unit 7.

【００２１】照合部７は、雑音付加部６の出力と、デー
タ圧縮部４の出力との間の類似度あるいは距離を演算
し、認識結果を出力する。変換関数として非線型関数ｆ
（ｘ）＝ｂ・ｅｘｐ（ａｘ）＋ｃを用いているので、四
則演算の計算だけでパワー・スペクトル次元の雑音成分
の付加ができ、計算量を大幅に削減することができる。The collating unit 7 calculates a similarity or a distance between the output of the noise adding unit 6 and the output of the data compressing unit 4, and outputs a recognition result. Nonlinear function f as a conversion function
Since (x) = b.exp (ax) + c is used, the noise component of the power spectrum dimension can be added only by the calculation of the four arithmetic operations, and the amount of calculation can be greatly reduced.

【００２２】また、請求項３ないし請求項６の手法によ
り非線型関数の定数ａ，ｂ，ｃを定めることにより、非
線型関数ｆ（ｘ）が取り扱うｘの範囲内において、非線
型関数ｆ（ｘ）＝ｂ・ｅｘｐ（ａｘ）＋ｃを対数関数に
近似させることができ、対数関数を用いた場合と同様な
特性を得ることができる。さらに、請求項７の発明のよ
うに、認識結果計数部を設けて照合部７における認識結
果を計数し、その認識率に基づき非線形関数の定数ａ，
ｂ，ｃを変化させることにより、照合部７における認識
率が最良になるデータ圧縮を行うことができる。Further, by determining the constants a, b, and c of the nonlinear function according to the method of claims 3 to 6, within the range of x handled by the nonlinear function f (x), the nonlinear function f ( x) = b.exp (ax) + c can be approximated to a logarithmic function, and the same characteristics as when a logarithmic function is used can be obtained. Further, a recognition result counting unit is provided to count the recognition results in the collating unit 7, and based on the recognition rate, constants a and
By changing b and c, it is possible to perform data compression with the best recognition rate in the matching unit 7.

【００２３】またさらに、請求項８ないし請求項１２の
発明のように、非線形関数の定数ｃをゼロに設定するこ
とにより、データ圧縮部４，４’における定数ｃの加
算、および、雑音付加部６において定数ｃの減算が必要
なくなり、更に計算量が削減され、請求項９の発明のよ
うに、辞書５から読み出されたテンプレートに、第２の
データ圧縮部４’より得られる雑音成分を乗算すること
により、雑音付加部６の出力を得ることが可能となる。Further, by setting the constant c of the non-linear function to zero, the addition of the constant c in the data compression units 4 and 4 'and the addition of a noise 6, the subtraction of the constant c is not required, and the calculation amount is further reduced. As in the invention of claim 9, the noise component obtained from the second data compression unit 4 'is added to the template read from the dictionary 5. By multiplying, the output of the noise adding unit 6 can be obtained.

【００２４】なお、定数ｃをゼロに設定しても、２つの
パターンの距離を照合部７において算出する際、２つの
パターンの差を計算する市街地距離では算出された距離
の値には影響しない。また、２つのパターンの２乗差を
計算するユークリット距離でも定数ｃによる絶対値の大
きさの違いは本質的でない。また、請求項１０の発明の
ように、辞書５にテンプレートを格納する際、非線型関
数の定数ｂによりテンプレート値を割っておくように構
成することにより、音声登録時、データ圧縮部４の出力
を定数ｂにより除算する必要があるが、音声認識時、雑
音付加部６においては乗算のみ行えばよく、除算が不必
要なため、音声認識時の計算量をより削減することがで
きる。Even when the constant c is set to zero, when the distance between the two patterns is calculated by the collating unit 7, the calculated city distance does not affect the value of the calculated distance. . The difference in the magnitude of the absolute value due to the constant c is not essential even in the Euclidean distance for calculating the square difference between two patterns. Further, according to the tenth aspect of the present invention, when storing the template in the dictionary 5, the template value is divided by the non-linear function constant b, so that the output of the data Is required to be divided by the constant b, but only the multiplication needs to be performed in the noise adding unit 6 at the time of speech recognition. Since the division is unnecessary, the calculation amount at the time of speech recognition can be further reduced.

【００２５】さらに、請求項１１の発明のように、第１
のデータ圧縮部４において圧縮された入力音声パターン
に非線型関数の定数ｂを乗算し、その結果を照合部７に
与えるように構成することにより、音声認識時、データ
圧縮部４の出力に定数ｂを乗算する必要があるが、雑音
付加部６においては乗算のみ行えばよく、除算が不必要
なため、１回の認識につき、テンプレートの数に対応し
た回数の除算が節約でき、音声認識時の計算量をより削
減することができる。Further, as in the eleventh aspect of the present invention, the first
By multiplying the input voice pattern compressed in the data compression unit 4 by a constant b of a non-linear function and providing the result to the collation unit 7, the output of the data compression unit 4 during voice recognition is a constant. b needs to be multiplied, but the noise adding unit 6 only needs to perform the multiplication and does not need to divide, so that one recognition can save the number of divisions corresponding to the number of templates. Can be further reduced.

【００２６】またさらに、請求項１２の発明のように、
第２のデータ圧縮部４’において圧縮された雑音パター
ンを非線型関数の定数ｂにより除算し、その結果を雑音
付加部６に与えるように構成することにより、音声認識
時、データ圧縮部４の出力を定数ｂで除算する必要があ
るが、雑音付加部６においては乗算のみ行えばよく、除
算が不必要なため、１回の認識につき、テンプレートの
数に対応した回数の除算が節約でき、音声認識時の計算
量をより削減することができる。Still further, as in the invention of claim 12,
By dividing the noise pattern compressed by the second data compression unit 4 ′ by a non-linear function constant b and providing the result to the noise addition unit 6, the data compression unit 4 Although it is necessary to divide the output by the constant b, it is sufficient to perform only the multiplication in the noise adding unit 6, and the division is unnecessary, so that the number of divisions corresponding to the number of templates can be saved for one recognition. The amount of calculation at the time of speech recognition can be further reduced.

【００２７】[0027]

【実施例】図２は本発明の第１の実施例を示す図であ
り、同図において、１１はアナログ／デジタル変換部、
１２は音声区間検出部、２および２’は第１および第２
の周波数分析部、３は定数決定部、４および４’は第１
および第２のデータ圧縮部、５は辞書、６１は雑音付加
部、７は照合部である。FIG. 2 is a diagram showing a first embodiment of the present invention. In FIG. 2, reference numeral 11 denotes an analog / digital converter,
12 is a voice section detector, and 2 and 2 'are first and second sections.
Frequency analysis section, 3 is a constant determination section, 4 and 4 'are first
And a second data compression unit, 5 is a dictionary, 61 is a noise addition unit, and 7 is a collation unit.

【００２８】同図において、アナログ／デジタル変換部
１１は音声入力信号をデジタル信号に変換する手段であ
り、その出力は音声区間検出部１２に与えられる。音声
区間検出部１２は入力信号のパワーなどの情報により音
声区間を決定する手段であり、音声区間検出部１２によ
り、入力音声信号の音声区間が検出され、音声区間は第
１の周波数分析部２に与えられる。In FIG. 1, an analog / digital converter 11 is a means for converting an audio input signal into a digital signal, and its output is given to an audio section detector 12. The voice section detection unit 12 is a means for determining a voice section based on information such as the power of an input signal. The voice section detection unit 12 detects a voice section of the input voice signal, and the voice section is determined by the first frequency analysis unit 2. Given to.

【００２９】なお、認識時にワード・スポッティング法
（識別の対象となるパターン信号と、テンプレートを比
較する際、一方の信号を他方の信号に対してずらしてい
き、最もその誤差が小さくなる位置における両者の距離
を求めることにより、識別の対象となるパターン信号と
テンプレート距離を定める手法）を用いる場合には音声
区間検出部１２は必ずしも必要ではない。In the recognition, the word spotting method (when comparing a pattern signal to be identified with a template, one signal is shifted with respect to the other signal, and the two signals are shifted at a position where the error is minimized. (A method of determining a pattern signal to be identified and a template distance by calculating the distance of the voice signal) is not necessarily required.

【００３０】第１の周波数分析部２および第２の周波数
分析部２’は、音声区間検出部１２の出力である音声信
号と雑音信号の両者に対して周波数分析を行う手段であ
り、例えば、１９チャンネルのバンド・パス・フィルタ
を用いて周波数分析を行い、各チャンネル毎にパワー・
スペクトルの大きさを出力する。第１の周波数分析部２
および第２の周波数分析部２’の出力はデータ圧縮部４
およびデータ圧縮部４’に与えられ、非線型関数ｆ
（ｘ）＝ｂ・ｅｘｐ（ａｘ）＋ｃによりデータ圧縮され
る。また、上記非線型関数の定数ａ，ｂ，ｃは定数決定
部３において決定される。The first frequency analysis unit 2 and the second frequency analysis unit 2 ′ are means for performing frequency analysis on both the audio signal and the noise signal output from the audio section detection unit 12. A frequency analysis is performed using a 19-channel band-pass filter, and the power
Outputs the magnitude of the spectrum. First frequency analysis unit 2
And the output of the second frequency analysis unit 2 ′ is the data compression unit 4
And a non-linear function f
Data compression is performed by (x) = b.exp (ax) + c. The constants a, b, and c of the nonlinear function are determined by the constant determining unit 3.

【００３１】定数決定部３は非線型関数ｆ（ｘ）＝ｂ・
ｅｘｐ（ａｘ）＋ｃが従来用いられてきた対数関数ｇ
（ｘ）＝ｌｏｇｘを近似するように定数ａ，ｂ，ｃを決
定する手段であり、近似の手法としては次の方法等を用
いることができる。ｆ（ｘ）とｇ（ｘ）の最大誤差を最小にするような
定数ａ，ｂ，ｃを決定する方法。ｆ（ｘ）とｇ（ｘ）の差の絶対値の積分値を最小に
するような定数ａ，ｂ，ｃを決定する方法。ｆ（ｘ）とｇ（ｘ）の２乗誤差の積分値を最小にす
るような定数ａ，ｂ，ｃを決定する方法。ｆ（ｘ）とｇ（ｘ）をテイラー展開して、３項まで
の各項の係数どうしが等しいとした時の定数ａ，ｂ，ｃ
による３元連立方程式によって３つの定数を決定する方
法。The constant determining unit 3 calculates the nonlinear function f (x) = b ·
exp (ax) + c is the logarithmic function g conventionally used
This is means for determining constants a, b, and c so as to approximate (x) = logx, and the following method can be used as an approximation method. A method of determining constants a, b, and c so as to minimize the maximum error between f (x) and g (x). A method of determining constants a, b, and c that minimizes the integral value of the absolute value of the difference between f (x) and g (x). A method of determining constants a, b, and c that minimizes the integral value of the square error between f (x) and g (x). f (x) and g (x) are Taylor-expanded, and constants a, b, and c when coefficients of each term up to three terms are assumed to be equal
A method of determining three constants by a ternary simultaneous equation according to

【００３２】図３は下式で表される対数関数（ａ）と非
線型関数（ｂ）ないし（ｃ）を示した図である。（ａ）ｆ（ｘ）＝ｌｏｇ（ｘ）（ｂ）ｆ（ｘ）＝ｂ・ｅｘｐ（ａｘ）＋ｃ（ｃ＝０の場
合）（ｃ）ｆ（ｘ）＝ｂ・ｅｘｐ（ａｘ）−ｂｅ^a（（ｂ）
においてｃ＝−ｂｅ^aの場合）同図から明らかなように、定数ａ，ｂ，ｃを選定するこ
とにより、対数関数に近似した非線型関数を得ることが
できる。FIG. 3 is a diagram showing a logarithmic function (a) and nonlinear functions (b) to (c) represented by the following equations. (A) f (x) = log (x) (b) f (x) = b · exp (ax) + c (when c = 0) (c) f (x) = b · exp (ax) −be ^a ((b)
In the case of c = −be ^a ) As is clear from the figure, by selecting the constants a, b, and c, it is possible to obtain a non-linear function approximating a logarithmic function.

【００３３】図２の辞書５には、音声登録時に入力され
たテンプレートが格納されており、音声認識時、辞書５
よりテンプレートが読み出される。雑音付加部６１は辞
書より読み出されたテンプレートに雑音を付加する手段
であり、また、照合部７はデータ圧縮部４が出力する音
声パターン信号との距離を計算し、入力音声の認識結果
を出力する手段である。The dictionary 5 shown in FIG. 2 stores a template input at the time of voice registration.
The template is read out. The noise adding unit 61 is a means for adding noise to the template read from the dictionary. The matching unit 7 calculates the distance from the voice pattern signal output from the data compression unit 4, and determines the recognition result of the input voice. Output means.

【００３４】次に図２の第１の実施例の動作を説明す
る。音声の登録時、音声が入力されると、入力された音
声はアナログ／デジタル変換部１１によりデジタル信号
に変換され音声区間検出部１２により音声区間が検出さ
れる。音声区間検出部１２の出力は周波数分析部２にお
いて、バンド・パス・フィルタにより各チャンネルごと
に音声が分析され、パワー・スペクトルの大きさが出力
される。Next, the operation of the first embodiment shown in FIG. 2 will be described. When voice is registered, when the voice is input, the input voice is converted into a digital signal by the analog / digital converter 11 and the voice section is detected by the voice section detector 12. The output of the voice section detection unit 12 is analyzed by a frequency analysis unit 2 for each channel by a band pass filter, and the magnitude of the power spectrum is output.

【００３５】音声の分析結果は、データ圧縮部４に与え
られ、非線型関数ｆ（ｘ）＝ｂ・ｅｘｐ（ａｘ）＋ｃに
よりデータ圧縮が行われて、テンプレートとして辞書５
に登録される。音声の認識時には、音声が入力される
と、入力された音声はアナログ／デジタル変換部１１に
よりデジタル信号に変換され音声区間検出部１２により
音声区間が検出される。The analysis result of the voice is given to the data compression unit 4, where the data is compressed by the nonlinear function f (x) = bexp (ax) + c, and the dictionary 5 is used as a template.
Registered in. At the time of voice recognition, when a voice is input, the input voice is converted into a digital signal by an analog / digital converter 11 and a voice section is detected by a voice section detector 12.

【００３６】音声区間検出部１２により検出された、雑
音が重畳された音声信号は周波数分析部２により、上記
と同様、周波数分析され、データ圧縮部４において、非
線型関数ｆ（ｘ）＝ｂ・ｅｘｐ（ａｘ）＋ｃによりデー
タ圧縮が行われて、照合部７に与えられる。一方、雑音
信号は音声区間検出部１２より周波数分析部２’に与え
られ、上記と同様、周波数分析されて、データ圧縮部
４’において非線型関数ｆ（ｘ）＝ｂ・ｅｘｐ（ａｘ）
＋ｃによりデータ圧縮が行われて、雑音付加部６１に与
えられる。The speech signal on which noise is superimposed detected by the speech section detection unit 12 is subjected to frequency analysis by the frequency analysis unit 2 in the same manner as described above, and the data compression unit 4 performs a nonlinear function f (x) = b Data compression is performed by exp (ax) + c, and the result is provided to the matching unit 7. On the other hand, the noise signal is provided from the voice section detection unit 12 to the frequency analysis unit 2 ', and is subjected to frequency analysis in the same manner as described above, and is subjected to the nonlinear function f (x) = b.exp (ax) in the data compression unit 4'.
The data is compressed by + c, and is given to the noise adding unit 61.

【００３７】雑音付加部６１においては、辞書５より読
み出されたテンプレートに圧縮された雑音成分を付加し
照合部７に与える。ここで、ｓを音声のパワー・スペク
トル、ｎを雑音のパワー・スペクトルとし、また上記し
た非線型関数ｆ（ｘ）＝ｂ・ｅｘｐ（ａｘ）＋ｃによる
音声のパワー・スペクトルの変換値をｆ（ｓ）、雑音の
パワー・スペクトルの変換値をｆ（ｎ）とすると、音声
に雑音が重畳されたパワー・スペクトルｓ＋ｎの変換値
ｆ（ｓ＋ｎ）は次式により求めることができる。The noise adding section 61 adds a compressed noise component to the template read from the dictionary 5 and supplies the template to the matching section 7. Here, s is the power spectrum of the voice, n is the power spectrum of the noise, and the converted value of the power spectrum of the voice by the nonlinear function f (x) = bexp (ax) + c is f ( s), assuming that the converted value of the noise power spectrum is f (n), the converted value f (s + n) of the power spectrum s + n in which the noise is superimposed on the voice can be obtained by the following equation.

【００３８】ｆ（ｎ＋ｓ）＝ｂ・ｅｘｐ（ａｎ＋ａｓ）＋ｃ＝ｂ・ｅｘｐ（ａｎ）・ｅｘｐ（ａｓ）＋ｃ＝〔｛ｂ・ｅｘｐ（ａｎ）＋ｃ｝−ｃ〕・〔｛ｂ・ｅｘｐ（ａｓ）＋ｃ｝−ｃ〕／ｂ＋ｃ＝｛ｆ（ｎ）−ｃ｝・｛ｆ（ｓ）−ｃ｝／ｂ＋ｃしたがって、雑音付加部６１においては、辞書５から読
み出された音声の変換値ｆ（ｓ）から定数ｃを減算する
とともに、データ圧縮部４’の出力である、雑音の変換
値ｆ（ｎ）から定数ｃを減算し、その積を定数ｂで割
り、さらに、定数ｃを加えることにより、雑音が重畳さ
れたパターン信号を求めることができ、この信号は照合
部７に与えられる。F (n + s) = b · exp (an + as) + c = b · exp (an) · exp (as) + c = [{b · exp (an) + c} −c] · [｛b · exp (as ) + C｝ −c] / b + c = {f (n) −c} · {f (s) −c} / b + c Therefore, in the noise adding unit 61, the converted value f ( subtracting the constant c from s), subtracting the constant c from the noise conversion value f (n), which is the output of the data compression unit 4 ', dividing the product by the constant b, and further adding the constant c. Thus, a pattern signal on which noise is superimposed can be obtained, and this signal is provided to the matching unit 7.

【００３９】照合部７においては、雑音付加部６１の出
力であるテンプレートに認識時の雑音を加えたパターン
信号と、データ圧縮部４の出力である分析後の雑音が重
畳された音声パターンとを、例えば、ＤＰ（ダイナミッ
ク・プログラミング）マッチング手法などを用いてマッ
チングを行い、認識結果を出力する。本実施例において
は、変換関数として非線型関数ｆ（ｘ）＝ｂ・ｅｘｐ
（ａｘ）＋ｃを用いているので、四則演算の計算だけで
パワー・スペクトル次元の雑音成分の付加ができ、図９
に示した従来例のように、１テンプレート当たり２回の
非線型処理を行う必要がなく、計算量を大幅に削減する
ことができる。The collating unit 7 compares a pattern signal obtained by adding noise at the time of recognition to the template output from the noise adding unit 61 and a voice pattern on which the analyzed noise output from the data compressing unit 4 is superimposed. For example, matching is performed using a DP (dynamic programming) matching method or the like, and a recognition result is output. In the present embodiment, a non-linear function f (x) = b · exp as a conversion function
Since (ax) + c is used, the noise component of the power spectrum dimension can be added only by the calculation of the four arithmetic operations.
It is not necessary to perform the non-linear processing twice per template as in the conventional example shown in (1), and the amount of calculation can be greatly reduced.

【００４０】図４は本発明の第２の実施例を示す図であ
り、同図は、図２に示した第１の実施例に認識結果計数
部９を付加したものであり、その他の構成は図２の実施
例と同一である。同図において、認識結果計数部９は照
合部７における照合結果を計数し、照合部における認識
率が最良になるように、定数決定部３において決定され
る非線型関数ｆ（ｘ）＝ｂ・ｅｘｐ（ａｘ）＋ｃの定数
を変化させる手段である。FIG. 4 is a diagram showing a second embodiment of the present invention. FIG. 4 shows a configuration in which a recognition result counting section 9 is added to the first embodiment shown in FIG. Is the same as the embodiment of FIG. In the figure, a recognition result counting unit 9 counts the matching result in the matching unit 7, and the nonlinear function f (x) = b · determined in the constant determining unit 3 so that the recognition rate in the matching unit becomes the best. This is a means for changing the constant of exp (ax) + c.

【００４１】図２に示した第１の実施例においては、非
線型関数ｆ（ｘ）＝ｂ・ｅｘｐ（ａｘ）＋ｃの定数を、
対数関数ｇ（ｘ）＝ｌｏｇｘに非線型関数ｆ（ｘ）が近
似するように決定していたが、従来、非線型関数として
対数関数を用いていた根拠は認識率とは厳密な関係はな
く、認識率の点から見ると、対数関数がかならずしも最
良のものとは言えない。In the first embodiment shown in FIG. 2, the constant of the nonlinear function f (x) = b.exp (ax) + c is
Although the non-linear function f (x) is determined to approximate the logarithmic function g (x) = logx, the reason for using the logarithmic function as the non-linear function conventionally has no strict relation to the recognition rate. However, the logarithmic function is not always the best in terms of recognition rate.

【００４２】逆に、非線型関数ｆ（ｘ）においては、３
つの定数を変えることができるので、これらの定数を本
実施例のように認識率に応じて変えることにより、認識
率が最良になるデータ圧縮を行うことが可能となる。図
５は本発明の第３の実施例であり、本実施例において
は、図２の実施例の第１および第２のデータ圧縮部４、
４’、雑音付加部６１を第１および第２のデータ圧縮部
４２，４２’、雑音付加部６２に変更したものであり、
その他の構成は図２の第１の実施例と同一である。Conversely, for the nonlinear function f (x), 3
Since the two constants can be changed, by changing these constants according to the recognition rate as in this embodiment, it is possible to perform data compression with the best recognition rate. FIG. 5 shows a third embodiment of the present invention. In this embodiment, the first and second data compression units 4 and 4 of the embodiment of FIG.
4 ′, the noise adding unit 61 is changed to first and second data compressing units 42 and 42 ′ and a noise adding unit 62,
Other configurations are the same as those of the first embodiment shown in FIG.

【００４３】本実施例のデータ圧縮部４２，４２’にお
いては、非線型関数ｆ（ｘ）＝ｂ・ｅｘｐ（ａｘ）＋ｃ
におけるｃの加算がなくなり、また、雑音付加部６２に
おいてはｃの減算がなくなっており、図２の実施例のも
のと比べ更に計算量が削減されている。本実施例のよう
にｃを省略しても、２つのパターンの距離を照合部７に
おいて算出する際、２つのパターンの差を計算している
ため、算出された距離の値には影響しない。In the data compression units 42 and 42 'of this embodiment, the non-linear function f (x) = b.exp (ax) + c
In the noise adding unit 62, the addition of c is eliminated, and the subtraction of c is eliminated, so that the calculation amount is further reduced as compared with the embodiment of FIG. Even if c is omitted as in the present embodiment, the difference between the two patterns is calculated when the distance between the two patterns is calculated in the matching unit 7, and thus does not affect the value of the calculated distance.

【００４４】例えば、照合部７における、距離尺度とし
て市街地距離を用いるとすると、２つのパターンの距離
は下式のように算出される。｜ｆ（ｘ１）−ｆ（ｘ２）｜＝｜ｂ・ｅｘｐ（ａｘ１）＋ｃ−（ｂ・ｅｘｐ（ａｘ２）＋ｃ）｜＝｜ｂ・ｅｘｐ（ａｘ１）−（ｂ・ｅｘｐ（ａｘ２））｜上記式から明らかなように、距離計算においては、２つ
のパターンの距離はｃに依存しない。For example, assuming that the city area distance is used as the distance scale in the collating unit 7, the distance between the two patterns is calculated as follows. | F (x1) −f (x2) | = | b · exp (ax1) + c− (b · exp (ax2) + c) | = | b · exp (ax1) − (b · exp (ax2)) | As is apparent from the formula, in the distance calculation, the distance between the two patterns does not depend on c.

【００４５】図６は本発明の第４の実施例を示す図であ
り、本実施例においては、図５の第３の実施例の雑音付
加部６２を雑音付加部６３に変更するとともに、データ
整合部１０１を付加したものであり、その他の構成は図
５の第３の実施例と同一である。図６において、音声登
録時、データ圧縮部４２の出力はデータ整合部１０１に
おいて、定数ｂで割り算され辞書５にテンプレートとし
て格納される。FIG. 6 is a diagram showing a fourth embodiment of the present invention. In this embodiment, the noise adding section 62 of the third embodiment shown in FIG. The configuration is the same as that of the third embodiment shown in FIG. 5 except that a matching unit 101 is added. 6, at the time of voice registration, the output of the data compression unit 42 is divided by a constant b in the data matching unit 101 and stored in the dictionary 5 as a template.

【００４６】したがって、前記したように、ｓを音声の
パワー・スペクトル、非線型関数ｆ（ｘ）＝ｂ・ｅｘｐ
（ａｘ）による音声のパワー・スペクトルの変換値をｆ
（ｓ）とすると、辞書５にはｆ（ｓ）／ｂ＝ｅｘｐ（ａ
ｓ）で表されるテンプレートが格納される。そして、音
声認識時、辞書５よりｆ（ｓ）／ｂ＝ｅｘｐ（ａｓ）で
表されるテンプレートが読み出され、雑音付加部６３に
おいて、データ圧縮部４２が出力するｆ（ｎ）＝ｂ・ｅ
ｘｐ（ａｎ）と乗算されるので、雑音付加部６３の出力
は、下式に示すようになる。Therefore, as described above, s is the power spectrum of the voice, and the nonlinear function f (x) = b · exp
The converted value of the power spectrum of the voice by (ax) is f
(S), the dictionary 5 contains f (s) / b = exp (a
The template represented by s) is stored. Then, at the time of speech recognition, a template represented by f (s) / b = exp (as) is read from the dictionary 5, and the noise adding unit 63 outputs f (n) = b · e
Since it is multiplied by xp (an), the output of the noise adding unit 63 is as shown in the following expression.

【００４７】ｆ（ｎ）・ｆ（ｓ）／ｂ＝ｅｘｐ（ａｓ）・ｂ・ｅｘｐ（ａｎ）＝ｂ・ｅｘｐ（ａｓ＋ａｎ）＝ｆ（ｎ＋ｓ）雑音付加部６３の出力は、照合部７に与えられ、第３の
実施例の場合と同様、データ圧縮部４２の出力との距離
が求められ、認識結果が出力される。本実施例において
は、音声登録時、データ圧縮部４２の出力を定数ｂによ
り除算する必要があるが、音声認識時、雑音付加部６３
においては乗算のみ行えばよく、除算が不必要なため、
音声認識時の計算量をより削減することができる。F (n) · f (s) / b = exp (as) · b · exp (an) = b · exp (as + an) = f (n + s) The output of the noise adding unit 63 is sent to the matching unit 7. The distance from the output of the data compression unit 42 is obtained as in the case of the third embodiment, and the recognition result is output. In the present embodiment, it is necessary to divide the output of the data compression unit 42 by a constant b at the time of voice registration.
In, only multiplication needs to be performed, and division is not necessary.
The amount of calculation at the time of speech recognition can be further reduced.

【００４８】図７は本発明の第５の実施例を示す図であ
り、本実施例においては、図５の第３の実施例の雑音付
加部６２を雑音付加部６３に変更するとともに、データ
整合部１０２を付加したものであり、その他の構成は図
５の第３の実施例と同一である。図７において、音声認
識時、辞書５よりｆ（ｓ）＝ｂ・ｅｘｐ（ａｓ）に対応
したテンプレートが読み出され、雑音付加部６３におい
て、データ圧縮部４２が出力するｆ（ｎ）＝ｂ・ｅｘｐ
（ａｎ）と乗算され、雑音付加部６３の出力は、下式に
示すようになる。FIG. 7 is a diagram showing a fifth embodiment of the present invention. In this embodiment, the noise adding section 62 of the third embodiment shown in FIG. The configuration is the same as that of the third embodiment shown in FIG. 5 except that a matching unit 102 is added. In FIG. 7, at the time of speech recognition, a template corresponding to f (s) = b · exp (as) is read from the dictionary 5, and f (n) = b output from the data compression unit 42 in the noise addition unit 63.・ Exp
(An), and the output of the noise adding unit 63 is as shown in the following expression.

【００４９】ｆ（ｎ）・ｆ（ｓ）＝ｂ・ｅｘｐ（ａｓ）・ｂ・ｅｘｐ（ａｎ）＝ｂ²・ｅｘｐ（ａｓ＋ａｎ）＝ｂ・ｆ（ｎ＋ｓ）一方、データ圧縮部４２の出力にデータ整合部１０２に
おいて、定数ｂが乗算されるので、データ整合部１０２
の出力はｂ・ｆ（ｎ＋ｓ）となる。雑音付加部６３の出
力は、照合部７に与えられ、第３の実施例の場合と同
様、データ整合部１０２の出力との距離が求められ、認
識結果が出力される。F (n) · f (s) = b · exp (as) · b · exp (an) = b ² • exp (as + an) = b · f (n + s) On the other hand, the output of the data compression unit 42 Since the data matching unit 102 multiplies by the constant b, the data matching unit 102
Is b · f (n + s). The output of the noise adding unit 63 is supplied to the matching unit 7, where the distance from the output of the data matching unit 102 is obtained and the recognition result is output, as in the third embodiment.

【００５０】本実施例においては、音声認識時、データ
圧縮部４２の出力に定数ｂを乗算する必要があるが、雑
音付加部６３においては乗算のみ行えばよく、除算が不
必要なため、１回の認識につき、テンプレートの数に対
応した回数の除算が節約でき、音声認識時の計算量をよ
り削減することができる。図８は本発明の第６の実施例
を示す図であり、本実施例においては、図５の第３の実
施例の雑音付加部６２を雑音付加部６３に変更するとと
もに、データ整合部１０３を付加したものであり、その
他の構成は図５の第３の実施例と同一である。In the present embodiment, it is necessary to multiply the output of the data compression unit 42 by a constant b at the time of speech recognition. However, in the noise addition unit 63, only the multiplication needs to be performed. With respect to the number of times of recognition, division of the number of times corresponding to the number of templates can be saved, and the amount of calculation at the time of speech recognition can be further reduced. FIG. 8 is a diagram showing a sixth embodiment of the present invention. In this embodiment, the noise adding unit 62 of the third embodiment shown in FIG. The other configuration is the same as that of the third embodiment shown in FIG.

【００５１】図８において、音声認識時、データ圧縮部
４２’の出力ｆ（ｎ）＝ｂ・ｅｘｐ（ａｎ）がデータ整
合部１０３において、定数ｂで除算されてｆ（ｎ）／ｂ
＝ｅｘｐ（ａｎ）が求められ、雑音付加部６３に与えら
れる。雑音付加部６３においては、辞書５より読み出さ
れたｆ（ｓ）＝ｂ・ｅｘｐ（ａｓ）とデータ整合部１０
３が出力するｆ（ｎ）／ｂ＝ｅｘｐ（ａｎ）とが乗算さ
れ、雑音付加部６３の出力は、下式に示すようになる。In FIG. 8, at the time of voice recognition, the output f (n) = b.exp (an) of the data compression unit 42 'is divided by the constant b in the data matching unit 103 to obtain f (n) / b
= Exp (an) is obtained and given to the noise adding unit 63. In the noise adding unit 63, f (s) = b · exp (as) read from the dictionary 5 and the data matching unit 10
3 is multiplied by f (n) / b = exp (an), and the output of the noise adding unit 63 is as shown in the following equation.

【００５２】ｆ（ｎ）・ｆ（ｓ）／ｂ＝ｂ・ｅｘｐ（ａｓ）・ｅｘｐ（ａｎ）＝ｂ・ｅｘｐ（ａｓ＋ａｎ）＝ｆ（ｎ＋ｓ）雑音付加部６３の出力は、照合部７に与えられ、第３の
実施例の場合と同様、データ整合部１０２の出力との距
離が求められ、認識結果が出力される。本実施例におい
ては、音声認識時、データ圧縮部４２の出力を定数ｂで
除算する必要があるが、雑音付加部６３においては乗算
のみ行えばよく、除算が不必要なため、１回の認識につ
き、テンプレートの数に対応した回数の除算が節約で
き、音声認識時の計算量をより削減することができる。F (n) · f (s) / b = b · exp (as) · exp (an) = b · exp (as + an) = f (n + s) The output of the noise adding unit 63 is sent to the matching unit 7. As in the case of the third embodiment, the distance from the output of the data matching unit 102 is obtained, and the recognition result is output. In this embodiment, at the time of speech recognition, it is necessary to divide the output of the data compression unit 42 by a constant b. However, in the noise addition unit 63, only the multiplication needs to be performed. Therefore, the number of divisions corresponding to the number of templates can be saved, and the amount of calculation at the time of speech recognition can be further reduced.

【００５３】[0053]

【発明の効果】以上説明したことから明らかなように、
本発明においては、変換関数として非線型関数ｆ（ｘ）
＝ｂ・ｅｘｐ（ａｘ）＋ｃを用いているので、四則演算
の計算だけでパワー・スペクトル次元の雑音成分の付加
ができ、計算量を大幅に削減することができる。As is apparent from the above description,
In the present invention, the nonlinear function f (x) is used as the conversion function.
Since b = exp (ax) + c is used, the noise component of the power spectrum dimension can be added only by calculation of the four arithmetic operations, and the amount of calculation can be greatly reduced.

【００５４】また、認識結果計数部を設けて照合部にお
ける認識結果を計数し、その認識率に基づき非線形関数
の定数ａ，ｂ，ｃを変化させることにより、照合部にお
ける認識率が最良になるデータ圧縮を行うことができ
る。さらに、非線形関数の定数ｃをゼロに設定すること
により、データ圧縮部における定数ｃの加算、および、
雑音付加部において定数ｃの減算が必要なくなり、更に
計算量を削減することができる。Also, a recognition result counting section is provided to count the recognition results in the matching section, and the constants a, b, and c of the nonlinear function are changed based on the recognition rate, whereby the recognition rate in the matching section is optimized. Data compression can be performed. Further, by setting the constant c of the nonlinear function to zero, the addition of the constant c in the data compression unit, and
In the noise adding unit, the subtraction of the constant c is not required, and the calculation amount can be further reduced.

[Brief description of the drawings]

【図１】本発明の原理図である。FIG. 1 is a principle diagram of the present invention.

【図２】本発明の第１の実施例を示す図である。FIG. 2 is a diagram showing a first embodiment of the present invention.

【図３】対数関数と非線形関数を示す図である。FIG. 3 is a diagram showing a logarithmic function and a non-linear function.

【図４】本発明の第２の実施例を示す図である。FIG. 4 is a diagram showing a second embodiment of the present invention.

【図５】本発明の第３の実施例を示す図である。FIG. 5 is a diagram showing a third embodiment of the present invention.

【図６】本発明の第４の実施例を示す図である。FIG. 6 is a diagram showing a fourth embodiment of the present invention.

【図７】本発明の第５の実施例を示す図である。FIG. 7 is a diagram showing a fifth embodiment of the present invention.

【図８】本発明の第６の実施例を示す図である。FIG. 8 is a diagram showing a sixth embodiment of the present invention.

【図９】従来例を示す図である。FIG. 9 is a diagram showing a conventional example.

[Explanation of symbols]

１音声入力部２，２’ 周波数分析部３定数決定部４，４’，４２，４２’ データ圧縮部５辞書６，６１，６２，６３雑音付加部７照合部９認識結果計数部１１アナログ／デジタル変換部１２音声区間検出部１０１，１０２，１０３データ整合部 DESCRIPTION OF SYMBOLS 1 Voice input part 2, 2 'Frequency analysis part 3 Constant determination part 4, 4', 42, 42 'Data compression part 5 Dictionary 6, 61, 62, 63 Noise addition part 7 Collation part 9 Recognition result counting part 11 Analog / Digital conversion section 12 Voice section detection section 101, 102, 103 Data matching section

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩＧ１０Ｌ 21/02 (56)参考文献古井貞煕，ディジタル音声処理，日本，東海大学出版会，1985年９月25日, ｐ．99−100 (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 15/02 G10L 15/06 G10L 15/10 G10L 21/02 ＪＩＣＳＴファイル（ＪＯＩＳ)──────────────────────────────────────────────────続き Continuation of the front page (51) Int.Cl. ⁷ Identification code FI G10L 21/02 (56) References Sadahiro Furui, Digital Speech Processing, Japan, Tokai University Press, September 25, 1985, p. 99-100 (58) Field surveyed (Int. Cl. ⁷ , DB name) G10L 15/02 G10L 15/06 G10L 15/10 G10L 21/02 JICST file (JOIS)

Claims

(57) [Claims]

An audio input unit for converting a generated audio sound signal into an electric signal, a frequency analysis of the audio input signal, and an output of an input audio pattern composed of analysis data of a plurality of channels for each analysis frame. The first and second frequency analyzers (2, 2 ′) that perform the analysis and the frequency patterns analyzed by the first and second frequency analyzers (2, 2 ′) are dynamically converted by nonlinear conversion.
First and second data compression units (4, 4 ') for compressing data while maintaining a range, a dictionary (5) for storing a template created from learning data after data compression, and a second dictionary for speech recognition. A noise adding unit (6) for adding a noise component obtained from the output of the data compression unit (4 ') to the template in the dimension of the power spectrum; and an input voice pattern compressed in the first data compression unit (4). And a collation unit (7) for comparing the output of the noise addition unit (6) with the output of the noise addition unit (6), and calculating a similarity or distance between the two. The following equation is used as the nonlinear function in (4, 4 ′), and f (x) = b · exp (ax) + c (where a, b, and c are constants). The constants a, b, and c of the nonlinear function are determined. Constant determining unit
(3), and the first and second data compression units (4, 4 ′) use the nonlinear function to convert the frequency patterns output by the first and second frequency analysis units (2, 2 ′) by the first and second data compression units (4, 4 ′). A speech recognition method characterized by data compression.

2. A speech recognition system according to claim 1, wherein said noise adding section adds noise to the template read from the dictionary according to the following equation. x3 = (x1-c) · (x2-c) / b + c where: x1: value of template read from dictionary (5) x2: noise component obtained from data compression unit (4 ′) x3: noise addition Output of part (6)

3. Within the range of x handled by the nonlinear function f (x), the nonlinear function f (x) and the logarithmic function log
Constants a, b, c so that the maximum error with (x) is minimized
The speech recognition method according to claim 1 or 2, wherein:

4. Within the range of x handled by the nonlinear function f (x), the nonlinear function f (x) and the logarithmic function log
3. The speech recognition system according to claim 1, wherein the constants a, b, and c are determined so that the integral value of the absolute value of the difference from (x) is minimized.

5. Within the range of x handled by the nonlinear function f (x), the nonlinear function f (x) and the logarithmic function log
3. The speech recognition method according to claim 1, wherein the constants a, b, and c are determined so that an integral value of a square error with (x) is minimized.

6. Within the range of x handled by the nonlinear function f (x), the nonlinear function f (x) and the logarithmic function log
(X) is subjected to Taylor expansion, and constants a, b, and c are determined by simultaneous equations for constants a, b, and c when the coefficients of the respective terms up to three terms are assumed to be equal. The speech recognition system according to claim 1 or 2.

7. A recognition result counting unit for counting recognition results in a collating unit, and changing constants a, b, and c of a nonlinear function based on the recognition rate to determine an optimum constant. The speech recognition device according to claim 1 or 2, wherein

8. The speech recognition apparatus according to claim 1, wherein a constant c of the nonlinear function is set to zero.

9. An output of the noise adding unit (6) is obtained by multiplying a template read from the dictionary (5) by a noise component obtained from the second data compression unit (4 '). 9. The speech recognition method according to claim 8, wherein:

10. The speech recognition method according to claim 9, wherein, when storing the template in the dictionary (5), the template value is divided by a constant b of a nonlinear function.

11. The method according to claim 9, wherein the input voice pattern compressed in the first data compression section is multiplied by a constant b of a non-linear function, and the result is given to a collation section. Voice recognition system.

12. The noise pattern compressed by the second data compression unit (4 ') is divided by a non-linear function constant b, and the result is given to the noise addition unit (6). Nine voice recognition methods.