JPH0449715B2

JPH0449715B2 -

Info

Publication number: JPH0449715B2
Application number: JP58007782A
Authority: JP
Inventors: Katsuyuki Futayada; Satoshi Fujii; Hideji Morii
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1983-01-19
Filing date: 1983-01-19
Publication date: 1992-08-12
Also published as: JPS59132000A

Description

【発明の詳細な説明】産業上の利用分野本発明は音声認識における音声の標準パターン
作成法に関するものである。DETAILED DESCRIPTION OF THE INVENTION Field of Industrial Application The present invention relates to a method for creating standard speech patterns in speech recognition.

従来例の構成とその問題点話者を限定しない音声認識装置において、音声
を認識する前段階として音素の認識を行なうのが
有効である。第１図に音素認識を行なう部分のブ
ロツク図を示す。１は比較部、２は音素標準パタ
ーン格納部である。音素標準パターン格納部２に
は各音素に対応する標準的な特徴パラメータが音
素の数だけ入つている。入力特徴パラメータが比
較部１で各音素の標準パターンと比較され、最も
類似度の大きい音素の記号または番号が結果とし
て出力される。Configuration of conventional example and its problems In a speech recognition device that does not limit speakers, it is effective to perform phoneme recognition as a step before recognizing speech. FIG. 1 shows a block diagram of the part that performs phoneme recognition. 1 is a comparison section, and 2 is a phoneme standard pattern storage section. The phoneme standard pattern storage section 2 stores standard feature parameters corresponding to each phoneme in the number of phonemes. The input feature parameters are compared with the standard pattern of each phoneme in the comparison unit 1, and the symbol or number of the phoneme with the highest degree of similarity is output as a result.

不特定話者を対象とするシステムでは、標準パ
ターンは多くの人のデータを使用して、あらかじ
め作成しておく必要がある。すなわち、使用環境
のもとで作成することはできない。一方、入力音
声は環境の影響やマイクロホンの特性の影響を受
けるため、必ずしも標準パターンを作成した環境
と同じ条件で使用されるとは限らない。このため
に、入力音声と標準パターンとのマツチングがう
まくゆかず、誤認識となる場合がある。環境やマ
イクロホン特性の影響に起因する、この種の誤認
識に対して、従来の方法では対処がなされていな
い。 In systems that target unspecified speakers, standard patterns must be created in advance using data from many people. In other words, it cannot be created under the usage environment. On the other hand, input audio is affected by the environment and microphone characteristics, so it is not necessarily used under the same conditions as the environment in which the standard pattern was created. For this reason, matching between the input voice and the standard pattern may not be successful, resulting in erroneous recognition. Conventional methods do not deal with this type of misrecognition caused by the influence of the environment or microphone characteristics.

発明の目的本発明の目的は、騒音やマイクロホンの特性を
考慮した標準パターンを作成し、それを使用する
ことによつて、上記問題点の解決することであ
る。OBJECT OF THE INVENTION An object of the present invention is to solve the above-mentioned problems by creating a standard pattern that takes noise and microphone characteristics into consideration and using it.

発明の構成本発明は上記目的を達成するもので、音声認識
における標準パターン作成法に関して、環境騒音
をモデル騒音とし、またマイクロホンの騒音に対
する特性をフイルタの周波数特性で近似し、モデ
ル騒音に対して前記騒音用のフイルタを適用して
マイクロホン特性を考慮した騒音データを作成す
る。次に音声データと前記マイクロホン特性を考
慮した騒音データを信号対雑音比が一定になるよ
うに加えてノイズが付加した音声データを作成
し、このデータを使用して音声標準パターンを作
成することを特徴とする音声の標準パターン作成
法を提供するものである。Composition of the Invention The present invention achieves the above object, and relates to a standard pattern creation method for speech recognition, in which environmental noise is used as a model noise, and the characteristics of a microphone with respect to noise are approximated by the frequency characteristics of a filter. The noise filter is applied to create noise data that takes microphone characteristics into consideration. Next, add the audio data and noise data that takes into account the microphone characteristics so that the signal-to-noise ratio is constant, create audio data with added noise, and use this data to create an audio standard pattern. It provides a method for creating standard patterns of distinctive voices.

実施例の説明本発明の一実施例による標準パターン作成法に
ついて述べる。DESCRIPTION OF EMBODIMENTS A standard pattern creation method according to an embodiment of the present invention will be described.

騒音をモデル騒音（たとえばHOTHスペク
トル騒音）と仮定し、騒音データを用意する。 Assume that the noise is model noise (for example, HOTH spectrum noise) and prepare noise data.

マイクロホンの騒音に対する周波数特性（遠
距離特性）を近似するフイルタを設計する。 Design a filter that approximates the frequency characteristics (long-distance characteristics) of microphone noise.

マイクロホンの音声に対する周波数特性（近
距離特性）を近似するフイルタを設計する。 Design a filter that approximates the frequency characteristics (near-field characteristics) of the microphone's voice.

上記に対してのフイルタを適用し、マイ
クロホン特性を考慮した騒音データを作成す
る。 A filter is applied to the above to create noise data that takes microphone characteristics into consideration.

防音室内で周波数特性が平坦なマイクロホン
を使用して収録した音声データ（クリーンデー
タ）に対して、のフイルタを適用し、マイク
ロホン特性を考慮した音声データを作成する。 The filter is applied to audio data (clean data) recorded using a microphone with flat frequency characteristics in a soundproof room to create audio data that takes into account the microphone characteristics.

で作成した音声データとで作成した騒音
データを、音声区間での信号対雑音比（Ｓ／Ｎ
比）が一定となるように加え合わせ、騒音入り
のデータを作成する。 The voice data created in 1 and the noise data created in
(ratio) is added so that the ratio is constant to create data with noise included.

騒音入りデータを使つて標準パターンを作
る。 Create a standard pattern using data with noise.

上記の手順で作成した標準パターンを使用すれ
ば、騒音の統計的な性質とマイクロホンの特性が
考慮されているため、これらの影響を少なくする
ことができ、誤認識が少なくなる。 If the standard pattern created in the above procedure is used, the statistical properties of the noise and the characteristics of the microphone are taken into account, so the effects of these can be reduced and misrecognitions are reduced.

具体的実施例についてさらに詳しく説明する。 Specific examples will be described in more detail.

騒音のスペクトル特性は音声認識装置が使用さ
れる環境によつて差異があるが、統計的に環境騒
音はHOTHスペクトル特性を示すことが知られ
ている。第２図に実線３でHOTHスペクトル特
性を示す。モデル騒音としてHOTHスペクトル
騒音を用いれば、それは環境騒音の特性を最もよ
く代表している。本実施例では−6dB／octの周
波数特性の騒音（第２図の破線４）でHOTHス
ペクトル特性を近似する。 Although the spectral characteristics of noise differ depending on the environment in which a speech recognition device is used, it is known that environmental noise statistically exhibits HOTH spectral characteristics. In Fig. 2, solid line 3 shows the HOTH spectrum characteristics. If we use HOTH spectrum noise as the model noise, it best represents the characteristics of environmental noise. In this embodiment, the HOTH spectrum characteristic is approximated by noise having a frequency characteristic of -6 dB/oct (broken line 4 in FIG. 2).

次にマイクロホンの周波数特性を近似する方法
を述べる。ここでは例として接話型マイクロホン
を使用した場合について説明する。 Next, we will explain how to approximate the frequency characteristics of a microphone. Here, a case where a close-talk type microphone is used will be explained as an example.

第３図は代表的な接話型マイクロホンの周波数
特性を示したものである。細実線５は近距離特性
であり、入力音声に対する特性である。太実線６
は遠距離特性であり、環境騒音に対する特性であ
る。第３図の例では近距離特性５は100〜2000Hz
までは平坦であり、5000Hz付近でも3dB高くなつ
ている程度なので、音声帯域においてほぼ平坦な
特性と考えてよい。このため、近距離特性を近似
するフイルタを設計する必要がない。 FIG. 3 shows the frequency characteristics of a typical close-talk type microphone. A thin solid line 5 is a short-distance characteristic, which is a characteristic for the input voice. Thick solid line 6
is a long-distance characteristic and a characteristic for environmental noise. In the example in Figure 3, short-range characteristic 5 is 100 to 2000Hz.
It is flat up to 5000 Hz, and is only 3 dB higher even around 5000 Hz, so it can be considered a nearly flat characteristic in the audio band. Therefore, there is no need to design a filter that approximates the short-range characteristics.

遠距離特性６は低域から2000Hz近辺まで大体
6dB／cotで上昇し、それ以上は飽和曲線の形状
となつている。この形状は１次のハイパスフイル
ターで近似することができる。第４図は遠距離特
性６をカツトオフ周波数1900Hzの１次のハイパス
フイルタで近似したものである。（破線７）。図か
ら明らかなように、100〜6000Hzの領域（音声帯
域）では、実によく近似されている。マイクの特
性が複雑な場合でも、同様な考え方でフイルタを
設計することが可能である。また、もし必要なら
ば、近距離特性に対しても近似フイルタを設計す
ることができる。 Long distance characteristic 6 is approximately from low range to around 2000Hz.
It rises at 6 dB/cot, and above that it takes the shape of a saturation curve. This shape can be approximated by a first-order high-pass filter. FIG. 4 shows the long-distance characteristic 6 approximated by a first-order high-pass filter with a cutoff frequency of 1900 Hz. (Dashed line 7). As is clear from the figure, the approximation is very good in the 100 to 6000 Hz region (audio band). Even if the characteristics of the microphone are complex, it is possible to design a filter using the same concept. Also, if necessary, an approximation filter can be designed for short-range characteristics as well.

次には、モデル騒音（白色騒音を積分して−
6dB／cot騒音を発声させる）に対して、上記の
フイルタを適用し、マイクロホン特性を考慮した
騒音を作る。 Next, integrate the model noise (white noise and -
6dB/cot noise), the above filter is applied to create noise that takes into account the microphone characteristics.

そして、この騒音を音声データ（クリーンデー
タまたはクリーンデータにマイクロホンの近距離
特性を考慮したもの）に、Ｓ／Ｎ比が一定となる
ように加え、騒音入りの音声データを作る。次に
この音声データを使用して標準パターンを作成す
る。標準パターンを作成する手順はクリーンデー
タで作成する場合と全く同様であるので説明を省
略する。 Then, this noise is added to the audio data (clean data or clean data with the close-range characteristics of the microphone taken into consideration) so that the S/N ratio is constant, creating audio data with noise added. Next, a standard pattern is created using this audio data. The procedure for creating a standard pattern is exactly the same as that for creating it using clean data, so the explanation will be omitted.

マイクロホンは用途が決まれば固定されてしま
うので、せいぜい２〜３種類の機種を考慮してお
けばよい。または騒音は環境によつてその性質が
異なるが、ここで使用しているのは統計的な性質
のみである。したがつて、本実施例で述べた方法
は一般性のある方法である。また１つのマイクロ
ホンに対して１度作成しておけばよい。標準パタ
ーンを作成する一連の手順は、計算機を使つて行
なうことができ、人手を煩わせる必要がないなど
の利点がある。 Since microphones are fixed once the purpose is determined, it is sufficient to consider at most two or three types of models. Also, the properties of noise vary depending on the environment, but only statistical properties are used here. Therefore, the method described in this embodiment is a general method. Also, it is only necessary to create it once for each microphone. The series of steps for creating a standard pattern can be performed using a computer, which has the advantage of not requiring any manual effort.

前記実施例による効果を音素認識率で評価す
る。音素認識率は、正しく認識された音素の数
（正確にはフレームの数：１フレームは10ｍ_sec長
の音声データ）の全音素数（全フレーム数）に対
する割合で定義される。 The effect of the above embodiment will be evaluated based on the phoneme recognition rate. The phoneme recognition rate is defined as the ratio of the number of correctly recognized phonemes (more precisely, the number of frames: one frame is 10 m _sec of audio data) to the total number of phonemes (total number of frames).

第５図は、５母音と鼻音（／ｍ／、／ｎ／、は
つ音）に対する評価結果である。実線８が本実施
例による標準パターンを使用した場合の結果であ
り、破線９は従来の標準パターンを使用した場合
の結果である。平均認識率で3.6％の向上が認め
られ、鼻音では26％も向上した。したがつて、本
実施例の効果は大きいと言える。なお第５図は男
性10名が発声した212単語の中の音素を対象とし
て評価したもので、各音素とも約15000フレーム
程度のデータ量があり、十分信頼のできる結果で
ある。 FIG. 5 shows the evaluation results for five vowels and nasal sounds (/m/, /n/, nasal sounds). The solid line 8 is the result when the standard pattern according to this embodiment is used, and the broken line 9 is the result when the conventional standard pattern is used. The average recognition rate improved by 3.6%, and nasal sounds improved by 26%. Therefore, it can be said that the effects of this embodiment are significant. Figure 5 shows an evaluation of the phonemes in 212 words uttered by 10 men, and each phoneme has an amount of data of about 15,000 frames, so the results are sufficiently reliable.

このように本実施例は一般性のある方法であ
り、人手を要せず、しかも音素認識率の改善に対
する効果が大きく有効である。 As described above, the present embodiment is a general method, does not require human labor, and is highly effective in improving the phoneme recognition rate.

発明の効果以上のように本発明は環境騒音をモデル騒音と
仮定し、マイクロホンの騒音に対する周波数特性
を所似する騒音用近似フイルタを準備し、モデル
騒音に対して前記騒音用近似フイルタを適用して
騒音データを作成し、音声データと騒音データを
信号対雑音比が一定になるように加え合わせて騒
音付加音声データを作成し、前記騒音付加音声デ
ータを用いて標準パターンを作成するもので、騒
音やマイクロホンの特性による影響を防止し、認
識率の向上がはかれる。Effects of the Invention As described above, the present invention assumes that environmental noise is a model noise, prepares a noise approximation filter that has similar frequency characteristics to the noise of a microphone, and applies the noise approximation filter to the model noise. create noise data, add the audio data and noise data so that the signal-to-noise ratio is constant to create noise-added audio data, and create a standard pattern using the noise-added audio data, The recognition rate is improved by preventing the effects of noise and microphone characteristics.

[Brief explanation of the drawing]

第１図は、パターンマツチングによつて音素認
識を行なう方法のブロツク図、第２図はモデル騒
音の周波数特性を示した図、第３図は接話型マイ
クロホンの特性図、第４図は遠距離特性をフイル
タの特性で近似した図、第５図は本発明の効果を
示す音素認識率を示す図である。１……比較部、２……音素標準パターン格納
部。 Figure 1 is a block diagram of a method for phoneme recognition using pattern matching, Figure 2 is a diagram showing the frequency characteristics of model noise, Figure 3 is a characteristic diagram of a close-talking microphone, and Figure 4 is a diagram showing the frequency characteristics of a model noise. FIG. 5, which is a diagram in which long-distance characteristics are approximated by filter characteristics, is a diagram showing the phoneme recognition rate showing the effects of the present invention. 1... Comparison section, 2... Phoneme standard pattern storage section.

Claims

[Claims] 1. Assuming that environmental noise is model noise, preparing a noise approximation filter that approximates the frequency characteristics of the microphone noise, and applying the noise approximation filter to the model noise to obtain noise data. the noise-added audio data is created by adding the audio data and the noise data so that the signal-to-noise ratio is constant, and the standard pattern is created using the noise-added audio data. Standard pattern making method. 2. The method for creating a standard sound pattern according to claim 1, wherein the sound data is created using a sound approximation filter that approximates the frequency characteristics of a microphone's sound.