JPH0567039B2

JPH0567039B2 -

Info

Publication number: JPH0567039B2
Application number: JP60135283A
Authority: JP
Inventors: Shin Kamya
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1985-06-20
Filing date: 1985-06-20
Publication date: 1993-09-24
Also published as: JPS61292699A

Description

【発明の詳細な説明】＜技術分野＞本発明は音声のみを通過させることにより、音
響雑音を除去するようにした音声通過フイルタに
関するものである。DETAILED DESCRIPTION OF THE INVENTION <Technical Field> The present invention relates to a voice passing filter that removes acoustic noise by passing only voice.

＜従来技術＞音声と雑音とを分離する際に、今までは白色雑
音やパルス性雑音等のある特定の雑音のみを検出
し、それを抑制もしくは除去することにより雑音
の軽減をはかつてきた。<Prior Art> When separating speech and noise, until now, noise has been reduced by detecting only a certain type of noise, such as white noise or pulsed noise, and suppressing or removing it.

しかし、雑音の種類は無限にあるので、雑音ご
とに抑制もしくは除去するこれまでの方法では全
ての雑音に対処し切れない。 However, since there are an infinite number of types of noise, conventional methods of suppressing or removing each noise cannot deal with all noises.

＜目的＞本発明はかかる従来の問題点に鑑みて成された
もので、その目的とするところは、音声を検出し
てそれのみを通過させることにより、効果的に多
種類の雑音を取り除くことの出来る音声通過フイ
ルタを提供することにある。<Purpose> The present invention has been made in view of such conventional problems, and its purpose is to effectively remove various types of noise by detecting voice and passing only that voice. The purpose of the present invention is to provide a voice passing filter that can perform the following functions.

＜実施例＞本発明は、日本語音声が母音と子音の組を基本
構造としている点に着目し、まず母音区間を検出
し、その区間と子音区間である確率の高いその前
後の所定長区間のみ、すなわち音声のみ通過させ
るようにしたものである。<Example> The present invention focuses on the fact that Japanese speech has a basic structure of vowel and consonant pairs, and first detects a vowel interval, and then detects a vowel interval and a predetermined length interval before and after that interval with a high probability of being a consonant interval. In other words, only audio is allowed to pass through.

また、上記の母音らしい区間は下記の３つの条
件によつて定められる。 Further, the above-mentioned vowel-like interval is determined by the following three conditions.

母音の標準パターンとのマツチング距離が小
さい区間。 An area where the matching distance with the standard vowel pattern is small.

スペクトル変化が小さい区間（音声定常部）パワーが大きい区間。 Section with small spectral changes (speech stationary part) Area with high power.

本発明の音声通過フイルタはこれら３つの条件
にもとづいて母音らしい区間を検出し、この区間
のみ通過させるようにしたものであり、以下、図
にもとづいて今少し詳細に説明する。 The speech passing filter of the present invention detects a vowel-like section based on these three conditions and allows only this section to pass, and will be explained in more detail below with reference to the drawings.

図は本発明に係る音声通過フイルタのブロツク
構成図である。図中、１はパワーＰとスペクトル
ｙを求める音声分析部、２は日本語の５母音標準
パターンのスペクトルを記憶するメモリ、３は前
記音声分析部１から出力されるスペクトルｙと前
記メモリ２からのスペクトルとを比較してマツチ
ング距離ｄを求めるマツチング部、４はスペクト
ル変化y′を計算して出力するスペクトル変化計算
部、５は音声波形を一定時間Ｔフレームだけ遅延
させて制御部７へ送る遅延部、６はパワーＰ、ス
ペクトル変化y′及び標準パターンとのマツチング
距離ｄから母音らしい区間を判定する判定部であ
る。 The figure is a block diagram of a voice passing filter according to the present invention. In the figure, 1 is a speech analysis unit that calculates the power P and spectrum y, 2 is a memory that stores the spectrum of the Japanese five-vowel standard pattern, and 3 is the spectrum y output from the speech analysis unit 1 and the memory 2. 4 is a spectral change calculating section that calculates and outputs the spectrum change y'; 5 is a spectral change calculating section that calculates and outputs the spectrum change y'; and 5 delays the audio waveform by a fixed time T frames and sends it to the control section 7. The delay unit 6 is a determining unit that determines a vowel-like section from the power P, the spectrum change y', and the matching distance d with respect to the standard pattern.

すなわち、音声分折部１にて音声信号を
16KHzでサンプリングし（なお、時刻ｔのサン
プリング値をＳ(t)で示す）、16m秒のハニング窓
をかけて、フレーム周期8m秒毎にサンプリング
値の二乗和であるパワーＰを求め、この窓内でフ
ーリエ変換してスペクトルｙを求めている（ただ
し、ｔ番目のフレームのパワー及びスペクトルを
Ｐ(t)，ｙ(t)で示す）。こゝで求められたスペクト
ルｙ(t)はマツチング部３に、またパワーＰ(t)は判
定部６にそれぞれ送られる。 In other words, the audio signal is processed by the audio splitter 1.
Sampling is performed at 16KHz (the sampling value at time t is indicated by S(t)), a Hanning window of 16 ms is applied, and the power P, which is the sum of squares of the sampling values at every 8 ms frame period, is obtained. The spectrum y is obtained by Fourier transform within the frame (the power and spectrum of the t-th frame are denoted by P(t) and y(t)). The spectrum y(t) thus obtained is sent to the matching section 3, and the power P(t) is sent to the determining section 6.

マツチング部３では各時刻ｔにおけるスペクト
ルｙ(t)とメモリ２の日本語の５母音標準パターン
のスペクトルとを比較し、そのスペクトル間のユ
ークリツド距離（マツチング距離）ｄ（ｉ，ｔ）
（ただし、ｉ＝１，２，３，４，５は母音／
ａ／，／ｉ／，／ｕ／，／ｅ／，／ｏ／に対応す
る）を得て判定部６に出力する。更に、スペクト
ル変化計算部４において数フレーム離れたスペク
トル間のユークリツド距離が求められて、スペク
トル変化y′(t)として判定部６に送られる。 The matching unit 3 compares the spectrum y(t) at each time t with the spectrum of the Japanese five-vowel standard pattern stored in the memory 2, and calculates the Euclidean distance (matching distance) d(i, t) between the spectra.
(However, i=1, 2, 3, 4, 5 are vowels/
a/, /i/, /u/, /e/, /o/) and outputs it to the determination unit 6. Furthermore, the Euclidean distance between spectra separated by several frames is determined in the spectrum change calculation section 4 and sent to the determination section 6 as the spectrum change y'(t).

このようにしてマツチング距離ｄ（ｉ，ｔ）と
パワーＰ(t)とスペクトル変化y′(t)が判定部６に入
力されると、判定部６では下記の条件１），２）
を満足するか否か判定され、満足すればそのフレ
ームが母音区間内にあると判定して母音フラグ
VF(t)＝１を、又満足しなければ母音フラグVF(t)
＝０をそれぞれ制御部７に出力する。 When the matching distance d(i, t), power P(t), and spectrum change y'(t) are input to the determination unit 6 in this way, the determination unit 6 determines the following conditions 1) and 2).
It is determined whether or not the frame is satisfied, and if it is, it is determined that the frame is within the vowel interval and the vowel flag is set.
VF(t)=1, and if not satisfied, vowel flag VF(t)
=0 is output to the control unit 7, respectively.

条件１パワーＰ(t)がしきい値ａ１より大きく、
かつスペクトル変化y′(t)がしきい値ａ２より小
さく、かつマツチング距離ｄ（ｉ，ｔ）の最小
値min ｄ（ｉ，ｔ）［ｉ＝１〜５］がしきい値
ａ３より小さい。Condition 1 Power P(t) is greater than threshold a1,
In addition, the spectral change y'(t) is smaller than the threshold a2, and the minimum value min d(i, t) [i=1-5] of the matching distance d(i, t) is smaller than the threshold a3.

条件２スペクトルy′(t)がしきい値ａ４より大き
く、かつマツチング距離ｄ（２，ｔ）かｄ（３，
ｔ）がしきい値ａ５より小さい。（無声子音に
後続する母音／ｉ／，／ｕ／の無声化を考慮）そして制御部７では、区間ｔ−2T＜ｘ＜ｔに
おいてVF(x)＝１を満足するようなｘが所定のフ
レーム数以上存在すればＯ(t)＝Ｓ（ｔ−Ｔ）を出
力し、存在しなければＯ(t)＝０を出力する。Condition 2 Spectrum y'(t) is larger than threshold a4, and matching distance d(2,t) or d(3,
t) is smaller than the threshold a5. (Taking into consideration the devoicing of vowels /i/, /u/ that follow unvoiced consonants) Then, the control unit 7 sets If more than the number of frames exist, O(t)=S(t-T) is output, and if not, O(t)=0 is output.

この様に、本発明の音声通過フイルタは入力音
から母音区間とその前後の所定長の区間すなわち
母音らしい区間を検出して、この区間のみ通過さ
せるものであり、音声のみ通過させるものである
から、多種類の音響雑音も効果的に軽減すること
が出来る。 In this manner, the voice passing filter of the present invention detects a vowel section and a predetermined length section before and after the vowel section, that is, a vowel-like section, from the input sound, and passes only this section, and only the voice passes. , many types of acoustic noise can be effectively reduced.

＜効果＞以上詳細に説明したように、本発明の音声通過
フイルタは、入力音のパワーとスペクトルを求め
る音声分析部と、予め記憶された母音の標準パタ
ーンのスペクトルと前記音声分析部から出力され
るスペクトルとを比較して、マツチング距離を求
めるマツチング部と、前記音声分析部から出力さ
れるスペクトルから数フレーム離れたスペクトル
間のスペクトル変化を計算して出力するスペクト
ル変化計算部と、前記パワーが第１のしきい値よ
り大きく、且つ前記スペクトル変化が第２のしき
い値より小さく、且つ前記マツチング距離の最小
値が第３のしきい値より小さいという第１の条
件、或は、前記スペクトル変化が第４のしきい値
より大きく、且つ前記マツチング距離が第５のし
きい値より小さいという第２の条件を満たすか否
かを判定する判定部と、前記判定部の出力に応答
して、前記第１或は第２の条件を満たす場合に、
前記入力音が音声であると判断して、当該入力音
を出力する制御部とを具備することにより、連続
する入力音から母音らしい区間を検出し、該区間
の入力音のみを音声として出力するものであり、
母音らしい区間を検出するために前記第１及び第
２の条件を設け、無声子音に後続する母音の無声
化をも考慮に入れて、母音らしい区間の検出を行
うものであるので、多種類の音響雑音を除去し、
確実に音声のみを出力することが可能となる。更
に、本発明の音声通過フイルタをさまざまな機器
に搭載することにより、以下のような効果を奏す
ることが出来る。<Effects> As explained in detail above, the speech passing filter of the present invention includes a speech analysis section that obtains the power and spectrum of an input sound, and a spectrum of a standard vowel pattern stored in advance and output from the speech analysis section. a matching unit that calculates a matching distance by comparing the spectrum with the spectrum output from the audio analysis unit; a spectrum change calculation unit that calculates and outputs a spectrum change between spectra that are several frames apart from the spectrum output from the audio analysis unit; a first condition that the spectral change is greater than a first threshold value, the spectral change is less than a second threshold value, and the minimum value of the matching distance is less than a third threshold; a determination unit that determines whether a second condition is satisfied that the change is greater than a fourth threshold and the matching distance is less than a fifth threshold; and a determination unit that responds to the output of the determination unit; , if the first or second condition is satisfied,
A control unit that determines that the input sound is speech and outputs the input sound detects a vowel-like section from continuous input sounds and outputs only the input sound in the section as speech. It is a thing,
The first and second conditions are set in order to detect vowel-like intervals, and the method detects vowel-like intervals by taking into consideration the devoicing of the vowel following a voiceless consonant. remove acoustic noise,
It becomes possible to reliably output only audio. Furthermore, by installing the audio passing filter of the present invention in various devices, the following effects can be achieved.

パーソナル無線…普通入力ゲインを上げて使
用すると、小さな雑音まで拡大されて不快感を
与える虞れがあるが、本発明の音声通過フイル
タを通したのちゲインを上げると、音声のみ強
調されるために非常に聞き取り易くなる。 Personal radio: Normally, if the input gain is increased and used, there is a risk that the noise will be amplified to a small level and cause discomfort, but if the gain is increased after passing through the voice passing filter of the present invention, only the voice will be emphasized. It becomes very easy to hear.

電話器…本発明の音声通過フイルタは飛行機
や車等の騒音を通さないので、電話に使用する
ことにより、飛行場や道路近くからの電話の音
声を明瞭に送ることができる。 Telephone device: The voice passing filter of the present invention does not allow noise from airplanes, cars, etc. to pass through, so by using it in a telephone, it is possible to clearly transmit the voice of a telephone from an airport or near a road.

テープレコーダ…やかましい環境の中で録音
した内容も、再生時は雑音をカツトして音声の
み明瞭に出力することが出来る。 Tape recorder: Even if the content is recorded in a noisy environment, it can cut out the noise and clearly output only the audio when playing back.

音声認識装置…音声入力待ち状態の時に雑音
が入ると、音声入力と誤つて認識装置が作動す
る。この場合は認識スコアから雑音による認識
結果をリジエクトすることはできるが、この間
に音声が入つてきても受け付けられない。しか
し本発明フイルタを使用することにより、雑音
による認識装置の誤動作を回避し、よつて音声
入力の受付ミスも防止することが出来る。 Speech recognition device: If noise enters while waiting for voice input, the recognition device will operate incorrectly as voice input. In this case, recognition results due to noise can be rejected from the recognition score, but even if speech comes in during this time, it will not be accepted. However, by using the filter of the present invention, malfunctions of the recognition device due to noise can be avoided, and mistakes in receiving voice input can also be prevented.

[Brief explanation of the drawing]

図は本発明に係る音声通過フイルタのブロツク
構成図である。１は音声分析部、２は母音標準パターンメモ
リ、３はマツチング部、４はスペクトル変化計算
部、５は遅延部、６は判定部、７は制御部。 The figure is a block diagram of a voice passing filter according to the present invention. 1 is a speech analysis section, 2 is a vowel standard pattern memory, 3 is a matching section, 4 is a spectrum change calculation section, 5 is a delay section, 6 is a determination section, and 7 is a control section.

Claims

[Scope of Claims] 1. A voice analysis unit that calculates the power and spectrum of an input sound, and a matching distance is determined by comparing the spectrum of a standard vowel pattern stored in advance with the spectrum output from the voice analysis unit. a matching section; a spectral change calculation section that calculates and outputs a spectral change between spectra several frames apart from the spectrum output from the speech analysis section; a first condition that the change is less than a second threshold and the minimum value of the matching distance is less than a third threshold, or the spectral change is greater than a fourth threshold, and a determination unit that determines whether or not a second condition that the matching distance is smaller than a fifth threshold is satisfied; and a determination unit that determines whether or not the first or second condition is satisfied in response to an output of the determination unit. and a control unit that determines that the input sound is voice and outputs the input sound when the input sound is voice. filter.