JPH03236000A

JPH03236000A - Audio signal processor

Info

Publication number: JPH03236000A
Application number: JP2033211A
Authority: JP
Inventors: Akira Nohara; 明野原; Joji Kane; 丈二加根
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1990-02-13
Filing date: 1990-02-13
Publication date: 1991-10-21
Anticipated expiration: 2014-10-06
Also published as: JP2959792B2

Abstract

PURPOSE:To remove noise and to improve articulation by detecting the vowel and consonant areas of an audio signal where the noise is mixed, setting a cancel coefficient according to the detection result, and removing a predicted noise component. CONSTITUTION:A cepstrum analysis of the frequency analytic output of the audio input signal is taken to detect the cepstrum peak and also calculate the mean value level, a vowel/consonant decision means 5 decides the vowels and consonants according to peak detection information and mean value information, and a cancel coefficient setting means 17 utilizes the decision results to set the cancel coefficient. A noise predicting means 6, on the other hand, predict noise components of the band-divided audio signal, a canceling means 8 inputs the noise prediction output, audio signal, and cancel coefficient signal to cancel the noise components in consideration of the cancel rate, and a signal composing means 9 composes its cancel output. Consequently, the noise is suppressed and the audio signal processor with good articulation is obtained.

Description

【発明の詳細な説明】産業上の利用分野本発明は、音声処理等に用いることができる音声信号処
理装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION Field of Industrial Application The present invention relates to an audio signal processing device that can be used for audio processing and the like.

従来の技術第８図は、従来の信号処理装置のブロック図である。１
１は雑音が混入された信号が入力され、信号または雑音
を検出するフィルタ制御部、１２はバンドパスフィルタ
を多数有するＢＰＦ群、１３は加算器である。即ち、フ
ィルタ制御部１１はＢＰＦ群１２のフィルタ係数を入力
信号の雑音または信号に応じて制御するものであり、Ｂ
ＰＦ群１２は帯域通過フィルタ群であって、入力信号を
適当な帯域に分け、フィルタ制御部１１の制御信号によ
って、その通過帯域特性を決めるように構成されている
。Prior Art FIG. 8 is a block diagram of a conventional signal processing device. 1
1 is a filter control unit to which a signal mixed with noise is input and detects the signal or noise; 12 is a BPF group having a large number of bandpass filters; and 13 is an adder. That is, the filter control unit 11 controls the filter coefficients of the BPF group 12 according to the noise or signal of the input signal, and
The PF group 12 is a group of band-pass filters, and is configured to divide an input signal into appropriate bands, and to determine the pass band characteristics according to a control signal from the filter control section 11.

上記のように構成された従来の信号処理装置の動作を以
下に説明する。The operation of the conventional signal processing device configured as described above will be described below.

音声に雑音が重畳された入力信号はフィルタ制御部１１
に供給される。フィルタ制御部１１はその供給信号から
雑音成分を、ＢＰＦ群１２の各帯域に対応して求め、Ｂ
ＰＦ群１２で雑音成分を通過させないようなフィルタ係
数をＢＰＦ群１２に供給する。The input signal with noise superimposed on the voice is sent to the filter control unit 11.
is supplied to The filter control unit 11 obtains noise components from the supplied signal corresponding to each band of the BPF group 12, and
Filter coefficients that do not allow noise components to pass through the PF group 12 are supplied to the BPF group 12.

ＢＰＦ群１２は、入力信号を適当な帯域に分け、各帯域
毎にフィルタ制御部】１より入力される前記フィルタ係
数によって入力信号を適宜通過させ、加算器１３に供給
する。加算器１３ではＢＰＦ群１２で適当な帯域に分割
した信号をミックスし、出力を得る。The BPF group 12 divides the input signal into appropriate bands, passes the input signal appropriately according to the filter coefficients inputted from the filter control unit 1 for each band, and supplies the input signal to the adder 13. The adder 13 mixes the signals divided into appropriate bands by the BPF group 12 to obtain an output.

この様にすることによって、入力信号はＢＰＦ群１２に
より、雑音が含まれている帯域の通過レベルが落とされ
る。その結果、雑音成分が減衰された信号が得られる。By doing this, the BPF group 12 lowers the pass level of the input signal in a band containing noise. As a result, a signal with attenuated noise components is obtained.

発明が解決しようとする課題しかしながら、雑音の多さと明瞭度とは必ずしも一致せ
ず、そのため、従来の信号処理装置では、雑音が抑えら
れるものの、明瞭度は良くならないという課題がある。Problems to be Solved by the Invention However, the amount of noise and the clarity do not necessarily match, and therefore, in conventional signal processing devices, although noise can be suppressed, there is a problem that the clarity does not improve.

本発明はこの様な従来の信号処理装置の課題を解決する
もので、母音／子音を判定しながら、雑音を抑えると共
に明瞭度も良い音声信号処理装置を提供することを目的
とするものである。The present invention solves the problems of conventional signal processing devices, and aims to provide a speech signal processing device that suppresses noise and provides good clarity while determining vowels/consonants. .

課題を解決するための手段請求項１の本発明は、音声入力信号を周波数分析する周
波数分析手段と、その周波数分析手段の周波数分析出力
をケプストラム分析するケプストラム分析手段と、その
ケプストラム分析手段のケプストラム分析出力における
ケブヌトラムピークを検出するピーク検出手段と、前記
ケプストラム分析手段のケプストラム分析出力における
平均値レベルを算出する平均値算出手段と、前記ピーク
検出手段のピーク検出情報と前記平均値算出手段の平均
値情報に基づいて、前記ピークに基づき母音を判定し、
前記平均値情報のレベルに基づき子音を判定して、母音
、子音を判定する母音／子音判定手段と、その母音／子
音判定手段での判定結果を利用してキャンセル係数を設
定するキャンセル係数設定手段と、前記フーリエ変換さ
れた音声信号が入力され、その雑音成分を予測する雑音
予測手段と、その雑音予測手段の雑音予測出力、前記音
声信号、及び前記キャンセル係数設定手段により設定さ
れたキャンセル係数信号が入力され、その音声信号から
、そのキャンセル率を考慮した雑音成分をキャンセルす
るキャンセル手段と、そのキャンセル手段のキャンセル
出力を合成する信号合成手段とを具備することを特徴と
する信号処理装置である。Means for Solving the Problems The present invention as claimed in claim 1 provides a frequency analysis means for frequency analyzing an audio input signal, a cepstrum analysis means for cepstrum analysis of the frequency analysis output of the frequency analysis means, and a cepstrum analysis means for cepstrum analysis of the frequency analysis output of the frequency analysis means. a peak detection means for detecting a kebnutrum peak in the analysis output; an average value calculation means for calculating an average level in the cepstrum analysis output of the cepstrum analysis means; and peak detection information of the peak detection means and the average value calculation means. Determine the vowel based on the peak based on the average value information of
Vowel/consonant determining means for determining whether a consonant is a vowel or a consonant by determining a consonant based on the level of the average value information, and a cancellation coefficient setting means for setting a cancellation coefficient using the determination result of the vowel/consonant determining means. and a noise prediction means which receives the Fourier-transformed audio signal and predicts its noise component, a noise prediction output of the noise prediction means, the audio signal, and a cancellation coefficient signal set by the cancellation coefficient setting means. is input, and the signal processing device is characterized by comprising a canceling means for canceling a noise component in consideration of the cancellation rate from the audio signal, and a signal synthesizing means for synthesizing the canceling output of the canceling means. .

請求項２記載の本発明は、音声入力信号をフーリエ変換
する帯域分割手段と、その帯域分割手段の帯域分割出力
をケプストラム分析するケプストラム分析手段と、その
ケプストラム分析手段のケプストラム分析出力における
ケプストラムピークを検出するピーク検出手段と、前記
ケプストラム分析手段のケプストラム分析出力における
平均値レベルを算出する平均値算出手段と、前記ピーク
検出手段のピーク検出情報と前記平均値算出手段の平均
値情報に基づいて、前記ピークに基づき母音を判定し、
前記平均値情報のレベルに基づき子音を判定して、母音
、子音を判定する母音／子音判定手段と、その母音／子
音判定手段での判定結果を利用してキャンセル係数を設
定するキャンセル係数設定手段と、前記フーリエ変換さ
れた音声信号が入力され、その雑音成分を予測する雑音
予測手段と、その雑音予測手段の雑音予測出力、前記音
声信号、及び前記キャンセル係数設定手段により設定さ
れたキャンセル係数信号が入力され、その音声信号から
、そのキャンセル率を考慮した雑音成分をキャンセルす
るキャンセル手段と、そのキャンセル手段のキャンセル
出力を帯域合成する帯域合成手段とを具備することを特
徴とする信号処理装置である。The present invention as set forth in claim 2 provides a band division means for Fourier transforming an audio input signal, a cepstrum analysis means for cepstrum analysis of the band division output of the band division means, and a cepstrum peak in the cepstrum analysis output of the cepstrum analysis means. Based on the peak detection means to detect, the average value calculation means to calculate the average level in the cepstrum analysis output of the cepstrum analysis means, the peak detection information of the peak detection means and the average value information of the average value calculation means, determining a vowel based on the peak;
Vowel/consonant determining means for determining whether a consonant is a vowel or a consonant by determining a consonant based on the level of the average value information, and a cancellation coefficient setting means for setting a cancellation coefficient using the determination result of the vowel/consonant determining means. and a noise prediction means which receives the Fourier-transformed audio signal and predicts its noise component, a noise prediction output of the noise prediction means, the audio signal, and a cancellation coefficient signal set by the cancellation coefficient setting means. is input, and is equipped with a canceling means for canceling a noise component from the audio signal in consideration of its cancellation rate, and a band synthesizing means for band synthesizing the cancellation output of the canceling means. be.

請求項３本発明は、請求項２の本発明において、母音／
子音判定手段が、少なくとも前記ピーク検出手段での検
出ピーク及びシュレスホールド設定部が設定したシュレ
スボールド値とを比較する第１比較器と、前記平均値算
出手段による算出平均値及びシュレスホールド設定部で
設定された所定のシュレスボールド値を比較する第２比
較器と、それら第１、第２比較器の比較結果に基づき、
母音、子音を判定し結果を出力する母音／子音判定回路
とを備えたことを特徴とする音声信号処理装置である。Claim 3 The present invention provides the vowel/
The consonant determining means includes a first comparator that compares at least the peak detected by the peak detecting means and the Schlesbold value set by the Schleshold setting section, and the average value and Schleshold value calculated by the mean value calculating means. A second comparator that compares a predetermined Schlesbold value set in the setting section, and based on the comparison results of the first and second comparators,
This is an audio signal processing device characterized by comprising a vowel/consonant determination circuit that determines vowels and consonants and outputs the results.

作用本発明では、周波数分析手段が音声入力信号を周波数分
析し、ケプストラム分析手段がその周波数分析手段の周
波数分析出力をケプストラム分析し、ピーク検出手段が
そのケプストラム分析手段のケプストラム分析出力にお
けるケプストラムピークを検出し、平均値算出手段が前
記ケプストラム分析手段のケプストラム分析出力におけ
る平均値レベルを算出し、母音／子音判定手段が前記ピ
ーク検出手段のピーク検出情報と前記平均値算出手段の
平均値情報に基づいて、前記ピークに基づき母音を判定
し、前記平均値情報のレベルに基づき子音を判定して、
母音、子音を判定し、キャンセル係数設定手段が母音／
子音判定手段での判定結果を利用してキャンセル係数を
設定する。他方、雑音予測手段が前記帯域分割された音
声信号が入力され、その雑音成分を予測し、キャンセル
手段がその雑音予測手段の雑音予測出力、前記音声信号
、及び前記キャンセル係数設定手段により設定されたキ
ャンセル係数信号が入力され、その音声信号から、その
キャンセル率を考慮した雑音成分をキャンセルし、信号
合成手段がそのキャンセル手段のキャンセル出力を合成
する。In the present invention, the frequency analysis means frequency-analyzes the audio input signal, the cepstrum analysis means cepstrum-analyzes the frequency analysis output of the frequency analysis means, and the peak detection means detects the cepstrum peak in the cepstrum analysis output of the cepstrum analysis means. and the average value calculation means calculates the average level in the cepstrum analysis output of the cepstrum analysis means, and the vowel/consonant determination means is based on the peak detection information of the peak detection means and the average value information of the average value calculation means. determining a vowel based on the peak, determining a consonant based on the level of the average value information,
Vowels and consonants are determined, and the cancellation coefficient setting means determines vowels/consonants.
A cancellation coefficient is set using the determination result of the consonant determination means. On the other hand, the noise prediction means is inputted with the band-divided audio signal and predicts its noise component, and the cancellation means is set based on the noise prediction output of the noise prediction means, the audio signal, and the cancellation coefficient setting means. A cancellation coefficient signal is input, noise components are canceled from the audio signal in consideration of the cancellation rate, and the signal synthesis means synthesizes the cancellation output of the cancellation means.

実施例以下に、本発明の実施例を、図面を基づいて説明する。Example Embodiments of the present invention will be described below with reference to the drawings.

第１図は、本発明の一実施例における音声信号処理装置
のブロック図である。第１図において、１は、信号に付
いて周波数分析を行う周波数分析手段の一例として、信
号を周波数帯域分割する帯域分割手段、特に、信号をフ
ーリエ変換するＦＦＴ手段、２はケプストラム分析を行
うケプストラム分析手段、３はケプストラム分布のピー
クを検出するピーク検出手段、４はケプストラム分布の
平均値算出手段、５は母音と子音を判定する母音／子音
判定手段である。即ち、ＦＦＴ手段１は音声信号入力を
高速フーリエ変換し、ケプストラムｌＯ− 分析手段２へ供給する。ケプストラム分析手段２は、そ
のスペクトラム信号についてのケプストラムを求め、ピ
ーク検出手段３及び平均値算出手段４へ供給する。第２
図（ａ）、　（ｂ）にそれを示す。ピーク検出手段３は
、ケプストラム分析手段２で得られたケプストラムにつ
いて、そのピークを求め、母音／子音判定手段５に供給
する。FIG. 1 is a block diagram of an audio signal processing device in one embodiment of the present invention. In FIG. 1, reference numeral 1 denotes a band division means for dividing a signal into frequency bands, in particular an FFT means for Fourier transforming the signal, as an example of a frequency analysis means for performing frequency analysis on a signal, and 2 a cepstrum for performing cepstral analysis. Analysis means; 3 is a peak detection means for detecting the peak of the cepstrum distribution; 4 is an average value calculation means for the cepstrum distribution; 5 is a vowel/consonant determination means for determining vowels and consonants. That is, the FFT means 1 performs fast Fourier transform on the audio signal input and supplies it to the cepstrum lO- analysis means 2. The cepstrum analysis means 2 obtains the cepstrum of the spectrum signal and supplies it to the peak detection means 3 and the average value calculation means 4. Second
This is shown in Figures (a) and (b). The peak detection means 3 determines the peak of the cepstrum obtained by the cepstrum analysis means 2 and supplies it to the vowel/consonant determination means 5.

他方、平均値算出手段４は、ケプストラム分析手段２で
得られるケプストラムの平均値を算出し、母音／子音判
定手段５に供給する。母音／子音判定手段５は、ピーク
検出手段３から供給されるケプストラムのピークと平均
値算出手段４から供給されるケプストラムの平均値を用
いて音声信号入力の母音及び子音を判定し、判定結果を
判定出力とするものである。６は前記ＦＦＴ　１の出力
信号を入力し、雑音成分を予測する雑音予測手段、８は
後述するようにして雑音を除去するキャンセル手段、９
は信号合成手段の一例の帯域合成手段の、特に、逆フー
リエ変換を行うＩＦＦＴＦＦ下ある。On the other hand, the average value calculation means 4 calculates the average value of the cepstrum obtained by the cepstrum analysis means 2, and supplies it to the vowel/consonant determination means 5. The vowel/consonant determination means 5 determines vowels and consonants of the audio signal input using the peak of the cepstrum supplied from the peak detection means 3 and the average value of the cepstrum supplied from the average value calculation means 4, and calculates the determination result. This is used as a judgment output. 6 is a noise prediction means for inputting the output signal of the FFT 1 and predicting a noise component; 8 is a canceling means for removing noise as described later; 9
is an example of a signal synthesizing means, which is below the band synthesizing means, especially IFFFTFF which performs inverse Fourier transform.

詳しく説明すると、雑音予測手段２は、ｍチャン１１− ネルに分割された音声／雑音入力に基づき、雑音成分を
各チャンネル毎に予測し、キャンセル手段３へ供給する
手段である。例えは、その雑音予測は、第３図に示すよ
うなものである。すなわち、ｙ軸に周波数、ｙ軸に音声
レベル、ｚ軸に時間をとるとともに、周波数ｆ１のとこ
ろのデータｐ１゜ｐ２．・・・、ｐｌをとり、その先の
ｐＪを予測する。To explain in detail, the noise prediction means 2 is a means for predicting noise components for each channel based on the voice/noise input divided into m channels 11-channels, and supplying the predicted noise components to the cancellation means 3. For example, the noise prediction is as shown in FIG. That is, the y-axis shows the frequency, the y-axis shows the audio level, and the z-axis shows the time, and the data p1, p2, etc. at the frequency f1 are plotted. ..., take pl and predict pJ beyond that.

例えば、雑音部分ｐｌ−ｐｉの平均をとりｐＪとする。For example, the average of the noise parts pl-pi is taken as pJ.

あるいは更に、音声信号部分が続くときはｐＪに減衰係
数を掛けるなどである。また、キャンセル手段８は、Ｆ
ＦＴ　１及び雑音予測手段６よりｍチャンネルの信号が
供給され、キャンセル係数入力に応じてチャンネル毎に
雑音を引算するなどしてキャンセルし、ＩＦＦＴ手段９
へ供給する手段である。即ち、予測された雑音成分にキ
ャンセル係数を掛けてキャンセルする。一般に、キャン
セルの方法の一例として、時間軸でのキャンセレーショ
ンは、第４図に示すように、雑音混入音声信号（イ）か
ら予測された雑音波形（ロ）を引算するものである。そ
れによって信号のみが取り出される（ハ）。Alternatively, when the audio signal portion continues, pJ is multiplied by an attenuation coefficient. Further, the canceling means 8 is
The m-channel signals are supplied from the FT 1 and the noise prediction means 6, and are canceled by subtracting noise for each channel according to the input of the cancellation coefficient, and then sent to the IFFT means 9.
It is a means of supplying That is, the predicted noise component is multiplied by a cancellation coefficient to cancel it. Generally, as an example of a cancellation method, cancellation on the time axis is a method of subtracting a predicted noise waveform (b) from a noise-containing audio signal (a), as shown in FIG. As a result, only the signal is extracted (c).

１２− また、第５図に示すように、周波数を基準にしたキャン
セレーションは、雑音混入音声信号（イ）をフーリエ変
換しく＋−ｆｆ）　、それから予測雑音のスペクトル（
ハ）を引き（ニ）、それを逆フーリエ変換して、雑音の
無い音声信号を得る（ホ）ものである。ＩＦＦＴ手段９
は、キャンセル手段８より供給されるｍチャンネルの信
号を逆フーリエ変換して音声出力を得る手段である。12- Also, as shown in Fig. 5, cancellation based on the frequency is performed by Fourier transforming the noisy speech signal (a), +-ff), and then calculating the predicted noise spectrum (
(c) is subtracted (d) and subjected to inverse Fourier transform to obtain a noise-free audio signal (e). IFFT means 9
is a means for obtaining an audio output by performing inverse Fourier transform on the m-channel signals supplied from the canceling means 8.

キャンセル係数設定手段７は、前記母音／子音判定手段
５により判定された母音／子音区域情報を利用してキャ
ンセル係数を適切に設定するための手段である。例えば
、音声区域では、明瞭度を確保するため、雑音成分を故
意に除去しないようにして明瞭度を良いものとするため
、キャンセル係数を小さいものとし、その他の雑音部分
では、全面的に雑音成分を除去するため、キャンセル係
数を大きいものとするなどである。本発明では、母音に
限らず、子音も確実に検出しているので、得られた音声
の明瞭度は十分良いものとなる。The cancellation coefficient setting means 7 is a means for appropriately setting a cancellation coefficient using the vowel/consonant area information determined by the vowel/consonant determination means 5. For example, in the voice area, in order to ensure clarity, the cancellation coefficient is set to be small in order to improve intelligibility by not intentionally removing noise components, and in other noise areas, the cancellation coefficient is set to a small value, and in other noise areas, noise components are completely eliminated. In order to eliminate this, the cancellation coefficient may be increased. In the present invention, not only vowels but also consonants are reliably detected, so the clarity of the obtained speech is sufficiently good.

以上のように構成した本発明の実施例における音声信号
処理装置について、以下、その動作を説明する。The operation of the audio signal processing device according to the embodiment of the present invention configured as described above will be described below.

音声信号入力はＦＦＴ手段１で高速フーリエ変換され、
次にケプストラム分析手段２でそのケプストラムが求め
られ、ピーク検出手段３でケプストラムのピークが求め
られる。また、平均値算出手段４でケプストラムの平均
値が求められる。そして母音／子音判定手段５では、ピ
ーク検出手段３からピークが検出されたことを示す信号
が入力された場合には、その音声信号入力は母音区間で
あると判断する。また、子音の判定については、例えば
平均値算出手段４より入力されるケプストラム平均値が
予め決められた規定値より大きな場合、或はそのケプス
トラム平均値の増加量（微分係数）が予め決められた規
定値より大きな場合は、音声信号入力は子音区間である
と判定する。そして結果としては、母音／子音を示す信
号、或は母音と子音を含んだ音声区間を示す信号を出力
する。The audio signal input is fast Fourier transformed by FFT means 1,
Next, the cepstrum is determined by the cepstrum analysis means 2, and the peak of the cepstrum is determined by the peak detection means 3. Further, the average value of the cepstrum is calculated by the average value calculating means 4. When the vowel/consonant determining means 5 receives a signal indicating that a peak has been detected from the peak detecting means 3, it determines that the audio signal input is in a vowel section. Regarding the determination of consonants, for example, if the cepstrum average value inputted from the average value calculating means 4 is larger than a predetermined value, or if the amount of increase (differential coefficient) of the cepstrum average value is larger than a predetermined value, If it is larger than the specified value, it is determined that the audio signal input is in a consonant section. As a result, a signal indicating a vowel/consonant or a signal indicating a speech section including a vowel and a consonant is output.

他方、ノイズを含む音声／雑音入力は、雑音予測手段６
て各チャンネル毎にその雑音成分が予測１３− １４− される。また、音声／雑音信号は、キャンセル手段３で
各チャンネル毎に雑音予測手段２から供給される雑音成
分を除去される。この時の雑音除去率は、キャンセル係
数入力によって各チャンネル毎に明瞭度を良くするよう
に適切に設定される。On the other hand, the voice/noise input containing noise is processed by the noise prediction means 6.
Then, the noise component is predicted for each channel. Further, the noise component supplied from the noise prediction means 2 for each channel is removed from the voice/noise signal by the cancellation means 3. The noise removal rate at this time is appropriately set by inputting a cancellation coefficient to improve clarity for each channel.

例えば、前述のように、音声区域では、明瞭度を確保す
るため、雑音成分を故意に除去しないようにして明瞭度
を良いものとするため、キャンセル係数を小さいものと
し、その他の雑音部分では、全面的に雑音成分を除去す
るため、キャンセル係数を大きいものとするなどである
。本発明では、母音に限らず、子音も確実に検出してい
るので、得られた音声の明瞭度は十分良いものとなる。For example, as mentioned above, in order to ensure intelligibility in the voice area, the cancellation coefficient is set to a small value in order to improve intelligibility by not intentionally removing noise components, and in other noise areas, In order to completely remove noise components, the cancellation coefficient is set to a large value. In the present invention, not only vowels but also consonants are reliably detected, so the clarity of the obtained speech is sufficiently good.

そしてキャンセル手段８により得られる雑音除去された
ｍチャンネルの信号をＩ　ＦＦＴ手段９は逆フーリエ変
換して音声信号として出力する。Then, the IFFT means 9 performs inverse Fourier transform on the m-channel signal from which noise has been removed obtained by the canceling means 8 and outputs it as an audio signal.

以上のように、本実施例によれば、キャンセル手段８の
雑音除去率をキャンセル係数入力によって帯域毎に適切
に与えることができ、そのキャンセル係数を音声に対応
させて精度良く選ぶことで、１５− 明瞭な、雑音抑圧された音声出力を得られる。As described above, according to this embodiment, the noise removal rate of the canceling means 8 can be appropriately given for each band by inputting a cancellation coefficient, and by selecting the cancellation coefficient with high accuracy in correspondence with the voice, - Obtain clear, noise-suppressed audio output.

次に、別の本発明に付いて説明する。Next, another aspect of the present invention will be explained.

第６図は、その一実施例を示すブロック図である。第１
図の実施例の手段と同じ手段には、同じ番号を付してい
る。すなわち、１は音声信号を高速フーリエ変換するＦ
Ｆ７手段、２はそのフーリエ変換されたスペクトラム信
号に付いてケプストラムを求めるケプストラム分析手段
、３はそのケプストラム分析結果に基づいて、ピークを
求めるピーク検出手段、４はそのケプストラムの平均値
を算出する平均値算出手段、６は雑音予測手段、７はキ
ャンセル手段、９はＩ　ＦＦ７手段、７はキャンセル係
数設定手段である。母音／子音判定手段５は、次のよう
な手段を有している。すなわち、第１比較器５２は、前
記ピーク検出手段３で得られたピーク情報と、第１シュ
レスホールド設定部５１で設定された所定の闇値とを比
較し、その結果を出力する回路である。また、その第１
シュレスホールド設定部５１は、前記平均値算出手段４
で得られた平均値に応じて、闇値を設定する手段１６− である。FIG. 6 is a block diagram showing one embodiment thereof. 1st
Means that are the same as those in the illustrated embodiment are given the same numbers. That is, 1 is F for fast Fourier transform of the audio signal.
F7 means; 2 is a cepstrum analysis means for determining the cepstrum of the Fourier-transformed spectrum signal; 3 is a peak detection means for determining the peak based on the cepstrum analysis result; 4 is an average for calculating the average value of the cepstrum. 7 is a value calculation means, 6 is a noise prediction means, 7 is a cancellation means, 9 is an IFF7 means, and 7 is a cancellation coefficient setting means. The vowel/consonant determining means 5 has the following means. That is, the first comparator 52 is a circuit that compares the peak information obtained by the peak detection means 3 with a predetermined darkness value set by the first threshold setting section 51 and outputs the result. be. Also, the first
The threshold setting section 51 includes the average value calculation means 4
means 16- for setting a darkness value according to the average value obtained in the above.

また、第２比較器５３は、第２シュレスホールド設定部
５４で設定された所定の閾値と、前記平均値算出手段４
で得られた平均値とを比較し、その結果を出力する回路
である。Further, the second comparator 53 uses a predetermined threshold set by the second threshold setting section 54 and the average value calculation means 4.
This is a circuit that compares the average value obtained in , and outputs the result.

また、母音／子音判定回路５５は、第１比較器５２で得
られた比較結果と、第２比較器５３で得られた比較結果
とに基づき、入力された音声信号が母音あるか子音であ
るかを判定する回路である。Further, the vowel/consonant determination circuit 55 determines whether the input audio signal is a vowel or a consonant based on the comparison result obtained by the first comparator 52 and the comparison result obtained by the second comparator 53. This is a circuit that determines whether

次に、上記実施例の動作に付いて説明する。Next, the operation of the above embodiment will be explained.

ＦＦＴ手段ｌは、音声信号を高速フーリエ変換する。ケ
プストラム分析手段２は、そのフーリエ変換された信号
に付いて、ケプストラムを求める。The FFT means 1 performs fast Fourier transform on the audio signal. The cepstrum analysis means 2 obtains the cepstrum of the Fourier-transformed signal.

ピーク検出手段３は、その求められたケプストラムに付
いて、ピークを検出する。他方、平均値算出手段４は、
前記求められたケプストラムに付いてその平均値を算出
する。The peak detection means 3 detects a peak in the determined cepstrum. On the other hand, the average value calculation means 4
The average value of the cepstrum obtained above is calculated.

次に、第１シュレスホールド設定手段５１は、ピーク検
出手段３で得られたピークが、母音と判断するに足るピ
ークであるかどうかを決める基準となる閾値を設定する
。その際、平均値算出手段４で得られた平均値を参照し
てその同値を決定する。例えば、平均値が大きい場合は
、閾を高く設定して確実に母音を示すピークを選択でき
るようにするためである。Next, the first threshold setting means 51 sets a threshold value as a reference for determining whether the peak obtained by the peak detection means 3 is a peak sufficient to be determined to be a vowel. At that time, the same value is determined by referring to the average value obtained by the average value calculating means 4. For example, if the average value is large, the threshold is set high to ensure that a peak indicating a vowel can be selected.

第１比較器５２は、そのシュレスホールド設定手段５１
によって、設定された閾値と、前記ピーク検出手段３で
検出されたピークとを比較し、その比較結果を出力する
。The first comparator 52 has threshold setting means 51
The set threshold value is compared with the peak detected by the peak detection means 3, and the comparison result is output.

他方、第２シュレスホールド設定手段５４は、所定の閾
値を設定する。平均値自体の閾値、あるいは平均値の増
加傾向を示す微分係数の闇値などである。そして、第２
比較器５３は、平均値算出手段４で得られた平均値と、
第２シュレスホールド設定手段５４で設定された閾値と
を比較して出力する。すなわち、算出平均値と閾値平均
値とを比較し、あるいは、算出平均値の増加値と、閾値
微分係数値とを比較する。On the other hand, the second threshold setting means 54 sets a predetermined threshold value. This may be a threshold value of the average value itself, or a dark value of a differential coefficient that indicates an increasing tendency of the average value. And the second
The comparator 53 compares the average value obtained by the average value calculating means 4 with
A comparison is made with a threshold value set by the second threshold setting means 54, and the result is output. That is, the calculated average value and the threshold average value are compared, or the increased value of the calculated average value and the threshold differential coefficient value are compared.

母音／子音判定回路５５は、第１比較器５２の比較結果
と第２比較器５３の比較結果とに基づき、−１７＝１８− 母音、子音を判定する。第１比較器５２の比較結果にお
いて、ピークが確実に検出されているなら、その区域は
母音と判定する。また、第２比較器５３の比較結果にお
いて、平均値が閾値の平均値を上回ればその区域は子音
と判定する。あるいは、平均値の増加と、ＲＲｌＦＢの
微分係数を比較し、ｗ７ＪＷを上回ればそこを子音と判
定する。The vowel/consonant determination circuit 55 determines -17=18- vowels and consonants based on the comparison result of the first comparator 52 and the comparison result of the second comparator 53. If a peak is definitely detected in the comparison result of the first comparator 52, that area is determined to be a vowel. Further, in the comparison result of the second comparator 53, if the average value exceeds the average value of the threshold value, that area is determined to be a consonant. Alternatively, the increase in the average value is compared with the differential coefficient of RRlFB, and if it exceeds w7JW, it is determined to be a consonant.

尚、母音／子音判定手段５の判定方法として、音声の母
音と子音の区間の性質、例えば音声は子音＋母音で構成
される性質を考慮し、子音区間と母音区間が揃った場合
にはじめて最初の子音区間にさかのぼって子音としての
判定出力を出すようにしてもよい。即ち、雑音と子音と
の区別をより確実に行うため、平均値によって子音と判
断する場合でも、その後に区間が続かない場合は雑音と
判定するものである。The determination method of the vowel/consonant determining means 5 takes into account the nature of the vowel and consonant sections of speech, for example, the property that speech is composed of consonants + vowels, and only when the consonant section and vowel section are aligned can the first The determination output as a consonant may be output retroactively to the consonant section. That is, in order to more reliably distinguish between noise and consonants, even if a consonant is determined based on the average value, it is determined to be noise if there is no subsequent interval.

キャンセル係数設定手段７は、この母音／子音判定手段
５で判定された母音／子音区域の音声情報に基づき、適
正なキャンセル係数を設定する。The cancellation coefficient setting means 7 sets an appropriate cancellation coefficient based on the voice information of the vowel/consonant area determined by the vowel/consonant determination means 5.

他方、ノイズを含む音声／雑音入力は、雑音子１９− 測手段６で各チャンネル毎にその雑音成分が予測される
。また、音声信号はキャンセル手段８で各チャンネル毎
に雑音予測手段６から供給される雑音成分が除去される
。このときの雑音除去率は、キャンセル係数設定手段７
より供給されるキャンセル係数によって各チャンネル毎
に設定される。On the other hand, for voice/noise input containing noise, the noise component is predicted for each channel by the noise detector 19-measuring means 6. Further, the noise component supplied from the noise prediction means 6 for each channel is removed from the audio signal by the cancellation means 8. The noise removal rate at this time is determined by the cancellation coefficient setting means 7.
It is set for each channel by the cancellation coefficient supplied by

即ち、予測された雑音成分をａｌ、雑音混入信号をｂｌ
、キャンセル係数をα、とすると、キャンセル手段３の
出力　ｃｌは（ｂｌ−α、Ｘａ：）となる。That is, the predicted noise component is al, and the noise mixed signal is bl.
, the cancellation coefficient is α, the output cl of the canceling means 3 becomes (bl-α, Xa:).

そのキャンセル係数α１は、第７図に示すような係数値
である。即ち、第７図（ａ）は、各帯域におけるキャン
セル係数を示すものである。ここに、ｆｅ　　ｆａは音
声／雑音入力の全帯域を示しており、このｆｌｌ−ｆａ
をｍチャンネルに分割して、キャンセル係数を設定する
。ｆ＋−ｆ２は特に音声が含まれる帯域を示し、前記母
音／子音判定手段５によって前述のように確実に求めら
れる。この様に音声帯域では、キャンセル係数を小さく
しく０に近づける）、雑音の除去をできるだけしないよ
うにする。それによって明瞭度が良くなる。人間の聴２
０− 覚は多少雑音があっても音声を聞き取れるからである。The cancellation coefficient α1 is a coefficient value as shown in FIG. That is, FIG. 7(a) shows the cancellation coefficient in each band. Here, fe fa indicates the entire voice/noise input band, and this fll-fa
Divide into m channels and set the cancellation coefficient. f+-f2 particularly indicates a band in which speech is included, and is reliably determined by the vowel/consonant determining means 5 as described above. In this way, in the voice band, the cancellation coefficient is made small and close to 0), and noise is not removed as much as possible. This improves clarity. human hearing 2
0- This is because the senses can hear the voice even if there is some noise.

そしてｆｌＩ−ｆ＋、ｆ２　ｆａの非音声帯域では、キ
ャンセル係数を１として十分雑音を除去するようにして
いる。同図（ｂ）のキャンセル係数は、音声が全くなく
、雑音としか考えられないことが確実にわかっていると
きに用いるキャンセル係数で、１として雑音を十分除去
するようにしている。In the non-voice bands of flI-f+ and f2fa, the cancellation coefficient is set to 1 to sufficiently remove noise. The cancellation coefficient shown in FIG. 6(b) is used when it is certain that there is no voice at all and that it can only be considered as noise, and is set to 1 to sufficiently remove noise.

例えば、ピーク周波数からみて、母音が全く出てこない
ことが続いた場合、音声信号とは判断できないので雑音
と判断する等がこの場合にあたる。For example, if a vowel does not appear at all in terms of peak frequency, it cannot be determined that it is a voice signal, so it is determined to be noise.

第７図（ａ）、　（ｂ）のキャンセル係数を適宜切り換
え得るようにすることが望ましい。It is desirable to be able to switch the cancellation coefficients in FIGS. 7(a) and 7(b) as appropriate.

なお、本発明はコンピュータを利用してソフトウェア的
に実現できるが、専用のハード回路を用いても実現可能
である。Note that although the present invention can be implemented in software using a computer, it can also be implemented using a dedicated hardware circuit.

発明の詳細な説明したところから明らかなように、本発明にかかる
音声信号処理装置は、雑音の混入した音声信号に付いて
、その母音／子音区域を検出し、それに基づいて、キャ
ンセル係数設定手段が適切なキャンセル係数を設定し、
そのキャンセル係数を利用して、予測された雑音成分を
適切に除去するので、雑音を除去すると共に、明瞭度も
良いものとすることができる。As is clear from the detailed description of the invention, the audio signal processing device according to the present invention detects the vowel/consonant area of the audio signal mixed with noise, and uses the cancellation coefficient setting means based on the detected vowel/consonant area. sets the appropriate cancellation factor,
Since the predicted noise component is appropriately removed using the cancellation coefficient, the noise can be removed and the clarity can also be improved.

[Brief explanation of drawings]

第１図は本発明にかかる音声信号処理装置の一実施例を
示すブロック図、第２図は同実施例におけるケプストラ
ムピークを示すグラフ、第３図は、同実施例の雑音予測
方法を説明するためのグラフ、第４図、第５図は、同実
施例のキャンセレーション法を説明するための波形図、
第６図は別の本発明にかかる音声信号処理装置の一実施
例を示すブロック図、第７図は、同実施例のキャンセル
係数を説明する売めのグラフ、第８図は従来の音声信号
処理装置を示すブロック図である。１・・・周波数分析手段（帯域分割手段、ＦＦＴ手段）
、２・・・ケプストラム分析手段、３・・・ピーク検出
手段、４・・・平均値算出手段、５・・・母音／子音判
定手段、５１・・・第１シュレスホールド設定部、５２
・・・第１比較器、５３・・・第２比較器、５４・・２
１− ２２− ・第２シュレスホールド設定部、５５・・・母音／子音
判定回路、６・・・雑音予測手段、７・・・キャンセル
係数設定手段、８・・・キャンセル手段、９・・・信号
合成（帯域合成手段、Ｉ　Ｆ　Ｆ　’ｒ手段）。FIG. 1 is a block diagram showing an embodiment of the audio signal processing device according to the present invention, FIG. 2 is a graph showing cepstrum peaks in the embodiment, and FIG. 3 explains the noise prediction method of the embodiment. 4 and 5 are waveform diagrams for explaining the cancellation method of the same embodiment,
FIG. 6 is a block diagram showing another embodiment of the audio signal processing device according to the present invention, FIG. 7 is a graph explaining the cancellation coefficient of the same embodiment, and FIG. 8 is a diagram showing a conventional audio signal processing device. FIG. 2 is a block diagram showing a processing device. 1... Frequency analysis means (band division means, FFT means)
, 2... Cepstrum analysis means, 3... Peak detection means, 4... Average value calculation means, 5... Vowel/consonant determination means, 51... First threshold setting section, 52
...First comparator, 53...Second comparator, 54...2
1-22- Second threshold setting section, 55... Vowel/consonant determination circuit, 6... Noise prediction means, 7... Cancellation coefficient setting means, 8... Cancellation means, 9... - Signal synthesis (band synthesis means, IFF'r means).

Claims

[Claims]

(1) A frequency analysis means for frequency analyzing an audio input signal, a cepstrum analysis means for cepstrally analyzing the frequency analysis output of the frequency analysis means, and a peak detection means for detecting a cepstrum peak in the cepstrum analysis output of the cepstrum analysis means. , an average value calculation means for calculating an average level in the cepstrum analysis output of the cepstrum analysis means, and a vowel based on the peak based on the peak detection information of the peak detection means and the average value information of the average value calculation means. a vowel/consonant determining means that determines a vowel or a consonant by determining a consonant based on the level of the average value information; and a cancellation that sets a cancellation coefficient using the determination result of the vowel/consonant determining means. a coefficient setting means, a noise prediction means which receives the Fourier transformed audio signal and predicts its noise component, a noise prediction output of the noise prediction means, the audio signal, and a noise prediction means set by the cancellation coefficient setting means. Signal processing characterized in that a cancellation coefficient signal is input, and the signal processing comprises a canceling means for canceling a noise component from the audio signal in consideration of the cancellation rate, and a signal synthesizing means for synthesizing the cancellation output of the canceling means. Device.

(2) band dividing means for dividing an audio input signal into bands, cepstrum analysis means for cepstrum analysis of the band division output of the band dividing means, and peak detection means for detecting a cepstrum peak in the cepstrum analysis output of the cepstrum analysis means; , an average value calculation means for calculating an average level in the cepstrum analysis output of the cepstrum analysis means, and a vowel based on the peak based on the peak detection information of the peak detection means and the average value information of the average value calculation means. and determine the consonant based on the level of the average value information to determine the vowel/consonant.
a consonant determination means; a cancellation coefficient setting means for setting a cancellation coefficient using the determination result of the vowel/consonant determination means; and a noise prediction means for receiving the Fourier-transformed audio signal and predicting its noise component. , the noise prediction output of the noise prediction means, the audio signal, and the cancellation coefficient signal set by the cancellation coefficient setting means are input, and canceling means cancels the noise component from the audio signal in consideration of the cancellation rate. and band synthesis means for band synthesising the cancellation outputs of the cancellation means.

(3) The vowel/consonant determination means includes a first comparator that compares at least the peak detected by the peak detection means and the threshold value set by the threshold setting unit;
a second comparator that compares the average value calculated by the average value calculation means and a predetermined threshold value set by the threshold value setting section; and based on the comparison results of the first and second comparators, vowels, Vowels that determine consonants and output the results/
3. The audio signal processing device according to claim 2, further comprising a consonant determination circuit.