JP5815614B2

JP5815614B2 - Reverberation suppression apparatus and method, program, and recording medium

Info

Publication number: JP5815614B2
Application number: JP2013168219A
Authority: JP
Inventors: 小林　和則; 和則小林; 仲大室; 慶介木下; 中谷　智広; 智広中谷
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-08-13
Filing date: 2013-08-13
Publication date: 2015-11-17
Anticipated expiration: 2033-08-13
Also published as: JP2015037238A

Description

本発明は、マイクロホンで収音した信号から部屋の残響による成分を抑圧する残響抑圧装置とその方法と、プログラムとその記録媒体に関する。 The present invention relates to a reverberation suppressing apparatus and method, a program, and a recording medium for suppressing a component due to room reverberation from a signal picked up by a microphone.

室内で音声を収音した場合、壁や床などで反射した残響音成分が直接音と同時に収音され音声が劣化する。例えば、広い会議室でのハンズフリーによる音声会議や、残響の多い場所での携帯端末による通話では、残響の影響により聞き取り難い音声となってしまう。 When sound is collected indoors, the reverberant sound component reflected from the wall or floor is picked up simultaneously with the direct sound and the sound deteriorates. For example, in a voice conference by hands-free in a large conference room or a call by a mobile terminal in a place with much reverberation, the sound becomes difficult to hear due to the effect of reverberation.

そこで、従来から、このような残響による音質劣化を軽減する目的の残響成分を抑圧する方法が提案されている。例えば、非特許文献１に開示されたマルチステップ線形予測を用いた残響抑圧装置９００が知られている。 Therefore, conventionally, a method for suppressing a reverberation component for the purpose of reducing the sound quality deterioration due to such reverberation has been proposed. For example, a dereverberation apparatus 900 using multi-step linear prediction disclosed in Non-Patent Document 1 is known.

図１７に、残響抑圧装置９００の機能構成を示してその動作を簡単に説明する。残響抑圧装置９００は、白色化部９１０、マルチステップ線形予測部９２０、残響計算部９３０、ＦＦＴ９４０、ＦＦＴ部９５０、スペクトルサブトラクション部９６０、逆ＦＦＴ部９７０、を具備する。白色化部９１０は、時間領域の収音信号を、短いタップ長の線形予測を用いて音声の自己相関に起因する周波数特性を取り除き白色化する。マルチステップ線形予測部９２０は、白色化された時間領域の収音信号に対して、長いタップ長のマルチステップ線形予測を行い残響成分を予測するフィルタ係数を算出する。長いタップ長とは、サンプリング周波数を例えば８ｋＨｚと仮定すると、６００〜７００ｍｓの残響時間に相当する５〜６千点程度の長さである。残響計算部９３０は、算出されたフィルタ係数で時間領域の収音信号をフィルタリングすることで残響成分を予測する。 FIG. 17 shows a functional configuration of the dereverberation apparatus 900 and its operation will be briefly described. The dereverberation apparatus 900 includes a whitening unit 910, a multi-step linear prediction unit 920, a reverberation calculation unit 930, an FFT 940, an FFT unit 950, a spectral subtraction unit 960, and an inverse FFT unit 970. The whitening unit 9 1 0 whitens the time-domain sound collected signal by removing frequency characteristics due to speech autocorrelation using linear prediction with a short tap length. The multi-step linear prediction unit 9 20 performs filter steps for predicting reverberation components by performing multi-step linear prediction with a long tap length on a whitened time domain sound pickup signal. The long tap length is a length of about 5 to 6,000 points corresponding to a reverberation time of 600 to 700 ms, assuming that the sampling frequency is 8 kHz, for example. The reverberation calculation unit 9 30 predicts a reverberation component by filtering the collected sound signal in the time domain with the calculated filter coefficient.

ＦＦＴ部９４０は、予測された残響成分を短時間フーリエ変換によって周波数領域の信号である周波数領域残響成分に変換する。ＦＦＴ９５０は、時間領域の収音信号を短時間フーリエ変換によって周波数領域の信号である周波数領域収音信号に変換する。 The FFT unit 940 converts the predicted reverberation component into a frequency domain reverberation component that is a frequency domain signal by short-time Fourier transform. The FFT 950 converts the time domain sound collection signal into a frequency domain sound collection signal that is a frequency domain signal by short-time Fourier transform.

スペクトルサブトラクション部９６０は、周波数領域残響成分の周波数ごとのパワーと、周波数領域収音信号の周波数ごとのパワーから残響を抑圧するゲインを計算し、周波数領域収音信号に当該ゲインを乗算することで残響を抑圧した残響抑圧信号を出力する。残響抑圧信号は、逆フーリエ変換によって時間領域の残響抑圧信号に変換される。 Spectral subtraction unit 960 calculates a gain for suppressing reverberation from the power for each frequency of the frequency domain reverberation component and the power for each frequency of the frequency domain sound collection signal, and multiplies the frequency domain sound collection signal by the gain. A reverberation suppression signal in which reverberation is suppressed is output. The reverberation suppression signal is converted into a time domain reverberation suppression signal by inverse Fourier transform.

Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakaatani, and Masato Miyoshi, “Suppression of Late Reverberation Effect on Speech Signal Using Long-Term Multiple-step Linear Prediction,” IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, No. 4, MAY 2009.Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakaatani, and Masato Miyoshi, “Suppression of Late Reverberation Effect on Speech Signal Using Long-Term Multiple-step Linear Prediction,” IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, No. 4, MAY 2009.

しかし、従来の残響抑圧装置９００では、長いタップ長の線形予測を用いることから、演算量が膨大になる課題があった。レビンソン・ダービンアルゴリズムによる線形予測であれば、演算量はタップ長の２乗のオーダーとなる。上記した例では６×１０⁶オーダーの計算量が必要となる。 However, since the conventional dereverberation apparatus 900 uses linear prediction with a long tap length, there is a problem that the amount of calculation becomes enormous. In the case of linear prediction by the Levinson-Durbin algorithm, the amount of calculation is on the order of the square of the tap length. In the above example, a calculation amount of 6 × 10 ⁶ order is required.

本発明は、この課題に鑑みてなされたものであり、低演算量で残響を抑圧する残響抑圧装置とその方法と、プログラムとその記録媒体を提供することを目的とする。 The present invention has been made in view of this problem, and an object of the present invention is to provide a reverberation suppressing device and method, a program, and a recording medium for suppressing reverberation with a low amount of computation.

本発明の残響抑圧装置は、ＦＦＴ部と、パワー計算部と、遅延部と、減算部と、適応アルゴリズム部と、周波数領域フィルタ部と、残響抑圧ゲイン計算部と、乗算部と、逆ＦＦＴ部と、を具備する。ＦＦＴ部は、収音信号を周波数領域の周波数領域収音信号に変換する。パワー計算部は、ＦＦＴ部が出力する周波数領域収音信号のパワーを周波数ごとに計算したパワースペクトルを出力する。遅延部は、パワー計算部が出力するパワースペクトルを所定の遅延量だけ遅延させた遅延パワースペクトルを出力する。減算部は、パワー計算部が出力するパワースペクトルから、遅延部が出力する遅延パワースペクトルにフィルタ係数を乗じたフィルタ後信号を減算して直接音の推定パワーを求める。適応アルゴリズム部は、遅延部が出力する遅延パワースペクトルと上記直接音の推定パワーを入力として、当該直接音の推定パワーを最小化するようにフィルタ係数を更新する。周波数領域フィルタ部は、遅延部が出力する遅延パワースペクトルに上記フィルタ係数を乗じてフィルタリングしたフィルタ後信号を出力する。残響抑圧ゲイン計算部は、パワー計算部が出力するパワースペクトルと減算部が出力する直接音の推定パワーを入力として、残響音を抑圧するための残響抑圧ゲインを計算する。乗算部は、周波数領域収音信号に残響抑圧ゲインを乗じて残響抑圧信号を出力する。逆ＦＦＴ部は、乗算部が出力する残響抑圧信号を時間領域の残響抑圧信号に変換する。 The dereverberation apparatus of the present invention includes an FFT unit, a power calculation unit, a delay unit, a subtraction unit, an adaptive algorithm unit, a frequency domain filter unit, a dereverberation suppression gain calculation unit, a multiplication unit, and an inverse FFT unit. And. The FFT unit converts the collected sound signal into a frequency domain sound collected signal in the frequency domain. The power calculation unit outputs a power spectrum in which the power of the frequency domain sound collection signal output from the FFT unit is calculated for each frequency. The delay unit outputs a delay power spectrum obtained by delaying the power spectrum output from the power calculation unit by a predetermined delay amount. The subtracting unit subtracts the filtered signal obtained by multiplying the delayed power spectrum output from the delay unit by the filter coefficient from the power spectrum output from the power calculating unit to obtain the estimated power of the direct sound. The adaptive algorithm unit receives the delay power spectrum output from the delay unit and the estimated power of the direct sound, and updates the filter coefficient so as to minimize the estimated power of the direct sound. The frequency domain filter unit outputs a filtered signal obtained by filtering the delayed power spectrum output from the delay unit by the filter coefficient. The reverberation suppression gain calculation unit calculates the reverberation suppression gain for suppressing the reverberation sound by using the power spectrum output from the power calculation unit and the estimated power of the direct sound output from the subtraction unit as inputs. The multiplication unit multiplies the frequency domain collected signal by a dereverberation gain and outputs a dereverberation signal. The inverse FFT unit converts the dereverberation signal output from the multiplication unit into a dereverberation signal in the time domain.

本発明の残響抑圧装置によれば、減算部の出力信号である直接音の推定パワーを最小化するように適応アルゴリズムを用いてフィルタ係数をパワースペクトルの領域で更新するので、フィルタのタップ長の１乗のオーダーの演算量で残響抑圧を行うことができる。具体的な効果については後述するが、従来法と比較して演算量を大幅に削減することが可能である。 According to the dereverberation apparatus of the present invention, the filter coefficient is updated in the region of the power spectrum using an adaptive algorithm so as to minimize the estimated power of the direct sound that is the output signal of the subtraction unit. Reverberation suppression can be performed with a calculation amount in the order of the first power. Although specific effects will be described later, the amount of calculation can be significantly reduced as compared with the conventional method.

本発明の残響抑圧装置１００の機能構成例を示す図。The figure which shows the function structural example of the dereverberation apparatus 100 of this invention. 残響抑圧装置１００の動作フローを示す図。The figure which shows the operation | movement flow of the dereverberation apparatus. 周波数領域フィルタ部１６０の機能構成例を示す図。The figure which shows the function structural example of the frequency domain filter part 160. FIG. 適応アルゴリズム部１５０の機能構成例を示す図。The figure which shows the function structural example of the adaptive algorithm part 150. FIG. 残響抑圧ゲイン計算部１７０の機能構成例を示す図。The figure which shows the function structural example of the reverberation suppression gain calculation part 170. FIG. 本発明の残響抑圧装置２００の適応アルゴリズム部２５０の機能構成例を示す図。The figure which shows the function structural example of the adaptive algorithm part 250 of the dereverberation apparatus 200 of this invention. 適応アルゴリズム部２５０の動作フローを示す図。The figure which shows the operation | movement flow of the adaptive algorithm part 250. 本発明の残響抑圧装置３００の適応アルゴリズム部３５０の機能構成例を示す図。The figure which shows the function structural example of the adaptive algorithm part 350 of the dereverberation apparatus 300 of this invention. 適応アルゴリズム部３５０の動作フローを示す図。The figure which shows the operation | movement flow of the adaptive algorithm part 350. 本発明の残響抑圧装置４００の残響抑圧ゲイン計算部４７０の機能構成例を示す図。The figure which shows the function structural example of the reverberation suppression gain calculation part 470 of the reverberation suppression apparatus 400 of this invention. 残響抑圧ゲイン計算部４７０の前半の動作フローを示す図。The figure which shows the operation | movement flow of the first half of the reverberation suppression gain calculation part. 残響抑圧ゲイン計算部４７０の後半の動作フローを示す図。The figure which shows the operation | movement flow of the latter half of the dereverberation gain calculation part. 本発明の残響抑圧装置５００の機能構成例を示す図。The figure which shows the function structural example of the dereverberation apparatus 500 of this invention. 本発明の残響抑圧装置６００の機能構成例を示す図。The figure which shows the function structural example of the dereverberation apparatus 600 of this invention. 本発明の残響抑圧装置７００の機能構成例を示す図。The figure which shows the function structural example of the dereverberation apparatus 700 of this invention. 評価実験の結果を示す図。The figure which shows the result of evaluation experiment. 従来の残響抑圧装置９００の機能構成を示す図。The figure which shows the function structure of the conventional dereverberation apparatus 900.

以下、この発明の実施の形態を図面を参照して説明する。複数の図面中同一のものには同じ参照符号を付し、説明は繰り返さない。 Embodiments of the present invention will be described below with reference to the drawings. The same reference numerals are given to the same components in a plurality of drawings, and the description will not be repeated.

図１に、この発明の残響抑圧装置１００の機能構成例を示す。その動作フローを図２に示す。残響抑圧装置１００は、ＦＦＴ部１１０と、パワー計算部１２０と、遅延部１３０と、減算部１４０と、適応アルゴリズム部１５０と、周波数領域フィルタ部１６０と、残響抑圧ゲイン計算部１７０と、乗算部１８０と、逆ＦＦＴ部１９０と、制御部１９５と、を具備する。残響抑圧装置１００は、例えばＲＯＭ、ＲＡＭ、ＣＰＵ等で構成されるコンピュータに所定のプログラムが読み込まれて、ＣＰＵがそのプログラムを実行することで実現されるものである。以下説明する各装置についても同じである。 FIG. 1 shows a functional configuration example of a dereverberation apparatus 100 of the present invention. The operation flow is shown in FIG. The dereverberation apparatus 100 includes an FFT unit 110, a power calculation unit 120, a delay unit 130, a subtraction unit 140, an adaptive algorithm unit 150, a frequency domain filter unit 160, a dereverberation suppression gain calculation unit 170, and a multiplication unit. 180, an inverse FFT unit 190, and a control unit 195. The dereverberation apparatus 100 is realized by reading a predetermined program into a computer composed of, for example, a ROM, a RAM, a CPU, and the like, and executing the program by the CPU. The same applies to each device described below.

ＦＦＴ部１１０は、収音信号を周波数領域の周波数領域収音信号に変換する（ステップＳ１１０）。収音信号は、図示しないマイクロホンで収音した直接音と室内の壁や床などで反射した残響音とからなる信号である。収音信号は、例えばサンプリング周波数８ｋＨｚで離散値化されたディジタル信号である。図１において、収音信号をディジタル化するＡ/Ｄ変換器や、ディジタル信号を連続値化するＤ/Ａ変換器の表記は省略している。ＦＦＴ部１１０は、短時間フーリエ変換によって離散値化した収音信号を、例えば１２８個集めたフレーム単位（ｔ＝１６ｍｓ）の間隔、ウィンドウサイズ３２ｍｓで周波数領域収音信号に変換する。ウィンドウはハニングウィンドウの平方根を取ったものなどを用いる。 The FFT unit 110 converts the collected sound signal into a frequency domain sound collected signal in the frequency domain (step S110). The collected sound signal is a signal composed of a direct sound collected by a microphone (not shown) and a reverberant sound reflected by an indoor wall or floor. The collected sound signal is, for example, a digital signal digitized at a sampling frequency of 8 kHz. In FIG. 1, the notation of an A / D converter that digitizes a collected sound signal and a D / A converter that converts a digital signal into a continuous value is omitted. The FFT unit 110 converts the collected sound signal, which has been made discrete by the short-time Fourier transform, into a frequency domain sound collected signal at intervals of, for example, 128 frames (t = 16 ms) and a window size of 32 ms. The window is obtained by taking the square root of the Hanning window.

パワー計算部１２０は、ＦＦＴ部１１０が出力する周波数領域収音信号を入力として、当該周波数領域収音信号のパワースペクトルを計算して出力する（ステップＳ１２０）。遅延部１３０は、パワー計算部１２０が出力するパワースペクトルを所定の遅延量だけ遅延させた遅延パワースペクトルを出力する（ステップＳ１３０）。所定の遅延量とは、例えば数１０〜１００ｍｓ程度の遅延時間である。 The power calculation unit 120 receives the frequency domain sound collection signal output from the FFT unit 110, calculates a power spectrum of the frequency domain sound collection signal, and outputs the power spectrum (step S120). The delay unit 130 outputs a delay power spectrum obtained by delaying the power spectrum output from the power calculation unit 120 by a predetermined delay amount (step S130). The predetermined delay amount is a delay time of about several tens to 100 ms, for example.

減算部１４０は、パワー計算部１２０が出力するパワースペクトルから、遅延パワースペクトルに適応アルゴリズム部１５０で求めたフィルタ係数を乗じたフィルタ後信号を減算して直接音の推定パワーを求める（ステップＳ１４０）。適応アルゴリズム部１５０は、遅延パワースペクトルと直接音の推定パワーを入力として、当該直接音の推定パワーを最小化するようにフィルタ係数を更新する（ステップＳ１５０）。 The subtracting unit 140 subtracts the filtered signal obtained by multiplying the delayed power spectrum by the filter coefficient obtained by the adaptive algorithm unit 150 from the power spectrum output from the power calculating unit 120 to obtain the estimated power of the direct sound (step S140). . The adaptive algorithm unit 150 receives the delay power spectrum and the estimated power of the direct sound as input, and updates the filter coefficient so as to minimize the estimated power of the direct sound (step S150).

周波数領域フィルタ部１６０は、遅延パワースペクトルにフィルタ係数を乗じてフィルタリングしたフィルタ後信号を出力する（ステップＳ１６０）。残響抑圧ゲイン計算部１７０は、パワースペクトルと直接音の推定パワーを入力として、残響音を抑圧するための残響抑圧ゲインを計算する（ステップＳ１７０）。 The frequency domain filter unit 160 outputs a filtered signal that has been filtered by multiplying the delay power spectrum by a filter coefficient (step S160). The reverberation suppression gain calculation unit 170 receives the power spectrum and the estimated power of the direct sound as input, and calculates a reverberation suppression gain for suppressing the reverberation sound (step S170).

乗算部１８０は、周波数領域収音信号に残響抑圧ゲインを乗じて残響抑圧信号を出力する（ステップＳ１８０）。逆ＦＦＴ部１９０は、残響抑圧信号を、ＦＦＴ部１１０と同じ間隔同じ点数で逆フーリエ変換し、その出力にウィンドウを乗算してオーバラップ加算して時間領域の残響抑圧信号に変換する（ステップＳ１９０）。以上説明したステップＳ１１０〜ステップＳ１９０の処理は、フレームを更新しながら（ステップＳ１９６）動作を停止するまで繰り返される（ステップＳ１９５のＮｏ）。この繰り返し処理の制御は制御部１９５が行う。制御部１９５は、残響抑圧装置１００の時系列動作を制御するものであり、特別なものではない。 Multiplier 180 multiplies the frequency domain collected signal by a dereverberation gain and outputs a dereverberation suppression signal (step S180). The inverse FFT unit 190 performs inverse Fourier transform on the dereverberation signal at the same interval and the same number as the FFT unit 110, multiplies the output by a window, and performs overlap addition to convert the signal into a time domain dereverberation signal (step S190). ). The processes in steps S110 to S190 described above are repeated until the operation is stopped while updating the frame (step S196) (No in step S195). The control unit 195 controls this repetitive process. The control unit 195 controls the time series operation of the dereverberation apparatus 100 and is not special.

残響抑圧装置１００では、減算部１４０の出力する誤差信号（直接音の推定パワー）を最小化するように適応アルゴリズムを用いてフィルタ係数を更新することで、遅延部１３０で遅延させた遅延パワースペクトルから遅延前のパワースペクトルを予測している。すなわち、減算部１４０の出力する誤差信号は、予測できなかった信号成分ということになり、減算部１４０で減算された成分は予測できた信号成分ということになる。 In the dereverberation apparatus 100, the delay power spectrum delayed by the delay unit 130 by updating the filter coefficient using an adaptive algorithm so as to minimize the error signal (estimated power of the direct sound) output from the subtraction unit 140. To predict the power spectrum before the delay. That is, the error signal output from the subtracting unit 140 is a signal component that could not be predicted, and the component subtracted by the subtracting unit 140 is a signal component that could be predicted.

一方、部屋の残響は、音源からマイクロホンまでに音が到達するときに、壁などで反射した音が、直接音から遅れてマイクロホンに到達する現象である。反射音は、必ず直接音よりも遅く到達するので、残響成分は、過去にマイクロホンに到達した音から予測可能である。これに対して、直接音は、最も早く到達するので過去の到達音からは完全には予測できない。よって、過去の信号から予測できない成分である減算部１４０の出力する誤差信号は、直接音の推定パワーとなる。 On the other hand, the reverberation in the room is a phenomenon in which when the sound reaches the microphone from the sound source, the sound reflected by the wall or the like arrives at the microphone with a delay from the direct sound. Since the reflected sound always arrives later than the direct sound, the reverberation component can be predicted from the sound that has reached the microphone in the past. On the other hand, since the direct sound reaches the earliest, it cannot be completely predicted from the past arrival sounds. Therefore, the error signal output from the subtractor 140, which is a component that cannot be predicted from the past signal, is the estimated power of the direct sound.

したがって、直接音の推定パワーを入力信号のパワーで除算した値をゲインとして周波数領域収音信号に乗じれば、入力信号（収音信号）のパワースペクトルが直接音の成分のパワースペクトルに変形されるので、残響音を抑圧することができる。このように残響抑圧装置１００は、パワースペクトルの領域での適応アルゴリズムを用いて残響音を抑圧できる。適応アルゴリズムを用いることで周波数領域フィルタ部１６０のタップ長の１乗のオーダーの演算量で、残響抑圧を実現することができる。 Therefore, if the frequency domain sound pickup signal is multiplied by the gain obtained by dividing the estimated power of the direct sound by the power of the input signal, the power spectrum of the input signal (sound pickup signal) is transformed into the power spectrum of the direct sound component. Therefore, reverberant sound can be suppressed. Thus, the reverberation suppression apparatus 100 can suppress reverberation using an adaptive algorithm in the power spectrum region. By using an adaptive algorithm, dereverberation can be realized with a calculation amount in the order of the first power of the tap length of the frequency domain filter unit 160.

以降では、各部のより具体的な機能構成例を示して、更に詳しく残響抑圧装置１００の動作を説明する。 Hereinafter, the operation of the dereverberation apparatus 100 will be described in more detail by showing more specific functional configuration examples of the respective units.

〔周波数領域フィルタ部〕
図３に、周波数領域フィルタ部１６０の機能構成例を示す。周波数領域フィルタ部１６０は、信号バッファ手段１６１と、畳み込み計算手段１６２と、を備える。 [Frequency domain filter section]
FIG. 3 shows a functional configuration example of the frequency domain filter unit 160. The frequency domain filter unit 160 includes a signal buffer unit 161 and a convolution calculation unit 162.

信号バッファ手段１６１は、遅延部１３０が出力する遅延パワースペクトル｜Ｘ（ω，ｔ−Ｄ）｜^２の過去Ｍ個分を保存する。但し、ｔはフレーム番号、ωは周波数、Ｄは遅延部で与えられる遅延量、Ｘ（ω，ｔ）はＦＦＴ部１１０が出力する周波数領域収音信号である。 The signal buffer means 161 stores the past M pieces of the delay power spectrum | X (ω, t−D) | ² output from the delay unit 130. However, t is a frame number, ω is a frequency, D is a delay amount given by the delay unit, and X (ω, t) is a frequency domain sound pickup signal output from the FFT unit 110.

畳み込み計算手段１６２は、適応アルゴリズム部１５０で保持されているＭタップのフィルタ係数Ｆ（ω，ｍ，ｔ），ｍ＝０，…，Ｍ−１を、周波数ごとに遅延パワースペクトル｜Ｘ（ω，ｔ−Ｄ）｜^２と畳み込み演算し、次式でフィルタ後信号｜Ｙ（ω，ｔ−Ｄ）｜^２を計算して減算部１４０に出力する。 The convolution calculation means 162 applies the M tap filter coefficients F (ω, m, t), m = 0,..., M−1 held in the adaptive algorithm unit 150 to the delay power spectrum | X (ω , T−D) | ² , a post-filter signal | Y (ω, t−D) | ² is calculated by the following equation and output to the subtracting unit 140.

〔適応アルゴリズム部〕
図４に、適応アルゴリズム部１５０の機能構成例を示す。適応アルゴリズム部１５０は、更新ベクトル計算手段１５１と、ステップサイズ乗算手段１５２と、加算手段１５３と、フィルタ係数保持手段１５４と、を備える。 [Adaptive algorithm part]
FIG. 4 shows a functional configuration example of the adaptive algorithm unit 150. The adaptive algorithm unit 150 includes an update vector calculation unit 151, a step size multiplication unit 152, an addition unit 153, and a filter coefficient holding unit 154.

更新ベクトル計算手段１５１は、遅延パワースペクトル｜Ｘ（ω，ｔ−Ｄ）｜^２と減算部１４０の出力信号Ｅ（ω，ｔ）を入力として、適応アルゴリズムを用いて更新ベクトルＵ（ω，ｍ，ｔ）を計算する。減算部１４０の出力信号Ｅ（ω，ｔ）は、次式に示すようにパワースペクトル｜Ｘ（ω，ｔ）｜^２からフィルタ後信号｜Ｙ（ω，ｔ−Ｄ）｜^２を減算した誤差信号である。減算部１４０の出力信号Ｅ（ω，ｔ）は、過去の信号から予測できなかった信号成分であり、直接音の推定パワーとなる。 The update vector calculation means 151 receives the delay power spectrum | X (ω, t−D) | ² and the output signal E (ω, t) of the subtraction unit 140 as inputs, and uses the adaptive algorithm to update the vector U (ω, m , T). The output signal E (ω, t) of the subtracting unit 140 is an error obtained by subtracting the filtered signal | Y (ω, t−D) | ² from the power spectrum | X (ω, t) | ² as shown in the following equation. Signal. The output signal E (ω, t) of the subtractor 140 is a signal component that could not be predicted from the past signal, and is the estimated power of the direct sound.

適応アルゴリズムは例えばＮＬＭＳ法（参考文献１：Simon Haykin, Adaptive filter theory. Prentice-Hall, 1986.）を用いる。更新ベクトル計算手段１５１は、次式で更新ベクトルＵ（ω，ｍ，ｔ）を計算する。 As an adaptive algorithm, for example, the NLMS method (reference 1: Simon Haykin, Adaptive filter theory. Prentice-Hall, 1986) is used. The update vector calculation means 151 calculates the update vector U (ω, m, t) by the following equation.

ここでｎはフィルタ係数の番号である。 Here, n is a filter coefficient number.

ステップサイズ乗算部１５２は、更新ベクトルＵ（ω，ｍ，ｔ）に、予め設定した０〜２の範囲のステップサイズαを乗じたフィルタ係数を出力する。加算手段１５３は、ステップサイズ乗算部１５２が出力するフィルタ係数に、フィルタ係数保持手段１５４に保持されている１フレーム前のフィルタ係数を加算して次のフレームのフィルタ係数Ｆ（ω，ｍ，ｔ＋１）として出力する。 The step size multiplier 152 outputs a filter coefficient obtained by multiplying the update vector U (ω, m, t) by a preset step size α in the range of 0-2. The adding means 153 adds the filter coefficient of the previous frame held in the filter coefficient holding means 154 to the filter coefficient output from the step size multiplier 152, and adds the filter coefficient F (ω, m, t + 1) of the next frame. ).

ＮＬＭＳ法などの適応アルゴリズムは、減算部１４０が出力する誤差信号Ｅ（ω，ｔ）の２乗平均を最小化するように動作する。よって、Ｄだけ過去の遅延パワースペクトル｜Ｘ（ω，ｔ−Ｄ）｜^２から、現在の周波数領域収音信号Ｘ（ω，ｔ）を出来る限り予測し、予測できた成分を取り除いた信号が誤差信号Ｅ（ω，ｔ）となる。したがって、過去の信号から予測できない成分のみが残り、誤差信号Ｅ（ω，ｔ）が直接音の推定パワーとなる。 An adaptive algorithm such as the NLMS method operates so as to minimize the mean square of the error signal E (ω, t) output from the subtraction unit 140. Therefore, a signal obtained by predicting the current frequency domain sound collection signal X (ω, t) as much as possible from the past delay power spectrum | X (ω, t−D) | ^{2 by} D and removing the predicted component. An error signal E (ω, t) is obtained. Therefore, only the components that cannot be predicted from the past signal remain, and the error signal E (ω, t) becomes the estimated power of the direct sound.

〔残響抑圧ゲイン計算部〕
図５に、残響抑圧ゲイン計算部１７０の機能構成例を示す。残響抑圧ゲイン計算部１７０は、除算手段１７１と、最大値制限手段１７２と、時間平滑化手段１７３と、を備える。 [Reverberation suppression gain calculator]
FIG. 5 shows a functional configuration example of the reverberation suppression gain calculation unit 170. The reverberation suppression gain calculation unit 170 includes a dividing unit 171, a maximum value limiting unit 172, and a time smoothing unit 173.

除算手段１７１は、パワースペクトル｜Ｘ（ω，ｔ）｜^２と直接音の推定パワーＥ（ω，ｔ）を入力として、両者の比を計算し、その比をβ乗してゲインＧ′（ω，ｔ）を計算する（式（５））。 The dividing means 171 receives the power spectrum | X (ω, t) | ² and the estimated direct sound power E (ω, t), calculates the ratio between them, and raises the ratio to the β power to obtain the gain G ′ ( (ω, t) is calculated (formula (5)).

ここでβは、予め設定した定数であり、大きい値に設定するほど残響の抑圧量が強くなる。βは、おおよそ０.５〜１.０の間で設定される。 Here, β is a preset constant, and the larger the value, the stronger the reverberation suppression amount. β is set between approximately 0.5 and 1.0.

最大値制限手段１７２は、除算手段１７１が出力するゲインＧ′（ω，ｔ）の値を１を上限として、式（６）と式（７）に示すように制限する。 The maximum value limiting means 172 limits the value of the gain G ′ (ω, t) output from the dividing means 171 with 1 as the upper limit as shown in the equations (6) and (7).

時間平滑化手段１７３は、最大値制限手段１７２の出力する制限されたゲインＧ″（ω，ｔ）を時間平滑化して、残響抑圧ゲインＧ（ω，ｔ）を出力する。時間平滑化は、例えば次式で実現される。 The time smoothing unit 173 time smoothes the limited gain G ″ (ω, t) output from the maximum value limiting unit 172, and outputs a reverberation suppression gain G (ω, t). For example, it is realized by the following equation.

ここでγは平滑化係数であり、予め設定される。γは０〜１の範囲の値を取り、１に近いほど長い時定数での平滑化となる。 Here, γ is a smoothing coefficient and is set in advance. γ takes a value in the range of 0 to 1, and the closer to 1, the smoothing is with a longer time constant.

残響抑圧ゲイン計算部１７０が出力する残響抑圧ゲインＧ（ω，ｔ）は、推定した直接音の推定パワー（誤差信号）を、収音信号のパワーで除算した値を基に計算されるので、この残響抑圧ゲインＧ（ω，ｔ）を周波数領域収音信号Ｘ（ω，ｔ）に乗算することで、残響成分を抑圧した出力を得ることができる。 The reverberation suppression gain G (ω, t) output from the reverberation suppression gain calculation unit 170 is calculated based on a value obtained by dividing the estimated direct sound estimated power (error signal) by the power of the collected sound signal. By multiplying the reverberation suppression gain G (ω, t) by the frequency domain collected signal X (ω, t), an output in which the reverberation component is suppressed can be obtained.

残響抑圧装置１００は、ＦＦＴと逆ＦＦＴを用いて収音信号を周波数領域の信号に変換して、適応アルゴリズムによって直接音のパワー成分の推定を実現しているので、周波数領域フィルタ部１６０のタップ長の１乗のオーダーの低演算量で残響音を抑圧することができる。 The dereverberation apparatus 100 converts the collected sound signal into a frequency domain signal using FFT and inverse FFT, and realizes direct power component estimation using an adaptive algorithm. A reverberant sound can be suppressed with a low calculation amount in the order of the first power of long.

図６に、この発明の残響抑圧装置２００の適応アルゴリズム部２５０の機能構成例を示す。その動作フローを図７に示す。残響抑圧装置２００は、残響抑圧装置１００の適応アルゴリズム部１５０が、適応アルゴリズム部２５０に代わったのみの構成であるので、その全体の機能構成例は省略する。 FIG. 6 shows a functional configuration example of the adaptive algorithm unit 250 of the dereverberation apparatus 200 of the present invention. The operation flow is shown in FIG. The dereverberation apparatus 200 has a configuration in which the adaptive algorithm unit 150 of the dereverberation apparatus 100 is simply replaced with the adaptive algorithm unit 250, and thus an overall functional configuration example thereof is omitted.

適応アルゴリズム部２５０は、上記した適応アルゴリズム部１５０（図４）に対して、ステップサイズ設定手段２５５を備える点で異なる。ステップサイズ設定手段２５５は、更新ベクトル計算手段１５１の出力する更新ベクトルＵ（ω，ｍ，ｔ）の値に応じて、ステップサイズαを更新する。例えば、更新ベクトルＵ（ω，ｍ，ｔ）が正の値を取る場合（ステップＳ２５５１のＹｅｓ）に予め設定した小さい値のステップサイズα_１を設定（式（９））（ステップＳ２５５２）し、更新ベクトルＵ（ω，ｍ，ｔ）が負の値を取る場合（ステップＳ２５５１のＮｏ）に予め設定した大きい値のステップサイズα_２を設定（式（１０））する（ステップＳ２５５３）。 The adaptive algorithm unit 250 is different from the above-described adaptive algorithm unit 150 (FIG. 4) in that a step size setting unit 255 is provided. The step size setting means 255 updates the step size α according to the value of the update vector U (ω, m, t) output from the update vector calculation means 151. For example, if the update vector U (omega, m, t) is a positive value sets the step size alpha ₁ small value set in advance in (Yes in step S2551) (Equation (9)) (step S2552), update vector U (ω, m, t) may take a negative value the step size alpha ₂ setting a large value set in advance in (No in step S2551) (equation (10)) (step S2553).

このようにステップサイズを設定することで、直接音の推定誤差を制御することができる。ステップサイズを制御しない場合は、推定誤差はプラスマイナスに均等に出るが、上記したように制御することで、直接音の推定誤差を正の方向に多く出るように制御することができる。すなわち、直接音のパワーを小さく推定してしまうことを防ぎ、音質の劣化を抑えることができる。 By setting the step size in this way, the direct sound estimation error can be controlled. When the step size is not controlled, the estimation error appears evenly in the plus and minus directions, but by controlling as described above, it is possible to control so that the direct sound estimation error appears in a positive direction. That is, it is possible to prevent the direct sound power from being estimated to be small, and to suppress deterioration in sound quality.

このように残響抑圧装置２００によれば、残響抑圧装置１００で得られる演算量を削減する効果に加えて、高品質な残響抑圧音を出力する効果も奏することができる。 Thus, according to the dereverberation apparatus 200, in addition to the effect of reducing the calculation amount obtained by the dereverberation apparatus 100, an effect of outputting a high-quality dereverberation sound can be achieved.

図８に、この発明の残響抑圧装置３００の適応アルゴリズム部３５０の機能構成例を示す。その動作フローを図９に示す。残響抑圧装置３００は、残響抑圧装置１００，２００に対して、適応アルゴリズム部３５０が非負拘束手段３５６を具備する点で異なる。 FIG. 8 shows a functional configuration example of the adaptive algorithm unit 350 of the dereverberation apparatus 300 of the present invention. The operation flow is shown in FIG. The dereverberation apparatus 300 is different from the dereverberation apparatuses 100 and 200 in that the adaptive algorithm unit 350 includes a non-negative constraint means 356.

非負拘束手段３５６は、加算手段１５３の出力する更新後のフィルタ係数Ｆ（ω，ｍ，ｔ＋１）が負の値となった場合に、その値を０に置き換える（ステップＳ３５６２）ことで、フィルタ係数Ｆ（ω，ｍ，ｔ＋１）が負の値にならないように制御する。 When the updated filter coefficient F (ω, m, t + 1) output from the adding means 153 becomes a negative value, the non-negative constraint means 356 replaces the value with 0 (step S3562), thereby obtaining a filter coefficient. Control is performed so that F (ω, m, t + 1) does not become a negative value.

この発明では、正の値しか持たない信号のパワーに着目して残響成分の推定を行っているので、フィルタ係数は正の値を取るのが正しい解である。フィルタ係数Ｆ（ω，ｍ，ｔ＋１）が負の値を取るということは、推定誤差であるので、その値を０に置き換えることで、より正確なフィルタ係数に修正することができる。このように、残響抑圧装置３００によれば、残響抑圧装置１００，２００に対して正確にフィルタ係数を求めることができ、高品質な残響抑圧を行うことができる。 In the present invention, since the reverberation component is estimated by paying attention to the power of a signal having only a positive value, the correct solution is to take a positive value for the filter coefficient. Since the fact that the filter coefficient F (ω, m, t + 1) takes a negative value is an estimation error, it can be corrected to a more accurate filter coefficient by replacing the value with 0. Thus, according to the dereverberation apparatus 300, the filter coefficients can be accurately obtained for the dereverberation apparatuses 100 and 200, and high-quality dereverberation can be performed.

図１０に、この発明の残響抑圧装置４００の残響抑圧ゲイン計算部４７０の機能構成例を示す。その動作フローを図１１と図１２に示す。残響抑圧装置４００は、残響抑圧装置１００，２００，３００の残響抑圧ゲイン計算部１７０が、残響抑圧ゲイン計算部４７０に代わったのみの構成であるので、その全体の機能構成例は省略する。 FIG. 10 shows a functional configuration example of the dereverberation gain calculation unit 470 of the dereverberation apparatus 400 of the present invention. The operation flow is shown in FIG. 11 and FIG. Since the dereverberation apparatus 400 has a configuration in which the dereverberation suppression gain calculation unit 170 of the dereverberation suppression apparatus 100, 200, 300 is replaced with the dereverberation suppression gain calculation unit 470, an example of the overall functional configuration thereof is omitted.

残響抑圧ゲイン計算部４７０は、残響抑圧ゲイン計算部１７０（図５）に対して、更に、マスキングレベル計算手段４７４と最大値選択手段４７６と、を具備する点で異なる。マスキングレベル計算手段４７４は、減算部１４０が出力する直接音の推定パワーＥ（ω，ｔ）から、聴覚マスキングレベルＱ（ω，ｔ）を求める。聴覚マスキングとは、周波数ＫのパワーがＰ（Ｋ）であった場合に、Ｐ（Ｋ）の関数として計算できる聴覚マスキングレベルを下回る音の成分は、人間の聴覚では聞き取れないという現象である。 Reverberation suppression gain calculator 470, to the reverberation suppression gain calculator 170 (FIG 5), further differs in that comprises masking level calculation section 474 and the maximum value selection unit 47 6. The masking level calculation means 474 obtains the auditory masking level Q (ω, t) from the estimated power E (ω, t) of the direct sound output from the subtracting unit 140. Auditory masking is a phenomenon in which when the power of frequency K is P (K), the sound component below the auditory masking level that can be calculated as a function of P (K) cannot be heard by human hearing.

聴覚マスキングレベルＱ（ω，ｔ）は、例えば次のようにして求めることができる。ある周波数ωの聴覚マスキングレベルＱ（ω，ｔ）を求める際に、その周波数の直接音の推定パワーＥ（ω，ｔ）と、その１つ下の周波数の仮の聴覚マスキングレベルＱ′（ω−１，ｔ）にそれぞれ係数ａ，ｂを乗じた値を比較（ステップＳ４７４４）し、大きい値を仮の聴覚マスキングレベルＱ′（ω，ｔ）とする（ステップＳ４７４５，Ｓ４７４６）。これを周波数ωの最小値から順に、ω＝最大値になるまで繰り返し実施する（ステップＳ４７４７のＮｏのループ）。係数ａ，ｂは聴覚マスキングの特性に基づいて予め設定される１未満、０以上の定数である。なお、係数ａ，ｂは周波数ωに応じて異なる値に設定しても良い。 The auditory masking level Q (ω, t) can be obtained, for example, as follows. When obtaining the auditory masking level Q (ω, t) at a certain frequency ω, the estimated power E (ω, t) of the direct sound at that frequency and the provisional auditory masking level Q ′ (ω at the next lower frequency) −1, t) are respectively multiplied by coefficients a and b (step S4744), and the larger value is set as a provisional auditory masking level Q ′ (ω, t) (steps S4745 and S4746). This is repeated in order from the minimum value of the frequency ω until ω = maximum value (No loop in step S4747). The coefficients a and b are constants less than 1 and greater than or equal to 0, which are preset based on auditory masking characteristics. The coefficients a and b may be set to different values depending on the frequency ω.

周波数ω＝最大値になると（ステップＳ４７４７のＹｅｓ、結合子Ａ）、次に周波数ωの最大値から順に、その周波数ωの仮の聴覚マスキングレベルＱ′（ω，ｔ）と、１つ上の周波数の仮の聴覚マスキングレベルＱ′（ω＋１，ｔ）に係数ｃを乗じた値を比較（ステップＳ４７５１）し、大きい値を聴覚マスキングレベルＱ（ω，ｔ）とする（ステップＳ４７５２，Ｓ４７５３）。係数ｃは、聴覚マスキングの特性に基づいて予め設定される１未満０以上の定数である。また、係数ｃは周波数ωに応じて異なる値に設定しても良い。 When the frequency ω becomes the maximum value (Yes in step S4747, connector A), then, in order from the maximum value of the frequency ω, the provisional auditory masking level Q ′ (ω, t) of the frequency ω and the next higher one A value obtained by multiplying the provisional auditory masking level Q ′ (ω + 1, t) of the frequency by the coefficient c is compared (step S4751), and a larger value is set as the auditory masking level Q (ω, t) (steps S4752, S4753). The coefficient c is a constant less than 1 and greater than or equal to 0 that is preset based on the characteristics of auditory masking. The coefficient c may be set to a different value depending on the frequency ω.

以上の方法により聴覚マスキングレベルＱ（ω，ｔ）を求めることができる。 The auditory masking level Q (ω, t) can be obtained by the above method.

最大値選択手段４７５は、減算部１４０が出力する誤差信号Ｅ（ω，ｔ）と、聴覚マスキングレベルＱ（ω，ｔ）を比較（ステップＳ４７６１）し、大きい方の値を、新たな直接音の推定パワー（誤差信号Ｅ（ω，ｔ））として除算手段１７１に出力する。 The maximum value selection means 475 compares the error signal E (ω, t) output from the subtraction unit 140 with the auditory masking level Q (ω, t) (step S4761), and uses the larger value as a new direct sound. The estimated power (error signal E (ω, t)) is output to the dividing means 171.

以上の方法により、残響抑圧装置４００は、聴覚マスキング特性を利用して、聴感上聞こえない残響成分を抑圧しないようにすることができる。したがって、残響抑圧装置４００は、残響音の不要な抑圧をしないようにすることができ、直接音の劣化を少なくする効果を奏する。 By the above method, the reverberation suppression apparatus 400 can suppress the reverberation component that cannot be heard in the sense of hearing by using the auditory masking characteristic. Therefore, the reverberation suppression apparatus 400 can prevent unnecessary suppression of reverberation sound, and has an effect of reducing direct sound deterioration.

図１３に、この発明の残響抑圧装置５００の機能構成例を示す。残響抑圧装置５００は、残響抑圧装置１００に、適応区間検出部５１１の構成を追加したものである。 FIG. 13 shows a functional configuration example of the dereverberation apparatus 500 of the present invention. The dereverberation apparatus 500 is obtained by adding the configuration of the adaptive section detection unit 511 to the dereverberation apparatus 100.

適応区間検出部５１１は、遅延部１３０の出力する遅延パワースペクトル｜Ｘ（ω，ｔ−Ｄ）｜^２の大きさと、予め設定した閾値を比較し、遅延パワースペクトルの大きさが閾値を超えた場合にのみ、適応アルゴリズム部１５０によるフィルタ係数の更新が行われるように制御する。 The adaptive interval detection unit 511 compares the magnitude of the delay power spectrum | X (ω, t−D) | ² output from the delay unit 130 with a preset threshold value, and the magnitude of the delay power spectrum exceeds the threshold value. Only in this case, control is performed so that the filter coefficient is updated by the adaptive algorithm unit 150.

この制御を付加することで、収音信号の信号レベルが小さく、周囲雑音の影響を受け易い区間での、フィルタ係数の更新を停止させることができるので、より高精度に残響を抑圧するためのフィルタ係数を求めることができる。 By adding this control, it is possible to stop the update of the filter coefficient in the section where the signal level of the collected sound signal is small and susceptible to ambient noise, so that reverberation can be suppressed with higher accuracy. Filter coefficients can be determined.

なお、残響抑圧装置５００を、残響抑圧装置１００に適応区間検出部５１１を追加した構成で説明を行ったが、上記した残響抑圧装置２００，３００，４００の何れにも適応区間検出部５１１の構成を追加することで、同様の効果を得ることができる。 The dereverberation apparatus 500 has been described with the configuration in which the adaptive interval detection unit 511 is added to the dereverberation device 100. However, the configuration of the adaptive interval detection unit 511 is included in any of the above-described dereverberation devices 200, 300, and 400. The same effect can be obtained by adding.

図１４に、この発明の残響抑圧装置６００の機能構成例を示す。残響抑圧装置６００は、残響抑圧装置１００に、帯域集約部６１１と帯域展開部６１２の構成を追加したものである。 FIG. 14 shows a functional configuration example of a dereverberation apparatus 600 of the present invention. The dereverberation device 600 is obtained by adding the configurations of a band aggregation unit 611 and a band expansion unit 612 to the dereverberation device 100.

帯域集約部６１１は、パワー計算部１２０が出力するパワースペクトル｜Ｘ（ω，ｔ）｜^２の周波数ωを集約し、より少ない周波数分割数になる周波数ω′に変換する。周波数ωを複数のグループに分け、そのグループ単位で、パワー計算部１２０の出力するパワースペクトル｜Ｘ（ω，ｔ）｜^２の総和を取り、その値を新たな周波数ω′のパワースペクトル｜Ｘ′（ω′，ｔ）｜^２として出力する（式（１６））。 The band aggregating unit 611 aggregates the frequency ω of the power spectrum | X (ω, t) | ² output from the power calculating unit 120 and converts the frequency ω ′ to a smaller frequency division number. The frequency ω is divided into a plurality of groups, and the sum of the power spectrum | X (ω, t) | ² output from the power calculation unit 120 is taken for each group, and the value is taken as the power spectrum | X of the new frequency ω ′. It outputs as '(ω', t) | ² (formula (16)).

ここでΩ（ω′）は、ω′のグループに属する周波数ωの集合である。 Here, Ω (ω ′) is a set of frequencies ω belonging to the group of ω ′.

帯域集約部６１１の出力する周波数ωを集約したパワースペクトル｜Ｘ′（ω′，ｔ）｜^２は、遅延部１３０、残響抑圧ゲイン計算部１７０、減算部１４０、にそれぞれ供給され、残響抑圧ゲインは周波数ω′の単位で処理される。 The power spectrum | X ′ (ω ′, t) | ² in which the frequency ω output from the band aggregating unit 611 is aggregated is supplied to the delay unit 130, the dereverberation suppression gain calculation unit 170, and the subtraction unit 140, respectively. Are processed in units of frequency ω ′.

帯域展開部６１２は、残響抑圧ゲイン計算部１７０で求められた残響抑圧ゲインＧ′（ω′，ｔ）の周波数ω′を、ＦＦＴ部１１０の周波数分割数になる周波数ωに変換する。ＦＦＴ部１１０の周波数ωが属する集約後の周波数ω′の残響抑圧ゲインを、周波数ωの残響抑圧ゲインにコピーすることで周波数ωに展開する（式（１７））。 The band expanding unit 612 converts the frequency ω ′ of the dereverberation suppression gain G ′ (ω ′, t) obtained by the dereverberation suppression gain calculation unit 170 into a frequency ω that is the frequency division number of the FFT unit 110. The reverberation suppression gain of the aggregated frequency ω ′ to which the frequency ω of the FFT unit 110 belongs is expanded into the frequency ω by copying the reverberation suppression gain of the frequency ω (formula (17)).

このように周波数ωを集約して残響抑圧ゲインを演算することにより、演算量を減らすことができる。なお、聴覚特性は周波数に対して対数的な感度を有しているので、高い周波数ほど周波数の集約数を大きくしても聴感上の劣化は少ないので、周波数帯域によって集約する周波数ω′の大きさを変えるようにして良い。 Thus, the amount of calculation can be reduced by collecting the frequency ω and calculating the reverberation suppression gain. Since the auditory characteristics have logarithmic sensitivity to frequency, the higher the frequency, the less the deterioration in audibility even if the frequency aggregation number is increased. You can change it.

なお、残響抑圧装置６００を、残響抑圧装置１００に帯域集約部６１１と帯域展開部６１２を追加した構成で説明を行ったが、上記した残響抑圧装置２００，３００，４００，５００の何れにも帯域集約部６１１と帯域展開部６１２の構成を追加することで、同様の効果を得ることができる。 The dereverberation apparatus 600 has been described with the configuration in which the band aggregation unit 611 and the band expansion unit 612 are added to the dereverberation apparatus 100. However, any of the above-described dereverberation apparatuses 200, 300, 400, and 500 has the band. By adding the configurations of the aggregation unit 611 and the band expansion unit 612, the same effect can be obtained.

図１５に、この発明の残響抑圧装置７００の機能構成例を示す。残響抑圧装置７００は、残響抑圧装置１００に、係数乗算部７１１と第２周波数領域フィルタ部７１２と減算部７１３の構成を追加したものである。 FIG. 15 shows a functional configuration example of a dereverberation apparatus 700 of the present invention. The dereverberation device 700 is obtained by adding a configuration of a coefficient multiplication unit 711, a second frequency domain filter unit 712, and a subtraction unit 713 to the dereverberation device 100.

係数乗算部７１１は、適応アルゴリズム部１５０が出力するフィルタ係数Ｆ（ω，ｍ，ｔ）の各々に、予め設定した係数Ｈ（ｍ）を乗算し、変換後のフィルタ係数Ｆ′（ω，ｍ，ｔ）を出力する（式（１８））。 The coefficient multiplication unit 711 multiplies each of the filter coefficients F (ω, m, t) output from the adaptive algorithm unit 150 by a preset coefficient H (m), and converts the converted filter coefficient F ′ (ω, m , T) is output (formula (18)).

係数Ｈ（ｍ）の内、ｍが小さい部分の係数を１よりも小さく設定することで、人の口腔内の反響特性を残響成分として誤推定してしまうことを減らすことができる。人の口腔内の反響特性は、部屋の残響と比べ、短時間の応答であるため、ｍが小さい部分のフィルタ係数に大きく影響する。ｍが小さい部分のフィルタ係数に１よりも小さい係数Ｈ（ｍ）を乗算することで、その影響を軽減することができる。 By setting the coefficient of the portion where m is small among the coefficients H (m) to be smaller than 1, it is possible to reduce erroneous estimation of the reverberation characteristics in the human oral cavity as reverberation components. Since the reverberation characteristic in the human mouth is a response in a short time as compared with the reverberation in the room, it greatly affects the filter coefficient of the portion where m is small. By multiplying the filter coefficient of the portion where m is small by a coefficient H (m) smaller than 1, the influence can be reduced.

第２周波数領域フィルタ部７１２は、係数Ｈ（ｍ）を乗算した後のフィルタ係数を遅延部１３０の出力する遅延パワースペクトル｜Ｘ（ω，ｔ−Ｄ）｜^２に乗算する。減算部７１３は、パワー計算部１２０が出力するパワースペクトルから、第２周波数領域フィルタ部７１２の出力信号を減算する。減算部７１３の出力は、残響抑圧ゲイン計算部１７０に誤差信号の代わりに入力される。 The second frequency domain filter unit 712 multiplies the delay power spectrum | X (ω, t−D) | ² output from the delay unit 130 by the filter coefficient after multiplication by the coefficient H (m). The subtraction unit 713 subtracts the output signal of the second frequency domain filter unit 712 from the power spectrum output from the power calculation unit 120. The output of the subtraction unit 713 is input to the reverberation suppression gain calculation unit 170 instead of the error signal.

残響抑圧装置７００によれば、人の口腔内の反響特性を残響成分として誤推定してしまうことを減らすことができる。なお、残響抑圧装置７００を、残響抑圧装置１００に係数乗算部７１１と第２周波数領域フィルタ部７１２と減算部７１３を追加した構成で説明を行ったが、上記した残響抑圧装置２００，３００，４００，５００，６００の何れにも同じ機能構成部を追加することで、同様の効果を得ることができる。 According to the reverberation suppression apparatus 700, it is possible to reduce erroneous estimation of the reverberation characteristics in the human mouth as a reverberation component. The dereverberation apparatus 700 has been described with a configuration in which a coefficient multiplication unit 711, a second frequency domain filter unit 712, and a subtraction unit 713 are added to the dereverberation apparatus 100, but the above-described dereverberation apparatuses 200, 300, and 400 are described. , 500, 600, the same effect can be obtained by adding the same functional component.

〔評価実験〕
本発明の効果を確認する目的で、残響抑圧性能を従来法と比較する評価実験を行った。評価実験は、鏡像法を用いた計算機シミュレーションにより疑似的に残響の付いた音声を生成し、その音声を残響抑圧処理した結果を、従来法とこの発明の方法を比較することで行った。評価はＰＥＳＱ（Perceptual Speech Quality Measure）により実施した。
残響音声の生成の条件は次の通りである。６ｍ×１０ｍ×３ｍの部屋で無指向性の音源と無指向性のマイクロホンが１ｍの距離で配置されているとし、壁面の反射係数を変化させてマイクロホン受音信号を生成した。音源信号には８ｋＨｚサンプリングの音声（女声１０ｓ，男声１０ｓ）を用いた。 [Evaluation experiment]
For the purpose of confirming the effect of the present invention, an evaluation experiment was performed comparing the reverberation suppression performance with the conventional method. The evaluation experiment was performed by comparing the conventional method with the method of the present invention, by generating a sound with pseudo reverberation by computer simulation using the mirror image method, and performing the reverberation suppression processing on the sound. Evaluation was performed by PESQ (Perceptual Speech Quality Measure).
The conditions for generating reverberant speech are as follows. Assume that a non-directional sound source and a non-directional microphone are arranged at a distance of 1 m in a 6 m × 10 m × 3 m room, and the microphone sound reception signal is generated by changing the reflection coefficient of the wall surface. As a sound source signal, 8 kHz sampling voice (female voice 10s, male voice 10s) was used.

図１６に、原音と比べた時のＰＥＳＱを示す。横軸は残響時間（ｍｓ）、縦軸はＰＥＳＱ値である。残響時間が増えるに従い、処理前の信号はＰＥＳＱ値が低下する。従来法、この発明の方法ともに、残響時間の長いデータにおいて、低下したＰＥＳＱ値を０.１程度改善している。改善量は、従来法とこの発明の方法で同程度である。 FIG. 16 shows the PESQ when compared with the original sound. The horizontal axis represents the reverberation time (ms), and the vertical axis represents the PESQ value. As the reverberation time increases, the PESQ value of the signal before processing decreases. In both the conventional method and the method of the present invention, the lowered PESQ value is improved by about 0.1 in data with a long reverberation time. The amount of improvement is comparable between the conventional method and the method of the present invention.

一方、演算時間を比較すると、従来法では２０ｓの音声を処理するのに約６ｓの時間を要した。演算時間は、Ｃｏｒｅ−ｉ７，３.４ＧＨｚの条件で処理した時間である。この発明の方法では、同じ２０ｓの音声を処理するのに約１００ｍｓの時間で演算を終了した。このように同程度のＰＥＳＱを、大幅に短い演算時間である従来法の約１/６０の演算量で得られることが確認できた。 On the other hand, comparing the calculation time, the conventional method required about 6 s to process 20 s of speech. The calculation time is the time processed under the condition of Core-i7, 3.4 GHz. In the method of the present invention, the calculation was completed in about 100 ms to process the same 20 s voice. Thus, it was confirmed that the same level of PESQ can be obtained with a calculation amount of about 1/60 of the conventional method, which has a significantly short calculation time.

このように、室内インパルス応答をパワースペクトルの領域で近似し、適応フィルタによる逐次学習を用いることで、少ない演算量で従来法と変わらない残響抑圧性能を実現することが可能となった。 Thus, by approximating the indoor impulse response in the region of the power spectrum and using sequential learning with an adaptive filter, it is possible to realize dereverberation suppression performance that is the same as the conventional method with a small amount of computation.

上記装置における処理手段をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、各装置における処理手段がコンピュータ上で実現される。 When the processing means in the above apparatus is realized by a computer, the processing contents of the functions that each apparatus should have are described by a program. Then, by executing this program on the computer, the processing means in each apparatus is realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、DVD（Digital Versatile Disc）、DVD-RAM（Random Access Memory）、CD-ROM（Compact Disc Read Only Memory）、CD-R（Recordable）/RW（ReWritable）等を、光磁気記録媒体として、MO（Magneto Optical disc）等を、半導体メモリとしてEEP-ROM（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like, and as an optical disk, a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only) Memory), CD-R (Recordable) / RW (ReWritable), etc., magneto-optical recording media, MO (Magneto Optical disc), etc., semiconductor memory, EEP-ROM (Electronically Erasable and Programmable-Read Only Memory), etc. Can be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したDVD、CD-ROM等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記録装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 This program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in a recording device of a server computer and transferring the program from the server computer to another computer via a network.

また、各手段は、コンピュータ上で所定のプログラムを実行させることにより構成することにしてもよいし、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Each means may be configured by executing a predetermined program on a computer, or at least a part of these processing contents may be realized by hardware.

Claims

An FFT unit that converts the collected sound signal into a frequency domain sound collected signal in the frequency domain;
With the frequency domain collected signal as an input, a power calculation unit that calculates and outputs a power spectrum for each frequency of the frequency domain collected signal; and
A delay unit that outputs a delayed power spectrum obtained by delaying the power spectrum by a predetermined delay amount;
From the power spectrum, a subtracting unit for subtracting the filtered signal obtained by multiplying the delayed power spectrum by a filter coefficient to obtain the estimated power of the direct sound;
An adaptive algorithm unit that receives the delay power spectrum and the estimated power of the direct sound as inputs, and updates a filter coefficient so as to minimize the estimated power of the direct sound;
A frequency domain filter unit that outputs a filtered signal obtained by multiplying the delayed power spectrum by the filter coefficient;
A reverberation suppression gain calculator for calculating a reverberation suppression gain for suppressing the reverberation sound, using the power spectrum and the estimated power of the direct sound as inputs,
A multiplier that multiplies the frequency domain collected signal by the reverberation suppression gain and outputs a reverberation suppression signal;
An inverse FFT unit for converting the dereverberation signal into a dereverberation signal in the time domain;
Equipped with,
The adaptive algorithm part is
An update vector calculation means for calculating an update vector using an adaptive algorithm, using the delay power spectrum and the estimated power of the direct sound as inputs,
The signal processing frame number is t, the frequency is ω, the filter order is m, the update vector value is U (ω, m, t), the step size is α,

Step size setting means and
Multiplication means for outputting a filter coefficient obtained by multiplying the update vector by the step size;
Adding means for adding the filter coefficient of the previous frame to the filter coefficient output by the multiplication means;
Filter coefficient holding means for holding the output of the adding means as a filter coefficient of the next frame;
A dereverberation device comprising:

In the dereverberation device according to claim 1 ,
The reverberation suppression gain calculation unit is
Masking level calculation means for calculating auditory masking level Q (ω, t) from the estimated power of the direct sound;
A maximum value selecting means for comparing the estimated power of the direct sound and the auditory masking level Q (ω, t), and selecting a larger value as a new direct sound power spectrum in the calculation of the reverberation suppression gain;
A dereverberation apparatus comprising:

In the dereverberation device according to claim 1 or 2 ,
Furthermore,
An adaptive interval for controlling the update of the filter coefficient by the adaptive algorithm unit only when the magnitude of the delay power spectrum is compared with a preset threshold and the magnitude of the delay power spectrum exceeds the threshold A dereverberation apparatus comprising a detection unit.

In the dereverberation device according to any one of claims 1 to 3 ,
Furthermore,
A band aggregating unit that aggregates the frequency division ω of the power spectrum and converts it to a smaller frequency division ω ′ as shown in the following equation:

Where Ω (ω ′) is a set of frequencies ω belonging to the group of ω ′,
A band expansion unit that converts the frequency division ω ′ of the reverberation suppression gain into the frequency division ω of the FFT unit as shown in the following equation;

A dereverberation apparatus comprising:

In the dereverberation device according to any one of claims 1 to 4 ,
Furthermore,
The filter coefficient is F (ω, m, t) , each of F (ω, m, t ) is multiplied by a preset coefficient H (m), and the converted filter coefficient F ′ (ω, m, t) a coefficient multiplier for outputting t);
A second frequency domain filter unit that multiplies the delayed power spectrum by the converted filter coefficient F ′ (ω, m, t);
A second subtraction unit that outputs a signal obtained by subtracting the output signal of the second frequency domain filter unit from the power spectrum to the reverberation suppression gain calculation unit instead of the estimated power of the direct sound;
A dereverberation apparatus comprising:

An FFT process for converting the collected sound signal into a frequency domain sound collected signal in the frequency domain;
Power calculation process of calculating and outputting a power spectrum for each frequency of the frequency domain sound collection signal, using the frequency domain sound collection signal as an input,
A delay process for outputting a delayed power spectrum obtained by delaying the power spectrum by a predetermined delay amount;
From the power spectrum, subtracting the filtered signal obtained by multiplying the delayed power spectrum by a filter coefficient to obtain the estimated power of the direct sound; and
An adaptive algorithm process that takes the delayed power spectrum and the estimated power of the direct sound as inputs and updates the filter coefficients to minimize the estimated power of the direct sound;
A frequency domain filtering process for outputting a filtered signal obtained by multiplying the delayed power spectrum by the filter coefficient;
A reverberation suppression gain calculation process for calculating a reverberation suppression gain for suppressing the reverberation sound, using the power spectrum and the estimated power of the direct sound as inputs.
A multiplication process for multiplying the frequency domain collected signal by the reverberation suppression gain and outputting a reverberation suppression signal;
An inverse FFT process for converting the dereverberation signal into a time domain dereverberation signal;
Equipped with a,
The adaptive algorithm process is
An update vector calculation step for calculating an update vector using an adaptive algorithm, using the delay power spectrum and the estimated power of the direct sound as inputs;
The signal processing frame number is t, the frequency is ω, the filter order is m, the update vector value is U (ω, m, t), the step size is α,

Step size setting step and
A multiplication step of outputting a filter coefficient obtained by multiplying the update vector by the step size;
An addition step of adding the filter coefficient of the previous frame to the filter coefficient output in the multiplication step;
A filter coefficient holding step of holding the output of the addition step as a filter coefficient of the next frame;
A reverberation suppression method characterized by comprising:

The dereverberation method according to claim 6 , wherein
The above reverberation suppression gain calculation process is:
A masking level calculating step for calculating an auditory masking level Q (ω, t) from the estimated power of the direct sound;
A maximum value selection step of comparing the estimated power of the direct sound and the auditory masking level Q (ω, t), and selecting a larger value as a new direct sound power spectrum for the calculation of the reverberation suppression gain;
A reverberation suppression method characterized by comprising:

The dereverberation method according to any one of claims 6 and 7 ,
Furthermore,
An adaptive interval for controlling the update of the filter coefficient by the adaptive algorithm process only when the magnitude of the delay power spectrum is compared with a preset threshold and the magnitude of the delay power spectrum exceeds the threshold A reverberation suppression method comprising a detection process.

The dereverberation method according to any one of claims 6 to 8 ,
Furthermore,
A band aggregation process for aggregating the frequency division ω of the power spectrum and converting it to a smaller frequency division ω ′ as shown in the following equation:

Where Ω (ω ′) is a set of frequencies ω belonging to the group of ω ′,
A band expansion process for converting the frequency division ω ′ of the reverberation suppression gain into the frequency division ω of the FFT process as shown in the following equation:

A reverberation suppression method comprising:

The reverberation suppression method according to any one of claims 6 to 9 ,
Furthermore,
The filter coefficient is F (ω, m, t) , each of F (ω, m, t ) is multiplied by a preset coefficient H (m), and the converted filter coefficient F ′ (ω, m, t) a coefficient multiplication process for outputting t);
A second frequency domain filtering process of multiplying the delayed power spectrum by the converted filter coefficient F ′ (ω, m, t);
A second subtraction process for outputting a signal obtained by subtracting the output signal of the second frequency domain filter process from the power spectrum to the reverberation suppression gain calculation process instead of the estimated power of the direct sound;
A reverberation suppression method comprising:

A program for causing a computer to function as the dereverberation device according to any one of claims 1 to 5 .

Computer readable recording medium having recorded one of the program set forth in claim 1 1.