JP5077847B2

JP5077847B2 - Reverberation time estimation apparatus and reverberation time estimation method

Info

Publication number: JP5077847B2
Application number: JP2008095540A
Authority: JP
Inventors: 祐史鵜木; 壮太平松
Original assignee: Japan Advanced Institute of Science and Technology
Current assignee: Japan Advanced Institute of Science and Technology
Priority date: 2008-03-04
Filing date: 2008-03-04
Publication date: 2012-11-21
Anticipated expiration: 2028-03-04
Also published as: JP2009211021A

Description

本発明は、系の残響時間を推定する残響時間推定装置及びその方法に関し、特に、時系列の音響信号から求められた周波数系列の変調スペクトルにより、原音信号を用いずに残響時間をブラインド推定する残響時間推定装置及びその方法に関する。 The present invention relates to a reverberation time estimation apparatus and method for estimating reverberation time of a system, and in particular, blind estimation of reverberation time without using an original sound signal by using a frequency-sequence modulation spectrum obtained from a time-series acoustic signal. The present invention relates to a reverberation time estimation apparatus and method.

音響の残響時間は、室の残響特性を知る上で欠かせないパラメータである。残響時間は、近年では音響信号に対するＦｏ推定、音声認識、原音信号の回復等に利用されている。かかる残響時間を求めるためには、従来、室の伝達特性を測定する必要があり、この伝達特性の測定には、音源、室内の人や物の状況、室内の静寂性等の多くの制約が存在する。また、従来の残響時間の測定には比較的長時間を必要とするため、刻々と残響特性が変化する環境の場合には測定が不可能な場合もあった。このような観点から、残響時間のリアルタイム測定が可能な手法の開発が望まれている。 The acoustic reverberation time is a parameter indispensable to know the reverberation characteristics of the room. In recent years, the reverberation time has been used for Fo estimation, speech recognition, and recovery of the original sound signal for an acoustic signal. In order to obtain such reverberation time, conventionally, it is necessary to measure the transfer characteristics of a room, and this transfer characteristic measurement has many restrictions such as the sound source, the condition of people and objects in the room, and the quietness of the room. Exists. In addition, since the conventional reverberation time measurement requires a relatively long time, the measurement may not be possible in an environment where the reverberation characteristics change every moment. From such a viewpoint, it is desired to develop a technique capable of measuring reverberation time in real time.

残響時間をリアルタイムに求めるためには、系の伝達特性を測定することなく、観測された音響信号のみから残響時間を推定することができるブラインド推定を行うことが必要である。ここで「系」とは、種々の事象の解析のために想定された空間のことをいう。非特許文献１には、残響の影響を受けた音響信号のパワーエンベロープから、元の音源信号のパワーエンベロープを回復する回復方法が開示されている。この非特許文献１に開示されている回復方法では、室内に伝送される音響のパワーエンベロープの入出力の関係（時間領域では畳込み，周波数領域では積）、すなわち変調度を、変調周波数を変数とした関数として表した変調伝達関数を利用している。更に詳しく説明すると、観測した音響信号のパワーエンベロープに、系の変調伝達関数の逆関数（逆フィルタ）を適用することで、元音響（残響が付加されていない音源からの音響）のパワーエンベロープを回復する。そして、非特許文献１では、残響が付加された音響信号のパワーエンベロープが、逆フィルタにより元音響のパワーエンベロープと同じ形状に回復されるときに、逆フィルタの残響時間パラメータが系の残響時間と等しい値となることを前提として、逆フィルタを算出する処理で求められる逆フィルタの残響時間パラメータを、残響時間として推定することが開示されている。
古川、鵜木、赤木、「ＭＴＦに基づいた残響音声パワーエンベロープの回復方法」、信学技報、社団法人電子情報通信学会、平成１４年４月、ＥＡ２００２−１５、ＳＰ２００２−１５、ｐ．４９−５４ In order to obtain the reverberation time in real time, it is necessary to perform blind estimation that can estimate the reverberation time only from the observed acoustic signal without measuring the transfer characteristics of the system. Here, “system” refers to a space assumed for analysis of various events. Non-Patent Document 1 discloses a recovery method for recovering the power envelope of an original sound source signal from the power envelope of an acoustic signal affected by reverberation. In the recovery method disclosed in Non-Patent Document 1, the input / output relationship (convolution in the time domain and product in the frequency domain) of the power envelope of the sound transmitted into the room, that is, the modulation degree, the modulation frequency is a variable. A modulation transfer function expressed as a function is used. More specifically, by applying an inverse function (inverse filter) of the modulation transfer function of the system to the power envelope of the observed acoustic signal, the power envelope of the original sound (sound from a sound source without added reverberation) is obtained. Recover. In Non-Patent Document 1, when the power envelope of the acoustic signal to which reverberation is added is restored to the same shape as the power envelope of the original sound by the inverse filter, the reverberation time parameter of the inverse filter is the reverberation time of the system. It is disclosed that the reverberation time parameter of the inverse filter obtained by the process of calculating the inverse filter is estimated as the reverberation time on the assumption that the values are equal.
Furukawa, Kashiwagi, Akagi, “Recovery method of reverberant speech power envelope based on MTF”, IEICE Technical Report, IEICE, April 2002, EA2002-15, SP2002-15, p. 49-54

しかしながら、上述した非特許文献１に開示されている残響音声パワーエンベロープの回復方法において、逆フィルタを算出する処理で求められる残響時間は、残響時間が０．５秒付近までは実際の残響時間と推定残響時間とが十分に一致しているが、それ以降は徐々に両者の間の差が大きくなり、十分に一致しているとはいえないものであった。 However, in the reverberation sound power envelope recovery method disclosed in Non-Patent Document 1 described above, the reverberation time obtained in the process of calculating the inverse filter is the actual reverberation time until the reverberation time is around 0.5 seconds. The estimated reverberation time was in good agreement, but after that the difference between the two gradually increased and could not be said to be in good agreement.

本発明は斯かる事情に鑑みてなされたものであり、その目的は系の伝達特性を測定することなく、正確な残響時間を推定することが可能な残響時間推定装置及び残響時間推定方法を提供することにある。 The present invention has been made in view of such circumstances, and an object of the present invention is to provide a reverberation time estimation device and a reverberation time estimation method capable of estimating an accurate reverberation time without measuring the transfer characteristics of the system. There is to do.

上述した課題を解決するために、本発明の残響時間推定装置は、残響が付加された時系列の音響信号に基づいて、前記音響信号に対応する時系列のパワーエンベロープを生成するパワーエンベロープ生成手段と、前記パワーエンベロープ生成手段によって生成されたパワーエンベロープに基づいて、周波数系列の変調スペクトルを生成する変調スペクトル生成手段と、前記変調スペクトル生成手段によって生成された変調スペクトルに基づいて、前記音響信号が観測された系の残響特性に関する伝達関数に対応する残響時間を推定する残響時間推定手段とを備える。 In order to solve the above-described problem, a reverberation time estimation device according to the present invention includes a power envelope generation unit that generates a time-series power envelope corresponding to the acoustic signal based on the time-series acoustic signal to which reverberation is added. And, based on the power envelope generated by the power envelope generating means, a modulation spectrum generating means for generating a modulation spectrum of a frequency sequence, and based on the modulation spectrum generated by the modulation spectrum generating means, the acoustic signal is Reverberation time estimation means for estimating a reverberation time corresponding to a transfer function related to the reverberation characteristics of the observed system.

上記発明においては、前記周波数系列の変調スペクトルにおいて高い変調スペクトルを示す主要変調周波数を特定する主要変調周波数特定手段を更に備え、前記残響時間推定手段が、前記周波数系列の前記変調スペクトルに前記伝達関数の逆伝達関数を適用したときに、適用後の前記主要変調周波数における変調スペクトルが、残響が付加されていない原音を示す時系列の原音信号に対応する周波数系列の変調スペクトルの前記主要変調周波数における変調スペクトルと略一致するような前記伝達関数に対応する残響時間を推定するように構成されていることが好ましい。 In the above invention, it further comprises main modulation frequency specifying means for specifying a main modulation frequency showing a high modulation spectrum in the modulation spectrum of the frequency sequence, and the reverberation time estimation means adds the transfer function to the modulation spectrum of the frequency sequence. When the inverse transfer function is applied, the modulation spectrum at the main modulation frequency after the application is applied at the main modulation frequency of the modulation spectrum of the frequency sequence corresponding to the time-series original sound signal indicating the original sound to which no reverberation is added. It is preferable that the reverberation time corresponding to the transfer function that substantially matches the modulation spectrum is estimated.

また、この場合においては、前記主要変調周波数特定手段が、前記パワーエンベロープに対する自己相関関数を求め、前記自己相関関数がピークを示す時間シフト量の逆数を前記主要変調周波数として特定するように構成されていることが好ましい。 Further, in this case, the main modulation frequency specifying means is configured to obtain an autocorrelation function for the power envelope and to specify the reciprocal of the time shift amount at which the autocorrelation function shows a peak as the main modulation frequency. It is preferable.

上記発明においては、前記パワーエンベロープ生成手段によって生成されたパワーエンベロープに対して適用されるローパスフィルタを更に備え、前記主要変調周波数特定手段が、前記ローパスフィルタから出力されたパワーエンベロープに基づいて、前記主要変調周波数を特定するように構成されていることが好ましい。 In the above invention, further comprising a low-pass filter applied to the power envelope generated by the power envelope generating means, wherein the main modulation frequency specifying means is based on the power envelope output from the low-pass filter, It is preferably configured to identify the main modulation frequency.

上記発明においては、前記音響信号を複数チャンネルに帯域分割する帯域分割手段と、前記帯域分割手段によって帯域分割された各チャンネルから、残響時間推定に用いるチャンネルを決定するチャンネル決定手段とを更に備えることが好ましい。 In the above invention, it further comprises band dividing means for dividing the acoustic signal into a plurality of channels, and channel determining means for determining a channel used for reverberation time estimation from each of the channels divided by the band dividing means. Is preferred.

この場合においては、前記パワーエンベロープ生成手段は、前記帯域分割手段によって帯域分割された各チャンネルについて、パワーエンベロープを生成するように構成されており、前記パワーエンベロープ生成手段によって生成されたパワーエンベロープの中で、所定の基準値を越える高レベル部を検出する高レベル部検出手段を更に備え、前記チャンネル決定手段は、前記高レベル部検出手段によって検出された高レベル部に基づいて、残響時間推定に用いるチャンネルを決定するように構成されていることが好ましい。 In this case, the power envelope generating means is configured to generate a power envelope for each channel band-divided by the band dividing means, and the power envelope generating means includes a power envelope generated by the power envelope generating means. And further comprising a high level part detecting means for detecting a high level part exceeding a predetermined reference value, wherein the channel determining means performs reverberation time estimation based on the high level part detected by the high level part detecting means. It is preferably configured to determine the channel to be used.

また、この場合においては、前記チャンネル決定手段が、前記高レベル部検出手段によって検出された２つの高レベル部の間に、微小なピークが存在するか否かを判定し、微小なピークが存在する場合には、当該チャンネルを推定に用いるチャンネルから除外するように構成されていることが好ましい。 In this case, the channel determination means determines whether or not a minute peak exists between the two high level parts detected by the high level part detecting means, and the minute peak exists. In this case, it is preferable that the channel is excluded from the channels used for estimation.

また、上記発明においては、前記チャンネル決定手段が、前記高レベル部検出手段によって検出された高レベル部の中に谷が存在するか否かを判定し、谷が存在する場合には、当該チャンネルを推定に用いるチャンネルから除外するように構成されていることが好ましい。 In the above invention, the channel determination means determines whether or not a valley exists in the high level part detected by the high level part detection means. Is preferably excluded from the channels used for estimation.

本発明の残響時間推定方法は、残響が付加された時系列の音響信号に基づいて、前記音響信号に対応する時系列のパワーエンベロープを生成するステップと、生成されたパワーエンベロープに基づいて、周波数系列の変調スペクトルを生成するステップと、生成された変調スペクトルに基づいて、前記音響信号が観測された系の残響特性に関する伝達関数に対応する残響時間を推定するステップとを備える。 The reverberation time estimation method of the present invention includes a step of generating a time-series power envelope corresponding to the acoustic signal based on the time-series acoustic signal to which reverberation is added, and a frequency based on the generated power envelope. Generating a modulation spectrum of the sequence; and estimating a reverberation time corresponding to a transfer function related to a reverberation characteristic of the system in which the acoustic signal is observed based on the generated modulation spectrum.

本発明の残響時間推定装置及び残響時間推定方法によれば、系の伝達特性を測定することなく、正確な残響時間を推定することができる。 According to the reverberation time estimation apparatus and reverberation time estimation method of the present invention, it is possible to estimate an accurate reverberation time without measuring the transfer characteristics of the system.

以下、本発明の好ましい実施の形態を、図面を参照しながら説明する。 Hereinafter, preferred embodiments of the present invention will be described with reference to the drawings.

（実施の形態１）
実施の形態１は、ハードウェアのデジタル信号処理回路により主として構成された残響時間推定装置である。(Embodiment 1)
The first embodiment is a reverberation time estimation device mainly configured by a hardware digital signal processing circuit.

［残響時間推定装置の構成］
図１は、本発明の実施の形態１に係る残響時間推定装置の構成を示すブロック図である。図１に示すように、本実施の形態に係る残響時間推定装置１は、室の音響を入力するためのマイクロフォン２と、マイクロフォン２で取り込んだアナログ音響信号に対してＡ／Ｄ変換を行うＡ／Ｄ変換器３と、Ａ／Ｄ変換器３から出力されるデジタル音響信号に対して信号処理を行うデジタル信号処理回路４と、デジタル信号処理回路４の処理結果を受け付け、残響時間の推定処理を実行する演算回路５と、メモリ６と、演算回路５により推定された残響時間を表示する液晶表示部７とを備えている。[Configuration of reverberation time estimation device]
FIG. 1 is a block diagram showing a configuration of a reverberation time estimation apparatus according to Embodiment 1 of the present invention. As shown in FIG. 1, a reverberation time estimation apparatus 1 according to the present embodiment includes a microphone 2 for inputting room sound, and an A / D converter for analog sound signals captured by the microphone 2. / D converter 3, digital signal processing circuit 4 that performs signal processing on the digital acoustic signal output from A / D converter 3, and processing results of digital signal processing circuit 4 are received, and reverberation time estimation processing Is provided with an arithmetic circuit 5, a memory 6, and a liquid crystal display unit 7 for displaying the reverberation time estimated by the arithmetic circuit 5.

図１に示すように、デジタル信号処理回路４は、パワーエンベロープ生成部４１、ローパスフィルタ４２、主要変調周波数取得部４３、及び正規化変調スペクトル生成部４４の機能ブロックを有している。パワーエンベロープ生成部４１は、Ａ／Ｄ変換器３から出力されたデジタル音響信号から、パワーエンベロープ信号を生成する。このパワーエンベロープ信号は、音響信号の振幅の時間包絡線情報を２乗したものである。パワーエンベロープ信号生成部４１は、次式により表される信号処理を実施するように構成されている。

但し、ＬＰＦ［・］は低域通過フィルタを示しており、Ｈｉｌｂｅｒｔ（・）はヒルベルト変換を示している。また、会話音声の変動リズムを考慮すると、主要な音響情報は変調周波数２０Ｈｚまでに存在することから、本実施の形態では、低域通過フィルタのカットオフ周波数を２０Ｈｚとしている。As illustrated in FIG. 1, the digital signal processing circuit 4 includes functional blocks of a power envelope generation unit 41, a low-pass filter 42, a main modulation frequency acquisition unit 43, and a normalized modulation spectrum generation unit 44. The power envelope generation unit 41 generates a power envelope signal from the digital acoustic signal output from the A / D converter 3. This power envelope signal is obtained by squaring the time envelope information of the amplitude of the acoustic signal. The power envelope signal generation unit 41 is configured to perform signal processing represented by the following equation.

However, LPF [•] indicates a low-pass filter, and Hilbert (•) indicates Hilbert transform. Considering the fluctuation rhythm of the conversational voice, the main acoustic information exists up to the modulation frequency of 20 Hz. Therefore, in this embodiment, the cutoff frequency of the low-pass filter is set to 20 Hz.

主要変調周波数取得部４３の前段には、ローパスフィルタ４２が設けられており、ローパスフィルタ４２によって高周波成分が除去されたパワーエンベロープ信号が主要変調周波数取得部４３に与えられる。この主要変調周波数取得部４３は、パワーエンベロープ信号の自己相関関数を求め、パワーエンベロープと、その自己相関関数との時間シフト量（ずれ時間）を主要変調周波数ｆ_ｍｄとして決定する。また、本実施の形態では、主要変調周波数ｆ_ｍｄは比較的低い変調周波数であるので、２Ｈｚまでに現れるものと考えられることから、ローパスフィルタ４２のカットオフ周波数を２Ｈｚとしている。このように、主要変調周波数取得部４３の前段にローパスフィルタ４２を設けることにより、主要変調周波数ｆ_ｍｄの誤検出を防止することができる。なお、本実施の形態においては、主要変調周波数として取得する変調周波数の数を１つとしている。ただし、これに限定されるものではなく、主要変調周波数の取得数を複数としてもよい。しかし、回路の複雑化、コスト増大等の観点より、主要変調周波数の取得数は少ない方がよく、１つ又は２つとすることが特に好ましい。なお、本実施の形態においては、ローパスフィルタ４２のカットオフ周波数を２Ｈｚとしたが、これに限定されるものではなく、５Ｈｚ又は１０Ｈｚ等の他のカットオフ周波数を採用することもできる。ただし、カットオフ周波数を高い値に設定すると、パワーエンベロープにノイズの高周波成分が多く含まれるおそれがあることから、１０Ｈｚ以下の値にすることが好ましい。A low-pass filter 42 is provided before the main modulation frequency acquisition unit 43, and a power envelope signal from which high-frequency components have been removed by the low-pass filter 42 is given to the main modulation frequency acquisition unit 43. The main modulation frequency acquisition unit 43 obtains an autocorrelation function of the power envelope signal, and determines a time shift amount (shift time) between the power envelope and the autocorrelation function as the main modulation frequency f _md . In the present embodiment, since the main modulation frequency f _md is a relatively low modulation frequency, it is considered that the main modulation frequency f _md appears up to 2 Hz. Therefore, the cutoff frequency of the low-pass filter 42 is set to 2 Hz. In this way, by providing the low-pass filter 42 in the previous stage of the main modulation frequency acquisition unit 43, erroneous detection of the main modulation frequency f _md can be prevented. In the present embodiment, the number of modulation frequencies acquired as the main modulation frequency is one. However, the present invention is not limited to this, and the number of acquisitions of the main modulation frequency may be plural. However, from the viewpoint of circuit complexity and cost increase, it is better that the number of acquisitions of the main modulation frequency is small, and it is particularly preferable to use one or two. In the present embodiment, the cut-off frequency of the low-pass filter 42 is 2 Hz. However, the present invention is not limited to this, and other cut-off frequencies such as 5 Hz or 10 Hz may be employed. However, if the cut-off frequency is set to a high value, the power envelope may contain a lot of high frequency components of noise, so it is preferable to set the value to 10 Hz or less.

正規化変調スペクトル生成部４４は、ＦＦＴ回路を有しており、時系列のデジタル信号であるパワーエンベロープ信号から、周波数系列のデジタル信号である変調スペクトル信号を生成する。また、この正規化変調スペクトル生成部４４では、変調スペクトルの大きさを、ＤＣ成分の変調スペクトル（変調周波数が０Ｈｚにおける変調スペクトル）で正規化した変調スペクトル信号を生成するように構成されている。 The normalized modulation spectrum generation unit 44 includes an FFT circuit, and generates a modulation spectrum signal that is a frequency series digital signal from a power envelope signal that is a time series digital signal. Further, the normalized modulation spectrum generation unit 44 is configured to generate a modulation spectrum signal in which the size of the modulation spectrum is normalized with the modulation spectrum of the DC component (the modulation spectrum when the modulation frequency is 0 Hz).

上記のデジタル信号処理回路４から出力される主要変調周波数及び変調スペクトルのデータは、演算回路５に与えられる。演算回路５は、ＦＰＧＡ又はＡＳＩＣ等により構成された残響時間推定処理専用のプロセッサである。かかる演算回路５は、室における変調度（室内に伝送される音響のパワーエンベロープの入出力の関係）を、変調周波数を変数とした関数として表した変調伝達関数（以下、ＭＴＦという）を利用して、室の残響時間を推定する。 Data of main modulation frequency and modulation spectrum output from the digital signal processing circuit 4 is given to the arithmetic circuit 5. The arithmetic circuit 5 is a processor dedicated to the reverberation time estimation process configured by an FPGA or an ASIC. The arithmetic circuit 5 uses a modulation transfer function (hereinafter referred to as MTF) that expresses the degree of modulation in the room (the relationship between the input and output of the power envelope of the sound transmitted into the room) as a function with the modulation frequency as a variable. And estimate the reverberation time of the room.

ここでＭＴＦについて詳細に説明する。ＭＴＦは、伝送路の特性が線形受動システムとして記述することができる場合には、インパルス応答の自乗のフーリエ変換として次式で定義される。

但し、ｆ_ｍは変調周波数を、ｈ（ｔ）は室内インパルス応答を示す。また、室内音響学でよく使われる統計近似のインパルス応答は、次式で表される。

但し、Ｔ_Ｒは残響時間を、ｎ（ｔ）は白色雑音を示す。上記の式（３）を用いれば、ＭＴＦは、ある残響時間Ｔ_Ｒにおいて、変調周波数ｆ_ｍに対する入出力パワーの割合として表される。室内が拡散音場である場合、減衰過程が指数関数的であることを、波動論を用いて示すことができる。この場合、ＭＴＦの大きさ（変調度）は、次式で示される。

図２は、式（４）のＭＴＦ（変調度）を示すグラフである。図２において、横軸は変調周波数を、縦軸はＭＴＦ（変調度）を示している。例えば、変調周波数ｆ_ｍ＝１０ＨｚのときのＭＴＦは、図２に示すように０．４０２である。これは、入力パワーエンベロープにある変調周波数ｆ_ｍ＝１０Ｈｚの成分の変調度が１であるとすると、その出力パワーエンベロープの同変調周波数における成分の変調度は０．４０２に減衰するということを意味する。この減衰特性は、ある意味で変調周波数に対する低域通過フィルタ特性と解釈することができる。Here, the MTF will be described in detail. MTF is defined by the following equation as the Fourier transform of the square of the impulse response when the characteristics of the transmission line can be described as a linear passive system.

However, _{f m} is the modulation frequency, h (t) denotes the room impulse response. The impulse response of statistical approximation often used in room acoustics is expressed by the following equation.

However, _{T R} is the reverberation time, n (t) denotes white noise. By using the above equation (3), MTF, in some reverberation time _{T R,} expressed as a percentage of the output power with respect to the modulation frequency _{f m.} If the room is a diffuse sound field, it can be shown using wave theory that the decay process is exponential. In this case, the magnitude (modulation degree) of the MTF is expressed by the following equation.

FIG. 2 is a graph showing the MTF (modulation degree) of Equation (4). In FIG. 2, the horizontal axis indicates the modulation frequency, and the vertical axis indicates the MTF (modulation degree). For example, the MTF when the modulation frequency f _m = 10 Hz is 0.402 as shown in FIG. This means that if the modulation degree of the component of the modulation frequency f _m = 10 Hz in the input power envelope is 1, the modulation degree of the component at the modulation frequency of the output power envelope is attenuated to 0.402. To do. This attenuation characteristic can be interpreted in a sense as a low-pass filter characteristic with respect to the modulation frequency.

次に、本実施の形態における残響時間推定の原理について説明する。図３Ａは、残響が付加されていない音響信号のパワーエンベロープを示すグラフであり、図３Ｂは、その変調スペクトルを示すグラフである。また、図４Ａは、図３Ａで示す音響に残響が付加した場合の音響信号のパワーエンベロープを示すグラフであり、図４Ｂは、その変調スペクトルを示すグラフである。図３Ａ、図４Ａの音響信号の主要変調周波数は共に５Ｈｚであり、図４Ａの音響信号に付加されている残響の残響時間Ｔ_Ｒは２．０秒である。図４Ｂの変調スペクトルに対してＭＴＦの逆フィルタを適用することで、当該変調スペクトルの主要変調周波数（５Ｈｚ）における値を、図３Ｂの変調スペクトルの同主要変調周波数の値に戻すことができれば、その逆フィルタのパラメータによって適切な残響時間を推定することが可能である。Next, the principle of reverberation time estimation in this embodiment will be described. FIG. 3A is a graph showing a power envelope of an acoustic signal to which reverberation is not added, and FIG. 3B is a graph showing a modulation spectrum thereof. 4A is a graph showing a power envelope of an acoustic signal when reverberation is added to the sound shown in FIG. 3A, and FIG. 4B is a graph showing a modulation spectrum thereof. Figure 3A, the primary modulation frequency of the acoustic signal in Fig. 4A are both 5 Hz, reverberation time T _R of the reverberation added to the sound signal of Figure 4A is 2.0 seconds. By applying an MTF inverse filter to the modulation spectrum of FIG. 4B, the value at the main modulation frequency (5 Hz) of the modulation spectrum can be returned to the value of the same main modulation frequency of the modulation spectrum of FIG. An appropriate reverberation time can be estimated based on the parameters of the inverse filter.

図３Ｂ及び図４Ｂに示すように、変調周波数０Ｈｚ（ＤＣ成分）付近においては、残響の影響を受けていないことが分かる。また、ＭＴＦが入出力パワーの割合（比率）を示すこと、系のパワーエンベロープの正規化されたＦｏｕｒｉｅｒ変換がＭＴＦであることから、元音響の変調スペクトルは残響が付加されるとその系のＭＴＦに従い減少することがわかる。そこで、本実施の形態においては、残響付加のない音響の変調スペクトルにおいて、主要変調周波数のパワーが０Ｈｚ付近のパワーと十分に近い場合が多いという事実に基づき、観測された音響の変調スペクトルの０Ｈｚにおけるパワーが、残響付加のない元音響の変調スペクトルの０Ｈｚ及び主要変調周波数におけるパワーと一致すると仮定し、元音響の変調スペクトルの主要変調周波数におけるパワーを、観測された音響の変調スペクトルの主要周波数におけるパワーまで減少させるようなＭＴＦを求め、そのＭＴＦの残響時間パラメータを系の残響時間として推定する。このことを換言すると、本実施の形態に係る残響時間推定装置は、観測された音響の変調スペクトルに対してＭＴＦの逆フィルタを適用したときに、当該変調スペクトルの主要変調周波数におけるパワーが、元音響の変調スペクトルの主要変調周波数におけるパワー（すなわち、観測された音響の変調スペクトルの０Ｈｚにおけるパワー）と略一致するようなＭＴＦに対応する逆特性を推定する。図４Ｂの破線は、元音響の変調スペクトルの主要変調周波数におけるパワーを、観測された音響の変調スペクトルの主要周波数におけるパワーまで減少させるようなＭＴＦを示しており、このＭＴＦに対応する残響時間が推定される。 As shown in FIGS. 3B and 4B, it can be seen that there is no influence of reverberation in the vicinity of the modulation frequency of 0 Hz (DC component). Further, since the MTF indicates the ratio (ratio) of input / output power and the normalized Fourier transform of the power envelope of the system is the MTF, the modulation spectrum of the original sound is added to the MTF of the system when reverberation is added. It turns out that it decreases according to. Therefore, in the present embodiment, based on the fact that the power of the main modulation frequency is often sufficiently close to the power near 0 Hz in the acoustic modulation spectrum with no reverberation added, the observed acoustic modulation spectrum of 0 Hz , The power at the main modulation frequency of the original acoustic modulation spectrum is the main frequency of the observed acoustic modulation spectrum. An MTF that decreases to the power at is determined, and the reverberation time parameter of the MTF is estimated as the reverberation time of the system. In other words, when the reverberation time estimation apparatus according to the present embodiment applies the MTF inverse filter to the observed acoustic modulation spectrum, the power at the main modulation frequency of the modulation spectrum is the original power. An inverse characteristic corresponding to the MTF is estimated that approximately matches the power at the main modulation frequency of the acoustic modulation spectrum (ie, the power at 0 Hz of the observed acoustic modulation spectrum). The broken line in FIG. 4B shows the MTF that reduces the power at the main modulation frequency of the original acoustic modulation spectrum to the power at the main frequency of the observed acoustic modulation spectrum, and the reverberation time corresponding to this MTF. Presumed.

このようなコンセプトに基づいて、演算回路５は具体的には以下のような処理を実行するように構成されている。メモリ６には、残響時間をパラメータとして有するＭＴＦの逆特性が、演算回路５によって処理可能な関数データとして記憶されている。このメモリ６に記憶されている逆フィルタ（逆特性）の関数データは、推定候補として予め与えられた複数の残響時間の分だけ用意される。ここで、かかる逆フィルタについて更に詳しく説明する。ＭＴＦの概念に基づいて、残響が付加されていない音源の音響信号（以下、音源信号という）ｘ（ｔ）、室内インパルス応答ｈ（ｔ）、及びそれらの畳み込みとして得られる観測された音響信号（以下、残響信号という）ｙ（ｔ）の関係を次のようにモデル化する。

ここで、“＊”は畳み込みを、ｅ_ｘ（ｔ）及びｅ_ｈ（ｔ）はそれぞれｘ（ｔ）及びｈ（ｔ）のエンベロープを、ｎ_１（ｔ）及びｎ_２（ｔ）は相互に無関係な白色雑音を、ａは振幅項を示している。ｎ_１（ｔ）とｎ_２（ｔ）との相互独立性から、残響信号とそのパワーエンベロープとの間には次の関係があることが知られている。

但し、＜・＞は集合平均を表す。この関係から、ｅ^２ _ｙ（ｔ）はｅ^２ _ｘ（ｔ）とｅ^２ _ｈ（ｔ）との畳み込みで得られることが分かる。Based on such a concept, the arithmetic circuit 5 is specifically configured to execute the following processing. The memory 6 stores the inverse characteristics of the MTF having the reverberation time as a parameter as function data that can be processed by the arithmetic circuit 5. The function data of the inverse filter (inverse characteristic) stored in the memory 6 is prepared for a plurality of reverberation times given in advance as estimation candidates. Here, the inverse filter will be described in more detail. Based on the concept of MTF, an acoustic signal of a sound source to which no reverberation is added (hereinafter referred to as a sound source signal) x (t), an indoor impulse response h (t), and an observed acoustic signal obtained as a convolution thereof ( Hereinafter, the relationship of y (t) (referred to as a reverberation signal) is modeled as follows.

Where “*” is the convolution, e _x (t) and e _h (t) are the envelopes of x (t) and h (t), respectively, and n ₁ (t) and n ₂ (t) are Irrelevant white noise, a indicates the amplitude term. From the mutual independence between n ₁ (t) and n ₂ (t), it is known that the following relationship exists between the reverberation signal and its power envelope.

However, <•> represents a set average. From this relationship, it can be seen that e ² _y (t) is obtained by convolution of e ² _x (t) and e ² _h (t).

次に、周波数領域における関係を考える。式（５）〜（１０）の関係式は、実際には離散時間で利用されるため、ここではその周波数変換としてｚ変換を利用する。ｚ変換をＺ［・］としたときに、式（８）のｚ領域の変調伝達関数は、次式のように表される。

次に、パワーエンベロープに関する入出力の関係は、Ｚ［ｅ^２ _ｙ（ｔ）］／Ｚ［ｅ^２ _ｘ（ｔ）］＝Ｚ［ｅ^２ _ｈ（ｔ）］となることから、音源信号の変調スペクトルＺ［ｅ^２ _ｘ（ｔ）］は、次式のように求めることができる。

但し、ｆ_ｓはサンプリング周波数である。ここで、式（１２）の第２項（式（１１））がちようど逆フィルタの特性（残響が積分処理であるのに対して、回復は微分処理に相当する）を表しており、１次のＩＩＲフィルタで実現できることが分かる。この式（１２）の逆フィルタを表す関数データが、メモリ６に複数の残響時間の分だけ記憶されている。演算回路５は、メモリ６から各残響時間の逆フィルタを読み出し、デジタル信号処理回路４から受け付けた主要変調周波数及び変調スペクトルに対して各残響時間の逆フィルタを適用して、各残響時間に対応する音源信号の主要変調周波数における変調スペクトルを算出するように構成されている。Next, consider the relationship in the frequency domain. Since the relational expressions (5) to (10) are actually used in discrete time, z conversion is used here as frequency conversion. When the z conversion is Z [•], the modulation transfer function in the z region of Expression (8) is expressed as the following expression.

Next, since the input / output relationship regarding the power envelope is Z [e ² _y (t)] / Z [e ² _x (t)] = Z [e ² _h (t)], the modulation of the sound source signal is performed. The spectrum Z [e ² _x (t)] can be obtained as follows.

Where f _s is the sampling frequency. Here, the second term of equation (12) (equation (11)) is similar to the characteristics of the inverse filter (reverberation is an integration process, whereas recovery corresponds to a differentiation process). It can be seen that this can be realized with the following IIR filter. Function data representing the inverse filter of equation (12) is stored in the memory 6 for a plurality of reverberation times. The arithmetic circuit 5 reads out the inverse filter of each reverberation time from the memory 6 and applies the inverse filter of each reverberation time to the main modulation frequency and modulation spectrum received from the digital signal processing circuit 4 to cope with each reverberation time. The modulation spectrum at the main modulation frequency of the sound source signal is calculated.

また、演算回路５は、このようにして求めた各残響時間に対応する音源信号の主要変調周波数における変調スペクトルのうち、大きさが０ｄＢ（変調度で表すと１）に最も近い１つを選択し、その残響時間を、室の残響時間として推定するように構成されている。この原理を、以下に説明する。 In addition, the arithmetic circuit 5 selects one of the modulation spectra at the main modulation frequency of the sound source signal corresponding to each reverberation time thus obtained, the magnitude closest to 0 dB (1 in terms of modulation degree). The reverberation time is estimated as the reverberation time of the room. This principle will be described below.

各変調スペクトルをＥ（ｆ_ｍ）とする。ここで、ｍ（ｆ_ｍ）＝Ｅ_ｈ（ｆ_ｍ）を考慮して、式（１０）の右辺の畳み込みを（対数）変調スペクトル表現で表すと、次式が得られる。

ここで、ｆ_ｍ＝ｆ_ｍｄのときの変調スペクトル及び変調伝達関数ＭＴＦの関係を考える。残響時間Ｔ_Ｒを完全に推定することができたとすると、主要変調周波数ｆ_ｍｄにおける音源信号の変調スペクトルと回復された信号の変調スペクトルとは一致する。また、ｆ_ｍ＝０のときの変調伝達関数ｍ（０）はあらゆるＴ_Ｒに対して１であるから、ｆ_ｍ＝０における音源信号と残響信号の変調スペクトルも常に一致する。これらの条件から、次式が導かれる。

これを式（１３）に代入すると、次式が導かれる。

ここで、変調伝達関数ｍ（ｆ_ｍ）はｆ_ｍの関数であるが、残響時間パラメータＴ_Ｒも見掛け上変数とみることができる。ブラインド推定では、Ｔ_Ｒが不明であるため、式（４）のｍ（ｆ_ｍ）をｍ（ｆ_ｍ，Ｔ_Ｒ）と表せば、求めるべき残響時間Ｔ_Ｒは、次式で表すことができる。

この式により、残響時間のブラインド推定が可能となる。Each modulation spectrum and E _{(f m).} Here, when m (f _m ) = E _h (f _m ) is taken into consideration and the convolution on the right side of equation (10) is expressed in (logarithmic) modulation spectrum expression, the following equation is obtained.

Now _consider the relationship between the modulation spectrum and the modulation transfer function MTF of the time of _f m ₌ f _md. When it was possible to completely estimate the reverberation time T _R, which coincides with the modulation spectrum of the modulated spectrum recovered signal of the sound source signal in the primary modulation frequency f _md. The modulation transfer function m (0) in the case of f m _{= 0} is because it is 1 for all T _R, the modulation spectrum of the source signal and the reverberation signal in the f m _{= 0} also always match. From these conditions, the following equation is derived.

Substituting this into equation (13) leads to the following equation:

Here, the modulation transfer function m (f _m) is a function of f _m, it can also viewed as apparently variable reverberation time parameter T _R. The blind estimation, because _{T R} is unknown, if indicated equation (4) _{m (f} m) of the _{m (f} m, _{T R)} and, the reverberation time _{T R} to be determined, can be represented by the following formula .

This equation enables blind estimation of reverberation time.

［残響時間推定装置の動作］
次に、残響時間推定装置の動作について説明する。作業者は、残響時間を測定する室内において、残響時間測定装置１のマイクロフォン２で音響をサンプリングする。マイクロフォン２から出力されたアナログ音響信号は、Ａ／Ｄ変換器３によりデジタル音響信号へ変換され、このデジタル音響信号（残響信号）がデジタル信号処理回路４へと与えられる。デジタル信号処理回路４に与えられたデジタル音響信号は、パワーエンベロープ生成部４１によってパワーエンベロープへと変換され、このパワーエンベロープ信号が、ローパスフィルタ４２と正規化変調スペクトル生成部４４とにそれぞれ与えられる。[Operation of reverberation time estimation device]
Next, the operation of the reverberation time estimation device will be described. The operator samples the sound with the microphone 2 of the reverberation time measuring apparatus 1 in the room where the reverberation time is measured. The analog acoustic signal output from the microphone 2 is converted into a digital acoustic signal by the A / D converter 3, and this digital acoustic signal (reverberation signal) is given to the digital signal processing circuit 4. The digital acoustic signal given to the digital signal processing circuit 4 is converted into a power envelope by the power envelope generation unit 41, and this power envelope signal is given to the low-pass filter 42 and the normalized modulation spectrum generation unit 44, respectively.

ローパスフィルタ４２によって高周波成分が除去されたパワーエンベロープ信号は、主要変調周波数取得部４３に与えられる。主要変調周波数取得部４３は、自己相関関数との時間シフト量を求めることにより、主要変調周波数を取得する。ここで取得される主要変調周波数は、変調周波数の全域のうち、特に高い変調スペクトルを示す変調周波数となる。例えば、主要変調周波数は、パワーエンベロープが正弦波の場合、その周波数と一致する。 The power envelope signal from which the high-frequency component has been removed by the low-pass filter 42 is given to the main modulation frequency acquisition unit 43. The main modulation frequency acquisition unit 43 acquires the main modulation frequency by obtaining a time shift amount with respect to the autocorrelation function. The main modulation frequency acquired here is a modulation frequency indicating a particularly high modulation spectrum in the entire modulation frequency. For example, the main modulation frequency matches the frequency when the power envelope is a sine wave.

正規化変調スペクトル生成部４４では、パワーエンベロープ信号に対してフーリエ変換が適用され、周波数領域の変調スペクトルが得られる。この変調スペクトルは、ＤＣ成分の変調スペクトルの値により正規化されたものとなる。 In the normalized modulation spectrum generation unit 44, Fourier transform is applied to the power envelope signal to obtain a modulation spectrum in the frequency domain. This modulation spectrum is normalized by the value of the modulation spectrum of the DC component.

デジタル信号処理回路４により出力された主要変調周波数及び変調スペクトルのデータは、演算回路５に与えられる。演算回路５は、メモリ６から複数の残響時間に対応するＭＴＦの逆フィルタのデータを読み出し、デジタル信号処理回路４から受け付けた主要変調周波数及び変調スペクトルに対して各残響時間の逆フィルタを適用する。これにより、各残響時間に対応する音源信号の主要変調周波数における変調スペクトルが算出される。 The main modulation frequency and modulation spectrum data output by the digital signal processing circuit 4 are supplied to the arithmetic circuit 5. The arithmetic circuit 5 reads MTF inverse filter data corresponding to a plurality of reverberation times from the memory 6 and applies the inverse filter of each reverberation time to the main modulation frequency and modulation spectrum received from the digital signal processing circuit 4. . Thereby, the modulation spectrum at the main modulation frequency of the sound source signal corresponding to each reverberation time is calculated.

次に、演算回路５は、このようにして求めた各残響時間に対応する音源信号の主要変調周波数における変調スペクトルのうち、その大きさが０ｄＢに最も近い１つを選択し、その残響時間を、室の残響時間として推定する。そして、演算回路５は、液晶表示部７を駆動制御し、推定した残響時間を表示させる。 Next, the arithmetic circuit 5 selects one of the modulation spectra at the main modulation frequency of the sound source signal corresponding to each reverberation time obtained in this way, the magnitude closest to 0 dB, and the reverberation time is selected. Estimated as room reverberation time. Then, the arithmetic circuit 5 drives and controls the liquid crystal display unit 7 to display the estimated reverberation time.

［評価実験］
評価用の音源信号として、式（６）の人工的なＡＭ信号を利用した。また、パワーエンベロープを変調周波数５Ｈｚの正弦波（変調度は１）とし、これに白色雑音キャリアを乗じた信号を用いた。このパワーエンベロープに対する変調スペクトルは、０Ｈｚの変調スペクトルと５Ｈｚの変調スペクトルが同一の値となった。次に、式（７）で定義される室内残響インパルス応答を利用した。ここでは、一つの残響時間Ｔ_Ｒにつき、１００種類の白色雑音キャリアを用意した。また、本評価実験で利用した残響時間Ｔ_Ｒは、０．１秒、０．３秒、０．５秒、１．０秒、及び２．０秒の５種類とした。そのため、本評価実験では、合計５００個のパルス応答が用意され、これらを人工的な信号に畳み込むことで、残響信号を作成した。[Evaluation experiment]
An artificial AM signal of Expression (6) was used as a sound source signal for evaluation. The power envelope was a sine wave with a modulation frequency of 5 Hz (modulation degree 1), and a signal obtained by multiplying this by a white noise carrier was used. The modulation spectrum for this power envelope has the same value for the modulation spectrum of 0 Hz and the modulation spectrum of 5 Hz. Next, the room reverberation impulse response defined by Equation (7) was used. Here, per one reverberation time T _R, we were prepared 100 kinds of white noise carrier. Further, reverberation time _{T R} utilized in this evaluation experiment, 0.1 seconds, 0.3 seconds, 0.5 seconds, and a five 1.0 seconds, and 2.0 seconds. Therefore, in this evaluation experiment, a total of 500 pulse responses were prepared, and a reverberation signal was created by convolving these with an artificial signal.

これらの５００個の残響信号に対して、本実施の形態に係る残響時間推定装置１が実施する残響時間推定と同一の残響時間推定方法（以下、本法という）と、非特許文献１に記載されている残響音声パワーエンベロープの回復方法における残響時間推定方法（以下、従来法という）とにより、残響時間を推定した。図５は、評価実験の結果を示すグラフである。図中の実線は、本実施の形態に係る残響時間推定装置１による推定結果を示し、図中の破線は、従来法による推定結果を示している。また、図中の直線（点線）は、理想推定値を示している。本法及び従来法により推定された値は、残響時間Ｔ_Ｒ毎に１００種類の残響信号から得られた推定値の平均値を示している。図中のエラーバーはそれぞれの残響時間における標準偏差を示している。The reverberation time estimation method (hereinafter referred to as the present method) that is the same as the reverberation time estimation performed by the reverberation time estimation apparatus 1 according to the present embodiment for these 500 reverberation signals; The reverberation time was estimated by the reverberation time estimation method (hereinafter referred to as the conventional method) in the reverberation speech power envelope recovery method. FIG. 5 is a graph showing the results of the evaluation experiment. A solid line in the figure indicates an estimation result by the reverberation time estimation apparatus 1 according to the present embodiment, and a broken line in the figure indicates an estimation result by the conventional method. Moreover, the straight line (dotted line) in a figure has shown the ideal estimated value. The present method and the estimated values by the conventional method, shows the average value of the estimated values obtained from 100 different reverberation signals each reverberation time T _R. The error bars in the figure indicate the standard deviation at each reverberation time.

理想直線と比較すると、従来法による推定結果は、残響時間が長くなるにつれ過小推定になる傾向があり、特に残響時間が１秒を超えたところから飽和する傾向がみられる。このような現象が生じる原因が、残響信号から生成したパワーエンベロープを回復する際に、ローパスフィルタで除去しきれなかったパワーエンベロープ上の高周波成分が逆フィルタリング（微分処理）で強調され、それらの位相の状態によっては、残響時間推定の精度を左右する谷の形成（変調度を定義するもの）に影響を与えることにある。これに対し、本法の推定結果は、ほとんど理想直線と一致しており、正確に残響時間を推定していることが分かる。両手法では、残響時間の増加にともない、パワーエンベロープ上の高周波数成分が逆フィルタにより異常に回復される。その際、従来法では時間領域で残響時間を推定するため高周波数の影響を直接受けてしまうが、本法では変調周波数領域で主要な周波数成分を利用して残響時間を推定するため高周波数の影響を受けない。これが、本法がうまく機能した理由である。 Compared with the ideal straight line, the estimation result by the conventional method tends to be underestimated as the reverberation time becomes longer, and in particular, tends to saturate when the reverberation time exceeds 1 second. The cause of this phenomenon is that when recovering the power envelope generated from the reverberant signal, high-frequency components on the power envelope that could not be removed by the low-pass filter are emphasized by inverse filtering (differential processing), and their phases Depending on the state, there is an influence on the formation of valleys that define the accuracy of reverberation time estimation (which defines the degree of modulation). On the other hand, the estimation result of this method almost coincides with the ideal straight line, and it can be seen that the reverberation time is accurately estimated. In both methods, as the reverberation time increases, high frequency components on the power envelope are abnormally recovered by the inverse filter. In this case, the conventional method directly affects the high frequency because the reverberation time is estimated in the time domain. However, in this method, the reverberation time is estimated using the main frequency components in the modulation frequency domain. Not affected. This is why this method worked well.

（実施の形態２）
実施の形態２は、人の音声から残響時間を推定する場合に特に適した構成の残響時間推定装置である。(Embodiment 2)
The second embodiment is a reverberation time estimation device having a configuration particularly suitable for estimating reverberation time from human speech.

［人の音声信号に対する残響時間推定方法の適用］
図６Ａ〜図６Ｄは、１０Ｈｚの正弦波１周期分を２セット用いて作ったパワーエンベロープと、その変調スペクトルを示すグラフである。図６Ａに示したパワーエンベロープの２つの正弦波のセット間の距離は０．１秒であり、同様に図６Ｂでは０．２秒、図６Ｃでは０．５秒、図６Ｄでは１．０秒である。このように、パワーエンベロープの２つの正弦波のセット間の平坦な区間を長くしていくと、変調スペクトル上で０Ｈｚの近傍の変調周波数ｆ_ｍＯＨｚに最初のピークが存在し、そのピークにおけるパワーの値Ｅ_ｘ（ｆ_ｍＯ）（図６Ａ〜Ｄにおいて○で示した箇所）と主要変調周波数でのパワーの値Ｅ_ｘ（ｆ_ｍｄ）が近付いていく傾向があることがわかる。ここで、ｆ_ｍＯは、０Ｈｚを除いて最も０Ｈｚに近い変調周波数である。このようなｆ_ｍＯにおけるパワーは、図２を参照すると残響により減衰しないことがわかる。そこで、このｆ_ｍＯを主要変調周波数ｆ_ｍｄのパワー回復の「基準値」として、残響時間の推定を実施すれば、かかる形状のパワーエンベロープに対しても残響時間を正確に推定することが可能となると考えられる。[Application of reverberation time estimation method to human speech signal]
6A to 6D are graphs showing a power envelope formed by using two sets of one period of a 10 Hz sine wave and a modulation spectrum thereof. The distance between the two sine wave sets of the power envelope shown in FIG. 6A is 0.1 seconds, similarly 0.2 seconds in FIG. 6B, 0.5 seconds in FIG. 6C, and 1.0 seconds in FIG. 6D. It is. Thus, as the flat interval between the two sets of sine waves of the power envelope is lengthened, the first peak exists at the modulation frequency f _mO Hz in the vicinity of 0 Hz on the modulation spectrum, and the power at that peak is It can be seen that the value E _x (f _mO ) (the portion indicated by ◯ in FIGS. 6A to _6D ) tends to approach the power value E _x (f _md ) at the main modulation frequency. Here, f _mO is the modulation frequency closest to 0 Hz except for 0 Hz. It can be _{seen that} such power at f _mO is not attenuated by reverberation with reference to FIG. Therefore, if the reverberation time is estimated using this f _mO as a “reference value” for power recovery of the main modulation frequency f _md , it is possible to accurately estimate the reverberation time for such a power envelope. It is considered to be.

図７は、図６Ａ〜Ｄに示したような正弦波２セットの間の距離（時間間隔）と、基準周波数ｆ_ｍＯにおけるパワー値Ｅ（ｆ_ｍＯ）と主要変調周波数ｆ_ｍｄにおけるパワー値Ｅ（ｆ_ｍｄ）との差との関係を示したグラフである。図中縦軸は、Ｅ（ｆ_ｍＯ）−Ｅ（ｆ_ｍｄ）を示し、横軸は正弦波２セット間の距離を示している。図７では、パワーエンベロープに用いる１周期分の正弦波の周波数を５、１０、２０Ｈｚとした。この図から、Ｅ（ｆ_ｍＯ）＝Ｅ（ｆ_ｍｄ）となるケースが存在することが分かる。また、ｆ_ｍｄは、１周期分の正弦波の時間差に対応した変調周波数として現れるので、この図からＥ（ｆ_ｍＯ）＝Ｅ（ｆ_ｍｄ）となるような場合には、ｆ_ｍｄは比較適低い変調周波数であることが分かる。FIG. 7 shows the distance (time interval) between two sets of sine waves as shown in FIGS. 6A to _6D, the power value E (f _mO ) at the reference frequency f _{mO, and} the power value E (at the main modulation frequency f _md ( It is the graph which showed the relationship with the difference with _fmd ). In the figure, the vertical axis represents E (f _mO ) −E (f _md ), and the horizontal axis represents the distance between two sets of sine waves. In FIG. 7, the frequency of the sine wave for one cycle used for the power envelope is set to 5, 10, and 20 Hz. From this figure, it can be seen that there is a case where E (f _mO ) = E (f _md ). Further, f _md appears as a modulation frequency corresponding to the time difference of the sine wave for one cycle. Therefore, when E (f _mO ) = E (f _md ) from this figure, f _md is suitable for comparison. It can be seen that the modulation frequency is low.

ｆ_ｍＯＨｚ上の基準値（パワー）は、残響により全く減衰を受けないわけではない。残響により基準値がどの程度影響を受けるかは重要な問題である。残響が付くことによって基準値が大きく減衰してしまっては、推定される残響時間の値が、理論値よりも短くなってしまう。The reference value (power) on f _mO Hz is not completely attenuated by reverberation. The degree to which the reference value is affected by reverberation is an important issue. If the reference value is greatly attenuated due to reverberation, the estimated reverberation time value becomes shorter than the theoretical value.

図８は、クリーンな状態での基準値における変調度を１としたＭＴＦ（Ｍ（ｆ_ｍ，Ｔ_Ｒ）：実線）と、残響により減衰した基準値における変調度を１としたＭＴＦ（Ｍ’（ｆ_ｍ，Ｔ_Ｒ）：破線）を示すグラフである。図に示すように、基準値が減衰すると、変調スペクトルの主要変調周波数ｆ_ｍｄにおけるパワーＥ_ｙ（ｆ_ｍｄ）において一致するＭ（ｆ_ｍ，Ｔ_Ｒ）とＭ’（ｆ_ｍ，Ｔ_Ｒ）とは異なる関数であるため、Ｍ（ｆ_ｍ，Ｔ_Ｒｃ）＝Ｍ’（ｆ_ｍ，Ｔ_Ｒｗ）となるときのＴ_ＲｃとＴ_Ｒｗの値も異なってくる。8, MTF with a degree of modulation 1 in the reference value at a clean state _{_{(M (f m, T R}} ): solid line) and, MTF with a degree of modulation 1 in the reference value attenuated by the reverberation (M ' _(f _{m, T} R): the dashed line) is a graph showing a. As shown in the figure, when the reference value is attenuated, M (f _m , T _R ) and M ′ (f _m , T _R ) that coincide in the power E _y (f _md ) at the main modulation frequency f _md of the modulation spectrum Are different functions, the values of T _Rc and T _Rw are different when M (f _m , T _Rc ) = M ′ (f _m , T _Rw ).

ｆ_ｍＯの値は、ＦＦＴ長に依存している。図９Ａは、基準周波数ｆ_ｍＯとＭＴＦの値との関係を、複数の残響時間について示したグラフであり、図９Ｂは、基準周波数ｆ_ｍＯのパワーＥ_ｙ（ｆ_ｍＯ）を基準値とした場合の残響時間の誤差と変調周波数との関係を示すグラフである。図９Ａは、残響時間Ｔ_Ｒが、０．１、０．３、０．５、１．０、２．０秒のそれぞれの場合に対する、ｆ_ｍＯ＝０．０２５、０．０５、０．１Ｈｚ上でのＭＴＦの値を示している。サンプリング周波数ｆ_ｓを２０ｋＨｚとすると、この図は、周波数分解能を高めるために、ＦＦＴ長が３０、２０、１０秒のときの基準値の減衰を示している。また、図９Ｂは、基準周波数ｆ_ｍＯ＝０．０２５、０．０５、０．１秒の各ケースについて、残響時間の誤差と変調周波数との関係を示している。この図から、ＦＦＴ長が短い程、残響時間が長い程、変調周波数が低い程、誤差が大きくなることが分かる。しかし、ｆ_ｍＯ＝０．０２５、０．５Ｈｚ上に基準値をおいた場合には、低い変調周波数域で誤差が最も大きくなるＴ_Ｒ＝２．０秒の場合でも、残響時間の誤差は高々１０^−３桁である。すなわち、ＦＦＴ長が２０秒程度であれば、０Ｈｚを除いて０Ｈｚに最も近い変調周波数成分におけるパワーを基準値としても、残響時間の推定値にはほとんど誤差の影響がないと考えられる。The value of f _mO depends on the FFT length. FIG. 9A is a graph showing the relationship between the reference frequency f _mO and the MTF value for a plurality of reverberation times, and FIG. 9B shows the _{case where} the power E _y (f _mO ) of the reference frequency f _mO is used as a reference value. It is a graph which shows the relationship between the error of the reverberation time of and the modulation frequency. 9A is reverberation time _{T R} is for the case of each 0.1,0.3,0.5,1.0,2.0 _seconds, f mO = 0.025,0.05,0.1Hz The above MTF values are shown. When the sampling frequency f _s is 20 kHz, this figure shows the attenuation of the reference value when the FFT length is 30, 20, and 10 seconds in order to increase the frequency resolution. FIG. 9B shows the relationship between the reverberation time error and the modulation frequency for each case of the reference frequency f _mO = 0.025, _{0.05, and} 0.1 seconds. From this figure, it can be seen that the shorter the FFT length, the longer the reverberation time, and the lower the modulation frequency, the greater the error. However, when f _mO = 0.025 and the reference value is set above _0.5 Hz, the error in the reverberation time is at most even in the case of T _R = 2.0 seconds where the error becomes the largest in the low modulation frequency region. ^10-3 digits. That is, if the FFT length is about 20 seconds, the estimated value of the reverberation time is considered to have almost no influence of errors even if the power at the modulation frequency component closest to 0 Hz is excluded except 0 Hz.

一方、人の音声の立上りと立ち下がりには、高い周波数のエネルギーが集中するため、帯域分割した音声のパワーエンベロープには、図６Ｄに示したような、ピーク間の平坦区間が長い形状がしばしばみられる。図１０Ａ及び図１０Ｂは、帯域分割したときの人の音声のパワーエンベロープのグラフ及び変調スペクトルのグラフを示している。図１０Ａは残響が付加されていない音源信号の場合であり、図１０Ｂは残響が付加された残響信号の場合である。音源信号の変調スペクトルは、０Ｈｚにごく近い変調周波数のパワーと主要な成分のパワーが等しくなっている。よって、音声信号に帯域分割を行うことにより、残響時間の推定が可能であると考えられる。 On the other hand, high frequency energy concentrates on the rise and fall of human voice, and therefore, the power envelope of the band-divided voice often has a shape with a long flat section between peaks as shown in FIG. 6D. Seen. FIG. 10A and FIG. 10B show a power envelope graph and a modulation spectrum graph of human speech when the band is divided. FIG. 10A shows a case of a sound source signal to which reverberation is not added, and FIG. 10B shows a case of a reverberation signal to which reverberation is added. In the modulation spectrum of the sound source signal, the power of the modulation frequency very close to 0 Hz and the power of the main component are equal. Therefore, it is considered that the reverberation time can be estimated by performing band division on the audio signal.

［残響時間推定装置の構成］
上記のような原理に基づいて、本実施の形態では、音響信号（音声信号）を帯域分割し、帯域分割された各チャンネルのパワーエンベロープに基づいて残響時間の推定値を求め、これとともに各チャンネルのパワーエンベロープに基づいて処理に適したチャンネルを選定し、選定されたチャンネルによる残響時間の推定値の平均値を最終的な残響時間の推定値として採用する構成としている。また、本実施の形態に係る残響時間推定装置は、０Ｈｚを除いて０Ｈｚに最も近い変調周波数における変調スペクトル値を基準値とし、この基準値で正規化した変調スペクトルを生成するように構成されている。[Configuration of reverberation time estimation device]
Based on the principle as described above, in this embodiment, an acoustic signal (audio signal) is divided into bands, and an estimated value of the reverberation time is obtained based on the power envelope of each divided band. A channel suitable for processing is selected on the basis of the power envelope, and the average value of the estimated reverberation time of the selected channel is adopted as the final estimated value of the reverberation time. In addition, the reverberation time estimation apparatus according to the present embodiment is configured to generate a modulation spectrum normalized by the reference value using a modulation spectrum value at a modulation frequency closest to 0 Hz as a reference value except 0 Hz. Yes.

図１１は、本発明の実施の形態２に係る残響時間推定装置の構成を示すブロック図である。図１１に示すように、本実施の形態に係る残響時間推定装置２０１は、パワーエンベロープ生成部４１、ローパスフィルタ４２、主要変調周波数取得部４３、及び正規化変調スペクトル生成部４８の機能ブロックに加えて、帯域分割部４５、及びチャンネル選定部４６の機能ブロックを有するデジタル信号処理回路２０４を備えている。 FIG. 11 is a block diagram showing a configuration of a reverberation time estimation apparatus according to Embodiment 2 of the present invention. As shown in FIG. 11, reverberation time estimation apparatus 201 according to the present embodiment includes functional blocks of power envelope generation unit 41, low-pass filter 42, main modulation frequency acquisition unit 43, and normalized modulation spectrum generation unit 48. In addition, a digital signal processing circuit 204 having functional blocks of a band dividing unit 45 and a channel selecting unit 46 is provided.

帯域分割部４５は、１００Ｈｚの帯域幅の定帯域通過フィルタバンクを備えており（図示せず）、Ａ／Ｄ変換器３から出力されたデジタル音響信号を１００チャンネルに帯域分割するように構成されている。この帯域分割部４５によって帯域分割された各チャンネル信号は、パワーエンベロープ生成部４１に与えられ、各チャンネル信号のパワーエンベロープ信号が生成されるようになっている。 The band dividing unit 45 includes a constant band pass filter bank (not shown) having a bandwidth of 100 Hz, and is configured to divide the digital acoustic signal output from the A / D converter 3 into 100 channels. ing. Each channel signal band-divided by the band dividing unit 45 is supplied to the power envelope generating unit 41, and a power envelope signal of each channel signal is generated.

パワーエンベロープ生成部４１から出力されるパワーエンベロープ信号は、ローパスフィルタ４２及び正規化変調スペクトル生成部４８に与えられると共に、チャンネル選定部４６にも与えられる。本実施の形態の正規化変調スペクトル生成部４８は、０Ｈｚを除いて０Ｈｚに最も近い変調周波数である基準周波数ｆ_ｍＯにおけるパワー値を基準値とし、この基準値で正規化された変調スペクトル信号を出力するように構成されている。The power envelope signal output from the power envelope generation unit 41 is supplied to the low pass filter 42 and the normalized modulation spectrum generation unit 48 and also to the channel selection unit 46. The normalized modulation spectrum generation unit 48 of the present embodiment _uses the power value at the reference frequency f _mO that is the modulation frequency closest to 0 Hz except 0 Hz as a reference value, and the modulated spectrum signal normalized by this reference value It is configured to output.

また、チャンネル選定部４６は、高レベル部検出部４６ａと、第１選定部４６ｂと、第２選定部４６ｃと、第３選定部４６ｄとを有しており、帯域分割されたチャンネルの内、残響時間推定に用いるチャンネルを選定するように構成されている。 The channel selection unit 46 includes a high level part detection unit 46a, a first selection unit 46b, a second selection unit 46c, and a third selection unit 46d. A channel used for reverberation time estimation is selected.

図１２Ａ〜Ｃは、チャンネル選定部４６による残響時間推定に用いるチャンネルの選定過程を示す模式図である。図１２Ａに示すように、まず高レベル部検出部４６ａにより、所定の基準レベル（図中破線で示す）に基づいてパワーエンベロープ信号の高レベル部の立上り時点ｂ（ｎ）と立ち下がり時点ｅ（ｎ）とが検出される。次に、立上り時点ｂ（ｎ）から立ち下がり時点ｅ（ｎ）までの区間で、パワーが最大の時点を検出し、ピーク時点ｐ（ｎ）とされる。検出された立上り時点ｂ（ｎ）、ピーク時点ｐ（ｎ）、及び立ち下がり時点ｅ（ｎ）は１つのセットとされる。高レベル部検出部４６ａは、このようにしてパワーエンベロープ信号に含まれる全ての高レベル部のセットを検出するように構成されている。 12A to 12C are schematic diagrams illustrating a channel selection process used for reverberation time estimation by the channel selection unit 46. As shown in FIG. 12A, first, the high level portion detector 46a causes the rising time point b (n) and the falling time point e () of the high level portion of the power envelope signal based on a predetermined reference level (indicated by a broken line in the figure). n) are detected. Next, in the section from the rising time point b (n) to the falling time point e (n), the time point at which the power is maximum is detected and set as the peak time point p (n). The detected rising time point b (n), peak time point p (n), and falling time point e (n) are taken as one set. The high level portion detection unit 46a is configured to detect a set of all high level portions included in the power envelope signal in this way.

第１選定部４６ｂは、図１２Ｂに示すように、立上り時点ｂ（ｎ）から一つ前のセットの立ち下がり時点ｅ（ｎ−１）までの間に、微小なピークが存在するか否かを判定するように構成されている。微小なピークが存在する場合には、そのセットを残響時間推定に用いる対象から除外する。 As shown in FIG. 12B, the first selection unit 46b determines whether or not a minute peak exists between the rising point b (n) and the falling point e (n-1) of the previous set. Is configured to determine. When a minute peak exists, the set is excluded from the target used for reverberation time estimation.

第２選定部４６ｃは、図１２Ｂに示すように、このような微小なピークが存在しない場合に、セットに大きな谷が含まれているか否かを判定するように構成されている。第２選定部４６ｃは、図示しないローパスフィルタを備えており、このローパスフィルタによって谷が検出される。このような谷が存在するセットは、残響時間推定に用いる対象から除外される。 As illustrated in FIG. 12B, the second selection unit 46c is configured to determine whether or not a large valley is included in the set when such a minute peak does not exist. The second selection unit 46c includes a low-pass filter (not shown), and the valley is detected by the low-pass filter. A set having such a valley is excluded from the target used for reverberation time estimation.

第３選定部４６ｄは、上記のように、微小なピークが存在せず、且つ、谷が存在しないセットが１つのパワーエンベロープ中に連続して２つ存在する場合には、図１２Ｃに示すように、その２つのピークのパワーの差と時間差とを求め、これらが所定の基準値以上であるか否かを判定し、パワーの差及び時間差のいずれも基準値以上である場合には、当該チャンネルを残響時間推定に用いる対象として選定する。このようにしてチャンネル選定部４６によって得られた選定結果は、演算回路２０５に与えられる。 As described above, the third selection unit 46d, as shown in FIG. 12C, when there are two consecutive sets in one power envelope in which no minute peak exists and no valley exists. In addition, the power difference and time difference between the two peaks are obtained, it is determined whether or not these are greater than or equal to a predetermined reference value, and if both the power difference and time difference are greater than or equal to the reference value, Select the channel to be used for reverberation time estimation. The selection result obtained by the channel selection unit 46 in this way is given to the arithmetic circuit 205.

演算回路２０５は、実施の形態１の場合と同様にして、主要変調周波数取得部４３及び正規化変調スペクトル生成部４８からの出力データを受け付け、各チャンネルにおける残響時間の推定値を演算する。また、演算回路２０５は、チャンネル選定部４６からの出力データを受け付け、残響時間推定に用いる対象のチャンネルの残響時間推定結果の平均値を演算し、この平均値を残響時間の推定値とする。かかる推定された残響時間は、演算回路２０５が液晶表示部７を駆動制御すること二より、液晶表示部７に表示される。 The arithmetic circuit 205 receives the output data from the main modulation frequency acquisition unit 43 and the normalized modulation spectrum generation unit 48 and calculates the estimated value of the reverberation time in each channel in the same manner as in the first embodiment. The arithmetic circuit 205 receives the output data from the channel selection unit 46, calculates the average value of the reverberation time estimation results of the target channel used for reverberation time estimation, and uses this average value as the reverberation time estimation value. The estimated reverberation time is displayed on the liquid crystal display unit 7 by the arithmetic circuit 205 driving and controlling the liquid crystal display unit 7.

なお、本実施の形態に係る残響時間推定装置２０１のその他の構成は、実施の形態１に係る残響時間推定装置１の構成と同様であるので、同一の構成要素に付いては同符号を付し、その説明を省略する。 In addition, since the other structure of the reverberation time estimation apparatus 201 which concerns on this Embodiment is the same as that of the reverberation time estimation apparatus 1 which concerns on Embodiment 1, it attaches | subjects the same code | symbol about the same component. The description is omitted.

［評価実験］
女性話者が発話した８文章を評価用音響信号として利用し、本実施の形態に係る残響時間推定装置が実施する残響時間推定と同一の残響時間推定方法（以下、本法という）の評価実験を実施した。評価用音響信号の発話内容は以下の通りである。
（１）「第一回通訳国際会議に参加のご登録をご希望される方は、所定の申込用紙に住所・氏名と発表・聴講の別を明記して、国際会議事務局までお申し込みください。」
（２）「はい。こちらは第一回通訳電話国際会議事務局です。」
（３）「もしもし。通訳国際会議への参加を申し込みたいのですけれども、どのような手続をすればよろしいでしょうか。」
（４）「通訳電話の国際会議に参加するためには、所定の申し込み用紙を用いて参加登録することが必要です。」
（５）「会議に発表するのではなくて聴講するだけだと、費用はいくらかかりますか。」
（６）「ご発表を希望される場合には、予稿集代・登録料を含めた参加費用は４万円です。」
（７）「聴講のみの場合は当日の受け付けも可能で、予稿集代を含めた費用は３万５千円かかります。」
（８）「参加登録の申し込み用紙はどのようにして手に入れればよろしいのでしょうか。」
利用した残響時間Ｔ_Ｒは、０．１、０．３、０．５、１．０、及び２．０秒の５種類とした。インパルス応答は、式（７）のものを残響時間毎に５０種類用意した。なお、総刺激数は２０００個（８個×５条件×５０キャリア）である。本実験では、本法と従来法とのそれぞれについて、全ての発話に対して得られた残響時間の推定値の平均値を求め、この平均値について評価した。[Evaluation experiment]
Evaluation test of reverberation time estimation method (hereinafter referred to as “this method”) that is the same as the reverberation time estimation performed by the reverberation time estimation apparatus according to the present embodiment using eight sentences uttered by a female speaker as an acoustic signal for evaluation. Carried out. The utterance content of the evaluation acoustic signal is as follows.
(1) “If you wish to register to participate in the 1st International Interpretation Conference, please specify the address, name and presentation / listening on the designated application form and apply to the International Conference Secretariat. . "
(2) “Yes. This is the first Interpreting Telephone International Conference Secretariat.”
(3) “Hello. I would like to apply for an international interpreting conference. What procedures should I take?”
(4) “To participate in an international interpreting telephone conference, you must register using the designated application form.”
(5) "How much does it cost to just attend rather than present at the meeting?"
(6) “If you wish to make a presentation, the participation fee including preliminary collection fee and registration fee is 40,000 yen.”
(7) “In the case of attendance only, it is possible to accept it on the day, and the cost including the proceedings collection cost 35,000 yen.”
(8) “How do I get an application form for registration?”
Reverberation time _{T R} utilized was 0.1,0.3,0.5,1.0, and was five 2.0 seconds. 50 types of impulse responses were prepared for each reverberation time of Equation (7). The total number of stimuli is 2000 (8 × 5 conditions × 50 carriers). In this experiment, for each of the present method and the conventional method, an average value of estimated reverberation times obtained for all utterances was obtained, and the average value was evaluated.

図１３は、評価実験の結果を示すグラフである。図の記載方法は図５と同様である。本法では、残響時間Ｔ_Ｒが０．１、０．３、０．５、及び１．０秒のときに、概ね理想値に近い推定結果を得た。また、残響時間Ｔ_Ｒが２．０秒のときも、僅かに過大に推定されているものの、良好な結果を得ていることが分かる。FIG. 13 is a graph showing the results of the evaluation experiment. The drawing method is the same as in FIG. In this method, the reverberation time _{T R} is 0.1, 0.3, 0.5, and when 1.0 seconds, taken generally give estimation result close to the ideal value. Although reverberation time T _R is also the time of 2.0 seconds, which is slightly overestimated, it can be seen that good results.

（実施の形態３）
実施の形態３は、残響時間を推定するために用いられるコンピュータプログラムをコンピュータが実行することにより実現された残響時間推定装置である。なお、本実施の形態に係る残響時間推定装置３０１は、実施の形態２に係る残響時間推定装置２０１と実質的に同一の処理を、ソフトウェアにより実現したものである。(Embodiment 3)
Embodiment 3 is a reverberation time estimation device realized by a computer executing a computer program used for estimating reverberation time. Note that reverberation time estimation apparatus 301 according to the present embodiment implements substantially the same processing as software by reverberation time estimation apparatus 201 according to Embodiment 2.

［残響時間推定装置の構成］
図１４は、本発明の実施の形態３に係る残響時間推定装置の構成を示すブロック図である。図１４に示すように、コンピュータ３０１ａは、本体３１１と、画像表示部３１２と、入力部３１３とを備えている。本体３１１は、ＣＰＵ３１１ａと、ＲＯＭ３１１ｂ、ＲＡＭ３１１ｃ、ハードディスク３１１ｄ、読出装置３１１ｅ、入出力インタフェース３１１ｆ、及び画像出力インタフェース３１１ｇを備えており、ＣＰＵ３１１ａ、ＲＯＭ３１１ｂ、ＲＡＭ３１１ｃ、ハードディスク３１１ｄ、読出装置３１１ｅ、入出力インタフェース３１１ｆ、および画像出力インタフェース３１１ｇは、バス３１１ｉによって接続されている。[Configuration of reverberation time estimation device]
FIG. 14 is a block diagram showing a configuration of a reverberation time estimation apparatus according to Embodiment 3 of the present invention. As illustrated in FIG. 14, the computer 301 a includes a main body 311, an image display unit 312, and an input unit 313. The main body 311 includes a CPU 311a, a ROM 311b, a RAM 311c, a hard disk 311d, a reading device 311e, an input / output interface 311f, and an image output interface 311g. The image output interface 311g is connected by a bus 311i.

ＣＰＵ３１１ａは、ＲＡＭ３１１ｃにロードされたコンピュータプログラムを実行することが可能である。そして、後述するような残響時間推定プログラム３１４ａを当該ＣＰＵ３１１ａが実行することにより、コンピュータ３０１ａが残響時間推定装置３０１として機能する。 The CPU 311a can execute a computer program loaded in the RAM 311c. Then, when the CPU 311 a executes a reverberation time estimation program 314 a described later, the computer 301 a functions as the reverberation time estimation device 301.

ＲＯＭ３１１ｂは、マスクＲＯＭ、ＰＲＯＭ、ＥＰＲＯＭ、又はＥＥＰＲＯＭ等によって構成されており、ＣＰＵ３１１ａに実行されるコンピュータプログラムおよびこれに用いるデータ等が記録されている。 The ROM 311b is configured by a mask ROM, PROM, EPROM, EEPROM, or the like, in which computer programs executed by the CPU 311a, data used for the same, and the like are recorded.

ＲＡＭ３１１ｃは、ＳＲＡＭまたはＤＲＡＭ等によって構成されている。ＲＡＭ３１１ｃは、ハードディスク３１１ｄに記録されている残響時間推定プログラム３１４ａの読み出しに用いられる。また、ＣＰＵ３１１ａがコンピュータプログラムを実行するときに、ＣＰＵ３１１ａの作業領域として利用される。 The RAM 311c is configured by SRAM, DRAM, or the like. The RAM 311c is used for reading the reverberation time estimation program 314a recorded in the hard disk 311d. Further, when the CPU 311a executes a computer program, it is used as a work area of the CPU 311a.

ハードディスク３１１ｄは、オペレーティングシステムおよびアプリケーションプログラム等、ＣＰＵ３１１ａに実行させるための種々のコンピュータプログラムおよび当該コンピュータプログラムの実行に用いられるデータがインストールされている。後述する残響時間推定プログラム３１４ａも、このハードディスク３１１ｄにインストールされている。 The hard disk 311d is installed with various computer programs to be executed by the CPU 311a, such as an operating system and application programs, and data used for executing the computer programs. A reverberation time estimation program 314a described later is also installed in the hard disk 311d.

ハードディスク３１１ｄには、残響時間をパラメータとして有するＭＴＦの逆特性が、ＣＰＵ３１１ａによって処理可能な関数データとして記憶されている。このハードディスク３１１ｄに記憶されている逆フィルタ（逆特性）の関数データは、推定候補として予め与えられた複数の残響時間の分だけ用意される。なお、逆フィルタの関数データの詳細は、実施の形態１と同様である。 The hard disk 311d stores MTF inverse characteristics having reverberation time as a parameter as function data that can be processed by the CPU 311a. The function data of the inverse filter (inverse characteristic) stored in the hard disk 311d is prepared for a plurality of reverberation times given in advance as estimation candidates. The details of the inverse filter function data are the same as in the first embodiment.

読出装置３１１ｅは、フレキシブルディスクドライブ、ＣＤ−ＲＯＭドライブ、またはＤＶＤ−ＲＯＭドライブ等によって構成されており、可搬型記録媒体３１４に記録されたコンピュータプログラムまたはデータを読み出すことができる。また、可搬型記録媒体３１４には、コンピュータを残響時間推定装置として機能させるための残響時間推定プログラム３１４ａが格納されており、コンピュータ３０１ａが当該可搬型記録媒体３１４から残響時間推定プログラム３１４ａを読み出し、当該残響時間推定プログラム３１４ａをハードディスク３１１ｄにインストールすることが可能である。 The reading device 311e is configured by a flexible disk drive, a CD-ROM drive, a DVD-ROM drive, or the like, and can read a computer program or data recorded on a portable recording medium 314. The portable recording medium 314 stores a reverberation time estimation program 314a for causing the computer to function as a reverberation time estimation device, and the computer 301a reads the reverberation time estimation program 314a from the portable recording medium 314. The reverberation time estimation program 314a can be installed in the hard disk 311d.

なお、前記残響時間推定プログラム３１４ａは、可搬型記録媒体３１４によって提供されるのみならず、電気通信回線（有線、無線を問わない）によってコンピュータ３０１ａと通信可能に接続された外部の機器から前記電気通信回線を通じて提供することも可能である。例えば、前記残響時間推定プログラム３１４ａがインターネット上のサーバコンピュータのハードディスク内に格納されており、このサーバコンピュータにコンピュータ３０１ａがアクセスして、当該コンピュータプログラムをダウンロードし、これをハードディスク３１１ｄにインストールすることも可能である。 The reverberation time estimation program 314a is provided not only by the portable recording medium 314 but also from an external device that is communicably connected to the computer 301a via an electric communication line (whether wired or wireless). It can also be provided through a communication line. For example, the reverberation time estimation program 314a may be stored in a hard disk of a server computer on the Internet, and the computer 301a may access the server computer to download the computer program and install it on the hard disk 311d. Is possible.

また、ハードディスク３１１ｄには、例えば米マイクロソフト社が製造販売するＷｉｎｄｏｗｓ（登録商標）等のマルチタスクオペレーティングシステムがインストールされている。
以下の説明においては、本実施の形態１に係る残響時間推定プログラム３１４ａは当該オペレーティングシステム上で動作するものとしている。The hard disk 311d is installed with a multitask operating system such as Windows (registered trademark) manufactured and sold by Microsoft Corporation.
In the following description, the reverberation time estimation program 314a according to the first embodiment is assumed to operate on the operating system.

入出力インタフェース３１１ｆは、例えばＵＳＢ，ＩＥＥＥ１３９４，又はＲＳ−２３２Ｃ等のシリアルインタフェース、ＳＣＳＩ，ＩＤＥ，又はＩＥＥＥ１２８４等のパラレルインタフェース、およびＤ／Ａ変換器、Ａ／Ｄ変換器等からなるアナログインタフェース等から構成されている。入出力インタフェース３１１ｆには、キーボードおよびマウスを有する入力部３１３が接続されており、ユーザが当該入力部３１３を使用することにより、コンピュータ３０１ａにデータを入力することが可能である。また、入力部３１３には、マイクロフォンが設けられており、音響を電気信号に変換し、これを入出力インタフェース３１１ｆによってＡ／Ｄ変換し、デジタル信号の音響信号を取得するように構成されている。 The input / output interface 311f is, for example, a serial interface such as USB, IEEE 1394, or RS-232C, a parallel interface such as SCSI, IDE, or IEEE 1284, an analog interface including a D / A converter, an A / D converter, or the like. It is configured. An input unit 313 having a keyboard and a mouse is connected to the input / output interface 311f, and the user can input data to the computer 301a by using the input unit 313. In addition, the input unit 313 is provided with a microphone, and is configured to convert sound into an electric signal, A / D convert this by an input / output interface 311f, and acquire a digital signal. .

画像出力インタフェース３１１ｇは、ＬＣＤまたはＣＲＴ等で構成された画像表示部３１２に接続されており、ＣＰＵ３１１ａから与えられた画像データに応じた映像信号を画像表示部３１２に出力するようになっている。画像表示部３１２は、入力された映像信号にしたがって、画像（画面）を表示する。 The image output interface 311g is connected to an image display unit 312 configured by an LCD, a CRT, or the like, and outputs a video signal corresponding to image data given from the CPU 311a to the image display unit 312. The image display unit 312 displays an image (screen) according to the input video signal.

［残響時間推定装置の動作］
次に、残響時間推定装置３０１の動作について説明する。図１５は、本実施の形態に係る残響時間推定装置３０１の動作の流れを示すフローチャートである。まず、作業者は、残響時間を測定する室内において、入力部３１３のマイクロフォンで音響（音声）をサンプリングする。マイクロフォンから出力されたアナログ音響信号は、入出力インタフェース３１１ｆが備えるＡ／Ｄ変換器によりＰＣＭ等のデジタル音響データへ変換され、この音響データがＣＰＵ３１１ａに与えられる。[Operation of reverberation time estimation device]
Next, the operation of the reverberation time estimation apparatus 301 will be described. FIG. 15 is a flowchart showing a flow of operations of reverberation time estimation apparatus 301 according to the present embodiment. First, an operator samples sound (sound) with a microphone of the input unit 313 in a room where reverberation time is measured. The analog sound signal output from the microphone is converted into digital sound data such as PCM by an A / D converter provided in the input / output interface 311f, and this sound data is given to the CPU 311a.

まず、ＣＰＵ３１１ａは、音響データを複数の帯域に分割する（ステップＳ１）。このとき、音響データは１００Ｈｚの帯域幅毎にチャンネル分割される。次に、ＣＰＵ３１１ａは、各チャンネルの音響データからパワーエンベロープをそれぞれ生成し、各パワーエンベロープをＲＡＭ３１１ｃに記憶する（ステップＳ２）。ＣＰＵ３１１ａは、チャンネルを１つ選択し（ステップＳ３）、ＲＡＭ３１１ｃからこのチャンネルのパワーエンベロープを読み出し、所定のカットオフ周波数により高周波成分を除去する（ステップＳ４）。そして、ＣＰＵ３１１ａは、この高周波成分を除去したパワーエンベロープに対して、自己相関関数を演算し、主要変調周波数を算出する（ステップＳ５）。 First, the CPU 311a divides the acoustic data into a plurality of bands (step S1). At this time, the acoustic data is divided into channels for each 100 Hz bandwidth. Next, the CPU 311a generates a power envelope from the acoustic data of each channel, and stores each power envelope in the RAM 311c (step S2). The CPU 311a selects one channel (step S3), reads the power envelope of this channel from the RAM 311c, and removes high frequency components with a predetermined cutoff frequency (step S4). Then, the CPU 311a calculates an autocorrelation function for the power envelope from which the high frequency component has been removed, and calculates a main modulation frequency (step S5).

次にＣＰＵ３１１ａは、再度前記チャンネルのパワーエンベロープをＲＡＭ３１１ｃから読み出し、フーリエ変換処理を実行し（ステップＳ６）、０Ｈｚを除いて０Ｈｚに最も近い変調周波数におけるパワーを基準値とし、ステップＳ６で得られた変調スペクトルを前記基準値で正規化した正規化変調スペクトルを演算する（ステップＳ７）。次に、ＣＰＵ３１１ａは、ステップＳ５で算出した主要変調周波数及びステップＳ７で算出した正規化変調スペクトルに基づいて、残響時間を推定する（ステップＳ８）。この残響時間の推定は、次のようにして行われる。まず、ＣＰＵ３１１ａがハードディスク３１１ｄから各残響時間の逆フィルタを読み出し、ステップＳ５で算出した主要変調周波数及びステップＳ７で算出した変調スペクトルに対して各残響時間の逆フィルタを適用して、各残響時間に対応する音源信号の主要変調周波数における変調スペクトルを算出する。次に、ＣＰＵ３１１ａが、このようにして求めた各残響時間に対応する音源信号の主要変調周波数における変調スペクトルのうち、大きさが０ｄＢ（変調度で表すと１）に最も近い１つを選択し、その残響時間を、室の残響時間として推定する。ＣＰＵ３１１ａは、このようにして求めた残響時間の推定値を、そのとき処理を行ったチャンネルと対応づけてＲＡＭ３１１ｃに記憶する。 Next, the CPU 311a reads again the power envelope of the channel from the RAM 311c, executes a Fourier transform process (step S6), uses the power at the modulation frequency closest to 0 Hz except for 0 Hz as a reference value, and is obtained in step S6. A normalized modulation spectrum obtained by normalizing the modulation spectrum with the reference value is calculated (step S7). Next, the CPU 311a estimates the reverberation time based on the main modulation frequency calculated in step S5 and the normalized modulation spectrum calculated in step S7 (step S8). The reverberation time is estimated as follows. First, the CPU 311a reads out the inverse filter of each reverberation time from the hard disk 311d, applies the inverse filter of each reverberation time to the main modulation frequency calculated in step S5 and the modulation spectrum calculated in step S7, and applies it to each reverberation time. The modulation spectrum at the main modulation frequency of the corresponding sound source signal is calculated. Next, the CPU 311a selects one of the modulation spectra at the main modulation frequency of the sound source signal corresponding to each reverberation time thus obtained, the magnitude closest to 0 dB (1 in terms of modulation degree). The reverberation time is estimated as the room reverberation time. The CPU 311a stores the estimated value of the reverberation time thus obtained in the RAM 311c in association with the channel that has been processed at that time.

次にＣＰＵ３１１ａは、当該チャンネルのパワーエンベロープを再度ＲＡＭ３１１ｃから読み出し、このパワーエンベロープに基づいて、当該チャンネルが残響時間の推定に用いるのに適したものであるか否かを判定する（ステップＳ９）。この処理は、（１）パワーエンベロープの高レベル部（立上り時点、ピーク時点、立ち下がり時点のセット）の検出、（２）隣り合う高レベル部の間に微小ピークが存在するか否かの判定、（３）高レベル部に谷が存在するか否かの判定、微小ピーク及び谷の両方が存在しない場合に、（４）隣り合う高レベル部のパワーの差及び時間差が所定の基準値以上であるか否かの判定、の各処理により構成される。なお、これらの処理の詳細は、実施の形態２で説明したものと同様であるので、その説明を省略する。 Next, the CPU 311a reads the power envelope of the channel again from the RAM 311c, and determines whether or not the channel is suitable for use in estimation of reverberation time based on the power envelope (step S9). This processing includes (1) detection of a high level portion (set of rising time, peak time, and falling time) of a power envelope, and (2) determination of whether or not a minute peak exists between adjacent high level portions. (3) Determination of whether or not a valley exists in a high level part, and when both a minute peak and a valley do not exist, (4) Power difference and time difference between adjacent high level parts are equal to or greater than a predetermined reference value It is comprised by each process of determination whether it is. Note that the details of these processes are the same as those described in the second embodiment, and a description thereof will be omitted.

次にＣＰＵ３１１ａは、全てのチャンネルに対して上記のステップＳ４〜Ｓ９の処理を行ったか否かを判定し（ステップＳ１０）、ステップＳ１０において上記ステップＳ４〜Ｓ９の処理が終了していないチャンネルが存在する場合（ステップＳ１０においてＮＯ）、未処理のチャンネルの１つを選択し（ステップＳ１１）、ステップＳ４へと処理を戻す。ステップＳ９において全チャンネルについて上記ステップＳ３〜Ｓ８の処理が終了している場合には（ステップＳ９においてＹＥＳ）、ＣＰＵ３１１ａは、ステップＳ９において、残響時間の推定に用いるのに適していると判定されたチャンネルに対応する残響時間の推定値の平均値を算出し（ステップＳ１２）、画像出力インタフェース３１１ｇを駆動して、画像表示部３１２に、この平均値を本音響信号に対する残響時間の推定値として表示させ（ステップＳ１３）、処理を終了する。 Next, the CPU 311a determines whether or not the processing in steps S4 to S9 has been performed on all channels (step S10), and there is a channel in which the processing in steps S4 to S9 has not been completed in step S10. If so (NO in step S10), one of the unprocessed channels is selected (step S11), and the process returns to step S4. When the processing of steps S3 to S8 has been completed for all channels in step S9 (YES in step S9), CPU 311a is determined to be suitable for use in estimation of reverberation time in step S9. An average value of the estimated reverberation time corresponding to the channel is calculated (step S12), the image output interface 311g is driven, and this average value is displayed on the image display unit 312 as an estimated value of the reverberation time for the sound signal. (Step S13), and the process ends.

（その他の実施の形態）
上記の実施の形態１においては、変調スペクトルの大きさを、ＤＣ成分の変調スペクトル（変調周波数が０Ｈｚにおける変調スペクトル）で正規化した変調スペクトル信号を生成するように正規化変調スペクトル生成部４４が構成されている場合について説明したが、これに限定されるものではなく、例えば、０Ｈｚを除いて０Ｈｚに最も近い変調周波数におけるパワー値で、変調スペクトルの大きさを正規化する構成であってもよい。(Other embodiments)
In the first embodiment, the normalized modulation spectrum generation unit 44 generates the modulation spectrum signal in which the size of the modulation spectrum is normalized by the modulation spectrum of the DC component (the modulation spectrum when the modulation frequency is 0 Hz). Although the case where it is configured has been described, the present invention is not limited to this. For example, even when the modulation spectrum is normalized by the power value at the modulation frequency closest to 0 Hz, excluding 0 Hz, Good.

また、上記実施の形態２及び３においては、０Ｈｚを除いて０Ｈｚに最も近い変調周波数における変調スペクトル値を基準値とし、この基準値で正規化した変調スペクトルを生成するよう正規化変調スペクトル生成部４８が構成されている場合について説明したが、これに限定されるものではなく、変調スペクトルの大きさを、ＤＣ成分の変調スペクトルで正規化した変調スペクトル信号を生成するように構成されていてもよい。 In the second and third embodiments, the normalized modulation spectrum generation unit generates a modulation spectrum normalized by the reference value using the modulation spectrum value at the modulation frequency closest to 0 Hz except 0 Hz as a reference value. However, the present invention is not limited to this, and it may be configured to generate a modulation spectrum signal in which the size of the modulation spectrum is normalized by the modulation spectrum of the DC component. Good.

上記の実施の形態３においては、音響データを複数のチャンネルに帯域分割し、それぞれのチャンネルについて残響時間の推定値を演算し、残響時間の推定に用いるのに適したチャンネルを選定して、選定されたチャンネルの残響時間の推定値を平均することで、本音響データに対する残響時間の推定値を求める構成について述べたが、これに限定されるものではなく、帯域分割及びチャンネルの選定を行わず、音響データに対して１つのパワーエンベロープを演算し、それに基づいて残響時間の推定値を演算する構成としてもよい。 In the third embodiment, the acoustic data is band-divided into a plurality of channels, a reverberation time estimation value is calculated for each channel, and a channel suitable for use in reverberation time estimation is selected and selected. The configuration for obtaining the estimated value of the reverberation time for this sound data by averaging the estimated values of the reverberation time of the selected channels has been described. However, the present invention is not limited to this, and band division and channel selection are not performed. A configuration may be adopted in which one power envelope is calculated for the acoustic data, and an estimated value of the reverberation time is calculated based on the power envelope.

また、上記の実施の形態２及び３においては、全てのチャンネルのパワーエンベロープに対して主要変調周波数の取得及び正規化変調スペクトルの生成を行うことによりそれぞれのチャンネルに対応する残響時間推定値を個々に求め、残響時間の推定に用いるのに適したチャンネルを全チャンネルから選定して、残響時間の推定に用いるのに適したチャンネルのパワーエンベロープから求めた残響時間の推定値の平均を求める構成としたが、これに限定されるものではなく、全てのチャンネルのパワーエンベロープから、残響時間の推定に用いるのに適したチャンネルを選定し、選定されたチャンネルのパワーエンベロープから残響時間の推定値を求め、その平均値を算出する構成としてもよい。この場合は、選定されたチャンネルのパワーエンベロープだけに対して、主要変調周波数の取得及び正規化変調スペクトルの生成が行われるため、処理を効率的に行うことができる。 In Embodiments 2 and 3 described above, the main modulation frequency is obtained and the normalized modulation spectrum is generated for the power envelopes of all the channels, whereby the reverberation time estimation values corresponding to the respective channels are individually obtained. And selecting a channel suitable for use in estimating the reverberation time from all channels, and calculating an average of the estimated reverberation time obtained from the power envelope of the channel suitable for use in estimating the reverberation time. However, the present invention is not limited to this, and a channel suitable for estimation of reverberation time is selected from the power envelopes of all channels, and an estimated value of reverberation time is obtained from the power envelope of the selected channel. The average value may be calculated. In this case, since the main modulation frequency is acquired and the normalized modulation spectrum is generated only for the power envelope of the selected channel, the processing can be performed efficiently.

また、上記の実施の形態１〜３においては、残響時間をパラメータとして有するＭＴＦの逆フィルタが、演算回路５又はＣＰＵ３１１ａによって処理可能な関数データとして、所定の複数の残響時間の分だけメモリ６又はＲＡＭ３１１ｃに記憶されており、取得された主要変調周波数及び変調スペクトルに対して各残響時間の逆フィルタを適用することにより、各残響時間に対応する音源信号の主要変調周波数における変調スペクトルを算出し、このようにして求めた各残響時間に対応する音源信号の主要変調周波数における変調スペクトルのうち、その大きさが０ｄＢに最も近い１つを選択し、その残響時間を、室の残響時間として推定する構成とした。しかしながら、上記の構成に限定されるものではない。上記の逆フィルタが関数データとしてではなく、ルックアップテーブルとしてメモリ６又はＲＡＭ３１１ｃに記憶されており、演算回路５又はＣＰＵ３１１ａが逆フィルタのルックアップテーブルを参照して、各残響時間に対応する音源信号の主要変調周波数における変調スペクトルを取得する構成としてもよい。上記の各実施の形態では残響時間推定装置が１つの装置となっているが、本発明はこのような態様に限定されるわけではない。例えば、当該残響時間推定装置が備える各手段が別々の装置に設けられ、それらの装置が通信ネットワーク等を介してデータ通信等を行うことによって、上記の各実施の形態における処理が実現されるような構成であってもよい。 In the first to third embodiments, the MTF inverse filter having the reverberation time as a parameter is the function data that can be processed by the arithmetic circuit 5 or the CPU 311a as the memory 6 or the predetermined number of reverberation times. The modulation spectrum at the main modulation frequency of the sound source signal corresponding to each reverberation time is calculated by applying an inverse filter of each reverberation time to the acquired main modulation frequency and modulation spectrum stored in the RAM 311c. Of the modulation spectra at the main modulation frequency of the sound source signal corresponding to each reverberation time obtained in this way, one whose magnitude is closest to 0 dB is selected, and the reverberation time is estimated as the room reverberation time. The configuration. However, it is not limited to the above configuration. The above inverse filter is not stored as function data but as a lookup table in the memory 6 or the RAM 311c, and the arithmetic circuit 5 or the CPU 311a refers to the lookup table of the inverse filter and corresponds to the sound source signal corresponding to each reverberation time. The modulation spectrum at the main modulation frequency may be obtained. In each of the above embodiments, the reverberation time estimation device is one device, but the present invention is not limited to such a mode. For example, each unit included in the reverberation time estimation device is provided in a separate device, and the devices perform data communication via a communication network or the like, so that the processing in each of the above embodiments is realized. It may be a simple configuration.

本発明の残響時間推定装置及び残響時間推定方法は、時系列の音響信号から求められた周波数系列の変調スペクトルにより、原音信号を用いずに残響時間をブラインド推定する残響時間推定装置及びその方法などとして有用である。 A reverberation time estimation apparatus and a reverberation time estimation method according to the present invention include a reverberation time estimation apparatus and a method for blindly estimating a reverberation time without using an original sound signal based on a frequency-sequence modulation spectrum obtained from a time-series acoustic signal. Useful as.

本発明の実施の形態１に係る残響時間推定装置の構成を示すブロック図。The block diagram which shows the structure of the reverberation time estimation apparatus which concerns on Embodiment 1 of this invention. ＭＴＦ（変調度）を示すグラフ。The graph which shows MTF (modulation degree). 残響が付加されていない音響信号のパワーエンベロープを示すグラフ。The graph which shows the power envelope of the acoustic signal to which reverberation is not added. 残響が付加されていない音響信号の変調スペクトルを示すグラフ。The graph which shows the modulation spectrum of the acoustic signal to which reverberation is not added. 図３Ａで示す音響に残響が付加した場合の音響信号のパワーエンベロープを示すグラフ。The graph which shows the power envelope of the acoustic signal at the time of adding reverberation to the acoustic shown in FIG. 3A. 図４Ａに示す音響信号の変調スペクトルを示すグラフ。The graph which shows the modulation spectrum of the acoustic signal shown to FIG. 4A. 実施の形態１に係る残響時間推定装置が実施する残響時間推定と同一の残響時間推定方法の評価実験の結果を示すグラフ。6 is a graph showing a result of an evaluation experiment of the same reverberation time estimation method as the reverberation time estimation performed by the reverberation time estimation apparatus according to Embodiment 1; １０Ｈｚの正弦波１周期分を２セット用いて作ったパワーエンベロープと、その変調スペクトルを示すグラフ（セット間の時間差０．１秒）。A power envelope made by using two sets of one 10 Hz sine wave and a graph showing its modulation spectrum (time difference between sets 0.1 seconds). １０Ｈｚの正弦波１周期分を２セット用いて作ったパワーエンベロープと、その変調スペクトルを示すグラフ（セット間の時間差０．２秒）。A power envelope made by using two sets of one 10 Hz sine wave and a graph showing its modulation spectrum (time difference between sets: 0.2 seconds). １０Ｈｚの正弦波１周期分を２セット用いて作ったパワーエンベロープと、その変調スペクトルを示すグラフ（セット間の時間差０．５秒）。A power envelope created by using two sets of one 10 Hz sine wave and a graph showing its modulation spectrum (time difference between sets 0.5 seconds). １０Ｈｚの正弦波１周期分を２セット用いて作ったパワーエンベロープと、その変調スペクトルを示すグラフ（セット間の時間差１．０秒）。A power envelope created by using two sets of one 10 Hz sine wave cycle and a graph showing its modulation spectrum (time difference between sets: 1.0 second). 図６Ａ〜Ｄに示した正弦波２セットの間の時間間隔と、基準周波数と主要変調周波数とにおけるパワー値の差との関係を示したグラフ。The graph which showed the relationship between the time interval between two sets of sine waves shown to FIG. 6A-D, and the difference of the power value in a reference frequency and a main modulation frequency. クリーンな状態での基準値における変調度を１としたＭＴＦと、残響により減衰した基準値における変調度を１としたＭＴＦとを示すグラフ。The graph which shows MTF which made the modulation degree in the reference value in a clean state 1, and MTF which made the modulation degree in the reference value attenuate | damped by reverberation 1. 基準周波数とＭＴＦの値との関係を、複数の残響時間について示したグラフ。The graph which showed the relationship between a reference frequency and the value of MTF about several reverberation time. 基準周波数のパワーを基準値とした場合における残響時間の誤差と変調周波数との関係を示すグラフ。The graph which shows the relationship between the error of reverberation time, and a modulation frequency when the power of a reference frequency is made into a reference value. 帯域分割したときの人の音声（残響付加なし）のパワーエンベロープのグラフ及び変調スペクトルのグラフ。The power envelope graph and the modulation spectrum graph of human voice (without reverberation added) when the band is divided. 帯域分割したときの人の音声（残響付加あり）のパワーエンベロープのグラフ及び変調スペクトルのグラフ。The graph of the power envelope and modulation spectrum of the human voice (with reverberation added) when the band is divided. 本発明の実施の形態２に係る残響時間推定装置の構成を示すブロック図。The block diagram which shows the structure of the reverberation time estimation apparatus which concerns on Embodiment 2 of this invention. チャンネル選定部による残響時間推定に用いるチャンネルの選定処理を説明するための模式図。The schematic diagram for demonstrating the selection process of the channel used for the reverberation time estimation by a channel selection part. チャンネル選定部による残響時間推定に用いるチャンネルの選定処理を説明するための模式図。The schematic diagram for demonstrating the selection process of the channel used for the reverberation time estimation by a channel selection part. チャンネル選定部による残響時間推定に用いるチャンネルの選定処理を説明するための模式図。The schematic diagram for demonstrating the selection process of the channel used for the reverberation time estimation by a channel selection part. 実施の形態２に係る残響時間推定装置が実施する残響時間推定と同一の残響時間推定方法の評価実験の結果を示すグラフ。The graph which shows the result of the evaluation experiment of the reverberation time estimation method same as the reverberation time estimation which the reverberation time estimation apparatus which concerns on Embodiment 2 implements. 本発明の実施の形態３に係る残響時間推定装置の構成を示すブロック図。The block diagram which shows the structure of the reverberation time estimation apparatus which concerns on Embodiment 3 of this invention. 実施の形態３に係る残響時間推定装置の動作の流れを示すフローチャート。10 is a flowchart showing a flow of operations of the reverberation time estimation apparatus according to the third embodiment.

Claims

A power envelope generating means for generating a time-series power envelope corresponding to the acoustic signal based on the time-series acoustic signal to which reverberation is added;
A modulation spectrum generating means for generating a modulation spectrum of a frequency sequence based on the power envelope generated by the power envelope generating means;
A reverberation time estimation device comprising: a reverberation time estimation unit that estimates a reverberation time corresponding to a transfer function related to a reverberation characteristic of a system in which the acoustic signal is observed based on the modulation spectrum generated by the modulation spectrum generation unit.

A main modulation frequency specifying means for specifying a main modulation frequency indicating a modulation spectrum larger than a surrounding frequency region in the modulation spectrum of the frequency sequence;
When the reverberation time estimation means applies an inverse transfer function of the transfer function to the modulation spectrum of the frequency sequence, the modulation spectrum at the main modulation frequency after application indicates an original sound with no reverberation added. The reverberation time corresponding to the transfer function is estimated so as to approximately match the modulation spectrum at the main modulation frequency of the modulation spectrum of the frequency sequence corresponding to the original sound signal of the sequence. Reverberation time estimation device.

3. The main modulation frequency specifying unit is configured to obtain an autocorrelation function for the power envelope and to specify an inverse of a time shift amount at which the autocorrelation function exhibits a peak as the main modulation frequency. Reverberation time estimation device.

A low pass filter applied to the power envelope generated by the power envelope generating means;
The reverberation time estimation apparatus according to claim 2 or 3, wherein the main modulation frequency specifying unit is configured to specify the main modulation frequency based on a power envelope output from the low-pass filter.

Band dividing means for dividing the acoustic signal into a plurality of channels;
The reverberation time estimation apparatus according to any one of claims 1 to 4, further comprising: a channel determination unit that determines a channel to be used for reverberation time estimation from each channel divided by the band division unit.

The power envelope generating means is configured to generate a power envelope for each channel band-divided by the band dividing means,
The power envelope generated by the power envelope generating means further comprises a high level part detecting means for detecting a high level part exceeding a predetermined reference value,
6. The reverberation time estimation device according to claim 5, wherein the channel determination unit is configured to determine a channel used for reverberation time estimation based on the high level portion detected by the high level portion detection unit.

The channel determining means determines whether or not a minute peak exists between the two high level parts detected by the high level part detecting means. If there is a minute peak, the channel determining means The reverberation time estimation device according to claim 6, wherein the reverberation time estimation device is configured to be excluded from a channel used for estimation.

The channel determination means determines whether or not a valley exists in the high level part detected by the high level part detection means, and if the valley exists, excludes the channel from the channel used for estimation. The reverberation time estimation apparatus according to claim 6 or 7, wherein the reverberation time estimation apparatus is configured to.

Generating a time-series power envelope corresponding to the acoustic signal based on the time-series acoustic signal to which reverberation is added; and
Generating a modulation spectrum of a frequency sequence based on the generated power envelope;
Reverberation time estimation method comprising: estimating reverberation time corresponding to a transfer function related to reverberation characteristics of a system in which the acoustic signal is observed based on the generated modulation spectrum.