JP6630605B2

JP6630605B2 - Impulse response estimation device and program

Info

Publication number: JP6630605B2
Application number: JP2016057525A
Authority: JP
Inventors: 陽佐々木; 敏行西口; 一穂小野
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2016-03-22
Filing date: 2016-03-22
Publication date: 2020-01-15
Anticipated expiration: 2036-03-22
Also published as: JP2017173456A

Description

本発明は、インパルス応答を推定するインパルス応答推定装置及びプログラムに関する。 The present invention relates to an impulse response estimation device and a program for estimating an impulse response.

番組音声の制作において、別々の時刻・場所で収録した複数音声をミクシングして一つの番組を制作することが行われる。しかし、例えば響きの豊かな場所で収録した音声信号に対して、響きのほとんどない場所で収録した音声信号をミクシングした場合など、聴感上なじまない不自然なコンテンツとなってしまうことがある。こうした問題を回避するため、残響のほとんどない音声信号に対して残響を付加することで、聴感上なじむ自然な番組制作が可能となる。 In the production of program audio, a plurality of audios recorded at different times and places are mixed to produce one program. However, for example, when an audio signal recorded in a place where there is little sound is mixed with an audio signal recorded in a place where the sound is rich, unnatural content that does not fit in the sense of hearing may occur. In order to avoid such a problem, by adding reverberation to an audio signal having almost no reverberation, it is possible to produce a natural program that is compatible with hearing.

また、モノラルや２ｃｈステレオなどの比較的少ないチャンネル数の音声信号を、５．１ｃｈサラウンドや２２．２ｃｈサラウンドのような多チャンネルの音響フォーマットへチャンネル数変換を行う場合、リスニングポイントに対して横や背面に設置されるチャンネルに、壁面の反射などによる間接音を模擬した残響を付加した信号を割り当てることで、空間的に広がり感のあるチャンネル数変換を行うことができる。 Also, when converting the number of audio signals of a relatively small number of channels, such as monaural or 2ch stereo, to a multi-channel sound format such as 5.1ch surround or 22.2ch surround, a horizontal or vertical shift with respect to the listening point is required. By assigning a signal to which a reverberation simulating an indirect sound due to reflection on a wall surface or the like is assigned to a channel installed on the back surface, it is possible to convert the number of channels with a sense of spatial expansion.

音声信号のミクシングやチャンネル数変換における残響付加において、実際にどのような残響を付加するかは、番組制作者の経験と勘に頼るところが大きい。例えば、残響の短い空間で収録した音声信号に対して、極端に長い残響を付加した音声信号をミクシングしたとしても、聴感的に不自然なコンテンツとなってしまう。 In the reverberation addition in the mixing of the audio signal and the conversion of the number of channels, what kind of reverberation is actually added largely depends on the experience and intuition of the program producer. For example, even if an audio signal recorded in a space with a short reverberation is mixed with an audio signal to which an extremely long reverberation is added, the resulting sound will be unnatural.

通常、ある空間（原音場）で収録された音響信号は、リスニングポイントに直接到来する直接音と、壁などに反射してから到来する間接音（残響音）で構成される。この間接音は、直接音に音源とリスニングポイント間のインパルス応答を畳み込んだ信号である。したがって、直接音とインパルス応答が判れば、直接音にインパルス応答を畳み込み演算することにより、残響音を生成することができる。そこで、残響を含んだ音声信号（原音）からインパルス応答を推定し、残響を含まない別の音声信号（直接音）に、このインパルス応答を畳み込むことで、原音場の残響音に近い残響音を付加することができる。この手法によれば、別々の時刻・場所で収録した複数音声をミクシングする場合でも、不自然さのない残響付加を行うことができる。 Normally, an acoustic signal recorded in a certain space (original sound field) is composed of a direct sound arriving directly at a listening point and an indirect sound (reverberant sound) arriving after being reflected on a wall or the like. This indirect sound is a signal obtained by convoluting the impulse response between the sound source and the listening point with the direct sound. Therefore, if the direct sound and the impulse response are known, a reverberation sound can be generated by convoluting the impulse response with the direct sound. Therefore, the impulse response is estimated from the sound signal containing the reverberation (original sound), and this impulse response is convolved with another sound signal (direct sound) containing no reverberation to obtain a reverberation sound close to the reverberation sound of the original sound field. Can be added. According to this method, even when a plurality of sounds recorded at different times and places are mixed, reverberation without unnaturalness can be added.

一般に、ある空間で収録された原音について、それに含まれる間接音と、直接音が共に既知の場合には、適応フィルター法を用いて、その空間の残響成分であるインパルス応答を推定することができる。適応フィルター法では、原音に含まれる間接音信号と、直接音にインパルス応答を畳み込んで生成した間接音信号の二乗誤差の平均値が最小となるよう、サンプルごとにある程度の期間学習させることでインパルス応答を推定するが、２つの信号間で線形な関係が成り立つ場合に限り、有効な推定が可能となることが知られている(例えば、非特許文献１参照)。 In general, when both the indirect sound and the direct sound included in an original sound recorded in a space are known, an impulse response, which is a reverberation component of the space, can be estimated using an adaptive filter method. . In the adaptive filter method, the indirect sound signal included in the original sound and the indirect sound signal generated by convolving the impulse response with the direct sound are trained for a certain period for each sample so that the average value of the square error is minimized. Although the impulse response is estimated, it is known that effective estimation is possible only when a linear relationship holds between two signals (for example, see Non-Patent Document 1).

しかし、通常の収録においては、直接音と間接音は別々に収録されず、それらの混合音のみを収録することが多い。そのような場合に、インパルス応答の推定は困難となる。そこで、既存の残響分離手法を用いて、収録された混合音から直接音と間接音を分離し、その直接音と間接音からインパルス応答を推定する技術が望まれている。残響分離手法として、例えば非特許文献２に記載の技術や、Zynaptiq社のソフトウェア「ＵＮＶＥＩＬ」などが知られている。 However, in normal recording, direct sound and indirect sound are not separately recorded, and only a mixed sound of them is often recorded. In such a case, it is difficult to estimate the impulse response. Therefore, there is a demand for a technique that separates a direct sound and an indirect sound from a recorded mixed sound using an existing reverberation separation technique, and estimates an impulse response from the direct sound and the indirect sound. As the reverberation separation method, for example, a technology described in Non-Patent Document 2 and software “UNVEIL” by Zynaptiq are known.

John Usher, "Acoustic impulse response measurement using speech and music signals", AES 128th convention, 2010John Usher, "Acoustic impulse response measurement using speech and music signals", AES 128th convention, 2010 Ｋ. Kinoshita et al, "Blind Upmix Of Stereo Music Signals Using Multi-Step Linear Prediction Based Reverberation Extraction", ICASSP, pp.49-52, 2010.K. Kinoshita et al, "Blind Upmix Of Stereo Music Signals Using Multi-Step Linear Prediction Based Reverberation Extraction", ICASSP, pp.49-52, 2010.

しかし、残響分離手法が非線形処理または時変処理によって実現されていた場合においては、入力信号である混合音に依存して、出力される直接音及び間接音の関係が刻々と変化してしまうため、推定インパルス応答は一定の値に収束しない。そのため、学習時間を長くすることで不自然なインパルス応答を推定してしまうことがあり、できるだけ短い信号区間においてインパルス応答を推定する必要がある。一方、楽音信号など周波数成分に時間的な偏りのある音響信号を用いた場合は、より多くの周波数成分を含むインパルス応答を推定するために、上記とは逆に学習時間を長く設定する必要があり、相反する条件が要求されるという問題があった。 However, when the reverberation separation method is realized by nonlinear processing or time-varying processing, the relationship between the output direct sound and the indirect sound changes momentarily depending on the mixed sound that is the input signal. , The estimated impulse response does not converge to a constant value. Therefore, an unnatural impulse response may be estimated by lengthening the learning time, and it is necessary to estimate an impulse response in a signal section as short as possible. On the other hand, when an acoustic signal having a temporal bias in frequency components such as a tone signal is used, it is necessary to set a longer learning time in order to estimate an impulse response including more frequency components. There was a problem that conflicting conditions were required.

かかる事情に鑑みてなされた本発明の目的は、残響のある原音場で収録された混合音である音響信号のみから、インパルス応答を精度良く推定することが可能なインパルス応答推定装置及びプログラムを提供することにある。 In view of such circumstances, an object of the present invention is to provide an impulse response estimation device and a program capable of accurately estimating an impulse response from only an acoustic signal that is a mixed sound recorded in an original sound field having reverberation. Is to do.

上記課題を解決するため、本発明に係るインパルス応答推定装置は、音響信号を直接音と間接音とに分離する残響分離部と、前記直接音及び前記間接音を複数のフレームに分割して、分割直接音及び分割間接音を生成する分割部と、前記分割直接音及び分割間接音から、音響信号を収録した原音場のインパルス応答を推定するインパルス応答推定部と、を備えることを特徴とする。 In order to solve the above problems, the impulse response estimation device according to the present invention, a reverberation separation unit that separates an audio signal into a direct sound and an indirect sound, dividing the direct sound and the indirect sound into a plurality of frames, A splitting unit that generates a split direct sound and a split indirect sound; and an impulse response estimating unit that estimates an impulse response of an original sound field containing an audio signal from the split direct sound and the split indirect sound. .

さらに、本発明に係るインパルス応答推定装置において、前記分割直接音及び分割間接音を平均化した平均直接音及び平均間接音を生成するフレーム平均化部を更に備え、前記インパルス応答推定部は、平均直接音及び平均間接音を用いて前記インパルス応答を推定することを特徴とする。 Furthermore, the impulse response estimation device according to the present invention further includes a frame averaging unit that generates an average direct sound and an average indirect sound by averaging the divided direct sound and the divided indirect sound, and the impulse response estimation unit includes an average The impulse response is estimated using a direct sound and an average indirect sound.

さらに、本発明に係るインパルス応答推定装置において、前記分割直接音及び前記分割間接音の周波数特性の差分が最も大きいフレームを選択するフレーム選択部を更に備え、前記インパルス応答推定部は、前記フレーム選択部により選択されたフレームの分割直接音及び分割間接音から、前記インパルス応答を推定することを特徴とする。 Furthermore, the impulse response estimation device according to the present invention further includes a frame selection unit that selects a frame having a largest difference between frequency characteristics of the divided direct sound and the divided indirect sound, wherein the impulse response estimation unit performs the frame selection. The impulse response is estimated from the divided direct sound and the divided indirect sound of the frame selected by the section.

さらに、本発明に係るインパルス応答推定装置において、前記フレーム選択部は、スペクトル歪み、又は全オクターブに対して均等に重み付けしたスペクトル歪みが最も大きいフレームを選択することを特徴とする。 Further, in the impulse response estimation device according to the present invention, the frame selecting unit selects a frame having the largest spectral distortion or spectral distortion weighted equally to all octaves.

また、上記課題を解決するため、本発明に係るプログラムは、コンピュータを、上記インパルス応答推定装置として機能させることを特徴とする。 In order to solve the above problems, a program according to the present invention causes a computer to function as the impulse response estimation device.

本発明によれば、残響のある原音場で収録された音響信号のみから、インパルス応答を精度良く推定することができる。また、非線形処理または時変処理による残響分離手法を用いた場合でも、妥当なインパルス応答を推定することができる。さらに、入力信号が時間的に周波数成分の偏った音響信号の場合でも、短い学習時間でインパルス応答を推定することができる。 According to the present invention, an impulse response can be accurately estimated from only an acoustic signal recorded in an original sound field having reverberation. Further, even when a reverberation separation method based on non-linear processing or time-varying processing is used, an appropriate impulse response can be estimated. Further, even when the input signal is a sound signal having a temporally biased frequency component, the impulse response can be estimated in a short learning time.

本発明の第１の実施形態に係るインパルス応答推定装置の構成例を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration example of an impulse response estimation device according to a first embodiment of the present invention. 本発明の第２の実施形態に係るインパルス応答推定装置の構成例を示すブロック図である。It is a block diagram showing the example of composition of the impulse response estimating device concerning a 2nd embodiment of the present invention.

以下、本発明の実施形態について、図面を参照して詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

（第１の実施形態）
本発明の第１の実施形態に係るインパルス応答推定装置について、以下に説明する。図１に、本発明の第１の実施形態に係るインパルス応答推定装置の構成例を示す。図１に示すインパルス応答推定装置１は、残響分離部１１と、直接音分割部１２と、間接音分割部１３と、直接音フレーム平均化部１４と、間接音フレーム平均化部１５と、インパルス応答推定部１６とを備える。 (1st Embodiment)
An impulse response estimation device according to the first embodiment of the present invention will be described below. FIG. 1 shows a configuration example of an impulse response estimation device according to the first embodiment of the present invention. The impulse response estimating apparatus 1 shown in FIG. 1 includes a reverberation separating unit 11, a direct sound dividing unit 12, an indirect sound dividing unit 13, a direct sound frame averaging unit 14, an indirect sound frame averaging unit 15, an impulse A response estimating unit 16.

インパルス応答推定装置１は、ある空間で収録された音響信号ｘ（ｋ）を入力する。音響信号ｘ（ｋ）は、直接音ｄ（ｋ）と間接音ｒ（ｋ）の混合音であり、式（１）のように表される。本明細書において、「音響信号」とは、音声や楽器などのあらゆる音を含みうるものとする。 The impulse response estimation device 1 inputs an acoustic signal x (k) recorded in a certain space. The acoustic signal x (k) is a mixed sound of the direct sound d (k) and the indirect sound r (k), and is expressed as in Expression (1). In this specification, the “acoustic signal” may include any sound such as a voice or a musical instrument.

残響分離部１１は、残響分離処理を行い、音響信号ｘ（ｋ）を直接音ｄ（ｋ）と間接音ｒ（ｋ）とに分離し、直接音を直接音分割部１２に出力し、間接音を間接音分割部１３に出力する。ここで、残響分離処理として既知の任意のものを用いることができる。 The reverberation separation unit 11 performs a reverberation separation process, separates the acoustic signal x (k) into a direct sound d (k) and an indirect sound r (k), outputs the direct sound to the direct sound division unit 12, and The sound is output to the indirect sound dividing unit 13. Here, any known reverberation processing can be used.

直接音分割部１２は、残響分離部１１から入力された直接音ｄ（ｋ）を所定の長さＴのフレームに分割して複数の分割直接音ｄ_ｎ（ｋ）（ｎ＝１，２，・・・，Ｎ）を生成し、直接音フレーム平均化部１４に出力する。ここで、ｋ＝１，２，・・・，Ｔである。 The direct sound dividing unit 12 divides the direct sound d (k) input from the reverberation separating unit 11 into frames having a predetermined length T, and divides the divided direct sound d _n (k) (n = 1, 2, .., N) are generated and output to the direct sound frame averaging unit 14. Here, k = 1, 2,..., T.

間接音分割部１３は、残響分離部１１から入力された間接音ｒ（ｋ）を所定の長さＴのフレームに分割して複数の分割間接音ｒ_ｎ（ｋ）（ｎ＝１，２，・・・，Ｎ）を生成し、間接音フレーム平均化部１５に出力する。ここで、ｋ＝１，２，・・・，Ｔである。 The indirect sound division unit 13 divides the indirect sound r (k) input from the reverberation separation unit 11 into frames of a predetermined length T, and divides the indirect sound r _n (k) (n = 1, 2, .., N) are generated and output to the indirect sound frame averaging unit 15. Here, k = 1, 2,..., T.

直接音ｄ（ｋ）及び間接音ｒ（ｋ）の分割は、Ｔをフレーム長として、式（２）のように重なり無く分割してもよいし、式（３）のように重なりを持って分割してもよい。 The division of the direct sound d (k) and the indirect sound r (k) may be performed using T as the frame length without division as in Expression (2), or with overlap as in Expression (3). It may be divided.

ここで、フレーム長Ｔの長さはユーザが任意に決定することができるが、抽出するインパルス応答の長さと同じかそれ以上の長さがよい。また、あらかじめ決定しておいたフレーム長Ｔで直接音ｄ（ｋ）及び間接音ｒ（ｋ）をきれいに分割できない場合には、直接音ｄ（ｋ）と間接音ｒ（ｋ）の信号末尾にゼロ詰めを行ってもよい。 Here, the length of the frame length T can be arbitrarily determined by the user, but is preferably equal to or longer than the length of the impulse response to be extracted. If the direct sound d (k) and the indirect sound r (k) cannot be clearly divided by the predetermined frame length T, the signal ends of the direct sound d (k) and the indirect sound r (k) are added. Zero padding may be performed.

直接音フレーム平均化部１４は、直接音分割部１２により生成された分割直接音ｄ_ｎ（ｋ）から、直接音ｄ（ｋ）を平均化した平均直接音／ｄ（ｋ）を生成する。なお、本明細書では、「／」は平均の記号であるバーを意味するものとする。平均直接音／ｄ（ｋ）は直接音のフレーム数（分割数）をＮとして、式（４）で表される。 Direct sound frame averaging unit 14, the division direct sound generated by the direct sound division unit 12 d _{n (k),} to produce an average direct sound / d to direct sound d (k) is averaged (k). In this specification, “/” means a bar which is an average symbol. The average direct sound / d (k) is represented by Expression (4), where N is the number of frames (the number of divisions) of the direct sound.

間接音フレーム平均化部１５は、間接音分割部１３により生成された分割間接音ｒ_ｎ（ｋ）から、間接音ｒ（ｋ）を平均化した平均間接音／ｒ（ｋ）を生成する。平均間接音／ｒ（ｋ）は間接音の分割数（フレーム数）をＮとして、式（５）で表される。 Indirect sound frame averaging unit 15, the division indirect sound generated by the indirect sound division unit 13 r _{n (k),} and generates an indirect sound r a (k) averaged mean indirect sound / r (k). The average indirect sound / r (k) is represented by Expression (5), where N is the number of divided indirect sounds (the number of frames).

また、間接音ｒ（ｋ）はインパルス応答ｈ（ｋ）を用いて式（６）のように表すことができる。ここで、＊は畳み込み演算を表す。 Further, the indirect sound r (k) can be expressed by Expression (6) using the impulse response h (k). Here, * represents a convolution operation.

同様に、分割間接音ｒ_ｎ（ｋ）は、分割されたフレームごとのインパルス応答ｈ_ｎ（ｋ）を用いて式（７）のように表すことができる。 Similarly, division indirect sound r _{n (k)} can be expressed as in Equation (7) using the impulse response h _n of each divided frame _(k).

ここで、e_ｎ（ｋ）は前フレームの残響の影響である。なお、フレームごとのインパルス応答ｈ_ｎ（ｋ）は不変であると仮定すると、平均直接音／ｄ（ｋ）及び平均間接音／ｒ（ｋ）の関係は式（８）のように表すことができる。また、／e（ｋ）は平均化されたそれぞれのフレームにおける前フレームの残響の影響である。 Here, e _{n (k)} is the influence of the reverberation of the previous frame. Incidentally, when the impulse response h _n of each frame _(k) is assumed to be unchanged, the relationship between the average direct sound / d (k) and the average indirect sound / r (k) is be expressed by Equation (8) it can. Also, / e (k) is the effect of the reverberation of the previous frame in each averaged frame.

ここで、／ｈ（ｋ）は平均化されたインパルス応答を示し、式（９）で表すことができる。 Here, / h (k) indicates an averaged impulse response, and can be expressed by equation (9).

インパルス応答推定部１６は、平均化直接音／ｄ（ｋ）及び平均間接音／ｒ（ｋ）に対し、適応フィルター法や最少二乗法などの既知の任意の手法を用いて、平均化されたインパルス応答／ｈ（ｋ）を求め、これを音響信号を収録した原音場のインパルス応答として外部に出力する。ここで、／e（ｋ）が平均直接音／ｄ（ｋ）と無相関であると仮定すると、適応フィルター法や最少二乗法を用いることにより、／e（ｋ）の影響を除去することができる。なお、適応フィルター法における適応アルゴリズムは、ＬＭＳアルゴリズムやＮ−ＬＭＳアルゴリズム・射影アルゴリズムなど多くの手法が提案されているので、音質を保って分離できるアルゴリズムを用いればよい。 The impulse response estimation unit 16 averages the averaged direct sound / d (k) and the average indirect sound / r (k) using any known method such as an adaptive filter method or a least square method. An impulse response / h (k) is obtained, and this is output to the outside as an impulse response of an original sound field in which an acoustic signal is recorded. Here, assuming that / e (k) is uncorrelated with the average direct sound / d (k), it is possible to remove the effect of / e (k) by using the adaptive filter method or the least squares method. it can. As an adaptive algorithm in the adaptive filter method, since many methods such as an LMS algorithm, an N-LMS algorithm, and a projection algorithm have been proposed, an algorithm that can be separated while maintaining sound quality may be used.

インパルス応答推定部１６において、平均化直接音／ｄ（ｋ）及び平均間接音／ｒ（ｋ）に対してインパルス応答を推定することにより、残響分離部１１において非線形処理または時変処理を行って入出力間関係の変化する直接音及び間接音に分離した場合でも、全信号区間において平均化されて周波数成分を多く含むインパルス応答／ｈ（ｋ）を得ることができる。 The impulse response estimating unit 16 estimates the impulse response for the averaged direct sound / d (k) and the average indirect sound / r (k), so that the reverberation separating unit 11 performs nonlinear processing or time-varying processing. Even when the sound is separated into a direct sound and an indirect sound in which the relationship between input and output changes, an impulse response / h (k) containing many frequency components can be obtained by averaging in all signal sections.

（第２の実施形態）
つぎに、本発明の第２の実施形態に係るインパルス応答推定装置について、以下に説明する。図２に、本発明の第２の実施形態に係るインパルス応答推定装置の構成例を示す。図２に示すインパルス応答推定装置２は、残響分離部１１と、直接音分割部１２と、間接音分割部１３と、フレーム選択部１７と、インパルス応答推定部１６とを備える。残響分離部１１、直接音分割部１２、及び間接音分割部１３は第１の実施形態と同じなので、説明を省略する。 (Second embodiment)
Next, an impulse response estimation device according to a second embodiment of the present invention will be described below. FIG. 2 shows a configuration example of an impulse response estimation device according to the second embodiment of the present invention. The impulse response estimation device 2 shown in FIG. 2 includes a reverberation separation unit 11, a direct sound division unit 12, an indirect sound division unit 13, a frame selection unit 17, and an impulse response estimation unit 16. The reverberation separation unit 11, the direct sound division unit 12, and the indirect sound division unit 13 are the same as those in the first embodiment, and thus the description is omitted.

フレーム選択部１７は、直接音分割部１２から分割直接音ｄ_ｎ（ｋ）を入力し、間接音分割部１３から分割間接音ｒ_ｎ（ｋ）を入力する。直接音及び間接音の周波数領域が広いほど、インパルス応答の推定精度が高くなる。そこで、フレーム選択部１７は、分割直接音ｄ_ｎ（ｋ）及び分割間接音ｒ_ｎ（ｋ）の周波数特性の差分の最も大きなフレームを選択する。すなわち、フレームごとに周波数特性の差分を求め、例えば、ｎ＝３のフレームについて周波数特性の差分が最大の場合には、分割直接音ｄ_３（ｋ）及び分割間接音ｒ_３（ｋ）をインパルス応答推定部１６に出力する。 The frame selecting unit 17 receives the divided direct sound d _n (k) from the direct sound dividing unit 12 and the divided indirect sound r _n (k) from the indirect sound dividing unit 13. The wider the frequency domain of the direct sound and the indirect sound, the higher the estimation accuracy of the impulse response. Therefore, the frame selection unit 17 selects the largest frame difference of the frequency characteristics of the division direct sound d _{n (k)} and divide the indirect sound r _{n (k).} That is, the difference of the frequency characteristics is obtained for each frame. For example, when the difference of the frequency characteristics is the largest for n = 3 frames, the divided direct sound d ₃ (k) and the divided indirect sound r ₃ (k) are impulse. Output to the response estimating unit 16.

以下に、フレーム選択部１７におけるフレーム選択処理の具体例を示す。分割直接音ｄ_ｎ（ｋ）及び分割間接音ｒ_ｎ（ｋ）の関係は、上述したように分割されたフレームごとのインパルス応答ｈ_ｎ（ｋ）を用いて式（７）で表される。 Hereinafter, a specific example of the frame selection process in the frame selection unit 17 will be described. Relationship split direct sound d _{n (k)} and divide the indirect sound r _{n (k)} is expressed by Equation (7) using the impulse response h _n of each frame that is divided as described above _(k).

さらに、分割間接音ｒ_ｎ（ｋ）を離散フーリエ変換（ＤＦＴ；discrete Fourier transform）することによって、式（１０）の関係式を導くことができる。 Further, division indirect sound _r n (k) of the discrete Fourier transform; by (DFT discrete` Fourier transform), it can be derived relation of formula (10).

ここで、Ｒ_ｎ（ω_ｍ），Ｄ_ｎ（ω_ｍ）はそれぞれ分割間接音ｒ_ｎ（ｋ），分割直接音ｄ_ｎ（ｋ）の周波数領域表現を表し、Ｈ_ｎ（ω_ｍ）は伝達関数を表す。また、ω_ｍは角周波数であり、ｍはＤＦＴ処理によって生じる周波数離散化におけるインデックスである。式（１０）より、伝達関数は直接音と間接音の相対関係を表している。 _{_{Here, R n (ω m),}} D n (ω m) each divided indirect sound _r n (k), represents the frequency domain representation of the division direct sound _{_{d n (k), H n}} (ω m) is transmitted Represents a function. Further, ω _m is an angular frequency, and m is an index in frequency discretization generated by the DFT processing. From equation (10), the transfer function represents the relative relationship between the direct sound and the indirect sound.

直接音と間接音の周波数特性の差分を表す指標として、スペクトル歪みＳＤ（Spectral Distortion）を式（１１）に示す。このスペクトル歪みＳＤが大きいフレームほど、直接音と間接音の周波数軸上での振幅特性の差の積分値が大きく、残響成分を多く含んでいると考えられる。そこで、フレーム選択部１７は、あらかじめ各フレームにおいてスペクトル歪みＳＤを計算し、最もスペクトル歪みＳＤの大きなフレームの分割直接音ｄ_ｎ（ｋ）及び分割間接音ｒ_ｎ（ｋ）をインパルス応答推定部１６に出力するようにしてもよい。ここで、ＭはＤＦＴ点数である。 As an index indicating the difference between the frequency characteristics of the direct sound and the indirect sound, spectral distortion SD (Spectral Distortion) is shown in Expression (11). It is considered that the larger the spectral distortion SD is, the larger the integrated value of the difference between the amplitude characteristics of the direct sound and the indirect sound on the frequency axis is, and the more the reverberation component is included. Therefore, the frame selection unit 17, a spectral distortion SD calculated in advance each frame, most divided direct sound d _{n (k)} and divided indirect sound of a large frame of spectral distortion SD r _{n (k)} an impulse response estimator 16 May be output. Here, M is the DFT score.

また、直接音と間接音の周波数特性の差分を表す別の指標として、全オクターブに対して均等に重み付けしたスペクトル歪みβを式（１２）に示す。この評価関数βが大きいフレームほど、人の聴覚特性に合わせ、全オクターブ対して均等に重み付けした平均化誤差で評価した際に、直接音と間接音の周波数軸上での振幅特性の差の積分値が大きく、残響成分を多く含んでいると考えられる。そこで、あらかじめ各フレームにおいて評価関数βを計算し、最も評価関数βの大きなフレームの分割直接音ｄ_ｎ（ｋ）及び分割間接音ｒ_ｎ（ｋ）をインパルス応答推定部１６に出力するようにしてもよい。 Further, as another index indicating the difference between the frequency characteristics of the direct sound and the indirect sound, a spectrum distortion β equally weighted for all octaves is shown in Expression (12). The larger the evaluation function β, the greater the difference between the amplitude characteristics of the direct sound and the indirect sound on the frequency axis when the frame is evaluated with an averaging error equally weighted over all octaves according to the human auditory characteristics. It is considered that the value is large and contains many reverberation components. Therefore, the evaluation function β is calculated in advance for each frame, and the divided direct sound d _n (k) and the divided indirect sound r _n (k) of the frame having the largest evaluation function β are output to the impulse response estimation unit 16. Is also good.

なお、上述したインパルス応答推定装置１，２として機能させるためにコンピュータを好適に用いることができ、そのようなコンピュータは、インパルス応答推定装置１，２の各機能を実現する処理内容を記述したプログラムを該コンピュータの記憶部に格納しておき、該コンピュータのＣＰＵによってこのプログラムを読み出して実行させることで実現することができる。なお、このプログラムは、コンピュータ読取り可能な記録媒体に記録可能である。 Note that a computer can be suitably used to function as the impulse response estimation devices 1 and 2 described above. Such a computer is a program that describes processing contents for realizing the functions of the impulse response estimation devices 1 and 2. Is stored in the storage unit of the computer, and the program is read out and executed by the CPU of the computer. This program can be recorded on a computer-readable recording medium.

上述したように、本発明に係るインパルス応答推定装置１，２、又はそのプログラムは、複数のフレームに分割された直接音及び間接音を用いてインパルス応答を推定する。そのため、残響分離部１１において非線形処理または時変処理により入出力間関係の変化する直接音及び間接音に分離した場合でも、妥当なインパルス応答を得ることができる。また、楽音信号などの周波数成分に時間的な偏りのある音響信号を用いた場合でも、短い学習時間で音響信号に含まれる全ての周波数を考慮したインパルス応答の推定が可能となる。 As described above, the impulse response estimation devices 1 and 2 according to the present invention, or the program thereof, estimates the impulse response using the direct sound and the indirect sound divided into a plurality of frames. Therefore, even when the reverberation separation unit 11 separates the direct sound and the indirect sound in which the relationship between the input and the output changes by the non-linear processing or the time-varying processing, an appropriate impulse response can be obtained. Further, even when an acoustic signal having a temporal bias in frequency components such as a tone signal is used, it is possible to estimate an impulse response in consideration of all frequencies included in the acoustic signal in a short learning time.

上述の実施形態は代表的な例として説明したが、本発明の趣旨及び範囲内で、多くの変更及び置換ができることは当業者に明らかである。したがって、本発明は、上述の実施形態によって制限するものと解するべきではなく、特許請求の範囲から逸脱することなく、種々の変形や変更が可能である。例えば、実施形態の構成図に記載の複数の構成ブロックを１つに組み合わせたり、あるいは１つの構成ブロックを分割したりすることが可能である。 Although the above embodiments have been described as representative examples, it will be apparent to those skilled in the art that many changes and substitutions can be made within the spirit and scope of the present invention. Therefore, the present invention should not be construed as being limited by the above-described embodiments, and various modifications and changes can be made without departing from the scope of the claims. For example, a plurality of configuration blocks described in the configuration diagram of the embodiment can be combined into one, or one configuration block can be divided.

本発明は、例えば番組制作における音声編集作業などに利用することができる。 INDUSTRIAL APPLICABILITY The present invention can be used, for example, for audio editing work in program production.

１，２インパルス応答推定装置
１１残響分離部
１２直接音分割部
１３間接音分割部
１４直接音フレーム平均化部
１５間接音フレーム平均化部
１６インパルス応答推定部
１７フレーム選択部 1, 2 impulse response estimation device 11 reverberation separation unit 12 direct sound division unit 13 indirect sound division unit 14 direct sound frame averaging unit 15 indirect sound frame averaging unit 16 impulse response estimation unit 17 frame selection unit

Claims

A reverberation separation unit that separates an acoustic signal into a direct sound and an indirect sound,
A dividing unit that divides the direct sound and the indirect sound into a plurality of frames to generate a divided direct sound and a divided indirect sound;
From the divided direct sound and the divided indirect sound, an impulse response estimation unit that estimates an impulse response of an original sound field that has recorded an acoustic signal,
An impulse response estimation device comprising:

Further comprising a frame averaging unit that generates an average direct sound and an average indirect sound by averaging the divided direct sound and the divided indirect sound,
The impulse response estimation device according to claim 1, wherein the impulse response estimation unit estimates the impulse response using an average direct sound and an average indirect sound.

The divided direct sound further includes a frame selection unit that selects a frame having a largest difference in frequency characteristics between the divided indirect sound,
The impulse response estimation device according to claim 1, wherein the impulse response estimation unit estimates the impulse response from a divided direct sound and a divided indirect sound of the frame selected by the frame selection unit.

The impulse response estimation device according to claim 3, wherein the frame selection unit selects a frame having the largest spectral distortion or the spectral distortion weighted equally to all octaves.

A program for causing a computer to function as the impulse response estimation device according to any one of claims 1 to 4.