JP4886616B2

JP4886616B2 - Sound collection device, sound collection method, sound collection program using the method, and recording medium

Info

Publication number: JP4886616B2
Application number: JP2007166491A
Authority: JP
Inventors: 裕輔日岡; 和則小林; 賢一古家; 章俊片岡
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2007-06-25
Filing date: 2007-06-25
Publication date: 2012-02-29
Anticipated expiration: 2027-06-25
Also published as: JP2009005261A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a sound pickup apparatus capable of improving a noise suppression performance even in an environment where reverberation presents, and of improving a sound quality of a required signal to be picked up. <P>SOLUTION: A sound pickup apparatus has six sound pickup parts, a processed target signal generator, a power spectrum estimator, a reverberation spectrum estimator, a gain factor calculator, and a multiplier. The processed target signal generator generates a processed target signal from a signal outputted from one or more preliminarily-decided microphones or sound pickup parts. The power spectrum estimator estimates a signal quantity for a required sound source from which a reverberation signal is removed and a signal quantity for the other sound source from a signal quantity of each sound pickup signal and reverberation sound obtained at each sound pickup part for each frequency. The reverberation spectrum estimator calculates a signal quantity of the reverberation sound from the signal quantity for the required sound source and the signal quantity for the other sound source which are estimated by the power spectrum estimator for each frequency. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は音声通話や機器の操作などハンズフリー方式で音声を収音する収音装置、収音方法、その方法を用いた収音プログラム、および記録媒体に関し、特にとらえたい音声を発する所望音源以外の雑音源が多数存在する場合に大きく関係する。 The present invention relates to a sound collection device, a sound collection method, a sound collection program using the method, and a recording medium that collects sound in a hands-free manner such as voice calls and device operations. This is greatly related to the case where there are many noise sources.

多数の背景雑音が存在する環境でのハンズフリーマイクを想定し、特定位置にある所望音源を強調する手法として、複数のビームフォーマー出力から所望音パワーを推定し、強調する方法が提案されている（非特許文献１）。
日岡裕輔、小林和則、古家賢一、片岡章俊、“小型マイクロホンアレー対を用いた特定位置にある音源の強調”、日本音響学会2006年春季研究発表会講演論文集、pp.621-622、2006． Assuming a hands-free microphone in an environment with a large number of background noises, as a method to emphasize a desired sound source at a specific position, a method for estimating and enhancing desired sound power from multiple beamformer outputs has been proposed. (Non-Patent Document 1).
Yusuke Hioka, Kazunori Kobayashi, Kenichi Furuya, Akitoshi Kataoka, “Emphasis of a sound source at a specific position using a small microphone array pair”, Proc. Of the Spring Meeting of the Acoustical Society of Japan 2006, pp.621-622, 2006 .

非特許文献１の技術では残響の影響を考慮していないため、一般的な室内のように残響が存在する環境では、雑音抑圧性能が理論上の性能よりも低下してしまう。特に、残響が多い環境では、さらに雑音抑圧性能の低下が著しい。したがって、収音された所望の音の品質が劣化することになる。
本発明の収音装置は、この課題を解決するためになされたもので、残響がある環境でも雑音抑圧性能を向上させ、収音される所望信号の音質を向上させることを目的とする。 Since the technology of Non-Patent Document 1 does not consider the effect of reverberation, the noise suppression performance is lower than the theoretical performance in an environment where reverberation exists like a general room. In particular, in an environment with a lot of reverberation, the noise suppression performance is further deteriorated. Therefore, the quality of the desired sound collected will deteriorate.
The sound collection device of the present invention was made to solve this problem, and aims to improve noise suppression performance even in an environment with reverberation and to improve the sound quality of a desired signal to be collected.

本発明の収音装置は、６つ以上の収音部と、処理対象信号生成部と、パワースペクトル推定部と、残響スペクトル推定部と、利得係数算出部と、乗算部とを備える。６つ以上の収音部は、複数のマイクロホンを搭載して構成されるマイクロホンアレーの出力信号を利用して、それぞれ異なる領域の音を収音する。ここで、「それぞれ異なる」とは、一致しないことを言い、重複する部分があってもよい。パワースペクトル推定部は、各収音部で得られた各収音信号と残響音の信号量から、残響信号を除去した所望音源の信号量と、その他の音源の信号量とを周波数ごとに推定する。残響スペクトル推定部は、パワースペクトル推定部が推定した所望音源の信号量とその他の音源の信号量から、残響音の信号量を周波数ごとに求める。利得係数算出部は、所望音源の信号量と、所望音源の信号量を含む全ての音源の信号量との比から周波数ごとに利得係数を求める。乗算部は、利得係数算出部で算出した利得係数を処理対象信号に乗算する。 The sound collection device of the present invention includes six or more sound collection units, a processing target signal generation unit, a power spectrum estimation unit, a reverberation spectrum estimation unit, a gain coefficient calculation unit, and a multiplication unit. The six or more sound collection units collect sounds in different regions by using output signals of a microphone array configured by mounting a plurality of microphones. Here, “different” means that they do not match, and there may be overlapping portions. The power spectrum estimation unit estimates the signal amount of the desired sound source from which the reverberation signal has been removed and the signal amount of other sound sources for each frequency from the collected sound signal and the reverberant signal amount obtained by each sound collecting unit. To do. The reverberation spectrum estimation unit obtains the signal amount of the reverberation sound for each frequency from the signal amount of the desired sound source estimated by the power spectrum estimation unit and the signal amount of other sound sources. The gain coefficient calculation unit obtains a gain coefficient for each frequency from the ratio of the signal amount of the desired sound source and the signal amounts of all sound sources including the signal amount of the desired sound source. The multiplication unit multiplies the processing target signal by the gain coefficient calculated by the gain coefficient calculation unit.

例えば、６つの収音部（第１収音部〜第６収音部）を備える場合には、第１及び第２収音部は、複数のマイクロホンを搭載して構成されるマイクロホンアレーの出力信号を利用して互いに異なる位置から所望音源位置を含む角度領域の音を収音する。第３及び第４収音部は、マイクロホンアレーの出力信号を利用して互いに異なる位置から前記所望音源位置を含まない角度領域の音を収音する。第５収音部は、互いに異なる位置の中間点から所望音源位置を含む角度領域の音を収音する。第６収音部は、中間点から所望音源位置を含まない角度領域の音を収音する。処理対象信号生成部は、あらかじめ定めた１つ以上のマイクロホンまたは収音部からの信号から、処理対象信号を生成する。 For example, when six sound collecting units (first sound collecting unit to sixth sound collecting unit) are provided, the first and second sound collecting units are outputs of a microphone array configured by mounting a plurality of microphones. Using the signal, sound in an angular region including a desired sound source position is collected from different positions. The third and fourth sound collection units collect sound in an angular region that does not include the desired sound source position from different positions using the output signals of the microphone array. The fifth sound collection unit collects sound in an angle region including a desired sound source position from an intermediate point between different positions. The sixth sound collecting unit picks up sound in an angle region that does not include the desired sound source position from the intermediate point. The processing target signal generation unit generates a processing target signal from one or more predetermined microphones or signals from the sound collection unit.

または、例えば、第１及び第２収音部は、複数のマイクロホンを搭載して構成されるマイクロホンアレーの出力信号を利用して互いに異なる位置から、所望音源位置を含まない角度領域の一部の音を抑圧して収音する。第３及び第４収音部は、マイクロホンアレーの出力信号を利用して互いに異なる位置から、所望音源位置を含む角度領域の音を抑圧して収音する。第５及び第６収音部は、マイクロホンアレーの出力信号を利用して互いに異なる位置から、所望音源位置を含まない角度領域であって、第１及び第２収音部とは異なる一部の音を抑圧して収音する。 Alternatively, for example, the first and second sound collection units may use a part of an angular region that does not include a desired sound source position from different positions using an output signal of a microphone array configured by mounting a plurality of microphones. The sound is suppressed and collected. The third and fourth sound collection units collect sound by suppressing the sound in the angle region including the desired sound source position from different positions using the output signals of the microphone array. The fifth and sixth sound collection units are angular regions that do not include the desired sound source position from different positions using the output signals of the microphone array, and are different from the first and second sound collection units. The sound is suppressed and collected.

なお、残響スペクトル推定部は、所望音源の信号量とその他の音源の信号量を、収音部ごとの信号量に変換するゲイン行列乗算部と、収音部ごとの信号量を記録し、複数の過去の収音部ごとの信号量を重み付き加算する重み付き加算部とを備えればよい。 The reverberation spectrum estimation unit records a signal amount for each sound collection unit, a gain matrix multiplication unit that converts the signal amount of the desired sound source and the signal amount of other sound sources into a signal amount for each sound collection unit, and And a weighted adder that weights and adds the signal amount for each past sound pickup unit.

本発明の収音装置によれば、残響音の信号量を求め、残響音の信号量を引いた収音信号から所望音源の信号量を求めるので、残響が存在する環境でも雑音抑圧性能を向上することができる。また、高品質な収音が可能となる。 According to the sound collection device of the present invention, the signal amount of the reverberation sound is obtained, and the signal amount of the desired sound source is obtained from the sound collection signal obtained by subtracting the signal amount of the reverberation sound, so that noise suppression performance is improved even in an environment where reverberation exists. can do. In addition, high-quality sound collection is possible.

図１に本発明の利用状況の一例を示す。２つの小規模マイクロホンアレー３Ｌ、３Ｒをある程度（例えばマイクロホンアレー３Ｌ、３Ｒと所望音源１までの距離と同程度の距離）離れた異なる位置に配置し、それぞれマイクロホンで受音された信号に対して以下で説明する処理を行なう。以下に説明する処理を行なうことにより所望音源１の音が強調されて収音され、背景雑音源２の音は抑圧される。
本発明について説明する前に、まず、未公開の特許出願（特願２００６−５２５０２）で示された技術を説明する。図２に特願２００６−５２５０２の収音装置の全体の構成を示す。この図２を用いて収音装置の概要を説明する。マイクロホンアレー３Ｌの各マイクロホンで生成された各受音信号は、この例では第１収音部４−１と第３収音部４−３に入力される。更に、マイクロホンアレー３Ｒの各マイクロホンで生成された各受音信号はこの例では第２収音部４−２と第４収音部４−４に入力される。マイクロホンアレー３Ｌと３Ｒの中央に位置するマイクロホンの信号が第５収音部４−５と第６収音部４−６に入力される。なお、両マイクロホンアレー３Ｌと３Ｒに搭載されるマイクロホンの数は必ずしも同数である必要はない。 FIG. 1 shows an example of the usage situation of the present invention. Two small microphone arrays 3L and 3R are arranged at different positions separated by a certain amount (for example, the same distance as the distance between the microphone arrays 3L and 3R and the desired sound source 1), and the signals received by the microphones are respectively received. Processing described below is performed. By performing the processing described below, the sound of the desired sound source 1 is emphasized and collected, and the sound of the background noise source 2 is suppressed.
Before describing the present invention, first, a technique disclosed in an unpublished patent application (Japanese Patent Application No. 2006-5502) will be described. FIG. 2 shows the overall configuration of the sound collection device of Japanese Patent Application No. 2006-55022. The outline of the sound collecting device will be described with reference to FIG. In this example, each sound reception signal generated by each microphone of the microphone array 3L is input to the first sound collection unit 4-1 and the third sound collection unit 4-3. Further, in this example, each received sound signal generated by each microphone of the microphone array 3R is input to the second sound collection unit 4-2 and the fourth sound collection unit 4-4. Microphone signals located at the center of the microphone arrays 3L and 3R are input to the fifth sound collection unit 4-5 and the sixth sound collection unit 4-6. The number of microphones mounted on both microphone arrays 3L and 3R is not necessarily the same.

第１収音部４−１〜第４収音部４−４は図４に示すように各マイクロホンの受音信号ｘ_１〜ｘ_ｍが入力されるＭ個のフィルタ処理部４１と、これらＭ個のフィルタ処理部４１の各出力信号を加算する加算部４２とによって構成される。各フィルタ処理部４１は例えばＦＩＲフィルタ等で構成され、デジタル処理により収音信号に含まれる周波数成分毎に分析処理を行いマイクロホンアレー３Ｌと３Ｒの指向特性を設定する。このような技術は例えば大賀寿郎、山崎芳男、金田豊共著「音響システムとデジタル処理」平成７年３月２５日社団法人電子情報通信学会発行に記載されており、周知の技術により実現することができる。 As shown in FIG. 4, the first sound collecting unit 4-1 to the fourth sound collecting unit 4-4 include M filter processing units 41 to which the received sound signals x _{1 to} x _{m of the} microphones are input, and these M And an adder 42 that adds the output signals of the filter processing units 41. Each filter processing unit 41 is composed of, for example, an FIR filter and the like, and performs analysis processing for each frequency component included in the collected sound signal by digital processing to set the directivity characteristics of the microphone arrays 3L and 3R. Such technology is described in, for example, Toshiro Oga, Yoshio Yamazaki, Yutaka Kaneda, “Sound System and Digital Processing”, published by the Institute of Electronics, Information and Communication Engineers on March 25, 1995, and can be realized by well-known technology. it can.

ここでは第１収音部４−１の指向特性及び第２収音部４−２の指向特性はマイクロホンアレー３Ｌ及び３Ｒのほぼ中央位置から図３に示す所望音源１の位置を含む角度領域Θ_LとΘ_Rを収音範囲とする特性に設定する。第３収音部４−３と第４収音部４−４の指向特性はマイクロホンアレー３Ｌと３Ｒのほぼ中央位置から図３に示す所望音源１の位置を含まない角度領域Θ_L￣とΘ_R￣とを収音範囲とする特性に設定する。さらに、第５収音部４−５の指向性はマイクロホンアレー３Ｌと３Ｒのほぼ中間位置から所望音源１の位置を含む角度領域Θ_Cを収音範囲とする特性に設定する。第６収音部４−６の指向性はマイクロホンアレー３Ｌと３Ｒのほぼ中間位置から所望音源１の位置を含まない角度領域Θ￣_Cの角度範囲を収音範囲とする特性に設定する。 Here, the directivity characteristic of the first sound collection unit 4-1 and the directivity characteristic of the second sound collection unit 4-2 are angular regions Θ including the position of the desired sound source 1 shown in FIG. 3 from the approximate center position of the microphone arrays 3L and 3R. _L and Θ _R are set to the characteristics that make the sound collection range. The directivity characteristics of the third sound collection unit 4-3 and the fourth sound collection unit 4-4 are angular regions Θ _L ￣ and Θ that do not include the position of the desired sound source 1 shown in FIG. 3 from the approximate center position of the microphone arrays 3L and 3R. _R Set to a characteristic with ￣ as the sound collection range. Further, the directivity of the fifth sound collection unit 4-5 is set to a characteristic in which an angle region Θ _C including the position of the desired sound source 1 is set from a substantially middle position between the microphone arrays 3L and 3R to a sound collection range. Sixth directivity collecting sections 4-6 sets the characteristic of the angular range of the desired position of the sound source 1 does not include the angle region Shita _C from a substantially intermediate position of the microphone array 3L and 3R and sound pickup range.

第１乃至第６収音部４−１〜４−６の指向特性で収音された収音信号は周波数領域変換部５で周波数領域の信号に変換される。周波数領域への変換は入力された信号を短い時間長（例えばサンプリング周波数１６０００Hzの場合は２５６サンプル程度）のフレームに分解し、それぞれのフレームにおいて離散フーリエ変換を行なう。離散フーリエ変換は例えばFFT等と呼ばれている高速フーリエ変換等を用いることができる。周波数領域に変換された信号は複数の周波数領域成分に分割される。
周波数領域の信号に変換された収音信号は加算部６と音源信号成分推定部７とに入力される。加算部６へは第１収音部４−１と第２収音部４−２の出力信号を入力する。加算部６では周波数領域へ変換された各周波数領域の信号を同一周波数領域成分ごとに加算する。 The collected sound signals collected with the directivity characteristics of the first to sixth sound collecting units 4-1 to 4-6 are converted into frequency domain signals by the frequency domain converting unit 5. In the conversion to the frequency domain, the input signal is decomposed into frames having a short time length (for example, about 256 samples when the sampling frequency is 16000 Hz), and discrete Fourier transform is performed in each frame. For the discrete Fourier transform, for example, a fast Fourier transform called FFT or the like can be used. The signal converted to the frequency domain is divided into a plurality of frequency domain components.
The collected sound signal converted into the frequency domain signal is input to the adding unit 6 and the sound source signal component estimating unit 7. Output signals from the first sound collection unit 4-1 and the second sound collection unit 4-2 are input to the addition unit 6. The adder 6 adds the signals of each frequency domain converted to the frequency domain for each identical frequency domain component.

音源信号成分推定部７へは第１収音部４−１から第６収音部４−６の全ての出力信号を入力し、周波数領域ごとに各音源の信号量を推定する。各音源の信号量が推定できると、所望音源１の信号量対その他の音源の信号量との比つまりSN比を求めることができる。このＳＮ比を周波数領域ごとに求め、このＳＮ比を利得係数として乗算部９で加算部６から与えられる所望音源１の信号を主成分とする信号に各周波数領域毎に乗算することにより、所望音源１の信号を主成分とする信号に含まれる背景雑音成分を抑制することができる。乗算部９の乗算結果は逆周波数領域変換部１０で時間領域信号に変換され、雑音除去後の信号として出力される。以上は特願２００６−５２５０２の発明の概要である。 All output signals from the first sound collection unit 4-1 to the sixth sound collection unit 4-6 are input to the sound source signal component estimation unit 7, and the signal amount of each sound source is estimated for each frequency domain. If the signal amount of each sound source can be estimated, the ratio of the signal amount of the desired sound source 1 to the signal amount of other sound sources, that is, the SN ratio can be obtained. The signal-to-noise ratio is obtained for each frequency domain, and the signal having the signal component of the desired sound source 1 given from the adder 6 by the multiplier 9 is multiplied for each frequency domain by using the signal-to-noise ratio as a gain coefficient. The background noise component contained in the signal whose main component is the signal of the sound source 1 can be suppressed. The multiplication result of the multiplication unit 9 is converted into a time domain signal by the inverse frequency domain conversion unit 10 and output as a signal after noise removal. The above is the outline of the invention of Japanese Patent Application No. 2006-55022.

以下では各部の構成及び動作を詳細に説明する。図４は第１収音部乃至第４収音部４−１〜４−４の構成を示している。ここでは第１収音部４−１を例示して説明するが、同様の処理が第２収音部４−２、第３収音部４−３、第４収音部４−４でも行われる。これら第１収音部４−１〜４−４は所望音源１の位置を挟んでその両側の方向から所望音源位置を含む角度領域を収音範囲とする収音特性及び所望音源位置を含まない角度領域を収音範囲とする収音特性に設定されることからサイドビームフォーマーとして機能する。第１収音部４−１に入力された信号ｘ_ＬｍＬ（ｎ）（ｍ_Ｌ＝１，２，…，Ｍ_Ｌ）はフィルタ処理部４１に入力される。フィルタ処理部４１ではあらかじめ与えられた（決定方法は後述する）フィルタ係数ｗ_ＬｍＬ（ｎ）と入力信号ｘ_ＬｍＬ（ｎ）を、式（１）に示す畳み込み演算に代入して得られる信号ｘ'_ＬｍＬ（ｎ）を出力する。 Hereinafter, the configuration and operation of each unit will be described in detail. FIG. 4 shows the configuration of the first to fourth sound collection units 4-1 to 4-4. Here, the first sound collection unit 4-1 is described as an example, but the same processing is performed in the second sound collection unit 4-2, the third sound collection unit 4-3, and the fourth sound collection unit 4-4. Is called. These first sound collection units 4-1 to 4-4 do not include a sound collection characteristic and a desired sound source position in which an angle region including the desired sound source position from both directions across the position of the desired sound source 1 is a sound collection range. It functions as a side beam former because it is set to the sound collection characteristic with the angle region as the sound collection range. The signal x _LmL (n) (m _L = 1, 2,..., M _L ) input to the first sound collection unit 4-1 is input to the filter processing unit 41. In the filter processing unit 41, a signal x ′ obtained by substituting a filter coefficient w _LmL (n) and an input signal x _LmL (n) given in advance (determination method will be described later) into the convolution operation shown in Expression (1). _{Output LmL} (n).

各フィルタ処理部４１の出力信号は加算部４２に入力される。加算部４２では入力信号を式（２）のように加算し、第１収音部４−１の出力信号y_ＳＬ（ｎ）を得る。

The output signal of each filter processing unit 41 is input to the adding unit 42. The adder 42 adds the input signals as shown in Expression (2) to obtain the output signal y _SL (n) of the first sound collecting unit 4-1.

ここでフィルタ係数ｗ_ＬｍＬ（ｎ）は、第１収音部の指向特性Ｄ_ＬＳＰＢ（ω，θ）が式（３）に示す特性を持つように、例えば最小二乗法などを利用して設計される。第２収音部、第３収音部、第４収音部についても同様に、式（４）から式（６）のそれぞれの条件を満たすように設計される。Θ、Θ￣はそれぞれ、所望信号の周辺方向（例えば所望信号方向から±１０°程度の範囲内の方向）、それ以外の方向、を示すものとする。また、式（３）〜（６）に示すＤ_・・・・（ω，θ）は各収音部の指向特性を表わしている。

Here, the filter coefficient w _LmL (n) is designed using, for example, the least square method so that the directivity characteristic D _LSPB (ω, θ) of the first sound collection unit has the characteristic shown in the expression (3). The Similarly, the second sound collection unit, the third sound collection unit, and the fourth sound collection unit are designed so as to satisfy the respective conditions of Expressions (4) to (6). Θ and Θ￣ indicate the peripheral direction of the desired signal (for example, the direction within a range of about ± 10 ° from the desired signal direction) and the other directions, respectively. Further, D ( _... (Ω, θ) shown in the equations (3) to (6) represents the directivity characteristic of each sound collection unit.

第１収音部４−１はマイクロホンアレー３Ｌから見たときに、所望音源１の方向で発せられる音のみを強調して収音する。第３収音部はマイクロホンアレー３Ｌから見て、所望音源の方向以外で発せられる音のみを強調して収音する。第２収音部４−２はマイクロホンアレー３Ｒから見て、所望音源１の方向で発せられる音のみを強調して収音する。第４収音部４−４はマイクロホンアレー３Ｒから見て、所望音源１の方向以外で発せられる音のみを強調して収音する。

The first sound collection unit 4-1 emphasizes and collects only the sound emitted in the direction of the desired sound source 1 when viewed from the microphone array 3 </ b> L. The third sound collection unit emphasizes and collects only the sound emitted in directions other than the direction of the desired sound source when viewed from the microphone array 3L. The second sound collection unit 4-2 emphasizes and collects only the sound emitted in the direction of the desired sound source 1 when viewed from the microphone array 3R. The fourth sound collection unit 4-4 emphasizes and collects only the sound emitted from directions other than the direction of the desired sound source 1 when viewed from the microphone array 3R.

図５は正面ビームフォーマーとして機能する第５収音部４−５と第６収音部４−６における処理の流れを示している。正面ビームフォーマーにはマイクロホンアレー３Ｌの中心に配置されたマイクロホンで受音された信号ｘ_{Ｌ（ＭＬ／２）}（ｎ）と、マイクロホンアレー３Ｒの中心に配置されたマイクロホンで受音された信号ｘ_{Ｒ（ＭＲ／２）}（ｎ）が入力され、それぞれフィルタ処理部５１と５２に入力される。フィルタ処理部５１と５２では入力された信号ｘ_{Ｌ（ＭＬ／２）}（ｎ）とｘ_{Ｒ（ＭＲ／２）}（ｎ）に、式（７）と式（８）に示すようなあらかじめ与えられたフィルタ係数ｗ_{Ｃ（ＭＬ／２）}（ｎ）、ｗ_{Ｃ（ＭＲ／２）}（ｎ）を畳み込んだ出力ｘ’_{Ｌ（ＭＬ／２）}（ｎ）、ｘ’_{Ｒ（ＭＲ／２）}（ｎ）を出力する。 FIG. 5 shows the flow of processing in the fifth sound collection unit 4-5 and the sixth sound collection unit 4-6 functioning as a front beam former. The front beamformer has a signal x _{L (ML / 2)} (n) received by the microphone arranged at the center of the microphone array 3L and a signal received by the microphone arranged at the center of the microphone array 3R. xR _{(MR / 2)} (n) is input and input to the filter processing units 51 and 52, respectively. In the filter processing units 51 and 52, the input signals x _{L (ML / 2)} (n) and x _{R (MR / 2)} (n) are given in advance as shown in the equations (7) and (8). Filter coefficients w _{C (ML / 2)} (n), w _{C (MR / 2)} (n) are convolved outputs x ′ _{L (ML / 2)} (n), x ′ _{R (MR / 2)} ( n) is output.

ここでフィルタ係数ｗ_{Ｃ（ＭＬ／２）}（ｎ）、ｗ_{Ｃ（ＭＲ／２）}（ｎ）は位相特性が同じものが望ましく、例えば単一インパルス信号

Here, it is desirable that the filter coefficients w _{C (ML / 2)} (n) and w _{C (MR / 2)} (n) have the same phase characteristics, for example, a single impulse signal.

が用いられる。第５収音部４−５ではフィルタ処理部５１と５２の出力信号ｘ’_{Ｌ（ＭＬ／２）}（ｎ）とｘ’_{Ｒ（ＭＲ／２）}（ｎ）を加算部５３に入力する。加算部５３では入力された信号を式（１０）のように加算して、信号ｙ_ＳＣ（ｎ）を出力する。これにより第５収音部４−５では、マイクロホンアレー３Ｌとマイクロホンアレー３Ｒの間の中間点から見て、所望音源１の方向で発せられる音のみを強調して収音する。

Is used. The fifth sound collection unit 4-5 inputs the output signals x ′ _{L (ML / 2)} (n) and x ′ _{R (MR / 2)} (n) of the

filter processing units

51 and 52 to the addition unit 53. The adder 53 adds the input signals as shown in Expression (10), and outputs a signal y _SC (n). As a result, the fifth sound collection unit 4-5 emphasizes and collects only the sound emitted in the direction of the desired sound source 1 when viewed from the midpoint between the microphone array 3L and the microphone array 3R.

ｙ_ＳＣ（ｎ）＝ｘ’_{Ｌ（ＭＬ／２）}（ｎ）＋ｘ’_{Ｒ（ＭＲ／２）}（ｎ）（１０）
第６収音部４−６ではフィルタ処理部５１と５２の出力信号ｘ’_{Ｌ（ＭＬ／２）}（ｎ）とｘ’_{Ｒ（ＭＲ／２）}（ｎ）を減算部５４に入力する。減算部５４では入力された信号を式（１１）のように減算して、信号ｙ_ＮＣ（ｎ）を出力する。したがって第６収音部４−６では、マイクロホンアレー３Ｌとマイクロホンアレー３Ｒの間の中間点から見て、所望音源１の方向以外で発せられる音のみを強調して収音する。 y _SC (n) = x ′ _{L (ML / 2)} (n) + x ′ _{R (MR / 2)} (n) (10)
The sixth sound collection unit 4-6 inputs the output signals x ′ _{L (ML / 2)} (n) and x ′ _{R (MR / 2)} (n) of the filter processing units 51 and 52 to the subtraction unit 54. The subtracting unit 54 subtracts the input signal as shown in Expression (11) and outputs a signal y _NC (n). Accordingly, the sixth sound collecting unit 4-6 emphasizes and collects only the sound emitted in directions other than the direction of the desired sound source 1 when viewed from an intermediate point between the microphone array 3L and the microphone array 3R.

ｙ_ＮＣ（ｎ）＝ｘ’_{Ｌ（ＭＬ／２）}（ｎ）−ｘ’_{Ｒ（ＭＲ／２）}（ｎ）（１１）
図６は音源信号成分推定部７における処理の流れを示している。音源信号成分推定部７に入力される周波数成分Ｙ_ＳＬ（ω，ｌ）、Ｙ_ＮＬ（ω，ｌ）、Ｙ_ＳＣ（ω，ｌ）、Ｙ_ＮＣ（ω，ｌ）、Ｙ_ＳＲ（ω，ｌ）、Ｙ_ＮＲ（ω，ｌ）はそれぞれパワー演算部６１に入力され、信号のパワー値｜Ｙ_ＳＬ（ω，ｌ）｜^２、｜Ｙ_ＮＬ（ω，ｌ）｜^２、｜Ｙ_ＳＣ（ω，ｌ）｜^２、｜Ｙ_ＮＣ（ω，ｌ）｜^２、｜Ｙ_ＳＲ（ω，ｌ）｜^２、｜Ｙ_ＮＲ（ω，ｌ）｜^２が出力され、ベクトル化部６２に入力される。ベクトル化部６２では、入力された第１乃至第６収音部４−１〜４−６の各出力信号のパワー値を式（１２）のようにベクトル形式でまとめた、パワーベクトルＹ（ω，ｌ）を出力する。 y _NC (n) = x ′ _{L (ML / 2)} (n) −x ′ _{R (MR / 2)} (n) (11)
FIG. 6 shows the flow of processing in the sound source signal component estimation unit 7. Frequency components Y _SL (ω, l), Y _NL (ω, l), Y _SC (ω, l), Y _NC (ω, l), Y _SR (ω, l) input to the sound source signal component estimation unit 7 ), Y _NR (ω, l) are input to the power calculation unit 61, and the signal power values | Y _SL (ω, l) | ² , | Y _NL (ω, l) | ² , | Y _SC (ω , L) | ² , | Y _NC (ω, l) | ² , | Y _SR (ω, l) | ² , | Y _NR (ω, l) | ² are output and input to the vectorization unit 62. . In the vectorization unit 62, a power vector Y (ω, in which the power values of the input output signals of the first to sixth sound pickup units 4-1 to 4-6 are collected in a vector format as shown in Expression (12). , L) is output.

パワーベクトルＹ（ω，ｌ）は乗算部６３に入力される。乗算部６３のもう一方の入力であるパワー推定行列Ｔ^＋は、擬似逆行列演算部６４の出力信号である。擬似逆行列演算部６４には式（１９）により定義されるゲイン行列Ｔが入力され、その擬似逆行列Ｔ^＋を出力する。

The power vector Y (ω, l) is input to the multiplication unit 63. The power estimation matrix T ⁺ that is the other input of the multiplier 63 is an output signal of the pseudo inverse matrix calculator 64. The pseudo inverse matrix calculator 64 receives the gain matrix T defined by the equation (19) and outputs the pseudo inverse matrix T ⁺ .

ゲイン逆行列Ｔの各要素は、第５収音部４−５と第６収音部４−６及び第１収音部４−１〜第４収音部４−４に設定されるΘ_x方向またはΘ_x￣方向に対する指向特性のゲインであり、例えば式（１４）から式（１７）に示すような指向特性の周波数および方向に関する平均値を用いる。

Each element of the gain inverse matrix T, theta _x is set to the fifth collecting sections 4-5 and sixth sound pickup portion 4-6 and the first sound pickup units 4-1 to fourth sound pickup unit 4-4 The gain of the directivity with respect to the direction or the Θ _x ￣ direction, and for example, an average value regarding the frequency and direction of the directivity as shown in the equations (14) to (17) is used.

α_ｘは所望音の周辺方向に対する第１、第２、第５収音部４−１、４−２、４−５に設定する指向特性の平均値である。β_ｘは所望信号の周辺方向に対する第１、第２、第５の収音部４−１、４−２、４−５に設定する指向特性の平均値である。γ_ｘは所望信号の周辺方向に対する、第３、第４、第６収音部４−３、４−４、４−６に設定する指向特性の平均値である。δ_ｘは所望信号の周辺方向以外に対する、第３、第４、第６収音部４−３、４−４、４−６に設定する指向特性の平均値である。尚、（１４）〜（１７）式中添字ｘはＲ、Ｃ、Ｌの何れかを表わす。

α _x is an average value of directivity characteristics set in the first, second, and fifth sound pickup units 4-1, 4-2, and 4-5 with respect to the peripheral direction of the desired sound. β _x is an average value of directivity characteristics set in the first, second, and fifth sound pickup units 4-1, 4-2, and 4-5 with respect to the peripheral direction of the desired signal. γ _x is an average value of directional characteristics set in the third, fourth, and sixth sound pickup units 4-3, 4-4, and 4-6 with respect to the peripheral direction of the desired signal. δ _x is an average value of directivity characteristics set in the third, fourth, and sixth sound pickup units 4-3, 4-4, and 4-6 with respect to directions other than the peripheral direction of the desired signal. In the expressions (14) to (17), the subscript x represents R, C, or L.

乗算部９は式（１８）に示すように入力されたビームフォーマー出力パワーベクトルとパワー推定行列の乗算を周波数成分ごとに行い、推定信号パワーベクトルＸ_ｏｐｔ（ω，ｌ）を出力する。
Ｘ_ｏｐｔ（ω，ｌ）＝Ｔ^＋Ｙ（ω，ｌ）（１８）
図７は利得係数算出８における処理の流れを示している。図６に示した音源信号成分推定部７より入力された推定信号パワーベクトルＸ_ｏｐｔ（ω，ｌ）はベクトル要素抽出部８１に入力される。ベクトル要素抽出部８１では式（１９）に示すように、入力された推定信号パワーベクトルの第１成分を推定信号パワー｜Ｓ（ω，ｌ）｜^２、第２成分を推定左方向雑音パワー｜Ｎ_Ｌ（ω，ｌ）｜^２、第３成分を推定正面方向雑音パワー｜Ｎ_Ｃ（ω，ｌ）｜^２、第４成分を推定右方向雑音パワー｜Ｎ_Ｒ（ω，ｌ）｜^２としてそれぞれ出力し、それらはＳＮ比推定部８２に入力される。 The multiplier 9 performs multiplication of the input beamformer output power vector and the power estimation matrix for each frequency component as shown in Expression (18), and outputs an estimated signal power vector X _opt (ω, l).
X _opt (ω, l) = T ⁺ Y (ω, l) (18)
FIG. 7 shows the flow of processing in the gain coefficient calculation 8. The estimated signal power vector X _opt (ω, l) input from the sound source signal component estimation unit 7 shown in FIG. 6 is input to the vector element extraction unit 81. In the vector element extraction unit 81, as shown in Expression (19), the first component of the input estimated signal power vector is the estimated signal power | S (ω, l) | ² , and the second component is the estimated left noise power | N _L (ω, l) | ² , the third component is estimated front noise power | N _C (ω, l) | ² , and the fourth component is estimated right noise power | N _R (ω, l) | ² These are output and input to the SN ratio estimation unit 82.

ＳＮ比推定部８２では式（２０）を用いて推定ＳＮ比ＥＳＮＲ（ω，ｌ）を計算する。

The S / N ratio estimation unit 82 calculates the estimated S / N ratio ESNR (ω, l) using Expression (20).

ＳＮ比推定部８２の出力である推定ＳＮ比ＥＳＮＲ（ω，ｌ）が利得係数Ｒ（ω，ｌ）として出力される。
利得係数Ｒ（ω，ｌ）は周波数領域毎に算出される。従って雑音の混入量が少ない周波数領域では利得係数Ｒ（ω，ｌ）は「１」に近い値となり、所望信号成分はそのまま出力される。また雑音の混入量が多い周波数領域では利得係数Ｒ（ω，ｌ）は「０」に近い値となり、その周波数領域の信号成分は大きく減衰され、雑音量を抑制する。このように周波数領域ごとに利得係数Ｒ（ω，ｌ）を加算部６から与えられる所望信号を主成分とする信号Ｙ_Ｓ（ω，ｌ）に乗算することにより、周波数領域ごとに雑音成分が抑圧され、逆周波数領域変換部１０で時間領域に変換された信号のＳＮ比を向上することができる。

The estimated SN ratio ESNR (ω, l), which is the output of the SN ratio estimation unit 82, is output as the gain coefficient R (ω, l).
The gain coefficient R (ω, l) is calculated for each frequency domain. Therefore, the gain coefficient R (ω, l) is a value close to “1” in the frequency region where the amount of noise is small, and the desired signal component is output as it is. In the frequency region where the amount of noise is large, the gain coefficient R (ω, l) is a value close to “0”, and the signal component in that frequency region is greatly attenuated to suppress the noise amount. In this way, by multiplying the signal Y _S (ω, l) whose main component is the desired signal given from the adding unit 6 by the gain coefficient R (ω, l) for each frequency region, the noise component is obtained for each frequency region. It is possible to improve the SN ratio of the signal that is suppressed and converted into the time domain by the inverse frequency domain transform unit 10.

［第１実施形態］
図８に、本発明の第１実施形態の収音装置全体の構成例を示す。図２に示した特願２００６−５２５０２の収音装置全体の構成とは、パワースペクトル推定部１１０、残響スペクトル推定部１２０、処理対象信号生成部１４０が異なる。図９は、第１実施形態の収音装置の処理フローを示す図である。 [First Embodiment]
FIG. 8 shows a configuration example of the entire sound collecting device according to the first embodiment of the present invention. The power spectrum estimation unit 110, the reverberation spectrum estimation unit 120, and the processing target signal generation unit 140 are different from the configuration of the entire sound collection device of Japanese Patent Application No. 2006-55022 shown in FIG. FIG. 9 is a diagram illustrating a processing flow of the sound collection device according to the first embodiment.

第１及び第２収音部４−１、４−２は、複数のマイクロホンを搭載して構成されるマイクロホンアレーの出力信号を利用して互いに異なる位置から所望音源位置を含む角度領域の音ｙ_ＳＬ（ｎ）、ｙ_ＳＲ（ｎ）を収音する（Ｓ４−１、Ｓ４−２）。第３及び第４収音部４−３、４−４は、マイクロホンアレーの出力信号を利用して互いに異なる位置から前記所望音源位置を含まない角度領域の音ｙ_ＮＬ（ｎ）、ｙ_ＮＲ（ｎ）を収音する（Ｓ４−３、Ｓ４−４）。第５収音部４−５は、互いに異なる位置の中間点から所望音源位置を含む角度領域の音ｙ_ＳＣ（ｎ）を収音する（Ｓ４−５）。第６収音部４−６は、中間点から所望音源位置を含まない角度領域の音ｙ_ＮＣ（ｎ）を収音する（Ｓ４−６）。周波数領域変換部５は、各収音部４−１〜４−６で収音された信号ｙ_ＳＬ（ｎ）、ｙ_ＳＲ（ｎ）、ｙ_ＮＬ（ｎ）、ｙ_ＮＲ（ｎ）、ｙ_ＳＣ（ｎ）、ｙ_ＮＣ（ｎ）を、周波数領域の信号Ｙ_ＳＬ（ω，ｌ）、Ｙ_ＳＲ（ω，ｌ）、Ｙ_ＮＬ（ω，ｌ）、Ｙ_ＮＲ（ω，ｌ）、Ｙ_ＳＣ（ω，ｌ）、Ｙ_ＮＣ（ω，ｌ）に変換する。処理対象信号生成部１４０は、周波数領域の第１収音部４−１からの信号Ｙ_ＳＬ（ω，ｌ）と第２収音部４−２からの信号Ｙ_ＳＲ（ω，ｌ）の平均を、処理対象信号Ｙ_Ｓ（ω，ｌ）とする（Ｓ１４０）。パワースペクトル推定部１１０は、周波数領域に変換された各収音部４−１〜４−６で得られた各収音信号Ｙ_ＳＬ（ω，ｌ）、Ｙ_ＳＲ（ω，ｌ）、Ｙ_ＮＬ（ω，ｌ）、Ｙ_ＮＲ（ω，ｌ）、Ｙ_ＳＣ（ω，ｌ）、Ｙ_ＮＣ（ω，ｌ）と残響音の信号量Ｚ^＊ _ｅｓｔ（ω，ｌ）から、残響信号を除去した所望音源の信号量とその他の音源の信号量Ｘ_ｏｐｔ（ω，ｌ）とを、周波数ごとに推定する（Ｓ１１０）。残響スペクトル推定部１２０は、パワースペクトル推定部１１０が推定した所望音源の信号量とその他の音源の信号量Ｘ_ｏｐｔ（ω，ｌ）から、残響音の信号量Ｚ^＊ _ｅｓｔ（ω，ｌ）を周波数ごとに求める（Ｓ１２０）。利得係数算出部８は、所望音源の信号量と、所望音源の信号量を含む全ての音源の信号量との比から周波数ごとに利得係数Ｒ（ω，ｌ）を求める（Ｓ８）。乗算部９は、利得係数算出部８で算出した利得係数Ｒ（ω，ｌ）を処理対象信号Ｙ_Ｓ（ω，ｌ）に乗算する（Ｓ９）。逆周波数領域変換部１０は、利得係数が乗算された処理対象信号Ｒ（ω，ｌ）Ｙ_Ｓ（ω，ｌ）を時間領域に変換する。 The first and second sound collecting units 4-1 and 4-2 use the output signals of the microphone array configured by mounting a plurality of microphones, and the sound y in the angle region including the desired sound source position from different positions. _SL (n) and y _SR (n) are picked up (S4-1, S4-2). The third and fourth sound collection units 4-3 and 4-4 use the output signals of the microphone array to generate sound y _NL (n), y _NR (in an angular region not including the desired sound source position from different positions. n) is picked up (S4-3, S4-4). The fifth sound pickup unit 4-5, picks up the sound _y SC (n) of the angular region including the desired sound source position from the midpoint of the different positions (S4-5). The sixth sound collecting unit 4-6 picks up the sound y _NC (n) in the angular region not including the desired sound source position from the intermediate point (S4-6). The frequency domain transform unit 5 receives signals y _SL (n), y _SR (n), y _NL (n), y _NR (n), y _SC collected by the sound collecting units 4-1 to 4-6. (N), y _NC (n) are converted into frequency domain signals Y _SL (ω, l), Y _SR (ω, l), Y _NL (ω, l), Y _NR (ω, l), Y _SC ( ω, l) and Y _NC (ω, l). The processing target signal generation unit 140 averages the signal Y _SL (ω, l) from the first sound collection unit 4-1 and the signal Y _SR (ω, l) from the second sound collection unit 4-2 in the frequency domain. Is a processing target signal Y _S (ω, l) (S140). The power spectrum estimation unit 110 obtains the sound collection signals Y _SL (ω, l), Y _SR (ω, l), Y _NL obtained by the sound collection units 4-1 to 4-6 converted into the frequency domain. The reverberation signal was removed from (ω, l), Y _NR (ω, l), Y _SC (ω, l), Y _NC (ω, l) and the amount of reverberant sound signal Z ^* _est (ω, l). The signal amount of the desired sound source and the signal amount X _opt (ω, l) of other sound sources are estimated for each frequency (S110). The reverberation spectrum estimation unit 120 calculates the signal amount Z ^* _est (ω, l) of the reverberant sound from the signal amount of the desired sound source estimated by the power spectrum estimation unit 110 and the signal amount X _opt (ω, l) of other sound sources. It calculates | requires for every frequency (S120). The gain coefficient calculation unit 8 obtains a gain coefficient R (ω, l) for each frequency from the ratio between the signal amount of the desired sound source and the signal amounts of all sound sources including the signal amount of the desired sound source (S8). The multiplier 9 multiplies the processing target signal Y _S (ω, l) by the gain coefficient R (ω, l) calculated by the gain coefficient calculator 8 (S9). The inverse frequency domain transform unit 10 transforms the processing target signal R (ω, l) Y _S (ω, l) multiplied by the gain coefficient into the time domain.

次に、図２の収音装置と異なる構成部の詳細を説明する。図１０は、処理対象信号生成部１４０の機能構成例を示す図である。処理対象信号生成部１４０は、加算部１４１と除算部１４２から構成される。加算部１４１は、周波数領域の第１収音部４−１からの信号Ｙ_ＳＬ（ω，ｌ）と第２収音部４−２からの信号Ｙ_ＳＲ（ω，ｌ）とを加算する。除算部１４２は、加算された信号を２で割り、平均値を処理対象信号Ｙ_Ｓ（ω，ｌ）として出力する。図２の収音装置では、加算部６によって周波数領域の第１収音部４−１からの信号Ｙ_ＳＬ（ω，ｌ）と第２収音部４−２からの信号Ｙ_ＳＲ（ω，ｌ）とを加算して、処理対象信号Ｙ_Ｓ（ω，ｌ）としていた。違いは、２で割るか否かである。この違いによって生じる差は、信号全体のボリュームだけであり、波形が同じなので、信号処理の観点からは等価である。つまり、２以外の値で除算しても、等価な処理である。 Next, details of components that are different from the sound collection device of FIG. 2 will be described. FIG. 10 is a diagram illustrating a functional configuration example of the processing target signal generation unit 140. The processing target signal generation unit 140 includes an addition unit 141 and a division unit 142. The adding unit 141 adds the signal Y _SL (ω, l) from the first sound collecting unit 4-1 in the frequency domain and the signal Y _SR (ω, l) from the second sound collecting unit 4-2. The division unit 142 divides the added signal by 2 and outputs the average value as the processing target signal Y _S (ω, l). In the sound collecting device of FIG. 2, the adder 6 causes the signal Y _SL (ω, l) from the first sound collecting unit 4-1 in the frequency domain and the signal Y _SR (ω, l) and the signal to be processed Y _S (ω, l). The difference is whether to divide by 2. The difference caused by this difference is only the volume of the entire signal, and the waveform is the same, so they are equivalent from the viewpoint of signal processing. That is, even if division by a value other than 2 is performed, the processing is equivalent.

図１１に、パワースペクトル推定部１１０の機能構成例を示す。パワースペクトル推定部１１０と、図６の音源信号成分推定部７との違いは、ベクトル化部６２と乗算部６３との間に、減算部１１１が備えられている点である。減算部１１１は、ベクトル化された信号Ｙ（ω，ｌ）から、推定した残響音の信号量Ｚ^＊ _ｅｓｔ（ω，ｌ）を次式のように減算し、その結果Ｙ’（ω，ｌ）を乗算部６３に入力する。
Ｙ’（ω，ｌ）＝Ｙ（ω，ｌ）−Ｚ^＊ _ｅｓｔ（ω，ｌ）（２１）
その他の処理は、音源信号成分推定部７と同じである。 FIG. 11 shows a functional configuration example of the power spectrum estimation unit 110. The difference between the power spectrum estimation unit 110 and the sound source signal component estimation unit 7 in FIG. 6 is that a subtraction unit 111 is provided between the vectorization unit 62 and the multiplication unit 63. The subtracting unit 111 subtracts the estimated signal amount Z ^* _est (ω, l) of the reverberant sound from the vectorized signal Y (ω, l) as in the following equation, and as a result, Y ′ (ω, l) ) Is input to the multiplier 63.
Y ′ (ω, l) = Y (ω, l) −Z ^* _est (ω, l) (21)
Other processes are the same as those of the sound source signal component estimation unit 7.

図１２に、残響スペクトル推定部１２０の機能構成例を示す。残響スペクトル推定部１２０は、ゲイン行列乗算部１２５と重み付き加算部１２６から構成される。ゲイン行列乗算部１２５は、所望音源の信号量とその他の音源の信号量Ｘ_ｏｐｔ（ω，ｌ）を、収音部ごとの信号量Ｚ_ｅｓｔ（ω，ｌ）に変換する。ゲイン行列Ｔ’は、残響成分に対する各収音部の指向特性のゲインで、例えば式（２２）とすればよい。 FIG. 12 shows a functional configuration example of the reverberation spectrum estimation unit 120. The reverberation spectrum estimation unit 120 includes a gain matrix multiplication unit 125 and a weighted addition unit 126. The gain matrix multiplication unit 125 converts the signal amount of the desired sound source and the signal amount X _opt (ω, l) of other sound sources into a signal amount Z _est (ω, l) for each sound collection unit. The gain matrix T ′ is a gain of the directivity of each sound collection unit with respect to the reverberation component, and may be, for example, Expression (22).

ただし、

However,

である。重み付き加算部１２６は、収音部ごとの信号量Ｚ_ｅｓｔ（ω，ｌ）を記録し、複数の過去の収音部ごとの信号量を重み付き加算する。具体的には、過去のＮ個のフレームの収音部ごとの信号量Ｚ_ｅｓｔ（ω，ｌ）の重み付き加算を行うのであれば、Ｎ個の遅延部１２１_１〜１２１_ＮとＮ個の重み乗算部１２２_１〜１２２_ＮとＮ−１個の加算部１２３_１〜１２３_Ｎ−１とを備えればよい。第１遅延部１２１_１は、収音部ごとの信号量Ｚ_ｅｓｔ（ω，ｌ）を記録し、１フレーム分遅延させる。第１重み乗算部１２２_１は、重みρ_１を第１遅延部１２１_１の出力（１フレーム前の収音部ごとの信号量Ｚ_ｅｓｔ（ω，ｌ））に乗算する。第ｎ遅延部１２１_ｎは、ｎ−１フレーム前の収音部ごとの信号量Ｚ_ｅｓｔ（ω，ｌ）を記録し、１フレーム分遅延させる。第ｎ重み乗算部１２２_ｎは、重みρ_ｎを第ｎ遅延部１２１_ｎの出力（ｎフレーム前の収音部ごとの信号量Ｚ_ｅｓｔ（ω，ｌ））に乗算する。第ｎ加算部１２３_ｎは、第ｎ＋１加算部１２３_ｎ＋１の出力に、第ｎ重み乗算部１２２_ｎの出力を加算する。第１加算部１２３_１は、第２加算部１２３_２の出力に、第１重み乗算部１２２_１の出力を加算して、残響音の信号量Ｚ^＊ _ｅｓｔ（ω，ｌ）を出力する。このように処理することで、ｎフレーム前の収音部ごとの信号量Ｚ_ｅｓｔ（ω，ｌ）に重みρ_ｎを付与した重み付き加算ができる。ここで、重みρ_ｎは残響成分の時間によるパワー減衰を表すパラメータであり、例えば、残響時間Ｔ_６０からは、式（２９）のように与えられる。

It is. The weighted addition unit 126 records the signal amount Z _est (ω, l) for each sound collection unit, and weights and adds the signal amounts for a plurality of past sound collection units. Specifically, if the weighted addition of the signal amount Z _est (ω, l) for each sound collection part of the past N frames is performed, N delay parts 121 _{1 to} 121 _N and N weight multiplier unit ₁₂₂ 1 to 122 _N and the N-1 of the adder ₁₂₃ ₁ _~123 _N-1 and may be Sonaere a. The first delay unit 121 _1, the signal amount of each sound pickup unit _{Z est (ω,} l) records, delaying one frame. The first weight multiplication unit 122 ₁ multiplies the output of the first delay unit 121 ₁ (the signal amount Z _est (ω, l) for each sound collection unit one frame before) by the weight ρ ₁ . The n-th delay unit 121 _n records the signal amount Z _est (ω, l) for each sound collection unit before n−1 frames and delays it by one frame. The first n weighted multiplication section 122 _n, the weights [rho _n outputs of the n delay unit 121 _n (signal amount of each sound pickup unit of n frames before _{Z est (ω,} l)) is multiplied. The n-th adder 123 _n adds the output of the n-th weight multiplier 122 _{n to} the output of the n + 1 adder 123 _{n + 1} . First adder 123 _1, the output of the second adder unit 123 ₂ adds the output from the first weight multiplying unit 122 _1, the signal amount of reverberation ^Z _{* est} ^(ω, l) and outputs a. By processing in this way, weighted addition in which a weight ρ _n is added to the signal amount Z _est (ω, l) for each sound collection unit n frames before can be performed. Here, the weight [rho _n is a parameter indicating the power attenuation with time of reverberation components, for example, from the reverberation time T ₆₀ is given by the equation (29).

ただし、Ｌ_Ｓは１フレームのサンプル数、Ｆ_Ｓはサンプリング周波数である。

Here, L _S is the number of samples in one frame, and F _S is the sampling frequency.

次に、本発明の残響を除去する原理を説明する。図１３は雑音発生のモデルを示す図である。図１４は、各フレームでのパワースペクトルへの残響の影響を示す図である。残響音は、ある時刻０（ここでは時間フレームで考える）で発せられた直接音に対して、その伝達経路の距離に応じた時間だけ遅れて、また一定の減衰率によってその大きさが減じられてマイクロホンに到達する。例えば、図１３に示す例では、時刻０に発せられた直接音と同じ音が時刻１〜３のフレームに残響として影響を与えている。このため、図１４に示すように、あるフレームｌにおける推定パワースペクトルには、過去のフレームに含まれる直接音の成分が残響として重畳されている。このときの減衰率が残響スペクトル推定部１２０の重みρ_ｎに対応する。重みρ_ｎは部屋の音響特性から決定され、例えば部屋の音響特性を示す１つの尺度である残響時間Ｔ_６０を用いて、式（２９）によって理論的に計算することが可能である。本発明の収音装置では、過去の直接音の成分は、過去の収音部ごとの信号量Ｚ_ｅｓｔ（ω，ｌ）として求めることができる。そこで、ゲイン行列乗算部１２５で収音部ごとの信号量Ｚ_ｅｓｔ（ω，ｌ）に変換し、重み付き加算部１２６で収音部ごとの信号量Ｚ_ｅｓｔ（ω，ｌ）を記録し、複数の過去の収音部ごとの信号量を重み付き加算する。このように残響音の信号量Ｚ^＊ _ｅｓｔ（ω，ｌ）を求め、パワースペクトル推定部１１０では、ベクトル化された信号Ｙ（ω，ｌ）から、推定した残響音の信号量Ｚ^＊ _ｅｓｔ（ω，ｌ）を減算する。したがって、第１実施形態の収音装置は、残響による影響を低減できる。 Next, the principle of removing the reverberation of the present invention will be described. FIG. 13 is a diagram illustrating a noise generation model. FIG. 14 is a diagram illustrating the influence of reverberation on the power spectrum in each frame. The reverberant sound is delayed by a time corresponding to the distance of the transmission path with respect to the direct sound emitted at a certain time 0 (in this case, considered in a time frame), and the magnitude is reduced by a certain attenuation rate. To reach the microphone. For example, in the example shown in FIG. 13, the same sound as the direct sound emitted at time 0 affects the frames at times 1 to 3 as reverberation. For this reason, as shown in FIG. 14, the component of the direct sound included in the past frame is superimposed on the estimated power spectrum in a certain frame l as reverberation. The attenuation rate at this time corresponds to the weight ρ _n of the reverberation spectrum estimation unit 120. The weight ρ _n is determined from the acoustic characteristics of the room, and can be theoretically calculated by the equation (29) using, for example, a reverberation time T ₆₀ that is one measure indicating the acoustic characteristics of the room. In the sound collection device of the present invention, the past direct sound component can be obtained as the signal amount Z _est (ω, l) for each past sound collection unit. Therefore, the signal amount of each sound pickup unit gain matrix multiplication unit 125 _{Z est} (ω, l) was converted to the signal amount of each sound pickup unit in the weighted addition section 126 _{Z est} (ω, l) to the recording, A signal amount for each of a plurality of past sound pickup units is added with weight. In this way, the signal amount Z ^* _est (ω, l) of the reverberant sound is obtained, and the power spectrum estimation unit 110 estimates the signal amount Z ^* _est ( Subtract ω, l). Therefore, the sound collection device of the first embodiment can reduce the influence of reverberation.

［第２実施形態］
図１５に、本発明の第２実施形態の収音装置全体の構成例を示す。第１実施形態（図８）とは、各収音部４’−１〜４’−６、処理対象信号生成部１４０’、パワースペクトル推定部１１０’、残響スペクトル推定部１２０’が異なる。以下では、第１実施形態と異なる構成部について説明する。 [Second Embodiment]
FIG. 15 shows a configuration example of the entire sound collecting device according to the second embodiment of the present invention. The sound collection units 4′-1 to 4′-6, the processing target signal generation unit 140 ′, the power spectrum estimation unit 110 ′, and the reverberation spectrum estimation unit 120 ′ are different from the first embodiment (FIG. 8). Below, a different structure part from 1st Embodiment is demonstrated.

図１６は、各収音部４’−１〜４’−６の設定を説明するための音源位置の領域を示す図である。また、図１７は、第１収音部４’−１の機能構成例を示す図である。マイクロホンアレー３Ｌには、信号ｘ_ＬｍＬ（ｎ）（ｍ_Ｌ＝１，２，…，Ｍ_Ｌ）が入力される。フィルタ処理部４１’では、あらかじめ定められた（決定方法は後述する）フィルタ係数ｗ_ＬｍＬ（ｎ）と入力信号ｘ_ＬｍＬ（ｎ）を、式（３０）に示す畳み込み演算に代入して得られる信号ｘ'_ＬｍＬ（ｎ）を出力する。 FIG. 16 is a diagram illustrating sound source position regions for explaining the settings of the sound collection units 4′-1 to 4′-6. FIG. 17 is a diagram illustrating a functional configuration example of the first sound collection unit 4′-1. A signal x _LmL (n) (m _L = 1, 2,..., M _L ) is input to the microphone array 3L. In the filter processing unit 41 ′, a signal obtained by substituting a predetermined filter coefficient w _LmL (n) and an input signal x _LmL (n) into a convolution operation shown in Expression (30) ( _description method will be described later). Output x ′ _LmL (n).

各フィルタ処理部４１’の出力信号は加算部４２’に入力される。加算部４２’では入力信号を式（３１）のように加算し、第１収音部４’−１の出力信号y_ＬＬ（ｎ）を得る。

The output signal of each filter processing unit 41 ′ is input to the adding unit 42 ′. The adder 42 ′ adds the input signals as shown in Expression (31) to obtain the output signal y _LL (n) of the first sound collection unit 4′-1.

ここでフィルタ係数ｗ_ＬｍＬ（ｎ）は、第１収音部４’−１の指向特性Ｄ_ＬＳＢ（ω，θ）が式（３２）に示す特性を持つように、例えば最小二乗法などを利用して設計される。第３収音部、第５収音部についても同様に、式（３３）、式（３４）のそれぞれの条件を満たすように設計される。Θ_Ｌ１〜Θ_Ｌ３は、それぞれ図１６に示すマイクロホンアレー３Ｌから見た角度領域を示している。

Here, the filter coefficient w _LmL (n) uses, for example, the least square method so that the directivity characteristic D _LSB (ω, θ) of the first sound collection unit 4′-1 has the characteristic shown in Expression (32). Designed. Similarly, the third sound collection unit and the fifth sound collection unit are designed so as to satisfy the conditions of Expression (33) and Expression (34). Θ _{L1 to} Θ _L3 indicate angular regions viewed from the microphone array 3L shown in FIG.

つまり、第１収音部４’−１は、角度領域Θ_Ｌ１の音を抑圧して収音する。第３収音部４’−３は、角度領域Θ_Ｌ２の音を抑圧して収音する。第５収音部４’−５は、角度領域Θ_Ｌ３の音を抑圧して収音する。
同様に、式（３５）から式（３７）に示すように、マイクロホンアレー３Ｒの第２収音部４’−２は、角度領域Θ_Ｒ１の音を抑圧して収音する。第４収音部４’−４は、角度領域Θ_Ｒ２の音を抑圧して収音する。第６収音部４’−６は、角度領域Θ_Ｒ３の音を抑圧して収音する。

That is, the first sound pickup unit 4'-1, to pick up by suppressing the sound of angular regions theta _L1. The third sound pickup unit 4'-3, to pick up by suppressing the sound of angular regions theta _L2. The fifth sound pickup 4'-5, to pick up by suppressing the sound of angular regions theta _L3.
Similarly, as shown from equation (35) into equation (37), a second collecting sections 4'-second microphone array 3R is to pick up by suppressing the sound of angular regions theta _R1. The fourth sound pickup unit 4'-4, to pick up by suppressing the sound of angular regions theta _R2. Sixth sound pickup 4'-6, to pick up by suppressing the sound of angular regions theta _R3.

図１８は、処理対象信号生成部１４０’の機能構成例を示す図である。処理対象信号生成部１４０’は、加算部１４１’と除算部１４２’から構成される。加算部１４１’は、周波数領域の第１収音部４−１’からの信号Ｙ_ＬＬ（ω，ｌ）、第２収音部４−２’からの信号Ｙ_ＬＲ（ω，ｌ）、第５収音部４−５’からの信号Ｙ_ＲＬ（ω，ｌ）、第６収音部４−６’からの信号Ｙ_ＲＲ（ω，ｌ）を次式のように加算し、加算結果Ｙ’_Ｂ（ω，ｌ）を出力する。

FIG. 18 is a diagram illustrating a functional configuration example of the processing target signal generation unit 140 ′. The processing target signal generation unit 140 ′ includes an addition unit 141 ′ and a division unit 142 ′. The adding unit 141 ′ includes a signal Y _LL (ω, l) from the first sound collecting unit 4-1 ′ in the frequency domain, a signal Y _LR (ω, l) from the second sound collecting unit 4-2 ′, 5 sound pickup unit 4-5 'signal from _{Y RL} (omega, l), the sixth sound pickup section 4-6' signal from _{Y RR} (omega, l) was added to the following equation, the addition result Y 'Output _B (ω, l).

除算部１４２’は、加算された信号Ｙ’_Ｂ（ω，ｌ）を次式のように４で割り、平均値を処理対象信号Ｙ_Ｂ（ω，ｌ）として出力する。
Ｙ_Ｂ（ω，ｌ）＝Ｙ’_Ｂ（ω，ｌ）／４（３９）
なお、第１実施形態で説明したように、除算部１４２’で割る数をいくつにしても、波形が同じなので、信号処理の観点からは等価である。つまり、４以外の値で除算しても、等価な処理である。

The division unit 142 ′ divides the added signal Y ′ _B (ω, l) by 4 as in the following equation, and outputs the average value as the processing target signal Y _B (ω, l).
Y _B (ω, l) = Y ′ _B (ω, l) / 4 (39)
Note that, as described in the first embodiment, the waveform is the same regardless of the number of divisions by the division unit 142 ′, which is equivalent from the viewpoint of signal processing. That is, even if division is performed by a value other than 4, it is an equivalent process.

図１９に、パワースペクトル推定部１１０’の機能構成例を示す。パワースペクトル推定部１１０’は、パワー演算部６１’、ベクトル化部６２’、減算部１１１’、乗算部６３’、擬似逆行列演算部６４’から構成される。パワー演算部６１’は、各収音部からの周波数領域の信号Ｙ_ＬＬ（ω，ｌ）、Ｙ_ＣＬ（ω，ｌ）、Ｙ_ＲＬ（ω，ｌ）、Ｙ_ＬＲ（ω，ｌ）、Ｙ_ＣＲ（ω，ｌ）、Ｙ_ＲＲ（ω，ｌ）から、パワー値｜Ｙ_ＬＬ（ω，ｌ）｜^２、｜Ｙ_ＣＬ（ω，ｌ）｜^２、｜Ｙ_ＲＬ（ω，ｌ）｜^２、｜Ｙ_ＬＲ（ω，ｌ）｜^２、｜Ｙ_ＣＲ（ω，ｌ）｜^２、｜Ｙ_ＲＲ（ω，ｌ）｜^２を計算し、出力する。ベクトル化部６２’は、パワー値を式（４０）のようにベクトル形式でまとめた、パワーベクトルＹ（ω，ｌ）を出力する。 FIG. 19 shows a functional configuration example of the power spectrum estimation unit 110 ′. The power spectrum estimation unit 110 ′ includes a power calculation unit 61 ′, a vectorization unit 62 ′, a subtraction unit 111 ′, a multiplication unit 63 ′, and a pseudo inverse matrix calculation unit 64 ′. The power calculation unit 61 ′ receives frequency domain signals Y _LL (ω, l), Y _CL (ω, l), Y _RL (ω, l), Y _LR (ω, l), Y from each sound collection unit. _{From CR} (ω, l) and Y _RR (ω, l), power values | Y _LL (ω, l) | ² , | Y _CL (ω, l) | ² , | Y _RL (ω, l) | ² , | Y _LR (ω, l) | ² , | Y _CR (ω, l) | ² , | Y _RR (ω, l) | ² are calculated and output. The vectorization unit 62 ′ outputs a power vector Y (ω, l) in which the power values are collected in a vector format as shown in Expression (40).

減算部１１１’は、ベクトル化された信号Ｙ（ω，ｌ）から、推定した残響音の信号量Ｚ^＊ _ｅｓｔ（ω，ｌ）を次式のように減算し、その結果Ｙ’（ω，ｌ）を乗算部６３’に入力する。
Ｙ’（ω，ｌ）＝Ｙ（ω，ｌ）−Ｚ^＊ _ｅｓｔ（ω，ｌ）（４１）
乗算部６３’のもう一方の入力であるパワー推定行列Ｔ^＋は、擬似逆行列演算部６４’の出力信号である。擬似逆行列演算部６４’には式（４２）により定義されるゲイン行列Ｔが入力され、その擬似逆行列Ｔ^＋を出力する。

The subtractor 111 ′ subtracts the estimated reverberation signal amount Z ^* _est (ω, l) from the vectorized signal Y (ω, l) as shown in the following equation, and as a result, Y ′ (ω, l) l) is input to the multiplier 63 '.
Y ′ (ω, l) = Y (ω, l) −Z ^* _est (ω, l) (41)
The power estimation matrix T ⁺ which is the other input of the multiplier 63 ′ is an output signal of the pseudo inverse matrix calculator 64 ′. The pseudo inverse matrix calculator 64 ′ receives the gain matrix T defined by the equation (42) and outputs the pseudo inverse matrix T ⁺ .

ゲイン逆行列Ｔ（ω）の各要素は、各収音部４’−１〜４’−６のΘ_１方向、Θ_２方向、Θ_３方向に対する指向特性のゲインであり、例えば式（４３）から式（４５）に示すような指向特性の方向に関する平均値を用いる。

Each element of the gain inverse matrix T (ω) is a gain of directivity with respect to the Θ ₁ direction, Θ ₂ direction, and Θ ₃ direction of each of the sound collection units 4′- _{1 to} 4′-6. For example, Expression (43) From the average value regarding the direction of the directivity as shown in Expression (45).

α_ｘ（ω）は、周波数ωにおける第１収音部４’−１と第２収音部４’−２の角度領域Θ_ｘの方向に対する指向特性の平均値である。β_ｘ（ω）は、周波数ωにおける第３収音部４’−３と第４収音部４’−４の角度領域Θ_ｘの方向に対する指向特性の平均値である。γ_ｘ（ω）は、周波数ωにおける第５収音部４’−５と第６収音部４’−６の角度領域Θ_ｘの方向に対する指向特性の平均値である。ここで、ｘには、Ｌ１、Ｌ２、Ｌ３、Ｒ１、Ｒ２、Ｒ３のいずれかが入る。乗算部６３’は、式（４６）に示すように残響が減算された信号Ｙ’（ω，ｌ）に擬似逆行列Ｔ^＋を乗算し、推定信号パワーベクトルＸ_ｏｐｔ（ω，ｌ）を出力する。

α _x (ω) is an average value of directivity characteristics with respect to the direction of the angle region Θ _x of the first sound collection unit 4′-1 and the second sound collection unit 4′-2 at the frequency ω. β _x (ω) is an average value of directivity with respect to the direction of the angle region Θ _x of the third sound collection unit 4′-3 and the fourth sound collection unit 4′-4 at the frequency ω. γ _x (ω) is an average value of directivity with respect to the direction of the angle region Θ _x of the fifth sound collection unit 4′-5 and the sixth sound collection unit 4′-6 at the frequency ω. Here, x is one of L1, L2, L3, R1, R2, and R3. The multiplier 63 ′ multiplies the signal Y ′ (ω, l) from which reverberation is subtracted as shown in the equation (46) by the pseudo inverse matrix T ^+, and outputs the estimated signal power vector X _opt (ω, l). To do.

Ｘ_ｏｐｔ（ω，ｌ）＝Ｔ^＋Ｙ’（ω，ｌ）（４６）
図２０に、残響スペクトル推定部１２０’の機能構成例を示す。残響スペクトル推定部１２０’は、ゲイン行列乗算部１２５’が、第１実施形態の残響スペクトル推定部１２０（図１２）と異なる。ゲイン行列乗算部１２５’では、ゲイン行列Ｔ’を、例えば式（４７）とする。 X _opt (ω, l) = T ⁺ Y ′ (ω, l) (46)
FIG. 20 shows a functional configuration example of the reverberation spectrum estimation unit 120 ′. The reverberation spectrum estimation unit 120 ′ is different from the reverberation spectrum estimation unit 120 (FIG. 12) in the gain matrix multiplication unit 125 ′ in the first embodiment. In the gain matrix multiplication unit 125 ′, the gain matrix T ′ is expressed by, for example, Expression (47).

ただし、

However,

である。重み付き加算部１２６は、第１実施形態の残響スペクトル推定部１２０（図１２）と同じなので、説明を省略する。
以上のような構成なので、第２実施形態の収音装置も、第１実施形態と同じように残響音を低減する効果を有する。

It is. Since the weighted addition unit 126 is the same as the reverberation spectrum estimation unit 120 (FIG. 12) of the first embodiment, the description thereof is omitted.
Since it is the above structure, the sound collection apparatus of 2nd Embodiment has the effect of reducing a reverberation sound similarly to 1st Embodiment.

［第３実施形態］
図２１に、本発明の第３実施形態の収音装置全体の構成例を示す。第２実施形態の収音装置とは、処理対象信号生成部１４０”が異なる。処理対象信号生成部の役割は、収音信号の中から、所望音に近い音を生成しておくことである。そして、雑音や残響を除去すれば高品質な収音が期待できる。所望音源が特定のマイクに近い場合、そのマイクから収音された信号の周波数領域の信号を処理対象信号とすることが合理的である。なお、第１実施形態にも処理対象信号生成部１４０”を用いることができる。 [Third Embodiment]
FIG. 21 shows a configuration example of the entire sound collecting device according to the third embodiment of the present invention. The processing target signal generation unit 140 ″ is different from the sound collection device of the second embodiment. The role of the processing target signal generation unit is to generate a sound close to the desired sound from the collected signals. If noise and reverberation are removed, high-quality sound collection can be expected.If the desired sound source is close to a specific microphone, the signal in the frequency domain of the signal collected from that microphone may be the signal to be processed. Note that the processing target signal generation unit 140 ″ can also be used in the first embodiment.

以上のような構成なので、第３実施形態の収音装置も、第１実施形態や第２実施形態と同じように残響音を低減する効果を有する。 Since it is the above structure, the sound collection apparatus of 3rd Embodiment also has the effect of reducing a reverberation sound similarly to 1st Embodiment or 2nd Embodiment.

［実験例］
次に第２実施形態の収音装置での実験結果を示す。図２２は実験環境を示す図である。それぞれのマイクロホンアレーには、４つのマイクロホンが直線状に４ｃｍの等間隔で配置されている。座標の単位はメートルであり、（０．４，０）と（−０．４，０）にそれぞれの中心が位置している。所望音源（対象話者の位置）が（０，０．５）にある。そして、３つの異なる背景雑音源（その他の話者の位置）が（−１．６，２．５）、（１．６，１．０）、（０．０，２．５）に配置されている。図２３は、残響が異なる２つの環境で、背景雑音の抑圧量を測定した結果を示す図である。実験環境１が残響時間２５０ｍｓの場合（一般的な寝室を同程度）、実験環境２が残響時間５００ｍｓ（一般的な会議室と同程度）である。この結果より、本発明の収音装置であれば、残響の異なる場合でも、特願２００６−５２５０２の収音装置に比べて残響抑圧量が向上していることが分かる。図２４は、第２実施形態の収音装置で収音した音の品質を被験者により確認した結果を示す図である。１０人の被験者が、本発明の収音装置で収音した音と、特願２００６−５２５０２の収音装置で収音した音とを、２：非常に良くなった、１：良くなった、０：同じである、−１：悪くなった、−２：非常に悪くなった、の５段階で評価した結果の平均値を示している。この結果より、多くの被験者が音が良くなったと評価しており、収音音質が改善していることが分かる。 [Experimental example]
Next, an experimental result in the sound collecting device of the second embodiment is shown. FIG. 22 is a diagram showing an experimental environment. In each microphone array, four microphones are linearly arranged at equal intervals of 4 cm. The unit of the coordinate is meter, and the respective centers are located at (0.4,0) and (−0.4,0). The desired sound source (target speaker position) is at (0, 0.5). Three different background noise sources (other speaker locations) are placed at (−1.6, 2.5), (1.6, 1.0), (0.0, 2.5). ing. FIG. 23 is a diagram illustrating a result of measuring the amount of suppression of background noise in two environments with different reverberations. When the experimental environment 1 has a reverberation time of 250 ms (same level as a general bedroom), the experimental environment 2 has a reverberation time of 500 ms (same level as a general conference room). From this result, it can be seen that the reverberation suppression amount is improved with the sound collecting device of the present invention even when reverberation is different from that of the sound collecting device of Japanese Patent Application No. 2006-55022. FIG. 24 is a diagram illustrating a result of confirming the quality of sound collected by the sound collection device of the second embodiment by a subject. 10 subjects have collected the sound collected by the sound collecting device of the present invention and the sound collected by the sound collecting device of Japanese Patent Application No. 2006-5502 2: very improved, 1: improved It shows the average value of the results evaluated in five stages: 0: the same, -1: worse, -2: very worse. From this result, it can be seen that many subjects evaluate that the sound has improved, and the sound collection quality has improved.

図２５に、コンピュータの機能構成例を示す。なお、本発明の収音装置は、コンピュータ２０００の記録部２０２０に、本発明の各構成部としてコンピュータ２０００を動作させるプログラムを読み込ませ、処理部２０１０、入力部２０３０、出力部２０４０などを動作させることで実現できる。また、コンピュータに読み込ませる方法としては、プログラムをコンピュータ読み取り可能な記録媒体に記録しておき、記録媒体からコンピュータに読み込ませる方法、サーバ等に記録されたプログラムを、電気通信回線等を通じてコンピュータに読み込ませる方法などがある。 FIG. 25 shows a functional configuration example of a computer. Note that the sound collection device of the present invention causes the recording unit 2020 of the computer 2000 to read a program that causes the computer 2000 to operate as each component of the present invention and operate the processing unit 2010, the input unit 2030, the output unit 2040, and the like. This can be achieved. In addition, as a method of causing the computer to read, the program is recorded on a computer-readable recording medium, and the program recorded on the server or the like is read into the computer through a telecommunication line or the like. There is a method to make it.

本発明の利用状況の一例を示す図。The figure which shows an example of the utilization condition of this invention. 特願２００６−５２５０２の収音装置の全体の構成を示す図。The figure which shows the structure of the whole sound-collecting apparatus of Japanese Patent Application No. 2006-55022. 第１〜第６収音部の指向性を説明するための平面図。The top view for demonstrating the directivity of the 1st-6th sound collection part. 第１〜第４収音部の構成を説明するためのブロック図。The block diagram for demonstrating the structure of a 1st-4th sound collection part. 第５収音部４−５と第６収音部４−６の構成を示す図。The figure which shows the structure of the 5th sound collection part 4-5 and the 6th sound collection part 4-6. 音源信号成分推定部７の構成を示す図。The figure which shows the structure of the sound source signal component estimation part. 利得係数算出８の構成を示す図。The figure which shows the structure of the gain coefficient calculation 8. FIG. 第１実施形態の収音装置全体の構成例を示す図。The figure which shows the structural example of the whole sound-collecting apparatus of 1st Embodiment. 第１実施形態の収音装置の処理フローを示す図。The figure which shows the processing flow of the sound collection device of 1st Embodiment. 処理対象信号生成部１４０の機能構成例を示す図。The figure which shows the function structural example of the process target signal production | generation part 140. FIG. パワースペクトル推定部１１０の機能構成例を示す図。The figure which shows the function structural example of the power spectrum estimation part 110. 残響スペクトル推定部１２０の機能構成例を示す図。The figure which shows the function structural example of the reverberation spectrum estimation part 120. FIG. 雑音発生のモデルを示す図。The figure which shows the model of noise generation. 各フレームでのパワースペクトルへの残響の影響を示す図。The figure which shows the influence of the reverberation to the power spectrum in each frame. 第２実施形態の収音装置全体の構成例を示す図。The figure which shows the structural example of the whole sound-collecting apparatus of 2nd Embodiment. 各収音部４’−１〜４’−６の設定を説明するための音源位置の領域を示す図。The figure which shows the area | region of the sound source position for demonstrating the setting of each sound collection part 4'-1-4'-6. 第１収音部４’−１の機能構成例を示す図。The figure which shows the function structural example of 1st sound collection part 4'-1. 処理対象信号生成部１４０’の機能構成例を示す図。The figure which shows the function structural example of the process target signal generation part 140 '. パワースペクトル推定部１１０’の機能構成例を示す図。The figure which shows the function structural example of power spectrum estimation part 110 '. 残響スペクトル推定部１２０’の機能構成例を示す図。The figure which shows the function structural example of the reverberation spectrum estimation part 120 '. 第３実施形態の収音装置全体の構成例を示す図。The figure which shows the structural example of the whole sound-collecting apparatus of 3rd Embodiment. 実験環境を示す図。The figure which shows an experimental environment. 残響が異なる２つの環境で、背景雑音の抑圧量を測定した結果を示す図。The figure which shows the result of having measured the suppression amount of the background noise in two environments where reverberations differ. 第２実施形態の収音装置で収音した音の品質を被験者により確認した結果を示す図。The figure which shows the result of having confirmed the quality of the sound collected with the sound collection device of 2nd Embodiment by the test subject. コンピュータの機能構成例を示す図。The figure which shows the function structural example of a computer.

Explanation of symbols

１１０、１１０’ パワースペクトル推定部
１１１減算部
１２０、１２０’ 残響スペクトル推定部
１２５、１２５’ ゲイン行列乗算部
１２６重み付き加算部
１４０、１４０’、１４０” 処理対象信号生成部 110, 110 ′ Power spectrum estimation unit 111 Subtraction unit 120, 120 ′ Reverberation spectrum estimation unit 125, 125 ′ Gain matrix multiplication unit 126 Weighted addition unit 140, 140 ′, 140 ″ Processing target signal generation unit

Claims

6 or more sound collection units for collecting sounds in different areas using output signals of a microphone array configured with a plurality of microphones;
A processing target signal generation unit that generates a processing target signal from one or more predetermined microphones or signals from the sound collection unit;
A power spectrum estimator that estimates the signal amount of the desired sound source from which the reverberation signal is removed and the signal amount of other sound sources for each frequency from the collected sound signals and the reverberant signal amounts obtained by the sound collecting units. When,
A reverberation spectrum estimation unit that obtains a signal amount of reverberation sound used for the next processing by the power spectrum estimation unit for each frequency from the signal amount of the desired sound source and the signal amount of other sound sources estimated by the power spectrum estimation unit ;
A gain coefficient calculation unit for obtaining a gain coefficient for each frequency from the ratio of the signal amount of the desired sound source and the signal amounts of all sound sources including the signal amount of the desired sound source;
A multiplier that multiplies the signal to be processed by the gain coefficient calculated by the gain coefficient calculator;
A sound collecting device.

First and second sound collection units that collect sound in an angular region including a desired sound source position from different positions using output signals of a microphone array configured to include a plurality of microphones;
A third and a fourth sound collecting unit for collecting sound in an angular region not including the desired sound source position from different positions using an output signal of the microphone array;
A fifth sound collection unit for collecting sound in an angle region including the desired sound source position from an intermediate point between the different positions;
A sixth sound collecting unit that picks up sound in an angular region not including the desired sound source position from the intermediate point;
A processing target signal generation unit that generates a processing target signal from one or more predetermined microphones or signals from the sound collection unit;
A power spectrum estimator that estimates the signal amount of the desired sound source from which the reverberation signal is removed and the signal amount of other sound sources for each frequency from the collected sound signals and the reverberant signal amounts obtained by the sound collecting units. When,
A reverberation spectrum estimation unit that obtains a signal amount of reverberation sound used for the next processing by the power spectrum estimation unit for each frequency from the signal amount of the desired sound source and the signal amount of other sound sources estimated by the power spectrum estimation unit ;
A gain coefficient calculation unit for obtaining a gain coefficient for each frequency from the ratio of the signal amount of the desired sound source and the signal amounts of all sound sources including the signal amount of the desired sound source;
A multiplier that multiplies the signal to be processed by the gain coefficient calculated by the gain coefficient calculator;
A sound collecting device.

First and second collections that collect sound by suppressing a part of sound in an angular region that does not include a desired sound source position from different positions using output signals of a microphone array that includes a plurality of microphones. The clef,
Third and fourth sound collection units for collecting sound by suppressing sound in an angle region including the desired sound source position from positions different from each other using output signals of the microphone array;
Collecting sound by suppressing a part of the sound that is different from the first and second sound collection units in an angular region not including the desired sound source position from different positions using the output signal of the microphone array. And fifth and sixth sound collection units
A processing target signal generation unit that generates a processing target signal from one or more predetermined microphones or signals from the sound collection unit;
A power spectrum estimator that estimates the signal amount of the desired sound source from which the reverberation signal is removed and the signal amount of other sound sources for each frequency from the collected sound signals and the reverberant signal amounts obtained by the sound collecting units. When,
A reverberation spectrum estimation unit that obtains a signal amount of reverberation sound used for the next processing by the power spectrum estimation unit for each frequency from the signal amount of the desired sound source and the signal amount of other sound sources estimated by the power spectrum estimation unit ;
A gain coefficient calculation unit for obtaining a gain coefficient for each frequency from the ratio of the signal amount of the desired sound source and the signal amounts of all sound sources including the signal amount of the desired sound source;
A multiplier that multiplies the signal to be processed by the gain coefficient calculated by the gain coefficient calculator;
A sound collecting device.

The sound collection device according to any one of claims 1 to 3,
The reverberation spectrum estimation unit includes:
A gain matrix multiplication unit that converts the signal amount of the desired sound source and the signal amount of the other sound source into a signal amount for each of the sound collection units;
A sound collection apparatus comprising: a weighted addition unit that records a signal amount for each of the sound collection units and weights and adds a plurality of past signal amounts of the sound collection units.

A sound collection step for collecting sounds of six or more different areas using an output signal of a microphone array configured with a plurality of microphones;
A processing target signal generation step of generating a processing target signal from a signal from one or more predetermined microphones or a signal obtained in the sound pickup step ;
From the signal amount before KiOsamu sound each collected sound signal and reverberation obtained in step, a signal of a desired sound source to remove reverberation signal, and the power spectrum estimation step of estimating the signal of other sound sources for each frequency ,
A reverberation spectrum estimation step for obtaining, for each frequency, a signal amount of reverberation sound used in the next processing by the power spectrum estimation step , from the signal amount of the desired sound source estimated by the power spectrum estimation step and the signal amount of other sound sources;
A gain coefficient calculating step for obtaining a gain coefficient for each frequency from the ratio of the signal amount of the desired sound source and the signal amounts of all sound sources including the signal amount of the desired sound source;
A multiplication step of multiplying the processing target signal by the gain coefficient calculated in the gain coefficient calculation step ;
A sound collection method.

First and second sound collecting steps for collecting sound in an angular region including a desired sound source position from different positions using output signals of a microphone array configured to include a plurality of microphones;
Third and fourth sound collecting steps for collecting sound in an angular region not including the desired sound source position from different positions using the output signal of the microphone array;
A fifth sound collecting step for collecting sound in an angle region including the desired sound source position from an intermediate point between the different positions;
A sixth sound collecting step for picking up sound in an angular region not including the desired sound source position from the intermediate point;
A processing target signal generation step of generating a processing target signal from a signal from one or more predetermined microphones or a signal obtained in the sound pickup step ;
Wherein the signal amount of each collected sound signal and reverberation obtained in each sound collecting step, and the signal level of a desired sound source to remove reverberation signal, and the power spectrum estimation step of estimating the signal of other sound sources for each frequency ,
A reverberation spectrum estimation step for obtaining, for each frequency, a signal amount of reverberation sound used in the next processing by the power spectrum estimation step , from the signal amount of the desired sound source estimated by the power spectrum estimation step and the signal amount of other sound sources;
A gain coefficient calculating step for obtaining a gain coefficient for each frequency from the ratio of the signal amount of the desired sound source and the signal amounts of all sound sources including the signal amount of the desired sound source;
A multiplication step of multiplying the processing target signal by the gain coefficient calculated in the gain coefficient calculation step ;
A sound collection method.

First and second collections that collect sound by suppressing a part of sound in an angular region that does not include a desired sound source position from different positions using output signals of a microphone array that includes a plurality of microphones. Sound step,
Third and fourth sound collecting steps for collecting sound by suppressing sound in an angular region including the desired sound source position from different positions using the output signal of the microphone array;
Collecting sound by suppressing a part of the sound that is different from the first and second sound collection units in an angular region not including the desired sound source position from different positions using the output signal of the microphone array. And fifth and sixth sound collecting steps to
A processing target signal generation step of generating a processing target signal from a signal from one or more predetermined microphones or a signal obtained in the sound pickup step ;
Wherein the signal amount of each collected sound signal and reverberation obtained in each sound collecting step, and the signal level of a desired sound source to remove reverberation signal, and the power spectrum estimation step of estimating the signal of other sound sources for each frequency ,
A reverberation spectrum estimation step for obtaining, for each frequency, a signal amount of reverberation sound used in the next processing by the power spectrum estimation step , from the signal amount of the desired sound source estimated by the power spectrum estimation step and the signal amount of other sound sources;
A gain coefficient calculating step for obtaining a gain coefficient for each frequency from the ratio of the signal amount of the desired sound source and the signal amounts of all sound sources including the signal amount of the desired sound source;
A multiplication step of multiplying the processing target signal by the gain coefficient calculated in the gain coefficient calculation step ;
A sound collection method.

The sound collection method according to any one of claims 5 to 7,
The reverberation spectrum estimation step includes:
A gain matrix multiplication sub-step for converting a signal amount of a desired sound source and a signal amount of another sound source into a signal amount for each sound collection step ;
A sound collection method comprising: a weighted addition sub-step for recording a signal amount for each of the sound collection steps and weighting and adding a plurality of signal amounts for each of the past sound collection steps .

A sound collection program for operating a computer as the sound collection device according to claim 1.

A computer-readable recording medium on which the sound collecting program according to claim 9 is recorded.