JP5139111B2

JP5139111B2 - Method and apparatus for extracting sound from moving sound source

Info

Publication number: JP5139111B2
Application number: JP2008034445A
Authority: JP
Inventors: 弘史中島; 一博中臺
Original assignee: Honda Motor Co Ltd
Current assignee: Honda Motor Co Ltd
Priority date: 2007-03-02
Filing date: 2008-02-15
Publication date: 2013-02-06
Anticipated expiration: 2028-02-15
Also published as: JP2008219884A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method to correctly extract a sound from a mobile sound source without generating frequency veering and/or amplitude fluctuation caused by a discontinuous output and a doppler effect. <P>SOLUTION: The method obtains a position of a sound source, and obtains a time varying convolution matrix from the sound source to each of observation points (H(p(k), n)) for converting a sound source signal vector (s) into observation signal vector (x(n)) for a plurality of observation points at each of the observation points as a function of the position of the sound source. Further, the time varying convolution matrix from the sound source to each of observation points is used to obtain a beam forming coefficient matrix (G(n)) for converting the observation signal vector at each of the observation points into the sound source signal vector for the intended sound source, and obtains a sound source signal vector (y) for the intended sound source from the observation signal vector at each of the observation points and the beam forming coefficient matrix. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、移動音源からの音の抽出方法および装置に関する。 The present invention relates to a method and apparatus for extracting sound from a moving sound source.

目的とする音源（以下、目的音源と呼称する）の音源信号を正確に抽出するために、ビームフォーミングが行われる。ビームフォーミングは、複数のマイクロフォンで観測した観測信号を適当なフィルタで処理して、目的音源の音源信号を強調して抽出する方法である。ビームフォーミングの具体的な方法としては、所定の方向からの音を同相化して強調する、遅延和アレイによる処理などが広く知られている。また、より高精度のビームフォーミング方法も提案されている（たとえば、特許文献１）。 In order to accurately extract a sound source signal of a target sound source (hereinafter referred to as a target sound source), beam forming is performed. Beam forming is a method in which observation signals observed with a plurality of microphones are processed with an appropriate filter, and a sound source signal of a target sound source is emphasized and extracted. As a specific method of beam forming, processing using a delay-and-sum array, in which sound from a predetermined direction is in-phase and emphasized, is widely known. A more accurate beamforming method has also been proposed (for example, Patent Document 1).

一方、移動音源に対するビームフォーミング方法は、従来、静止音源に対するビームフォーミング方法をそのまま適用したものであった。具体的には、たとえば、移動音源の移動領域を分割し、分割領域ごとにビームフォーミングのためのフィルタ係数を求め、それぞれの分割領域のフィルタ係数によって移動音源からの音の出力値を求め、出力値の内最大のものを移動音源からの音とする。しかし、このような従来の方法には、分割領域ごとに係数を切り替えることによる出力の不連続やドップラー効果による周波数変化や振幅変動が生じるという欠点があった。
特開平２００６−２７０９０３号公報 On the other hand, the beam forming method for a moving sound source has been conventionally applied to the beam forming method for a stationary sound source as it is. Specifically, for example, the moving area of the moving sound source is divided, the filter coefficient for beam forming is obtained for each divided area, the output value of the sound from the moving sound source is obtained by the filter coefficient of each divided area, and output The largest value is the sound from the moving sound source. However, such a conventional method has a drawback in that output discontinuity due to switching of coefficients for each divided region and frequency change and amplitude fluctuation due to the Doppler effect occur.
Japanese Patent Laid-Open No. 2006-270903

したがって、出力の不連続やドップラー効果による周波数変化や振幅変動を生じさせることなく、移動音源からの音を正確に抽出する方法および装置に対するニーズがある。 Therefore, there is a need for a method and apparatus for accurately extracting sound from a moving sound source without causing frequency changes or amplitude fluctuations due to output discontinuities or Doppler effects.

本発明による移動音源からの音の抽出方法は、音源の位置を求め、前記音源の位置の関数として、前記音源の音源信号ベクトルを複数の観測点のそれぞれの観測点における観測信号ベクトルに変換する、前記音源から前記それぞれの観測点までの時変畳み込み行列を求める。さらに、前記音源から前記それぞれの観測点までの時変畳み込み行列を使用して、前記それぞれの観測点における観測信号ベクトルを目的音源の音源信号ベクトルに変換するビームフォーミング係数行列を求め、前記それぞれの観測点における観測信号ベクトルおよび前記ビームフォーミング係数行列から前記目的音源の音源信号ベクトルを求める。 The sound extraction method from the moving sound source according to the present invention obtains the position of the sound source, and converts the sound source signal vector of the sound source into an observation signal vector at each of the observation points as a function of the position of the sound source. Then, a time-variant convolution matrix from the sound source to the respective observation points is obtained. Further, using a time-varying convolution matrix from the sound source to the respective observation points, a beam forming coefficient matrix for converting an observation signal vector at the respective observation points into a sound source signal vector of a target sound source is obtained, A sound source signal vector of the target sound source is obtained from the observation signal vector at the observation point and the beamforming coefficient matrix.

本発明による移動音源からの音の抽出装置は、複数の観測点における音の観測信号ベクトルを取得する音データ取得部と、音源の位置を検出する位置検出部と、を備える。本装置は、前記音源の位置の関数として、前記音源の音源信号ベクトルを複数の観測点のそれぞれの観測点における観測信号ベクトルに変換する、前記音源から前記それぞれの観測点までの時変畳み込み行列を格納する時変畳み込み行列格納部と、前記複数の観測点における音の観測信号ベクトルおよび前記音源から前記それぞれの観測点までの時変畳み込み行列から目的音源の音源信号ベクトルを求める演算処理部と、をさらに備える。 An apparatus for extracting sound from a moving sound source according to the present invention includes a sound data acquisition unit that acquires observation signal vectors of sound at a plurality of observation points, and a position detection unit that detects the position of the sound source. The apparatus converts a sound source signal vector of the sound source into an observation signal vector at each of a plurality of observation points as a function of the position of the sound source, and a time-varying convolution matrix from the sound source to the respective observation points A time-variant convolution matrix storage unit for storing sound, an arithmetic processing unit for obtaining a sound source signal vector of a target sound source from sound observation signal vectors at the plurality of observation points and a time-variant convolution matrix from the sound source to the respective observation points; Are further provided.

本発明によれば、移動する音源の位置によりビームフォーミング係数を切り替えて使用することがなく、それぞれの時刻において最適なビームフォーミング係数を使用するので、出力の不連続やドップラー効果による周波数変化や振幅変動を生じさせることなく、移動する目的音源による音を正確に抽出することができる。 According to the present invention, the beam forming coefficient is not switched and used depending on the position of the moving sound source, and the optimum beam forming coefficient is used at each time, so that the frequency change and amplitude due to output discontinuity and Doppler effect It is possible to accurately extract the sound from the moving target sound source without causing fluctuations.

移動音源の位置をp（ｔ）、その信号（体積速度）をｓ（ｔ）とすれば、位置qにおける観測信号（音圧）ｘ（ｔ）は、次式の解である［P. M. Morse and K. U. Ingard,
”Theoretical Acoustics”, Princeton, USA, pp. 717-732, 1968］。なお、以下の式において、ｐおよびｑは、位置を表すベクトルを示す。

If the position of the moving sound source is p (t) and its signal (volume velocity) is s (t), the observed signal (sound pressure) x (t) at the position q is a solution of the following equation [PM Morse and KU Ingard,
“Theoretical Acoustics”, Princeton, USA, pp. 717-732, 1968]. In the following equations, p and q represent vectors representing positions.

式（１）は、静止音源の波動方程式と比較すると、右辺のデルタ関数が時間の関数でもある点で異なり、移動速度などにより観測点での音圧は変化する。式（１）は、時変ではあるが、線形の方程式である。このため、ｓ（ｔ）を時刻ｔ_ｓでのインパルス入力

の積分として分解すれば

となり、観測信号ｘ（ｔ）は、時刻ｔ_ｓでのインパルス入力

に対する応答

の積分

として計算できる。応答

は、

としたときの式（１）の解であり、式（４）を式（１）に代入し、

とおいて整理すれば、

となる。式（５）は音源位置がp（ｔ_ｓ）である場合の静止音源に対するインパルス応答を与える式と一致する。したがって、

である。ここで、ｈ（ｔ，ｐ）は、音源位置がｐのときのインパルス応答である。したがって、式（３）は、

と変形できる。ここでｈ（ｔ，p）は、位置pにある静止音源から観測位置qまでのインパルス応答である。この式は移動音源であっても、音源が取りうる各位置からの静止のインパルス応答が既知であれば、その応答と音源信号から出力が求められることを示す。本明細書では式（７）を時変畳み込み演算と定義する。この演算は離散システムにおいても、式（７）と同様に近似的に次式で計算できることが実験的に示されている［奥山智尚、松久寛、宇津野秀夫、“伝達関数を用いた仮想空間移動時の音圧計算”、騒音振動研究会報告 N-2006-46, 2006］。 Equation (1) differs from the wave equation of a stationary sound source in that the delta function on the right side is also a function of time, and the sound pressure at the observation point changes depending on the moving speed and the like. Equation (1) is a time-varying but linear equation. For this reason, impulse input of s (t), at time _{t s}

If it is decomposed as an integral of

Next, the observed signal x (t) is an impulse input at time _{t s}

Response to

The integral of

Can be calculated as response

Is

Is the solution of equation (1), substituting equation (4) into equation (1),

If you keep it organized,

It becomes. Equation (5) is consistent with the formula which gives the impulse response with respect to the stationary sound source when the sound source position is p (t _s). Therefore,

It is. Here, h (t, p) is an impulse response when the sound source position is p. Therefore, equation (3) is

And can be transformed. Here, h (t, p) is an impulse response from the stationary sound source at the position p to the observation position q. This equation indicates that even if the sound source is a moving sound source, if the stationary impulse response from each position that the sound source can take is known, the output is obtained from the response and the sound source signal. In this specification, Equation (7) is defined as a time-varying convolution operation. It has been experimentally shown that this calculation can also be calculated approximately in the same way as in Equation (7) in the discrete system [Tomonao Okuyama, Hiroshi Matsuhisa, Hideo Utsuno, “Virtual Space Using Transfer Functions” "Sound pressure calculation during movement", Noise and Vibration Study Group Report N-2006-46, 2006].

ここで、ｋおよびｋ_Ｓは離散時間を表す。サンプリング周波数は、移動によるドップラー効果を加味した音源の上限周波数の２倍より大きく定める必要がある。式（８）はベクトルと行列を用いて次式で表現できる。

Here, k and k _S represent discrete time. The sampling frequency must be set to be larger than twice the upper limit frequency of the sound source taking into account the Doppler effect due to movement. Expression (8) can be expressed by the following expression using a vector and a matrix.

ここで、ｓは音源信号ベクトル、ｘは観測信号ベクトル，Ｈは時変畳み込み行列［M. Mastumoto, M. Tohyama and H. Yanagawa, “A method of interpolating binaural impulse responses for moving sound images,” Acoust. Sci. & Tech. 24,5, pp284-292,2003］である。時変畳み込み行列Ｈの各行および音源信号ベクトルｓの列は、音源信号の時刻に対応し、時変畳み込み行列Ｈの各列および観測信号ベクトルｘの列は、観測信号の時刻に対応する。音源の移動パターンを離散時間ベクトルkの位置ベクトル関数 p（k）として定義すれば、Ｈは、移動パターンp（k）と観測点ｑにより定まる。なお離散時間の原点はｌ、音源信号長はＬ_ｓインパルス応答長はＬ_ｈとした。 Here, s is a sound source signal vector, x is an observation signal vector, H is a time-varying convolution matrix [M. Mastumoto, M. Tohyama and H. Yanagawa, “A method of interpolating binaural impulse responses for moving sound images,” Acoust. Sci. & Tech. 24,5, pp284-292, 2003]. Each row of the time-varying convolution matrix H and the column of the sound source signal vector s correspond to the time of the sound source signal, and each column of the time-varying convolution matrix H and the column of the observation signal vector x correspond to the time of the observation signal. If the movement pattern of the sound source is defined as a position vector function p (k) of the discrete time vector k, H is determined by the movement pattern p (k) and the observation point q. The origin of the discrete time is 1 and the sound source signal length is L _s impulse response length L _h .

図１は、本発明の一実施形態による移動音源からの音の抽出装置の構成を示す図である。 FIG. 1 is a diagram illustrating a configuration of an apparatus for extracting sound from a moving sound source according to an embodiment of the present invention.

移動音源からの音の抽出装置は、音データ取得部１０１、移動音源の位置を検出する位置検出部１０３、ビームフォーミングなどの演算により移動音源を抽出する演算処理部１０５および時変畳み込み行列を格納する時変畳み込み行列格納部１０７を備える。 An apparatus for extracting sound from a moving sound source stores a sound data acquisition unit 101, a position detection unit 103 that detects the position of the moving sound source, an arithmetic processing unit 105 that extracts a moving sound source by operations such as beam forming, and a time-varying convolution matrix. A time-varying convolution matrix storage unit 107 is provided.

位置検出部１０３は、レーザ距離計や電波の位相差を利用した距離計などにより、音データ取得部１０１に対する移動音源の位置を検出する。時変畳み込み行列格納部１０７は、任意の音源の位置における時変畳み込み行列を格納しておき、検出された移動音源の位置に対応する時変畳み込み行列を取り出すようにしてもよい。あるいは、移動音源の移動経路が固定されている場合には、時変畳み込み行列格納部１０７は、該移動経路上の位置における時変畳み込み行列を格納しておき、検出された移動音源の位置に対応する時変畳み込み行列を取り出すようにしてもよい。 The position detection unit 103 detects the position of the moving sound source with respect to the sound data acquisition unit 101 by using a laser distance meter or a distance meter using a phase difference of radio waves. The time-varying convolution matrix storage unit 107 may store a time-varying convolution matrix at an arbitrary sound source position and take out a time-varying convolution matrix corresponding to the detected position of the moving sound source. Alternatively, when the moving path of the moving sound source is fixed, the time-varying convolution matrix storage unit 107 stores the time-varying convolution matrix at the position on the moving path, and the detected position of the moving sound source is stored. A corresponding time-varying convolution matrix may be extracted.

本明細書において、移動音源は、音データ取得部１０１に対して相対的に移動する音源を含む。換言すれば、移動音源の位置は観測点に対する相対的な位置である。したがって、音源が所定の位置に固定され、たとえば、ロボットなどに固定された音データ取得部１０１が、ロボットとともに移動する場合に、該所定の位置に固定された音源は移動音源とみなすことができる。 In this specification, the moving sound source includes a sound source that moves relative to the sound data acquisition unit 101. In other words, the position of the moving sound source is a relative position with respect to the observation point. Therefore, when the sound source is fixed at a predetermined position, for example, when the sound data acquisition unit 101 fixed to the robot or the like moves with the robot, the sound source fixed at the predetermined position can be regarded as a moving sound source. .

図２は、上記の移動音源からの音の抽出装置の機能を示す図である。音データ取得部１０１は、Ｎ個のマイクロフォンを備える。ｘ（ｎ）は、移動音源２０１の音ｓを、ｎ番目のマイクロフォンによって観測した観測信号ベクトルを示す。Ｇ（ｎ）は、ｎ番目のマイクロフォンの観測信号ベクトルに対するビームフォーミング係数である。ｙは、抽出された移動音源信号ベクトルである。演算処理部１０５は、ビームフォーミング係数Ｇ（ｎ）を求め、観測信号ベクトルｘ（ｎ）およびビームフォーミング係数Ｇ（ｎ）から移動音源信号ベクトルｙを求める。ビームフォーミング係数Ｇ（ｎ）の求め方については後で説明する。 FIG. 2 is a diagram illustrating the function of the sound extraction device from the moving sound source. The sound data acquisition unit 101 includes N microphones. x (n) represents an observation signal vector obtained by observing the sound s of the moving sound source 201 with the nth microphone. G (n) is a beamforming coefficient for the observation signal vector of the nth microphone. y is the extracted moving sound source signal vector. The arithmetic processing unit 105 obtains a beam forming coefficient G (n) and obtains a moving sound source signal vector y from the observation signal vector x (n) and the beam forming coefficient G (n). A method for obtaining the beam forming coefficient G (n) will be described later.

図３は、本発明の一実施形態による移動音源の抽出方法を示す流れ図である。以下の各ステップは、観測信号ベクトルｘ（ｎ）のサンプリングごと、または所定のサンプリング回数ごとに行う。 FIG. 3 is a flowchart illustrating a moving sound source extraction method according to an embodiment of the present invention. The following steps are performed every time the observed signal vector x (n) is sampled or every predetermined number of times of sampling.

図３のステップＳ０１０において、演算処理部１０５は、位置検出部１０３からの情報により、その時点における移動音源の位置を求め、該位置に対する移動音源からｎ番目のマイクロフォンまでの時変畳み込み行列Ｈ（ｐ（ｋ），ｎ）を、時変畳み込み行列格納部１０７から取り出す。該行列の行数は、Ｌ_ｓ＋Ｌ_ｈ−１であり、列数は、Ｌ_ｓである。 In step S010 of FIG. 3, the arithmetic processing unit 105 obtains the position of the moving sound source at that time from the information from the position detecting unit 103, and a time-varying convolution matrix H (from the moving sound source to the nth microphone for the position. p (k), n) is extracted from the time-varying convolution matrix storage unit 107. The number of rows of the matrix is L _s + L _h −1, and the number of columns is L _s .

図３のステップＳ０２０において、演算処理部１０５は、それぞれのマイクロフォンの観測信号ベクトルｘ（ｎ）および移動音源から該マイクロフォンまでの時変畳み込み行列Ｈ（ｐ（ｋ），ｎ）から、それぞれのマイクロフォンの観測信号ベクトルに対するビームフォーミング係数Ｇ（１）、Ｇ（２）・・・Ｇ（Ｎ）および移動音源信号ベクトルｙを求める。移動音源信号ベクトルｙは

である。ここで、ｙの列およびＧ^Ｔの各列は、音源信号の時刻に対応する。Ｇ^Ｔの各行およびｘの列は、Ｎ個のマイクロフォンの観測信号の時刻に対応する。 In step S020 in FIG. 3, the arithmetic processing unit 105 determines each microphone from the observed signal vector x (n) of each microphone and the time-varying convolution matrix H (p (k), n) from the moving sound source to the microphone. Beam forming coefficients G (1), G (2)... G (N) and a moving sound source signal vector y are obtained for the observed signal vectors. The moving sound source signal vector y is

It is. Here, each column of the columns and G ^T of y corresponds to the time of the source signal. Column of each row and x of G ^T corresponds to the time of the observation signals of the N microphones.

他方、観測信号群ｘは、

と表せる。ここで、ｘの列およびＨ^Ｔ(p(ｋ))の各列は、Ｎ個のマイクロフォンの観測信号の時刻に対応する。Ｈ^Ｔ(p(ｋ))の各行およびｓの列は、音源信号の時刻に対応する。 On the other hand, the observation signal group x is

It can be expressed. Here, each column of x and each column of H ^T (p (k)) corresponds to the time of the observation signals of N microphones. Each row of H ^T (p (k)) and the column of s correspond to the time of the sound source signal.

ｙ＝ｓである必要から、Ｇは、

の解であり、式（１２）を擬似逆行列として求めれば、移動音源に対応した最小ノルム重み付き遅延和ビームフォーミングの係数が得られる。 Since y = s needs to be satisfied, G is

If Equation (12) is obtained as a pseudo inverse matrix, the minimum norm weighted delay sum beamforming coefficient corresponding to the moving sound source can be obtained.

また、非目的音源の移動パターンｐ_Ｕ（k）と非目的音源からそれぞれの観測点（Ｎ個のマイクロフォン）への時変畳み込み行列Ｈ（ｐ_Ｕ（k））が既知であれば、

を順次追加した解を求めることで、非目的音源からの音のゲインを小さくし、目的音源による音をより正確に抽出することができる[中島弘史、“不定項を利用した平均サイドローブ最小ビームフォーミングの実現”、日本音響学会誌62巻10号、pp.726-737,2006］。 If the movement pattern p _U (k) of the non-target sound source and the time-varying convolution matrix H (p _U (k)) from the non-target sound source to each observation point (N microphones) are known,

By finding a solution that sequentially adds, the gain of the sound from the non-target sound source can be reduced, and the sound from the target sound source can be extracted more accurately [Hiroshi Nakajima, “Average sidelobe minimum beam using indefinite terms. Realization of forming ", Journal of the Acoustical Society of Japan, Vol. 62, No. 10, pp.726-737, 2006].

図４は、本実施形態による移動音源からの音の抽出方法の機能を確認するための数値実験を説明するための図である。音源は、移動音源Ｓ１と静止音源Ｓ２の２個とした。Ｓ１は、図５に示すように１２５Ｈｚの正弦波、Ｓ２は４００Ｈｚの正弦波とした。サンプリング周波数は１ｋＨｚ、Ｓ１の速度は２０ｍ／ｓ、信号長は０．５ｓとした。３素子のマイクロフォン・アレイ（Ｍ１、Ｍ２、Ｍ３）を使用したビームフォーミングにより、Ｓ１の信号を抽出することを目的とした。 FIG. 4 is a diagram for explaining a numerical experiment for confirming the function of the sound extraction method from the moving sound source according to the present embodiment. Two sound sources were used, a moving sound source S1 and a stationary sound source S2. As shown in FIG. 5, S1 was a 125 Hz sine wave, and S2 was a 400 Hz sine wave. The sampling frequency was 1 kHz, the S1 speed was 20 m / s, and the signal length was 0.5 s. The purpose was to extract the S1 signal by beam forming using a three-element microphone array (M1, M2, M3).

図５乃至図７において、横軸は時間（単位は秒）を示し、縦軸は音圧（単位はパスカル）を示す。 5 to 7, the horizontal axis indicates time (unit is second), and the vertical axis indicates sound pressure (unit is Pascal).

図６は、マイクロフォンＭ１における観測信号を示す図である。観測信号には、Ｓ２の信号（高周波）と、振幅と周波数が変化するＳ１の信号が見られる。 FIG. 6 is a diagram illustrating an observation signal in the microphone M1. In the observation signal, a signal S2 (high frequency) and a signal S1 whose amplitude and frequency change can be seen.

本実施形態による方法において、式（１２）および式（１３）の連立方程式の解としてビームフォーミング係数行列を求めた。 In the method according to the present embodiment, a beamforming coefficient matrix is obtained as a solution of the simultaneous equations of Expression (12) and Expression (13).

本実施形態と比較するための方法（以下、比較方法と呼称する）において、１０度間隔の音源方向に対して、それぞれビームフォーミング係数行列を求め、それぞれのビームフォーミング係数行列にしたがって出力値を求め、最大の出力値をＳ１の信号とした。 In a method for comparison with the present embodiment (hereinafter referred to as a comparison method), a beamforming coefficient matrix is obtained for each sound source direction at intervals of 10 degrees, and an output value is obtained according to each beamforming coefficient matrix. The maximum output value was the signal of S1.

図７（ａ）は、比較方法により抽出されたＳ１の信号を示す図である。図７（ａ）に示すように、Ｓ２の高周波成分は除去されるものの、Ｓ１の信号において移動による振幅変化や切り替えによる不連続が残る。なお、図７（ａ）において、０．３ｓ付近の振幅が低い原因は焦点と死角の方向が近いためである。 FIG. 7A is a diagram illustrating the S1 signal extracted by the comparison method. As shown in FIG. 7A, although the high-frequency component of S2 is removed, the amplitude change due to movement and the discontinuity due to switching remain in the signal of S1. In FIG. 7A, the reason why the amplitude is low in the vicinity of 0.3 s is that the direction of the focal point and the dead angle are close.

図７（ｂ）は、本実施形態による方法により抽出されたＳ１の信号を示す図である。図７（ｂ）に示すように、Ｓ１の信号において移動による振幅変化や切り替えによる不連続はなく、正確にＳ１の信号が再現されている。 FIG. 7B is a diagram showing the signal S1 extracted by the method according to the present embodiment. As shown in FIG. 7B, there is no amplitude change due to movement or discontinuity due to switching in the S1 signal, and the S1 signal is accurately reproduced.

本発明の実施形態の特徴を以下に説明する。 Features of the embodiment of the present invention will be described below.

本発明の実施形態によれば、目的音源の位置を求め、前記目的音源の位置の関数として、前記目的音源の音源信号ベクトルを複数の観測点のそれぞれの観測点における観測信号ベクトルに変換する、前記目的音源から前記それぞれの観測点までの時変畳み込み行列を求める。さらに、前記ビームフォーミング係数行列を、前記目的音源から前記それぞれの観測点までの時変畳み込み行列の擬似逆行列として求める。 According to an embodiment of the present invention, the position of the target sound source is obtained, and as a function of the position of the target sound source, the sound source signal vector of the target sound source is converted into an observation signal vector at each observation point of a plurality of observation points. A time-varying convolution matrix from the target sound source to each of the observation points is obtained. Further, the beamforming coefficient matrix is obtained as a pseudo inverse matrix of a time-varying convolution matrix from the target sound source to the respective observation points.

本実施形態によれば、移動する目的音源に対応した最小ノルム重み付き遅延和ビームフォーミングの係数が得られ、移動する目的音源による音を正確に抽出することができる。 According to this embodiment, the coefficient of the minimum norm weighted delay sum beamforming corresponding to the moving target sound source can be obtained, and the sound by the moving target sound source can be accurately extracted.

本発明の他の実施形態によれば、非目的音源の位置をさらに求め、前記非目的音源の位置の関数として、前記非目的音源の音源信号ベクトルを前記それぞれの観測点における観測信号ベクトルに変換する、前記非目的音源から前記それぞれの観測点までの時変畳み込み行列をさらに求め、前記ビームフォーミング係数行列を、前記非目的音源から前記それぞれの観測点までの時変畳み込み行列と前記ビームフォーミング係数行列との積が０となるように調整する。 According to another embodiment of the present invention, the position of the non-target sound source is further obtained, and the sound source signal vector of the non-target sound source is converted into an observation signal vector at each observation point as a function of the position of the non-target sound source. Further, a time-varying convolution matrix from the non-target sound source to the respective observation points is further obtained, and the beam forming coefficient matrix is determined from the time-varying convolution matrix from the non-target sound source to the respective observation points and the beam forming coefficient. Adjust so that the product with the matrix is zero.

本実施形態によれば、非目的音源からの音のゲインを小さくし、目的音源による音をより正確に抽出することができる。 According to this embodiment, the gain of the sound from the non-target sound source can be reduced, and the sound from the target sound source can be extracted more accurately.

本発明の一実施形態による移動音源からの音の抽出装置の構成を示す図である。It is a figure which shows the structure of the extraction apparatus of the sound from the moving sound source by one Embodiment of this invention. 移動音源からの音の抽出装置の機能を示す図である。It is a figure which shows the function of the extraction apparatus of the sound from a moving sound source. 本発明の一実施形態による移動音源の抽出方法を示す流れ図である。4 is a flowchart illustrating a method of extracting a moving sound source according to an embodiment of the present invention. 本実施形態による移動音源からの音の抽出方法の機能を確認するための数値実験を説明するための図である。It is a figure for demonstrating the numerical experiment for confirming the function of the extraction method of the sound from the moving sound source by this embodiment. 移動音源信号を示す図である。It is a figure which shows a moving sound source signal. マイクロフォンＭ１における観測信号を示す図である。It is a figure which shows the observation signal in the microphone M1. 比較方法および本実施形態による方法により抽出されたＳ１の信号を示す図である。It is a figure which shows the signal of S1 extracted by the comparison method and the method by this embodiment.

Explanation of symbols

１０１…音データ取得部、１０３…位置検出部、１０５…演算処理部、１０９…時変畳み込み行列格納部 DESCRIPTION OF SYMBOLS 101 ... Sound data acquisition part, 103 ... Position detection part, 105 ... Operation processing part, 109 ... Time-variant convolution matrix storage part

Claims

Find the location of the sound source,
As a function of the position of the sound source, the sound source signal vector of the sound source is converted into an observation signal vector at each observation point of a plurality of observation points, a time-varying convolution matrix from the sound source to the respective observation points is obtained,
Using a time-varying convolution matrix from the sound source to the respective observation points, a beam forming coefficient matrix for converting an observation signal vector at the respective observation points into a sound source signal vector of the target sound source is obtained.
A method for extracting sound from a moving sound source, wherein a sound source signal vector of the target sound source is obtained from observation signal vectors at the respective observation points and the beamforming coefficient matrix.

Determining the position of the target sound source;
A time-varying convolution matrix from the target sound source to each observation point, which converts the sound source signal vector of the target sound source into an observation signal vector at each observation point of a plurality of observation points as a function of the position of the target sound source. Seeking
The method for extracting sound from a moving sound source according to claim 1, wherein the beam forming coefficient matrix is obtained as a pseudo inverse matrix of a time-varying convolution matrix from the target sound source to each observation point.

Further find the location of the non-target sound source,
A time-variant convolution matrix from the non-target sound source to the respective observation points, which converts the sound source signal vector of the non-target sound source into an observation signal vector at the respective observation points as a function of the position of the non-target sound source; Seeking
3. The mobile sound source according to claim 2 , wherein the beam forming coefficient matrix is adjusted so that a product of a time-varying convolution matrix from the non-target sound source to each observation point and the beam forming coefficient matrix becomes zero. Sound extraction method.

A sound data acquisition unit for acquiring sound observation signal vectors at a plurality of observation points;
A position detector for detecting the position of the sound source;
When storing a time-varying convolution matrix from the sound source to each observation point, converting the sound source signal vector of the sound source into an observation signal vector at each observation point of a plurality of observation points as a function of the position of the sound source A convolution matrix storage unit;
An apparatus for extracting sound from a moving sound source, comprising: an arithmetic processing unit that obtains a sound source signal vector of a target sound source from observation signal vectors of sound at the plurality of observation points and a time-varying convolution matrix from the sound source to the respective observation points .

The position detection unit detects the position of the target sound source;
The time-varying convolution matrix storage unit converts the sound source signal vector of the target sound source into an observation signal vector at the respective observation points as a function of the position of the target sound source, from the target sound source to the respective observation points. Store the time-varying convolution matrix
The arithmetic processing unit generates a beamforming coefficient matrix for converting an observation signal vector at each observation point into a sound source signal vector of the target sound source, and a pseudo inverse of a time-varying convolution matrix from the target sound source to each observation point. 5. The apparatus for extracting sound from a moving sound source according to claim 4, wherein a sound source signal vector of the target sound source is obtained from an observation signal vector at each observation point and the beam forming coefficient matrix.

The position detector further detects the position of the non-target sound source;
The time-varying convolution matrix storage unit converts a sound source signal vector of the non-target sound source into an observation signal vector at the respective observation points as a function of the position of the non-target sound source, and the respective observations from the non-target sound source. Store more time-varying convolution matrices up to points,
The arithmetic processing unit adjusts the beamforming coefficient matrix so that a product of a time-varying convolution matrix from the non-target sound source to each observation point and the beamforming coefficient matrix becomes zero. An apparatus for extracting sound from a moving sound source described in 1.