JP2008304555A - Sound input apparatus - Google Patents

Sound input apparatus Download PDF

Info

Publication number
JP2008304555A
JP2008304555A JP2007149570A JP2007149570A JP2008304555A JP 2008304555 A JP2008304555 A JP 2008304555A JP 2007149570 A JP2007149570 A JP 2007149570A JP 2007149570 A JP2007149570 A JP 2007149570A JP 2008304555 A JP2008304555 A JP 2008304555A
Authority
JP
Japan
Prior art keywords
sound
sound pressure
dead point
sound source
differential value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2007149570A
Other languages
Japanese (ja)
Other versions
JP4894638B2 (en
Inventor
Minoru Fukushima
実 福島
Kana Kawahigashi
香菜 川東
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Electric Works Co Ltd
Original Assignee
Panasonic Electric Works Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Panasonic Electric Works Co Ltd filed Critical Panasonic Electric Works Co Ltd
Priority to JP2007149570A priority Critical patent/JP4894638B2/en
Publication of JP2008304555A publication Critical patent/JP2008304555A/en
Application granted granted Critical
Publication of JP4894638B2 publication Critical patent/JP4894638B2/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

<P>PROBLEM TO BE SOLVED: To input only sound generated from a desired sound source. <P>SOLUTION: In sound pressure O(f, k) which is output from a dead point forming means 2, sound other than the sound generated from the sound source which exists in the dead point, namely, the sound pressure of noise is included, while in sound pressure M(f, k) detected by a sound collecting sensor means 1, both the sound pressure of the sound generated from the sound source which exists in the dead point, and the sound pressure of noise are included. Since only sound pressure S(f, k) of the sound generated from the sound source which exists in the dead point, is extracted by an extracting means 3 of object speaker voice, only the sound generated from the sound source which exists in the dead point is input, even when the noise comes from a direction of the dead point. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、音源から発せられる音を入力する音響入力装置に関するものである。   The present invention relates to an acoustic input device that inputs sound emitted from a sound source.

周囲騒音や残響の存在する環境下で特定の音源から発せられる音、例えば、人の発する音声(話者音声)のみを入力する音響入力装置が従来より種々提供されている。例えば、特許文献1に記載されている従来例は、2つのマイクロホンと、各マイクロホンで集音した音響信号毎に周波数分析して2つのチャンネル別の周波数成分を得るとともに、各チャネルの周波数成分について適応ビームフォーマ処理を行うことにより、目的音(話者音声)の方向以外の感度を低くして周囲雑音等が抑圧された音響信号を取得し、同様に周囲雑音等の方向以外の感度を低くして目的音が抑圧された音響信号を取得し、適応ビームフォーマ処理で用いられるフィルタ係数から目的音方向と周囲雑音方向を推定して逐次修正し、さらに、スペクトル・サブトラクション処理によって前者の音響信号と後者の音響信号から周囲雑音成分を除去することによって目的音(話者音声)のみを入力するものである。
特開2000−47699号公報
2. Description of the Related Art Conventionally, various acoustic input devices for inputting only sound emitted from a specific sound source in an environment where ambient noise and reverberation exist, for example, a voice uttered by a person (speaker voice) have been provided. For example, in the conventional example described in Patent Document 1, frequency analysis is performed for two microphones and acoustic signals collected by each microphone to obtain frequency components for two channels, and for each channel frequency component By performing adaptive beamformer processing, the sensitivity other than the direction of the target sound (speaker voice) is reduced to obtain an acoustic signal in which the ambient noise is suppressed, and similarly the sensitivity other than the direction of the ambient noise is reduced. The target sound is suppressed and the target sound direction and the ambient noise direction are estimated from the filter coefficients used in the adaptive beamformer processing and corrected sequentially, and then the former sound signal is processed by spectral subtraction processing. In the latter case, only the target sound (speaker voice) is input by removing the ambient noise component from the acoustic signal.
JP 2000-47699 A

しかしながら、上記従来例においては、例えばマイクロホンからみて話者と雑音源が同じ方向に存在する場合には目的音と目的音以外の雑音とを分離することができないという問題がある。   However, the above conventional example has a problem that the target sound and the noise other than the target sound cannot be separated when the speaker and the noise source are present in the same direction as viewed from the microphone, for example.

本発明は上記事情に鑑みて為されたものであり、その目的は、所望の音源から発せられる音のみを入力することができる音響入力装置を提供することにある。   The present invention has been made in view of the above circumstances, and an object thereof is to provide an acoustic input device capable of inputting only sound emitted from a desired sound source.

請求項1の発明は、上記目的を達成するために、音圧、当該音圧の時間微分値、当該音圧を二次元直交座標系の各軸方向に微分した空間微分値をそれぞれ検出する集音センサ手段と、集音センサ手段で検出される音圧、時間微分値、空間微分値に対して所定の係数ベクトルとの荷重和及び低域通過フィルタ処理を行うことにより集音感度が最小となる死点を予め設定した目的話者の位置に形成する死点形成手段と、集音センサ手段で検出される音圧と死点形成手段から出力される音圧を用いて前記目的話者から発せられる音声の音圧のみを抽出する目的話者音声抽出手段とを備えたことを特徴とする。   In order to achieve the above object, the invention of claim 1 is a collection for detecting a sound pressure, a time differential value of the sound pressure, and a spatial differential value obtained by differentiating the sound pressure in each axial direction of a two-dimensional orthogonal coordinate system. The sound collection sensitivity is minimized by performing the load sum of the predetermined coefficient vector and the low-pass filter processing on the sound sensor means and the sound pressure, time differential value, and spatial differential value detected by the sound collection sensor means. From the target speaker using the dead point forming means for forming the dead point at the position of the target speaker set in advance, and the sound pressure detected by the sound collecting sensor means and the sound pressure output from the dead point forming means. And a target speaker voice extracting means for extracting only the sound pressure of the uttered voice.

請求項1の発明によれば、死点形成手段から出力される音圧には死点に存在する音源から発せられる音以外の音、すなわち、雑音の音圧のみが含まれ、一方、集音センサ手段で検出される音圧には死点に存在する音源から発せられる音の音圧と雑音の音圧の双方が含まれており、目的話者音声抽出手段によって死点に存在する音源から発せられる音の音圧のみを抽出するので、雑音が死点の方向から到来する場合においても死点に存在する音源から発せられる音のみを入力することができる。   According to the first aspect of the present invention, the sound pressure output from the dead point forming means includes only sound other than the sound emitted from the sound source existing at the dead point, that is, noise pressure, The sound pressure detected by the sensor means includes both the sound pressure of the sound emitted from the sound source existing at the dead point and the sound pressure of the noise. From the sound source existing at the dead point by the target speaker voice extraction means Since only the sound pressure of the emitted sound is extracted, only the sound emitted from the sound source existing at the dead point can be input even when the noise comes from the direction of the dead point.

請求項2の発明は、請求項1の発明において、目的話者音声抽出手段は、スペクトル・サブトラクション法によって前記音源から発せられる音の音圧を抽出することを特徴とする。   The invention of claim 2 is characterized in that, in the invention of claim 1, the target speaker voice extracting means extracts the sound pressure of the sound emitted from the sound source by the spectral subtraction method.

請求項3の発明は、請求項1の発明において、目的話者音声抽出手段は、独立成分分析によって前記音源から発せられる音の音圧を抽出することを特徴とする。   The invention of claim 3 is characterized in that, in the invention of claim 1, the target speaker voice extraction means extracts the sound pressure of the sound emitted from the sound source by independent component analysis.

請求項4の発明は、請求項3の発明において、目的話者音声抽出手段は、独立成分分析を行う前に主成分分析を行うことを特徴とする。   According to a fourth aspect of the present invention, in the third aspect of the present invention, the target speaker voice extracting means performs the principal component analysis before performing the independent component analysis.

請求項5の発明は、請求項1〜4の何れか1項の発明において、集音センサ手段は、前記二次元直交座標系の各軸と直交する向きに複数が配置されたマイクロホンを具備することを特徴とする。   According to a fifth aspect of the present invention, in any one of the first to fourth aspects, the sound collection sensor means includes a plurality of microphones arranged in directions orthogonal to the respective axes of the two-dimensional orthogonal coordinate system. It is characterized by that.

請求項6の発明は、請求項1〜4の何れか1項の発明において、集音センサ手段は、振動板が中央の1点で支持された2軸直交型のジンバル構造を有するマイクロホンを具備することを特徴とする。   According to a sixth aspect of the present invention, in any one of the first to fourth aspects, the sound collecting sensor means includes a microphone having a biaxial orthogonal gimbal structure in which a diaphragm is supported at one central point. It is characterized by doing.

請求項7の発明は、請求項1〜6の何れか1項の発明において、前記音源位置を変更する音源位置変更手段を備え、音源位置変更手段は、集音センサ手段で検出される音圧の瞬時パワーを死点形成手段から出力される音圧の瞬時パワーで除した値が所定のしきい値以上となるときに集音センサ手段で検出される音圧、時間微分値、空間微分値に基づいて推定される位置に前記音源位置を変更することを特徴とする。   A seventh aspect of the present invention provides the sound source position changing means for changing the sound source position according to any one of the first to sixth aspects, wherein the sound source position changing means is a sound pressure detected by the sound collecting sensor means. The sound pressure, time differential value, and spatial differential value detected by the sound collecting sensor means when the value obtained by dividing the instantaneous power of the sound by the instantaneous power of the sound pressure output from the dead point forming means is equal to or greater than a predetermined threshold value The sound source position is changed to a position estimated based on the above.

請求項7の発明によれば、音源の位置が変動した場合でも音源位置変更手段によって音源位置が変更されるために音源位置と死点がずれることがなく、その結果、音源が移動する場合においても当該音源から発せられる音のみを入力することができる。   According to the invention of claim 7, even when the position of the sound source changes, the sound source position is changed by the sound source position changing means, so that the position of the sound source does not deviate from the dead point, and as a result, the sound source moves. Also, only sound emitted from the sound source can be input.

本発明によれば、雑音が死点の方向から到来する場合においても死点に存在する音源から発せられる音のみを入力することができる。   According to the present invention, even when noise comes from the direction of the dead center, it is possible to input only the sound emitted from the sound source existing at the dead point.

以下、図面を参照して本発明の実施形態を詳細に説明する。尚、実施形態の説明において参照する文献については、参考文献1,参考文献2,…のように表記し、それぞれの参考文献1,2,…の一覧を末尾に掲載する。   Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In addition, about the literature referred in description of embodiment, it describes with reference literature 1, reference literature 2, ..., and lists each reference literature 1, 2, ... at the end.

(実施形態1)
本実施形態の音響入力装置は、図1に示すように音圧、当該音圧の時間微分値、当該音圧を二次元直交座標系の各軸方向に微分した空間微分値をそれぞれ検出する集音センサ手段1と、集音センサ手段1で検出される音圧、時間微分値、空間微分値に対して所定の係数ベクトルとの荷重和及び低域通過フィルタ処理を行うことにより集音感度が最小となる死点を予め設定した目的話者の位置に形成する死点形成手段2と、集音センサ手段1で検出される音圧と死点形成手段2から出力される音圧を用いて前記死点に存在する音源(話者)から発せられる音の音圧のみを抽出する目的話者音声抽出手段3とを備えている。
(Embodiment 1)
As shown in FIG. 1, the acoustic input device according to the present embodiment collects sound pressure, a time differential value of the sound pressure, and a spatial differential value obtained by differentiating the sound pressure in each axial direction of a two-dimensional orthogonal coordinate system. The sound collection sensitivity is improved by performing a load sum with a predetermined coefficient vector and a low-pass filter process on the sound sensor means 1 and the sound pressure, time differential value, and space differential value detected by the sound collection sensor means 1. Using the dead point forming means 2 for forming the minimum dead point at a predetermined target speaker position, the sound pressure detected by the sound collecting sensor means 1 and the sound pressure output from the dead point forming means 2. And target speaker voice extraction means 3 for extracting only the sound pressure of the sound emitted from the sound source (speaker) existing at the dead point.

集音センサ手段1は、図1に示すように三次元直交座標系のx軸並びにy軸と直交する向き(z軸の正の向き)に複数(図示例では4本)が配置された無指向性のマイクロホン10A,10B,10C,10Dと、マイクロホン10A〜10Dの出力信号fA(t),fB(t),fC(t),fD(t)に対して時空間勾配測定処理を行う時空間勾配測定処理部11とを具備する。時空間勾配測定処理部11では、各マイクロホン10A〜10Dの出力信号fA(t),fB(t),fC(t),fD(t)から音圧の同相成分M(t)、時間微分値(時間勾配成分)Mt(t)、x軸方向空間微分値(x軸方向空間勾配成分)Mx(t)、y軸方向空間微分値(y軸方向空間勾配成分)My(t)をそれぞれ下式より求める。 As shown in FIG. 1, a plurality of (four in the illustrated example) sound collecting sensor means 1 are arranged in a direction orthogonal to the x-axis and y-axis (positive direction of the z-axis) of the three-dimensional orthogonal coordinate system. Spatiotemporal gradient measurement for directional microphones 10A, 10B, 10C and 10D and output signals f A (t), f B (t), f C (t) and f D (t) of microphones 10A to 10D And a spatiotemporal gradient measurement processing unit 11 for performing processing. In the spatiotemporal gradient measurement processing unit 11, the in-phase component M (t) of the sound pressure is determined from the output signals f A (t), f B (t), f C (t), and f D (t) of the microphones 10A to 10D. , Time differential value (time gradient component) M t (t), x-axis direction spatial differential value (x-axis direction spatial gradient component) M x (t), y-axis direction spatial differential value (y-axis direction spatial gradient component) M Each y (t) is obtained from the following equation.

M(t)=fA(t)+fB(t)+fC(t)+fD(t)
t(t)=dfA(t)/dt+dfB(t)/dt+dfC(t)/dt+dfD(t)/dt
x(t)=fA(t)+fB(t)−fC(t)−fD(t)
y(t)=fA(t)−fB(t)+fC(t)−fD(t)
死点形成手段2は、集音センサ手段1から出力される同相成分M(t)、時間微分値Mt(t)、x軸方向空間微分値Mx(t)、y軸方向空間微分値My(t)を用いて、時空間勾配法を応用することで死点を形成するものである。ここで、死点形成手段2による死点形成処理を説明するに当たって、初めに時空間勾配法について詳しく説明する。
M (t) = f A (t) + f B (t) + f C (t) + f D (t)
M t (t) = df A (t) / dt + df B (t) / dt + df C (t) / dt + df D (t) / dt
M x (t) = f A (t) + f B (t) −f C (t) −f D (t)
M y (t) = f A (t) -f B (t) + f C (t) -f D (t)
The dead point forming means 2 includes an in-phase component M (t), a time differential value M t (t), an x-axis direction spatial differential value M x (t), and a y-axis direction spatial differential value output from the sound collection sensor means 1. with M y (t), and forms a dead point by the application of space-time gradient method. Here, in describing the dead point forming process by the dead point forming means 2, the spatiotemporal gradient method will be described in detail first.

時空間勾配法とは、そもそも動画像中の見かけの速度場であるオプティカルフローを決定する手法の一つとして提案されたものである(参考文献1参照)。動画像中の濃淡パターンの特徴を表す画像関数f(x,y,t)が、運動に際し不変に保たれるとの仮定(f(x,y,t)=f(x+δx,y+δy,t+δt))より、ある点(x,y)におけるオプティカルフローの速度と、動画像の濃淡分布の空間勾配および時間勾配を関係付ける式をもとにした解析手法である。以下、この手法について詳しく解説する。   The spatiotemporal gradient method was originally proposed as one of the methods for determining an optical flow that is an apparent velocity field in a moving image (see Reference 1). Assumption (f (x, y, t) = f (x + δx, y +) that the image function f (x, y, t) representing the characteristics of the light and shade pattern in the moving image is kept unchanged during the movement. δy, t + δt)), the analysis method is based on an equation that relates the velocity of the optical flow at a certain point (x, y) to the spatial gradient and temporal gradient of the gray-scale distribution of the moving image. The method will be explained in detail below.

時刻t+δtにおいて、座標(x+δx,y+δy)での濃淡パターンf(x+δx,y+δy,t+δt)を(x,y,t)のまわりでテーラー展開すると、
f(x+δx,y+δy,t+δt)=f(x,y,t)+fxδx+fyδy+ftδt+O(δx+δy+δt) …(1)
となる。ここで、O(δx+δy+δt)はδx,δy,δtの2次以上の項であるが、微小量であるために以降では無視する。この時、時刻tにおいて座標(x,y)にある濃淡パターンが、δt時刻経過した後に座標(x+δx,y+δy)にその濃度値分布を一定に保ったまま移動した時、その対応付けから次式が成り立つ。
At time t + δt, when the density pattern f (x + δx, y + δy, t + δt) at coordinates (x + δx, y + δy) is tailored around (x, y, t),
f (x + δx, y + δy, t + δt) = f (x, y, t) + f x δx + f y δy + f t δt + O (δx + δy + δt) ... (1)
It becomes. Here, O (δx + δy + δt) is a second-order or higher term of δx, δy, δt, but is ignored in the following because it is a minute amount. At this time, when the grayscale pattern at the coordinate (x, y) at time t moves with the density value distribution kept constant at the coordinate (x + δx, y + δy) after δt time has passed, The following equation holds from the attachment.

f(x,y,t)=f(x+δx,y+δy,t+δt)
=f(x,y,t)+fxδx+fyδy+ftδt …(2)
xδx+fyδy+ftδt=0 …(3)
式(3)の両辺をδtで割ると、
xδx/δt+fyδy/δt+ft=0 …(4)
を得る。ここで、δtが無限小であると仮定して、δt→0とすると次式を得る。
f (x, y, t) = f (x + δx, y + δy, t + δt)
= f (x, y, t ) + f x δx + f y δy + f t δt ... (2)
f x δx + f y δy + f t δt = 0 ... (3)
Dividing both sides of equation (3) by δt,
f x δx / δt + f y δy / δt + f t = 0 ... (4)
Get. Here, assuming that δt is infinitesimal, assuming that δt → 0, the following equation is obtained.

xdx/dt+fydy/dt+ft=0 …(5)
オプティカルフロー速度v=(u,v)=(dx/dt,dy/dt)を用いると、式(5)は、
ufx+vfy+ft=0 …(6)
となり、式(6)は動画像の濃淡値の時間、空間に関する勾配とオプティカルフロー速度vとを関係付ける式である。
f x dx / dt + f y dy / dt + f t = 0 ... (5)
Using optical flow velocity v = (u, v) = (dx / dt, dy / dt), equation (5) becomes
uf x + vf y + f t = 0 ... (6)
Equation (6) is an equation that correlates the gradient of time and space of the gray value of the moving image with the optical flow velocity v.

次に、「ある着目点の近傍領域Γにおいて速度場はほぼ一定であると近似できる」という仮定を行う。この時、領域Γ内のいたるところで式(6)が成立しなければならない。そこで、式(6)の左辺の2乗積分(下記の式(7))を用いて評価し、最小自乗法によって速度場を求める。   Next, an assumption is made that “the velocity field can be approximated to be almost constant in the vicinity region Γ of a certain point of interest”. At this time, equation (6) must be established everywhere in the region Γ. Therefore, evaluation is performed using the square integral of the left side of the equation (6) (the following equation (7)), and the velocity field is obtained by the method of least squares.

式(7)をu,vに関して微分し、0とおくと、
uSxx+vSxy+Sxt=0,uSxy+vSyy+Syt=0 …(8)
Differentiating equation (7) with respect to u and v and setting it to 0,
uS xx + vS xy + S xt = 0, uS xy + vS yy + S yt = 0 (8)

が得られる。式(8)を解くと速度ベクトル(u,v)は
u=(SytSxy-SxtSyy)/(SxxSyy-S2 xy),v=(SxtSxy-SytSxx)/(SxxSyy-S2 xy) …(10)
のように求められる。
Is obtained. Solving equation (8) gives the velocity vector (u, v)
u = (S yt S xy -S xt S yy ) / (S xx S yy -S 2 xy ), v = (S xt S xy -S yt S xx ) / (S xx S yy -S 2 xy )… (10)
It is required as follows.

次に、上述の動画像中のオプティカルフロー速度を求める時空間勾配法を応用して、音源が空間中に作る音場のある1点における音圧とその時空間勾配の間に成り立つ線形関係に基づいて、音源位置を定位する手法について説明する(参考文献2参照)。   Next, by applying the spatiotemporal gradient method for obtaining the optical flow velocity in the moving image described above, based on the linear relationship established between the sound pressure at one point in the sound field created by the sound source in the space and the spatiotemporal gradient. A method for localizing the sound source position will be described (see Reference 2).

図2に示すように観測点を原点とする三次元直交座標系を取り、その前方(z>0)に互いに無相関な点音源が複数個あるとする。音速をc、i番目の音源の座標を(xi,yi,zi)、音源と観測点との距離をRi=(x2 i+y2 i+z2 i)1/2、音源音をgi(t)、各音源が観測点に形成する音場をfi(t)とすると、観測点に形成される合成音場fはこれらからの球面波の和として、 As shown in FIG. 2, it is assumed that a three-dimensional orthogonal coordinate system having an observation point as the origin is taken, and there are a plurality of point sources that are uncorrelated with each other in front (z> 0). The speed of sound is c, the coordinates of the i-th sound source are (x i , y i , z i ), the distance between the sound source and the observation point is R i = (x 2 i + y 2 i + z 2 i ) 1/2 , Assuming that the sound source sound is g i (t) and the sound field formed by each sound source at the observation point is f i (t), the synthesized sound field f formed at the observation point is the sum of the spherical waves from them,

と表される。これを偏微分することにより、観測点での音場のx,y微分、時間微分は下記の式(12),(13),(14)で表される。 It is expressed. By performing partial differentiation, the x, y differentiation and time differentiation of the sound field at the observation point are expressed by the following equations (12), (13), (14).

ここで、
ξi x=xi/R2 ii y=yi/R2 i …(15)
は強度勾配と呼ばれ、
τi x=xi/cRii y=yi/cRi …(16)
はx,y方向時間勾配と呼ばれる。
here,
ξ i x = x i / R 2 i , ξ i y = y i / R 2 i … (15)
Is called the intensity gradient,
τ i x = x i / cR i , τ i y = y i / cR i (16)
Is called the time gradient in the x and y directions.

次に簡単のため、1音源の場合の音源定位手法について述べる。1音源の場合、式(12),(13)は
fx=-ξxf-τxft,fy=-ξyf-τyft …(17)
となり、式(1)と同様に最小自乗法を適用してτxyxyを求める。短時間の時間窓Γにおいて評価関数を
J=∫Γ{(fxxf+τxft)2+(fyyf+τyft)2}dt …(18)
とする。式(18)をτxyxyに関して偏微分し、0とおくと下式が得られる。
Next, for the sake of simplicity, a sound source localization method in the case of one sound source will be described. In the case of one sound source, equations (12) and (13) are
f x = -ξ x f-τ x f t , f y = -ξ y f-τ y f t (17)
Thus, τ x , τ y , ξ x , ξ y are obtained by applying the method of least squares as in the equation (1). The evaluation function in the short time window Γ
J = ∫ Γ {(f x + ξ x f + τ x f t ) 2 + (f y + ξ y f + τ y f t ) 2 } dt… (18)
And When the equation (18) is partially differentiated with respect to τ x , τ y , ξ x , ξ y and set to 0, the following equation is obtained.

∂J/∂τx=∫Γ2(fxxf+τxft)・ftdt=0,∂J/∂τy=∫Γ2(fyyf+τyft)・ftdt=0 …(19)
∂J/∂ξx=∫Γ2(fxxf+τxft)・ftdt=0,∂J/∂ξy=∫Γ2(fyyf+τyft)・ftdt=0 …(20)
ここで、観測窓Γから推定される共分散行列を
∂J / ∂τ x = ∫ Γ 2 (f x + ξ x f + τ x f t ) ・ f t dt = 0, ∂J / ∂τ y = ∫ Γ 2 (f y + ξ y f + τ y f t ) ・ f t dt = 0… (19)
∂J / ∂ξ x = ∫ Γ 2 (f x + ξ x f + τ x f t ) ・ f t dt = 0, ∂J / ∂ξ y = ∫ Γ 2 (f y + ξ y f + τ y f t ) ・ f t dt = 0… (20)
Where the covariance matrix estimated from the observation window Γ is

とおくと、式(19),(20)は
SxtxStxStt=0,SytyStyStt=0 …(22)
SxxS+τxSt=0,SyyS+τySt=0 …(23)
と書き直される。式(22),(23)を解くことにより、τxyxyが次式のように求められる。
Then, equations (19) and (20) are
S xt + ξ x S t + τ x S tt = 0, S yt + ξ y S t + τ y S tt = 0… (22)
S x + ξ x S + τ x S t = 0, S y + ξ y S + τ y S t = 0… (23)
Rewritten. By solving the equations (22) and (23), τ x , τ y , ξ x , and ξ y are obtained as follows.

τx=(SxSt-SSxt)/(SStt-S2 t),τy=(SySt-SSyt)/(SStt-S2 t) …(24)
ξx=(SxtSt-SxStt)/(SStt-S2 t),ξy=(SytSt-SyStt)/(SStt-S2 t) …(25)
音源の方位角(x/R,y/R)=(cτx,cτy)については式(21),(24)から求められる。音源までの距離Rについては、式(15),(16)から最小自乗法を適用することにより求められる。評価関数を
τ x = (S x S t -SS xt ) / (SS tt -S 2 t ), τ y = (S y S t -SS yt ) / (SS tt -S 2 t ) (24)
ξ x = (S xt S t -S x S tt) / (SS tt -S 2 t), ξ y = (S yt S t -S y S tt) / (SS tt -S 2 t) ... (25 )
The azimuth angle (x / R, y / R) = (cτ x , cτ y ) of the sound source can be obtained from equations (21) and (24). The distance R to the sound source can be obtained by applying the least square method from the equations (15) and (16). Evaluation function

とし、これを1/Rで偏微分して0とおくと And this is a partial differential with 1 / R and set to 0

となる。これを解くと
R=c(τ2 x2 y)/(τxξxyξy) …(28)
のように音源までの距離が求められる。
It becomes. Solving this
R = c (τ 2 x + τ 2 y ) / (τ x ξ x + τ y ξ y )… (28)
The distance to the sound source is required.

次に、音場の時空間勾配を利用して、指向性制御を行う手法について解説する(参考文献3〜5参照)。今、1音源の場合を仮定すると、観測点における音圧信号f(t)のx,y方向の空間勾配は式(12),(13)より   Next, a method for directivity control using the spatiotemporal gradient of the sound field will be described (see References 3 to 5). Assuming the case of one sound source, the spatial gradient in the x and y directions of the sound pressure signal f (t) at the observation point is given by equations (12) and (13).

となる。この式を音源から観測点に向かうベクトルr=(x,y,z)を用いて書き直すと It becomes. When this equation is rewritten using the vector r = (x, y, z) from the sound source to the observation point,

となる。次にf(t),ft(t),∇f(t)が観測される時、これらの荷重和は It becomes. Then f (t), f t ( t), when ∇f (t) is observed, these weighted sum is

と表される。ただし、u,utは実数定数、w=(wx,wy,0)は観測点を原点とし、任意の方向を向いている単位ベクトルである。式(30)を式(31)に代入すると、 It is expressed. Here, u and u t are real constants, and w = (w x , w y , 0) is a unit vector with the observation point as the origin and pointing in an arbitrary direction. Substituting equation (30) into equation (31),

となる。よって時空間勾配の荷重和は、f(t),ft(t)に対してそれぞれ異なる指向特性H(r),Ht(r)をもつフィルタの和として表される。H(r)=αのとき、式(33)は It becomes. Therefore, the load sum of the spatiotemporal gradient is expressed as the sum of filters having different directivity characteristics H (r) and H t (r) for f (t) and f t (t), respectively. When H (r) = α, equation (33) becomes

と変形できる。ここで、2つのベクトルa,bの成す角をθとすると以下の公式が成り立つ。 And can be transformed. Here, if the angle formed by the two vectors a and b is θ, the following formula holds.

式(38)の公式を用いると式(36)は次式のように書き換えられる。 Using the formula of equation (38), equation (36) can be rewritten as

ここで、|w|=1より、 Here, from | w | = 1,

という球の方程式で表される。u+α=0の場合には、式(35)は
r・w=0 …(42)
となる。また、Ht(r)=αの時には式(34)は
It is expressed by the sphere equation. When u + α = 0, equation (35) becomes r · w = 0 (42)
It becomes. Also, when H t (r) = α, equation (34) becomes

となるので、ベクトルrとwの成す角をθ(r)とすると|w|=1より Therefore, if the angle between vectors r and w is θ (r), | w | = 1

となる。よって、式(43)は It becomes. Therefore, equation (43) becomes

となる。 It becomes.

式(41)、(42)、(45)より、H(r),Ht(r)について次のような性質を得る。
1)2つの指向特性H(r),Ht(r)はwを軸とする回転対称体をもつ
2)H(r)=0の時、rの分布は直径1/u(u≠0)の球面または平面(u=0)を成す
3)Ht(r)=0の時、rの分布は頂角2cut(ut≠0)の円錐面または平面(ut=0)を成す
4)H(r)=0とHt(r)=0の時のrの分布の交わりは円または平面を成す
式(32)を周波数領域に変換すると、
From the equations (41), (42), and (45), the following properties are obtained for H (r) and H t (r).
1) The two directivity characteristics H (r) and H t (r) have a rotationally symmetric body with w as the axis. 2) When H (r) = 0, the distribution of r has a diameter 1 / u (u ≠ 0). ) To form a spherical surface or plane (u = 0) 3) When H t (r) = 0, the distribution of r is a conical surface or plane (u t = 0) with apex angle 2cu t (u t ≠ 0). 4) The intersection of the distributions of r when H (r) = 0 and H t (r) = 0 is a circle or plane.

を得る。よって音源rからs(t)への周波数応答T(r,w)は、
T(r,w)=H(r)+jωHt(r) …(47)
となり、H(r),Ht(r)が実数であればT(r,w)=0となる場合には
H(r)=0,Ht(r)=0 …(48)
となる。故に、式(47)からS(ω)=0となる零点分布は、周波数ωに依存せず、音源位置rのみに依存することが分かる。したがって、観測点における音圧の時間勾配とx,y方向の空間勾配が得られる時に、零感度領域(死点)を形成するには、ある瞬間においてf,ft,fx,fyの荷重和を取り、補償フィルタ処理(低域通過フィルタ処理)を施すだけでよい。
Get. Therefore, the frequency response T (r, w) from the sound source r to s (t) is
T (r, w) = H (r) + jωH t (r) (47)
If H (r) and H t (r) are real numbers, if T (r, w) = 0, H (r) = 0, H t (r) = 0 (48)
It becomes. Therefore, it can be seen from equation (47) that the zero distribution where S (ω) = 0 does not depend on the frequency ω, but only on the sound source position r. Therefore, the time slope and x of the sound pressure at the observation point, when the spatial gradient in the y-direction is obtained, to form the zero sensitivity region (dead center), at a certain moment f, f t, f x, the f y It is only necessary to take the load sum and perform compensation filter processing (low-pass filter processing).

而して、本実施形態における死点形成手段2においては、集音センサ手段1から出力される同相成分M(t)、時間微分値Mt(t)、x軸方向空間微分値Mx(t)、y軸方向空間微分値My(t)をそれぞれ上述のf,ft,fx,fyに置き換え、これらの値を要素とするベクトルM=(M(t) Mt(t) Mx(t) My(t))Tを定義し、これらに対する荷重を要素とする係数ベクトルW=(W Wtxy)Tとの荷重和を演算した後、低域通過フィルタ処理を施すことによって、予め決められた任意位置に死点を形成する。具体的には、上述の指向特性H(r),Ht(r)をそれぞれH1(ri),H2(ri)と置き換えて下記のように定義し(但し、riは音源iの位置ベクトル、nix,niyはそれぞれri/|ri|のx成分とy成分である。)、 Thus, in the dead point forming means 2 in this embodiment, the in-phase component M (t), the time differential value M t (t), and the x-axis direction spatial differential value M x ( t), y-axis spatial differential value M y (t) respectively above f, f t, f x, replaced by f y, the vector M = (M (t of these values as elements) M t (t ) defines the M x (t) M y ( t)) T, after calculating the weighted sum of the coefficient vector and a load element W = (W W t W x W y) T to these, the low-pass By performing the filtering process, a dead point is formed at a predetermined arbitrary position. Specifically, the directivity characteristics H (r) and H t (r) described above are replaced with H 1 (r i ) and H 2 (r i ), respectively, and defined as follows (where r i is a sound source) i's position vector, n ix and n iy are the x and y components of r i / | r i |, respectively).

さらに、図1において(p+jωq)-1で表した低域通過フィルタ20のフィルタ係数p=WHH1(ri),q=WHH2(ri)が何れもゼロとなるような係数ベクトルWを選択することで音源iの位置riに、周波数に依存しない死点が形成できる。そして、死点形成手段2の出力O(t)には死点に存在する音源から発せられる音の音圧が含まれない、言い換えると、死点に居る話者の音声を除く周囲騒音や残響音(以下、雑音という。)のみが含まれていることになる。 Further, the filter coefficients p = W H H 1 (r i ) and q = W H H 2 (r i ) of the low-pass filter 20 represented by (p + jωq) −1 in FIG. By selecting such a coefficient vector W, a dead point independent of the frequency can be formed at the position r i of the sound source i. The output O (t) of the dead point forming means 2 does not include the sound pressure of the sound emitted from the sound source existing at the dead point. In other words, ambient noise and reverberation excluding the voice of the speaker at the dead point. Only sound (hereinafter referred to as noise) is included.

目的話者音声抽出手段3は、集音センサ手段1から出力される音圧(目的音<死点に存在する話者の音声>と雑音を含む音圧)M(t)と、死点形成手段2から出力される雑音成分O(t)とから、従来周知のスペクトル・サブトラクション法によって目的音S(t)を抽出する処理を行う(参考文献6参照)。まず、目的話者音声抽出手段3では同相成分M(t)及び雑音成分O(t)をフレーム分割部30にて単位時間(フレーム時間)毎に分割し、分割された音圧M(t,k)及び雑音成分O(t,k)を高速フーリエ変換部31で時間領域から周波数領域に変換する(但し、kはフレーム番号を示す)。そして、雑音成分O(f,k)の平均振幅μ(=E{|O(f,k|})を雑音平均振幅算出部32で算出し、振幅算出部33で算出した音圧M(f,k)の振幅値|M(f,k)|から雑音成分O(f,k)の平均振幅μを減算するとともに、減算した値(|M(f,k)|−μ)に、位相算出部34で算出した音圧M(f,k)の位相(=exp{j∠M(f,k)})を乗算することで雑音が含まれていない出力S(f,k)=(|M(f,k)|−μ)・exp{j∠M(f,k)}を取り出し、この出力S(f,k)を高速フーリエ逆変換して周波数領域から時間領域に戻すことで雑音が含まれない目的音S(t)のみを得ることができる。   The target speaker voice extraction means 3 has a sound pressure output from the sound collection sensor means 1 (target sound <speaker voice at the dead point> and sound pressure including noise) M (t), and dead point formation. The target sound S (t) is extracted from the noise component O (t) output from the means 2 by a conventionally known spectral subtraction method (see Reference 6). First, the target speaker voice extraction means 3 divides the in-phase component M (t) and the noise component O (t) for each unit time (frame time) by the frame dividing unit 30, and the divided sound pressure M (t, k) and the noise component O (t, k) are converted from the time domain to the frequency domain by the fast Fourier transform unit 31 (where k indicates a frame number). Then, the average amplitude μ (= E {| O (f, k |}) of the noise component O (f, k) is calculated by the noise average amplitude calculator 32 and the sound pressure M (f calculated by the amplitude calculator 33 is calculated. , k) amplitude value | M (f, k) | is subtracted from the average amplitude μ of the noise component O (f, k) and the subtracted value (| M (f, k) | −μ) By multiplying the phase of the sound pressure M (f, k) calculated by the calculation unit 34 (= exp {j∠M (f, k)}), the output S (f, k) = (not including noise) | M (f, k) | −μ) · exp {j∠M (f, k)} is taken out, and this output S (f, k) is inversely transformed by fast Fourier transform to return from the frequency domain to the time domain. Only the target sound S (t) that does not contain noise can be obtained.

このように本実施形態の音響入力装置によれば、死点形成手段2から出力される音圧(同相成分)O(f,k)には死点に存在する音源から発せられる音以外の音、すなわち、雑音の音圧のみが含まれ、一方、集音センサ手段1で検出される音圧(同相成分)M(f,k)には死点に存在する音源から発せられる音の音圧と雑音の音圧の双方が含まれており、目的話者音声抽出手段3によって死点に存在する音源から発せられる音の音圧S(f,k)のみを抽出するので、雑音が死点の方向から到来する場合においても死点に存在する音源から発せられる音のみを入力することができる。図3は目的音と雑音を含む音圧M(f)、目的音を含まない雑音のみの音圧O(f)、目的話者音声抽出手段3で抽出される目的音の音圧S(f)の周波数特性の一例を示しており、音圧S(f)に含まれる雑音成分が十分に抑圧されていることが判る。   Thus, according to the acoustic input device of the present embodiment, the sound pressure (in-phase component) O (f, k) output from the dead point forming means 2 is a sound other than the sound emitted from the sound source existing at the dead point. That is, only the sound pressure of noise is included, while the sound pressure (in-phase component) M (f, k) detected by the sound collection sensor means 1 is the sound pressure of the sound emitted from the sound source existing at the dead point. And the sound pressure of the noise are included, and only the sound pressure S (f, k) of the sound emitted from the sound source existing at the dead point is extracted by the target speaker voice extraction means 3, so that the noise is the dead point. Even when coming from the direction of, only the sound emitted from the sound source existing at the dead point can be input. FIG. 3 shows a sound pressure M (f) including the target sound and noise, a sound pressure O (f) including only the noise not including the target sound, and a sound pressure S (f) of the target sound extracted by the target speaker voice extraction means 3. ) Shows an example of frequency characteristics, and it can be seen that the noise component included in the sound pressure S (f) is sufficiently suppressed.

ここで、本実施形態の音響入力装置をインターホン装置(ドアホン子器)に搭載すれば、周囲騒音の大きい環境下においても話者(来訪者)の音声のみを抽出して通話することができる。   Here, if the acoustic input device of this embodiment is installed in an interphone device (door phone slave), it is possible to make a call by extracting only the voice of a speaker (visitor) even in an environment with a high ambient noise.

ところで、本実施形態における集音センサ手段1では4本のマイクロホン10A〜10Dをxy平面上に配置しているが、図4に示すように平面視円形の振動板13が中央の1点(中心)で支持された2軸直交型のジンバル構造を有するマイクロホン12をマイクロホン10A〜10Dの代わりに用いても構わない。振動板13は全体が薄い円盤状であって、その中央部分における同心円上に二重の溝14,15が形成され、内側の溝14を仕切る一対のビーム14a,14aと、外側の溝15を仕切る一対のビーム15a,15aとを有し、支持棒16に支持された点(中心)を支点として各ビーム14a,15aのねじれによってx軸及びy軸の回りに回転可能となっている(参考文献7,8参照)。したがって、振動板13上に4つの観測点A,B,C,Dを設定し、各観測点における変位量をマイクロホン10A〜10Dの出力に置き換えれば、同相成分M(t)、時間微分値Mt(t)、x軸方向空間微分値Mx(t)、y軸方向空間微分値My(t)を検出することが可能である。 Incidentally, in the sound collection sensor means 1 in the present embodiment, the four microphones 10A to 10D are arranged on the xy plane, but as shown in FIG. May be used instead of the microphones 10A to 10D. The diaphragm 13 has a thin disk shape as a whole, and double grooves 14 and 15 are formed on concentric circles in the central portion thereof, and a pair of beams 14a and 14a partitioning the inner groove 14 and an outer groove 15 are formed. It has a pair of beams 15a and 15a for partitioning, and can rotate around the x-axis and the y-axis by twisting each beam 14a and 15a with a point (center) supported by the support rod 16 as a fulcrum (reference) References 7 and 8). Therefore, if four observation points A, B, C, and D are set on the diaphragm 13, and the displacement at each observation point is replaced with the output of the microphones 10A to 10D, the in-phase component M (t) and the time differential value M t (t), x-axis spatial differential value M x (t), it is possible to detect the y-axis direction spatial differential value M y (t).

(実施形態2)
本実施形態は、目的話者音声抽出手段3における抽出処理としてスペクトル・サブトラクション法の代わりに独立成分分析の手法を利用する点に特徴があり、その他の構成並びに動作は実施形態1と共通であるから、共通の構成要素には同一の符号を付して図示並びに説明を省略する。
(Embodiment 2)
The present embodiment is characterized in that an independent component analysis technique is used instead of the spectrum subtraction method as the extraction process in the target speaker voice extraction means 3, and other configurations and operations are the same as those in the first embodiment. Therefore, the same components are denoted by the same reference numerals, and illustration and description thereof are omitted.

独立成分分析(ICA:Independent Component Analysis)の目的は、複数の観測される変数を統計的に独立な変数の線形結合として表現することであり、観測変数から計算で求められる独立な変数が独立成分である。例えば、観測変数ベクトルをXとし、この観測変数ベクトルXが未知の独立変数ベクトルSの線形結合で与えられると仮定すると、未知の混合行列をAとして、X=ASの関係が成立する。そして、独立成分分析とは、独立成分及び混合行列に関する知識を一切利用せずに観測データのみから分離行列Wを用いてY=WXで求められる復元データベクトルYの各成分が独立となるような分離行列Wを求める手法であり、理想的には分離行列Wが混合行列Aの逆行列(W=A-1)となればよい。ここで、観測データが二次元の場合の独立成分分析のモデルを図5に示す。 The purpose of Independent Component Analysis (ICA) is to represent multiple observed variables as a linear combination of statistically independent variables, and the independent variables calculated from the observed variables are independent components. It is. For example, assuming that the observed variable vector is X and this observed variable vector X is given by a linear combination of the unknown independent variable vector S, the unknown mixing matrix is A and the relationship X = AS is established. The independent component analysis is such that each component of the reconstructed data vector Y obtained by Y = WX using only the observation data and the separation matrix W without using any knowledge about the independent component and the mixing matrix becomes independent. This is a technique for obtaining the separation matrix W. Ideally, the separation matrix W may be an inverse matrix of the mixing matrix A (W = A −1 ). Here, FIG. 5 shows a model of independent component analysis when the observation data is two-dimensional.

ここで、目的音と雑音を含む音圧M(f,k)と、死点形成手段2より出力される雑音の音圧O(f,k)を成分に持つ観測行列をX(f,k)=[M(f,k) O(f,k)]Tとし、目的音の音圧S(f,k)と騒音源の音圧N(f,k)を成分に持つ行列をS(f,k)=[S(f,k) N(f,k)]Tとし、空間伝達行列をAとすると、 Here, an observation matrix having a sound pressure M (f, k) including a target sound and noise and a sound pressure O (f, k) of noise output from the dead point forming means 2 as components is X (f, k). ) = [M (f, k) O (f, k)] T, and a matrix having the sound pressure S (f, k) of the target sound and the sound pressure N (f, k) of the noise source as components. f, k) = [S (f, k) N (f, k)] T and the spatial transfer matrix is A,

と表される。そして、目的話者音声抽出手段3においては、S(f,k)=Wi+1(f,k)X(f,k)を満足する分離行列Wを適応的に同定することで目的音と雑音を含む音圧O(f,k)から目的音S(f,k)のみを分離して抽出する。この分離行列Wi+1(f)は、 It is expressed. The target speaker voice extraction means 3 adaptively identifies the target matrix by identifying the separation matrix W satisfying S (f, k) = W i + 1 (f, k) X (f, k). And only the target sound S (f, k) is extracted from the sound pressure O (f, k) including noise. This separation matrix W i + 1 (f) is

と表される(参考文献7参照)。ここで、diagは対角行列を示し、Φ(Y)はφ(y)=(1+exp(-y))-1で表されるシグモイド関数、若しくはφ(y)=3/4y11+25/4y9+14/4y7+47/4y5+29/4y3などの多項式で近似される非線形ベクトル関数である。例えば、上記式をシグモイド関数によって近似した場合、 (See Reference 7). Here, diag indicates a diagonal matrix, and Φ (Y) is a sigmoid function represented by φ (y) = (1 + exp (−y)) −1 , or φ (y) = 3 / 4y 11 + It is a nonlinear vector function approximated by a polynomial such as 25 / 4y 9 + 14 / 4y 7 + 47 / 4y 5 + 29 / 4y 3 . For example, when the above equation is approximated by a sigmoid function,

となる。 It becomes.

ところで、観測変数の数が独立変数の数よりも多いならば、観測変数は線形従属であり、分離行列は低次元化を行う行列となる。また、変数が互いに独立であれば、それらは無相関であるため、分離行列は変数を無相関化する行列でもある。無相関化とそれに伴う低次元化を同時に行う統計的手法に主成分分析(PCA:Principal Component Analysis)があり、独立成分分析の前処理として主成分分析が利用されることがある。   By the way, if the number of observation variables is larger than the number of independent variables, the observation variables are linearly dependent, and the separation matrix is a matrix for reducing the order. Also, if the variables are independent of each other, they are uncorrelated, so the separation matrix is also a matrix that decorrelates the variables. Principal component analysis (PCA) is a statistical method that simultaneously performs decorrelation and associated reduction in dimensions. Principal component analysis is sometimes used as preprocessing for independent component analysis.

そこで本実施形態においても、雑音として反射や残響がある場合に、観測行列X(f,k)を低次元化するために独立成分分析の前処理として主成分分析を行うようにしてもよい。なお、変数が互いに独立であれば無相関でもあるため、主成分分析によって無相関化と低次元化を同時に行うことができる。   Therefore, also in this embodiment, when there is reflection or reverberation as noise, principal component analysis may be performed as preprocessing of independent component analysis in order to reduce the dimension of the observation matrix X (f, k). If the variables are independent of each other, they are also uncorrelated, so that decorrelation and reduction in dimensions can be performed simultaneously by principal component analysis.

例えば、音源の数がrである場合に、m次元の観測行列X(f,k)の特異値分解は下記のようになる。   For example, when the number of sound sources is r, the singular value decomposition of the m-dimensional observation matrix X (f, k) is as follows.

また、各音源スペクトルS1,S2,…,Srとしたとき、 When each sound source spectrum S 1 , S 2 ,.

ここで、 here,

という変換を考えると、その分散は Considering this transformation, the variance is

となる。分散行列が対角行列となることから、変換後の変数は互いに無相関となる。また、m次元の観測行列Xをr次元に圧縮することができ、後段の独立成分分析における処理量を低減することが可能となる。 It becomes. Since the variance matrix is a diagonal matrix, the converted variables are uncorrelated with each other. In addition, the m-dimensional observation matrix X can be compressed to r dimensions, and the amount of processing in the subsequent independent component analysis can be reduced.

(実施形態3)
実施形態1,2では目的音の音源(例えば、話者)の位置が既知であることを前提として、当該位置に死点を形成することで音源から発せられる目的音のみを集音している。しかしながら、ドアホン子器のように目的音の音源(来訪者)の位置が一意に定まらない場合も多い。一方、雑音(周囲騒音並びに残響音)が非常に少ない環境下においては、既に説明した時空間勾配法による音源定位の技術を用いて音源の位置を推定することができ、音源の位置が一意に定まらない場合においても、音源の位置を推定して当該位置に死点を形成することで音源から発せられる目的音のみを入力することが可能である。
(Embodiment 3)
In the first and second embodiments, assuming that the position of the sound source (for example, a speaker) of the target sound is known, only the target sound emitted from the sound source is collected by forming a dead point at the position. . However, there are many cases where the position of the sound source (visitor) of the target sound is not uniquely determined as in a door phone sub-unit. On the other hand, in an environment where there is very little noise (ambient noise and reverberation sound), the position of the sound source can be estimated using the sound source localization technique described above using the spatiotemporal gradient method, and the position of the sound source is uniquely Even if it is not fixed, it is possible to input only the target sound emitted from the sound source by estimating the position of the sound source and forming a dead point at the position.

そのために本実施形態では、集音センサ手段1で検出される同相成分M(t)の瞬時パワーPM(t)を死点形成手段2から出力される同相成分O(t)の瞬時パワーPO(t)で除した値(=PM(t)/PO(t))が所定のしきい値δ以上となるときに集音センサ手段1で検出される音圧(同相成分M)、時間微分値Mt、空間微分値Mx,Myに基づいて推定される位置に前記音源位置を変更する音源位置変更手段を備えている。この音源位置変更手段は死点形成手段2とほぼ共通の処理を行うものであるから死点形成手段2で兼用することも可能である。そして、上述の条件が満たされるときに死点形成手段2で音源位置を推定し、推定された音源位置がそのときの死点の位置と異なっていれば、推定された音源位置に死点を変更するのである。 Therefore, in the present embodiment, the instantaneous power P M (t) of the in-phase component M (t) detected by the sound collecting sensor means 1 is used as the instantaneous power P of the in-phase component O (t) output from the dead point forming means 2. Sound pressure (in-phase component M) detected by the sound collecting sensor means 1 when the value divided by O (t) (= P M (t) / P O (t)) is equal to or greater than a predetermined threshold value δ. , and a sound source position changing means for changing the sound source position to a position where the time differential value M t, spatial differential values M x, is estimated based on M y. Since this sound source position changing means performs almost the same processing as the dead point forming means 2, the dead point forming means 2 can also be used. Then, when the above condition is satisfied, the dead point forming means 2 estimates the sound source position. If the estimated sound source position is different from the position of the dead point at that time, the dead point is set at the estimated sound source position. Change it.

このように本実施形態によれば、音源の位置が変動した場合でも音源位置変更手段によって音源位置が変更されるために音源位置と死点がずれることがなく、その結果、音源が移動する場合においても当該音源から発せられる音のみを入力することができる。
<参考文献一覧>
参考文献1:安藤 繁 「画像の時空間微分算法を用いた速度ベクトル分布計測システム」 計測自動制御学会論文集 22-12,1330/1336(1986)
参考文献2:安藤 繁・篠田 裕之・小川 勝也・光山 訓 「時空間勾配法に基づく3次元音源定位センサシステム」 計測自動制御学会論文集 第29巻第5号,p520~528,1993
参考文献3:N. Ono, T. Arita, Y. Senjo, and S. Ando, “Directivity steering principle for biomimicry silicon microphone”, Proc. Int. Conf. Solid State Sensors, Actuators, and Microsystems (Transducers '05), pp. 792-795, 2005.
参考文献4:小野, 安藤, “音場の計測と指向性制御, 第22回センシングフォーラム資料, pp. 305-310, 2005.
参考文献5:小野, 有田, 千條, 安藤, “時空間勾配計測に基づく指向性制御と音源分離の理論, 日本音響学会2005年春季研究発表会講演論文集, 2-6-13, pp. 607-608, 2005.
参考文献6:S.F.Boll "Suppression of Acoustic Noise in Speech. using Spectral Subtraction" IEEE Trans.on.Acoustics,Speech and Signal Processing Vol.ASSP-27,No.2,pp.113-1,1979
参考文献7:小野 順貴,斎藤 章人,安藤 繁「ヤドリバエを模倣した超小型音源定位センサの理論と実験(第2報)」,第19回センシングフォーラム,pp.379-382,2002
参考文献8:小野 順貴,斎藤 章人,安藤 繁「ヤドリバエを模倣した微分検出型音源定位センサの理論と実験」,聴覚研究会資料,pp.187-192,2002
As described above, according to the present embodiment, even when the position of the sound source changes, the sound source position is changed by the sound source position changing means, so that the position of the sound source does not deviate from the dead point, and as a result, the sound source moves. Only the sound emitted from the sound source can be input.
<List of references>
Reference 1: Shigeru Ando “Velocity Vector Distribution Measurement System Using Spatiotemporal Differential Arithmetic of Images” Transactions of the Society of Instrument and Control Engineers 22-12, 1330/1336 (1986)
Reference 2: Shigeru Ando, Hiroyuki Shinoda, Katsuya Ogawa, Satoshi Mitsuyama "Three-dimensional sound source localization sensor system based on spatiotemporal gradient method" Vol. 29, No. 5, p520-528, 1993
Reference 3: N. Ono, T. Arita, Y. Senjo, and S. Ando, “Directivity steering principle for biomimicry silicon microphone”, Proc. Int. Conf. Solid State Sensors, Actuators, and Microsystems (Transducers '05) , pp. 792-795, 2005.
Reference 4: Ono, Ando, “Measurement of sound field and directivity control, 22nd Sensing Forum document, pp. 305-310, 2005.
Reference 5: Ono, Arita, Chiaki, Ando, “Theory of directivity control and sound source separation based on spatiotemporal gradient measurement, Proc. Of the Spring Meeting of the Acoustical Society of Japan 2005, 2-6-13, pp. 607-608, 2005.
Reference 6: SFBoll "Suppression of Acoustic Noise in Speech. Using Spectral Subtraction" IEEE Trans.on. Acoustics, Speech and Signal Processing Vol. ASSP-27, No. 2, pp. 113-1, 1979
Reference 7: Junji Ono, Akihito Saito, Shigeru Ando “Theory and Experiments of Localization Sensors for Miniaturized Sound Sources Simulating Drosophila (2nd Report)”, 19th Sensing Forum, pp.379-382,2002
Reference 8: Junji Ono, Akihito Saito, Shigeru Ando, “Theory and Experiment of Differential Detection Type Sound Source Localization Sensor Imitating Mistlefly”, Auditory Society, pp.187-192,2002

本発明の実施形態1を示すブロック図である。It is a block diagram which shows Embodiment 1 of this invention. 同上における時空間勾配法を説明するための説明図である。It is explanatory drawing for demonstrating the spatiotemporal gradient method in the same as the above. 同上の説明図である。It is explanatory drawing same as the above. 同上におけるジンバル構造型のマイクロホンを示し、(a)は振動板の平面図、(b)は断面図である。The gimbal structure type microphone is shown, wherein (a) is a plan view of the diaphragm and (b) is a cross-sectional view. 本発明の実施形態2における目的話者音声抽出手段の説明図である。It is explanatory drawing of the target speaker audio | voice extraction means in Embodiment 2 of this invention.

符号の説明Explanation of symbols

1 集音センサ手段
2 死点形成手段
3 目的話者音声抽出手段
DESCRIPTION OF SYMBOLS 1 Sound collection sensor means 2 Dead point formation means 3 Target speaker voice extraction means

Claims (7)

音圧、当該音圧の時間微分値、当該音圧を二次元直交座標系の各軸方向に微分した空間微分値をそれぞれ検出する集音センサ手段と、集音センサ手段で検出される音圧、時間微分値、空間微分値に対して所定の係数ベクトルとの荷重和及び低域通過フィルタ処理を行うことにより集音感度が最小となる死点を予め設定した目的話者の位置に形成する死点形成手段と、集音センサ手段で検出される音圧と死点形成手段から出力される音圧を用いて前記目的話者から発せられる音声の音圧のみを抽出する目的話者音声抽出手段とを備えたことを特徴とする音響入力装置。   Sound collecting means for detecting sound pressure, time differential value of the sound pressure, spatial differential value obtained by differentiating the sound pressure in the direction of each axis of the two-dimensional orthogonal coordinate system, and sound pressure detected by the sound collecting sensor means The dead point at which the sound collection sensitivity is minimized is formed at a preset target speaker position by performing a load sum with a predetermined coefficient vector and a low-pass filter process on the time differential value and the spatial differential value. Target speaker voice extraction for extracting only the sound pressure of the voice uttered from the target speaker using the dead point forming means and the sound pressure detected by the sound collecting sensor means and the sound pressure output from the dead point forming means And a sound input device. 目的話者音声抽出手段は、スペクトル・サブトラクション法によって前記音源から発せられる音の音圧を抽出することを特徴とする請求項1記載の音響入力装置。   2. The sound input device according to claim 1, wherein the target speaker voice extraction means extracts the sound pressure of the sound emitted from the sound source by a spectral subtraction method. 目的話者音声抽出手段は、独立成分分析によって前記音源から発せられる音の音圧を抽出することを特徴とする請求項1記載の音響入力装置。   The sound input device according to claim 1, wherein the target speaker voice extraction unit extracts a sound pressure of a sound emitted from the sound source by independent component analysis. 目的話者音声抽出手段は、独立成分分析を行う前に主成分分析を行うことを特徴とする請求項3記載の音響入力装置。   4. The acoustic input device according to claim 3, wherein the target speaker voice extraction means performs principal component analysis before performing independent component analysis. 集音センサ手段は、前記二次元直交座標系の各軸と直交する向きに複数が配置されたマイクロホンを具備することを特徴とする請求項1〜4の何れか1項に記載の音響入力装置。   The sound input device according to any one of claims 1 to 4, wherein the sound collection sensor means includes a plurality of microphones arranged in a direction orthogonal to each axis of the two-dimensional orthogonal coordinate system. . 集音センサ手段は、振動板が中央の1点で支持された2軸直交型のジンバル構造を有するマイクロホンを具備することを特徴とする請求項1〜4の何れか1項に記載の音響入力装置。   5. The sound input according to claim 1, wherein the sound collection sensor means includes a microphone having a biaxial orthogonal gimbal structure in which a diaphragm is supported at one central point. apparatus. 前記音源位置を変更する音源位置変更手段を備え、音源位置変更手段は、集音センサ手段で検出される音圧の瞬時パワーを死点形成手段から出力される音圧の瞬時パワーで除した値が所定のしきい値以上となるときに集音センサ手段で検出される音圧、時間微分値、空間微分値に基づいて推定される位置に前記音源位置を変更することを特徴とする請求項1〜6の何れか1項に記載の音響入力装置。   The sound source position changing means for changing the sound source position, the sound source position changing means is a value obtained by dividing the instantaneous power of the sound pressure detected by the sound collecting sensor means by the instantaneous power of the sound pressure output from the dead point forming means. The sound source position is changed to a position estimated based on a sound pressure, a time differential value, and a spatial differential value detected by the sound collection sensor means when the value becomes equal to or greater than a predetermined threshold value. The acoustic input device according to any one of 1 to 6.
JP2007149570A 2007-06-05 2007-06-05 Acoustic input device Expired - Fee Related JP4894638B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2007149570A JP4894638B2 (en) 2007-06-05 2007-06-05 Acoustic input device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2007149570A JP4894638B2 (en) 2007-06-05 2007-06-05 Acoustic input device

Publications (2)

Publication Number Publication Date
JP2008304555A true JP2008304555A (en) 2008-12-18
JP4894638B2 JP4894638B2 (en) 2012-03-14

Family

ID=40233363

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2007149570A Expired - Fee Related JP4894638B2 (en) 2007-06-05 2007-06-05 Acoustic input device

Country Status (1)

Country Link
JP (1) JP4894638B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010066506A (en) * 2008-09-10 2010-03-25 Panasonic Electric Works Co Ltd Sound collecting device
JP2011179888A (en) * 2010-02-26 2011-09-15 Nissan Motor Co Ltd Method and device for calculating wave source position

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003271191A (en) * 2002-03-15 2003-09-25 Toshiba Corp Device and method for suppressing noise for voice recognition, device and method for recognizing voice, and program
JP2006058395A (en) * 2004-08-17 2006-03-02 Spectra:Kk Sound signal input/output device
WO2006131959A1 (en) * 2005-06-06 2006-12-14 Saga University Signal separating apparatus

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003271191A (en) * 2002-03-15 2003-09-25 Toshiba Corp Device and method for suppressing noise for voice recognition, device and method for recognizing voice, and program
JP2006058395A (en) * 2004-08-17 2006-03-02 Spectra:Kk Sound signal input/output device
WO2006131959A1 (en) * 2005-06-06 2006-12-14 Saga University Signal separating apparatus

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010066506A (en) * 2008-09-10 2010-03-25 Panasonic Electric Works Co Ltd Sound collecting device
JP2011179888A (en) * 2010-02-26 2011-09-15 Nissan Motor Co Ltd Method and device for calculating wave source position

Also Published As

Publication number Publication date
JP4894638B2 (en) 2012-03-14

Similar Documents

Publication Publication Date Title
Furukawa et al. Noise correlation matrix estimation for improving sound source localization by multirotor UAV
US7613310B2 (en) Audio input system
Dagamseh et al. Imaging dipole flow sources using an artificial lateral-line system made of biomimetic hair flow sensors
KR100754385B1 (en) Apparatus and method for object localization, tracking, and separation using audio and video sensors
US20100034397A1 (en) Sound source tracking system, method and robot
Jiang et al. Real-time vibration source tracking using high-speed vision
EP3227704B1 (en) Method for tracking a target acoustic source
JP6467736B2 (en) Sound source position estimating apparatus, sound source position estimating method, and sound source position estimating program
CN113692750A (en) Sound transfer function personalization using sound scene analysis and beamforming
JP2014137226A (en) Mobile object, and system and method for creating acoustic source map
Gala et al. Realtime active sound source localization for unmanned ground robots using a self-rotational bi-microphone array
Gala et al. Three-dimensional sound source localization for unmanned ground vehicles with a self-rotational two-microphone array
KR102316671B1 (en) Method for treating sound using cnn
Pan et al. Cognitive acoustic analytics service for Internet of Things
JP4894638B2 (en) Acoustic input device
CN113539288A (en) Voice signal denoising method and device
CN114690121A (en) Dynamic space-time beamforming
Hosseini et al. Time difference of arrival estimation of sound source using cross correlation and modified maximum likelihood weighting function
JP5086768B2 (en) Telephone device
Chau et al. Audio-visual SLAM towards human tracking and human-robot interaction in indoor environments
GB2604227A (en) Sensing via signal to signal translation
Tanigawa et al. Invisible-to-Visible: Privacy-Aware Human Segmentation using Airborne Ultrasound via Collaborative Learning Probabilistic U-Net
JP5060438B2 (en) Sound collector
Tanabe et al. Probabilistic 3d sound source mapping system based on monte carlo localization using microphone array and lidar
Nakadai et al. Humanoid active audition system improved by the cover acoustics

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20100217

RD04 Notification of resignation of power of attorney

Free format text: JAPANESE INTERMEDIATE CODE: A7424

Effective date: 20101019

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20110802

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20111003

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20111129

A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20111212

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20150106

Year of fee payment: 3

LAPS Cancellation because of no payment of annual fees