JP2020120261A

JP2020120261A - Sound pickup device, sound pickup program, and sound pickup method

Info

Publication number: JP2020120261A
Application number: JP2019009597A
Authority: JP
Inventors: 隆矢頭; Takashi Yato; 大藤枝; Masaru Fujieda
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2019-01-23
Filing date: 2019-01-23
Publication date: 2020-08-06

Abstract

To provide a sound pickup device, a sound pickup program, and a sound pickup method that improve sound quality when picking up a target area sound.SOLUTION: The present invention relates to a sound pickup device. The sound pickup device of the present invention comprises: non-target area sound extraction means that performs spectrum subtraction on respective input signals input to a plurality of directional microphones to extract a non-target area sound present in a target area direction viewed from the respective directional microphones; and target area sound extraction means that performs spectrum subtraction on the non-target area sound from the input signals to extract a target area sound.SELECTED DRAWING: Figure 1

Description

本発明は、収音装置、収音プログラム及び収音方法に関し、例えば、特定のエリアの音を強調し、それ以外のエリアの音を抑制する収音装置及びプログラムに適用し得る。 The present invention relates to a sound collecting device, a sound collecting program, and a sound collecting method, and can be applied to, for example, a sound collecting device and a program that emphasize sounds in a specific area and suppress sounds in other areas.

雑音環境下で音声通信システムや音声認識応用システムを利用する場合、必要な目的音声と同時に混入する周囲の雑音は、良好なコミュニケーションを阻害し、音声認識率の低下をもたらす厄介な存在である。従来、このような複数の音源が存在する環境下において、特定の方向の音のみ分離・収音することで不要音の混入を避け必要な目的音を得る技術として、マイクアレイを用いたビームフォーマ（ＢｅａｍＦｏｒｍｅｒ；以下「ＢＦ」と呼ぶ；特許文献２参照）がある。ＢＦとは各マイクロホンに到達する信号の時間差を利用して指向性を形成する技術である。 When a voice communication system or a voice recognition application system is used in a noisy environment, ambient noise mixed with the required target voice interferes with good communication and is a troublesome presence that lowers the voice recognition rate. Conventionally, in such an environment where multiple sound sources exist, a beamformer using a microphone array has been used as a technique for obtaining a desired target sound by separating and collecting only sound in a specific direction to avoid mixing of unnecessary sound. (Beam Former; hereinafter referred to as “BF”; see Patent Document 2). BF is a technique for forming directivity by utilizing the time difference between signals reaching each microphone.

図５は、従来において、マイクロホン数が２個の場合の減算型ＢＦに係る構成を示すブロック図である。 FIG. 5 is a block diagram showing a configuration related to a subtraction type BF when the number of microphones is two in the related art.

ここでは、図５に示すように、θ_Ｌ方向から到来する音を距離ｄだけ離れて設置された２つのマイクロホンＭ１、Ｍ２で受音することを考える。θ_Ｌ方向から到来した音波は、まず音源に近い右のマイクロホンＭ１に受音される。次に、θ_Ｌ方向から到来した音波は距離ｌだけ進んで左のマイクロホンＭ２に到達する。したがって、マイクロホンＭ２での受音信号ｘ_２（ｔ）はマイクロホンＭ１での受音信号ｘ_１（ｔ）と比べて音波が距離ｌだけ進行するのに要する時間τ_Ｌだけ遅れた信号となっている。すなわち、図５に示す構成では、以下の（１）式、（２）式の関係が成立する。なお、（１）式以下では音速を「ｃ」と表す。
τ_Ｌ＝（ｄｓｉｎθ_Ｌ）／ｃ …（１）
ｘ_２（ｔ）＝ｘ_１（ｔ−τ_Ｌ） …（２） Here, as shown in FIG. 5, it is considered that the sound coming from the θ _L direction is received by the two microphones M1 and M2 which are installed at a distance d. The sound wave coming from the θ _L direction is first received by the right microphone M1 close to the sound source. Next, the sound wave arriving from the θ _L direction travels the distance 1 and reaches the left microphone M2. Therefore, the sound reception signal x ₂ (t) at the microphone M2 becomes a signal delayed by the time τ _L required for the sound wave to travel the distance 1 compared to the sound reception signal x ₁ (t) at the microphone M1. There is. That is, in the configuration shown in FIG. 5, the relationships of the following expressions (1) and (2) are established. Note that the sound velocity is expressed as “c” in the expression (1) and below.
τ _L =(dsin θ _L )/c (1)
_{_{x 2 (t) = x 1}} (t-τ L) ... (2)

したがって、図５に示す構成では、遅延器２１を用いてｘ_１（ｔ）にτ_Ｌなる遅延を与え、ｘ_２（ｔ）に加算することで、θ_Ｌ方向の音が強調された信号ａ（ｔ）が取得できる（（３）式参照）。また、図５に示す構成では、逆に、ｘ_１（ｔ）にτ_Ｌなる遅延を与え、ｘ_２（ｔ）を減算することで信号同土が相殺され、θ_Ｌ方向に死角が形成された信号ｂ（ｔ）が取得できる（（４）式参照）。
ａ（ｔ）＝ｘ_２（ｔ）＋ｘ_１（ｔ−τ_Ｌ） …（３）
ｂ（ｔ）＝ｘ_２（ｔ）−ｘ_１（ｔ−τ_Ｌ） …（４） Therefore, in the configuration shown in FIG. 5, the delay device 21 delays x ₁ (t) by τ _L and adds it to x ₂ (t), thereby enhancing the signal a in which the sound in the θ _L direction is emphasized. (T) can be acquired (see the equation (3)). In the configuration shown in FIG. 5, conversely, by delaying x ₁ (t) by τ _L and subtracting x ₂ (t), the signal soil is canceled and a blind spot is formed in the θ _L direction. The obtained signal b(t) can be obtained (see the equation (4)).
_{a (t) = x 2 (} t) + x 1 (t-τ L) ... (3)
_{b (t) = x 2 (} t) -x 1 (t-τ L) ... (4)

そして、（３）式、（４）式の加算・減算処理は、周波数領域でも同様に行うことができ、その場合、（３）式、（４）式はそれぞれ、以下の（５）式、（６）式に変更される。

Then, the addition/subtraction processing of the expressions (3) and (4) can be similarly performed in the frequency domain. In that case, the expressions (3) and (4) are respectively expressed by the following expression (5), The formula is changed to the formula (6).

ところで、ＢＦは加算型と減算型の大きく２つの種類に分けられる。特に減算型ＢＦは、加算型ＢＦに比べ、少ないマイクロホン数で指向性を形成できるという利点がある。減算型ＢＦの代表的な手法として、スペクトル減算法（ＳｐｅｃｔｒａｌＳｕｂｔｒａｃｔｉｏｎ；以下単に「ＳＳ」とも呼ぶ）を用いたＢＦが挙げられる（非特許文献１参照）。 By the way, BFs are roughly classified into two types: addition type and subtraction type. In particular, the subtraction type BF has an advantage that directivity can be formed with a smaller number of microphones than the addition type BF. As a typical method of the subtraction type BF, there is a BF using a spectral subtraction method (Spectral Subtraction; hereinafter also simply referred to as “SS”) (see Non-Patent Document 1).

図６は、従来のＳＳに係る構成を示すブロック図である。 FIG. 6 is a block diagram showing a configuration related to the conventional SS.

図６に示す構成では、音響信号を周波数領域で処理を行なうものとして説明する。図６に示す構成では、時間周波数変換器３１で周波数変換された入力信号を用いるものとする。 In the configuration shown in FIG. 6, it is assumed that the acoustic signal is processed in the frequency domain. In the configuration shown in FIG. 6, the input signal frequency-converted by the time-frequency converter 31 is used.

図６に示す構成では、まず遅延器３２が目的方向θ_ＬからマイクロホンＭ１及びマイクロホンＭ２に到来する信号の時間差τ_Ｌを算出し、遅延を加えることにより目的音源方向の音信号の位相を合わせる。ここで、目的音源がマイクロホンＭ１とマイクロホンＭ２の中心に対してマイクロホンＭ１の方向に存在する場合、遅延器３２は、マイクロホンＭ１の入力に対し遅延処理を行うものとする。その後、加算器３３が（５）式に従い加算処理を行い、減算器３４が式（６）式に従い減算処理を行う。これにより、加算処理により目的音源方向の音が強調され、また減算処理により目的音源方向以外の音が抽出される。さらに、加算処理及び減算処理されたデータを用いてスペクトル減算器３５が（７）式に従い処理を行うことにより、目的音源方向の音を強調し、それ以外の音を抑圧することができる。なお、（７）式において、「β」はＳＳの強度を変更するための係数である。
Ｙ（ω）＝Ａ（ω）−βＢ（ω） …（７） In the configuration shown in FIG. 6, the delay device 32 first calculates the time difference τ _L between the signals arriving at the microphone M1 and the microphone M2 from the target direction θ _L , and adds the delay to match the phase of the sound signal in the target sound source direction. Here, when the target sound source exists in the direction of the microphone M1 with respect to the centers of the microphones M1 and M2, the delay device 32 performs delay processing on the input of the microphone M1. After that, the adder 33 performs the addition process according to the equation (5), and the subtractor 34 performs the subtraction process according to the equation (6). As a result, the sound in the target sound source direction is emphasized by the addition processing, and the sound other than the target sound source direction is extracted by the subtraction processing. Further, the spectrum subtractor 35 performs the processing according to the expression (7) using the data subjected to the addition processing and the subtraction processing, whereby the sound in the target sound source direction can be emphasized and the other sounds can be suppressed. In the expression (7), “β” is a coefficient for changing the strength of SS.
Y(ω)=A(ω)−βB(ω) (7)

しかし、ＢＦだけでは収音を目的とするエリア（以下、「目的エリア」と呼ぶ）の周囲に他の音源が存在する場合、目的エリア内に存在する音（以下、「目的エリア音」と呼ぶ）だけを収音することが難しい。そのため、従来、特許文献１等により、複数のマイクロホンアレイを用いて目的エリアを収音するエリア収音方式が提案されている。 However, if another sound source exists around an area whose sound is to be collected (hereinafter, referred to as “target area”) only with BF, a sound existing in the target area (hereinafter, referred to as “target area sound”). ) It is difficult to pick up only. Therefore, conventionally, Patent Document 1 and the like have proposed an area sound pickup method that picks up a target area using a plurality of microphone arrays.

図７は、従来技術において、２つのマイクロホンアレイＭＡ１、ＭＡ２を用いて、目的エリアの音源からの目的エリア音を収音する処理について示した説明図である。 FIG. 7 is an explanatory diagram showing a process of collecting a target area sound from a sound source of the target area using the two microphone arrays MA1 and MA2 in the conventional technique.

図７（ａ）は、各マイクロホンアレイの構成例について示した説明図である。図７（ｂ）、図７（ｃ）は、それぞれ図７（ａ）に示すマイクロホンアレイＭＡ１、ＭＡ２のＢＦ出力について周波数領域で示したグラフ（イメージ図）である。 FIG. 7A is an explanatory diagram showing a configuration example of each microphone array. FIGS. 7B and 7C are graphs (image diagrams) showing the BF outputs of the microphone arrays MA1 and MA2 shown in FIG. 7A in the frequency domain.

従来のエリア収音では、図７（ａ）に示すように、マイクロホンアレイＭＡ１、ＭＡ２の指向性を別々の方向から収音したいエリア（目的エリア）で交差させて収音する。図７（ａ）の状態では、各マイクロホンアレイＭＡ１、ＭＡ２の指向性に目的エリア内に存在する音（目的エリア音）だけでなく、目的エリア方向の雑音（非目的エリア音）も含まれている。しかし、図７（ｂ）、図７（ｃ）に示すように、マイクロホンアレイＭＡ１、ＭＡ２の指向性を周波数領域で比較すると、目的エリア音成分はどちらの出力にも含まれるが、非目的エリア音成分は各マイクロホンアレイで異なることになる。 In the conventional area sound collection, as shown in FIG. 7A, the directivity of the microphone arrays MA1 and MA2 is crossed and collected from different directions in the desired area (target area). In the state of FIG. 7A, the directivity of each of the microphone arrays MA1 and MA2 includes not only the sound existing in the target area (target area sound) but also noise in the target area direction (non-target area sound). There is. However, as shown in FIGS. 7B and 7C, when comparing the directivities of the microphone arrays MA1 and MA2 in the frequency domain, the target area sound component is included in both outputs, but the non-target area is included. The sound component will be different for each microphone array.

従来のエリア収音技術では、このような特性を利用し、２つのマイクロホンアレイＭＡ１、ＭＡ２のＢＦ出力に、共通に含まれる成分以外を抑圧することで目的エリア音のみ抽出することができる。 In the conventional area sound collecting technique, by utilizing such a characteristic, only the target area sound can be extracted by suppressing components other than the components commonly included in the BF outputs of the two microphone arrays MA1 and MA2.

特開２００５−１９５９５５号公報JP, 2005-195955, A 特開２０１４−７２７０８号公報JP, 2014-72708, A

浅野太著、“音のアレイ信号処理−音源の定位・追跡と分離”、社団法人日本音響学会、コロナ社、２０１１年２月２５日発行Futoshi Asano, “Sound Array Signal Processing-Localization/Tracking and Separation of Sound Source”, The Acoustical Society of Japan, Corona Publishing, February 25, 2011

ところで、エリア収音方式では、異なる位置から目的エリア方向に指向性を形成するためＢＦの原理に基づくマイクロホンアレイの処理が必要である。目的音方向にピンポイントで指向性を向けるためには鋭い指向性の形成が求められる。これを加算型ＢＦで実現しようとすれば多数のマイクロホンが必要で現実的ではない。そのため従来のエリア収音では、減算型ＢＦとＳＳを組み合わせた処理（以下、「減算型ＢＦ＋ＳＳ処理」又は単に「ＢＦ＋ＳＳ処理」と呼ぶ）を用い、少ないマイクロホンで鋭利な指向性を形成している。 By the way, in the area sound collecting method, it is necessary to process the microphone array based on the BF principle in order to form directivity from different positions toward the target area. In order to pinpoint the directivity in the direction of the target sound, it is necessary to form a sharp directivity. If this is to be realized by the addition type BF, a large number of microphones are required, which is not realistic. Therefore, in the conventional area sound pickup, a process in which the subtraction type BF and the SS are combined (hereinafter referred to as “subtraction type BF+SS process” or simply “BF+SS process”) is used to form sharp directivity with a small number of microphones. ..

しかし、ＳＳ自体非線形処理であるため処理歪やミュージカルノイズの発生が避けられない。また、マイクアレイのマイク間隔によって、得意とする周波数領域が大きく異なる。 However, since SS itself is a non-linear process, processing distortion and occurrence of musical noise cannot be avoided. In addition, the frequency range in which the user is good differs greatly depending on the distance between the microphones of the microphone array.

図８に、正面方向に死角を向けた減算型ＢＦのポーラパターンを示す。 FIG. 8 shows a polar pattern of the subtraction type BF in which the blind spot is directed in the front direction.

図８では、マイクロホンＭ１、Ｍ２の間隔は３ｃｍであるものとする。この場合、当該減算型ＢＦでは、１ｋＨｚから４ｋＨｚは概ね良好な特性を示すが、５００Ｈｚ以下の低周波数域はほとんど利得が取れない。また、この場合、当該減算型ＢＦでは、８ｋＨｚ以上では少しずつ指向特性の変形が始まり、さらに高い周波数になると空間エイリアジングによって正面方向以外の方向に死角が生じるようになる。そのため、当該減算型ＢＦにおいて、低周波数域の特性を改善するためにはマイク間隔をより大きく取る必要があり、逆に高周波数領域を改善するにはマイク間隔を狭める必要がある。このことから、従来の減算型ＢＦ＋ＳＳ処理では、音声情報伝達の目的に合わせた妥協点として３ｃｍ前後のマイク間隔を採用する場合が多い。ただし、その場合でも、低周波数域の利得を稼ぐため低域ブーストなどの処置が取られる。この場合、低周波数域成分は増すが副作用として歪の増大を招く。したがって、従来のエリア収音方式は、特定のエリアの音だけを収音可能な方式として注目されているが、強調された目的音にはある程度の歪の混入が避けられない、低周波数領域の情報が欠落するなどの問題があった。 In FIG. 8, it is assumed that the distance between the microphones M1 and M2 is 3 cm. In this case, the subtraction type BF shows good characteristics in the range of 1 kHz to 4 kHz, but almost no gain can be obtained in the low frequency range of 500 Hz or lower. Further, in this case, in the subtraction type BF, the directional characteristic gradually begins to be deformed at 8 kHz or higher, and when the frequency becomes higher, a blind spot occurs in a direction other than the front direction due to spatial aliasing. Therefore, in the subtraction type BF, in order to improve the characteristics in the low frequency region, it is necessary to make the microphone interval larger, and conversely, in order to improve the high frequency region, it is necessary to narrow the microphone interval. Therefore, in the conventional subtraction type BF+SS processing, a microphone interval of about 3 cm is often adopted as a compromise point according to the purpose of voice information transmission. However, even in that case, measures such as low-frequency boosting are taken in order to gain low-frequency gain. In this case, low frequency components increase, but as a side effect, distortion increases. Therefore, the conventional area sound collection method has attracted attention as a method capable of collecting only a sound in a specific area, but it is inevitable that a certain amount of distortion is unavoidable in the emphasized target sound. There was a problem such as missing information.

以上のような問題に鑑みて、目的エリア音を収音する際の音質を改善する収音装置、収音プログラム及び収音方法が望まれている。 In view of the above problems, a sound pickup device, a sound pickup program, and a sound pickup method that improve the sound quality when picking up a target area sound are desired.

第１の本発明の収音装置は、（１）複数の指向性マイクロホンが入力される入力信号を、それぞれをスペクトル減算することで、それぞれの前記指向性マイクロホンから見て目的エリア方向に存在する非目的エリア音を抽出する非目的エリア音抽出手段と、（２）前記非目的エリア音を前記入力信号からスペクトル減算することにより目的エリア音を抽出する目的エリア音抽出手段とを有することを特徴とする。 The sound collecting apparatus of the first aspect of the present invention is (1) present in the direction of the target area when viewed from each of the directional microphones by spectrally subtracting the input signals to which the plurality of directional microphones are input. A non-target area sound extraction means for extracting a non-target area sound; and (2) a target area sound extraction means for extracting a target area sound by spectrally subtracting the non-target area sound from the input signal. And

第２の本発明の収音プログラムは、コンピュータを、（１）複数の指向性マイクロホンが入力される入力信号を、それぞれをスペクトル減算することで、それぞれの前記指向性マイクロホンから見て目的エリア方向に存在する非目的エリア音を抽出する非目的エリア音抽出手段と、（２）前記非目的エリア音を前記入力信号からスペクトル減算することにより目的エリア音を抽出する目的エリア音抽出手段として機能させることを特徴とする。 A sound collecting program according to a second aspect of the present invention allows a computer to perform (1) spectral subtraction of input signals to which a plurality of directional microphones are input, so that the direction of the target area is viewed from each of the directional microphones. A non-target area sound extracting means for extracting a non-target area sound existing in (1), and (2) functioning as a target area sound extracting means for extracting a target area sound by spectrally subtracting the non-target area sound from the input signal. It is characterized by

第３の本発明は、収音装置が行う収音方法において、（１）前記収音装置は非目的エリア音抽出手段、及び目的エリア音抽出手段を有し、（２）前記非目的エリア音抽出手段は、複数の指向性マイクロホンが入力される入力信号を、それぞれをスペクトル減算することで、それぞれの前記指向性マイクロホンから見て目的エリア方向に存在する非目的エリア音を抽出し、（３）前記目的エリア音抽出手段は、前記非目的エリア音を前記入力信号からスペクトル減算することにより目的エリア音を抽出することを特徴とする。 A third aspect of the present invention is a sound collecting method performed by a sound collecting device, wherein (1) the sound collecting device has a non-target area sound extracting means and a target area sound extracting means, and (2) the non-target area sound. The extracting means extracts the non-target area sound existing in the target area direction as viewed from each of the directional microphones by spectrally subtracting each of the input signals input to the plurality of directional microphones, (3 ) The target area sound extracting means extracts the target area sound by spectrally subtracting the non-target area sound from the input signal.

本発明によれば、目的エリア音を収音する際の音質を改善する収音装置、収音プログラム及び収音方法を提供することができる。 According to the present invention, it is possible to provide a sound collecting device, a sound collecting program, and a sound collecting method that improve sound quality when collecting a target area sound.

実施形態に係る収音装置の機能的構成について示したブロック図である。It is the block diagram shown about the functional composition of the sound collection device concerning an embodiment. 実施形態に係る収音装置で用いる干渉管型超指向性マイクロホンの斜視図である。It is a perspective view of an interference tube type super-directional microphone used in the sound collecting device according to the embodiment. 実施形態に係る干渉管型超指向性マイクロホンの配置例について示した説明図である。It is explanatory drawing shown about the arrangement example of the interference tube type|mold super-directional microphone which concerns on embodiment. 実施形態に係る収音装置のハードウェア構成の例について示したブロック図である。It is the block diagram shown about the example of the hardware constitutions of the sound collection device concerning an embodiment. 従来のマイクロホン数が２個の場合の減算型ＢＦに係る構成を示すブロック図である。It is a block diagram which shows the structure which concerns on the subtraction type BF in case the number of conventional microphones is two. 従来のＳＳ（スペクトル減算処理）に係る構成を示すブロック図である。It is a block diagram showing the composition concerning conventional SS (spectrum subtraction processing). 従来の収音装置において、２つのマイクアレイのビームフォーマ（ＢＦ）による指向性を別々の方向から目的エリアへ向けた場合に処理される信号の例について示した説明図である。It is explanatory drawing shown about the example of the signal processed when the directivity by the beam former (BF) of two microphone arrays is directed to a target area from another direction in the conventional sound collection device. 従来の減算型ＢＦにおいて正面方向に死角を向けた場合のポーラパターンについて示した説明図である。It is explanatory drawing shown about the polar pattern at the time of directing a blind spot to the front direction in the conventional subtraction type BF.

（Ａ）主たる実施形態
以下、本発明による収音装置、収音プログラム及び収音方法の一実施形態を、図面を参照しながら詳述する。 (A) Main Embodiment Hereinafter, one embodiment of a sound collecting device, a sound collecting program, and a sound collecting method according to the present invention will be described in detail with reference to the drawings.

まず、この実施形態における収音装置の構成概要について説明する。 First, the outline of the configuration of the sound collecting device in this embodiment will be described.

エリア収音を実現するためには、目的とするエリアに向けて異なる位置から指向性を向ける必要がある。この実施形態の収音装置では、複数のマイクアレイに代えて指向性マイクロホンを用いたエリア収音を行うものとする。 In order to realize the area sound collection, it is necessary to direct the directivity from different positions toward the target area. In the sound collecting device of this embodiment, it is assumed that directional microphones are used in place of a plurality of microphone arrays to perform area sound collection.

従来、指向性マイクロホンには、単一指向性マイクロホンや、双極性マイクロホンや、超指向性マイクロホン等がある。 Conventionally, directional microphones include unidirectional microphones, bipolar microphones, superdirectional microphones, and the like.

単一指向性マイクロホンは、指向性と称しても背面の音をとらないだけで前方の音はすべてとる。そのため、単一指向性マイクロホンは、エリア収音のマイクロホンとしては用を成さない。また、双極性マイクロホンも横方向からの音をとらないだけなので単一指向性と同様に、エリア収音処理には不適である。超指向性マイクロホンと呼ばれるものは、その他のマイクロホンに比べ前方への鋭い指向性を有し、エリア収音への適用可能性がある。 A unidirectional microphone, even if it is called directional, does not take the sound of the back surface but takes all the sound in the front. Therefore, the unidirectional microphone cannot be used as an area pickup microphone. Further, since the bipolar microphone also does not pick up sounds from the lateral direction, it is not suitable for area sound collection processing as in the case of unidirectionality. The so-called super-directional microphone has sharp directivity to the front compared to other microphones and may be applicable to area sound collection.

超指向性マイクロホンには、構造的に干渉管を使う干渉管型と、マイクロホンユニットを２つ使う二次音圧傾度型がある。いずれの構造も超指向性マイクロホンとして一般的に使われているが、二次音圧傾度型マイクロホンは、減算型ＢＦを信号処理ではなく電気的に行なっているものに過ぎず、原理的にエリア収音処理における周波数特性などの課題は解決されない。一方、干渉管型マイクロホンは、側面にスリットを刻んだ長めの筒を専用マイクロホンユニットの先端に取り付けて、「音響的」に指向性を狭角にしているものである。そのため、干渉管型マイクロホンをエリア収音処理に用いた場合、減算型ＢＦ＋ＳＳ処理のような歪はほとんどなく、周波数特性も低域から高域まで可聴域ほぼ全域をカバーすることができる。 Superdirective microphones include an interference tube type that structurally uses an interference tube and a secondary sound pressure gradient type that uses two microphone units. Both structures are generally used as super-directional microphones, but the secondary sound pressure gradient type microphones are only those that perform subtractive BF electrically instead of signal processing, and in principle Problems such as frequency characteristics in sound collection processing cannot be solved. On the other hand, the interference tube type microphone has a long cylinder with slits on the side surface attached to the tip of a dedicated microphone unit to "acoustically" narrow the directivity. Therefore, when the interference tube type microphone is used for the area sound collection processing, there is almost no distortion as in the subtraction type BF+SS processing, and the frequency characteristic can cover almost the entire audible range from the low range to the high range.

そこで、この実施形態の収音装置では、複数の干渉管型の超指向性マイクロホンの出力を用いてエリア収音処理を行う構成とする。以下に、本発明の収音装置の具体的な構成の例について説明する。 Therefore, the sound collecting device of this embodiment is configured to perform the area sound collecting process by using the outputs of a plurality of interference tube type super-directional microphones. Hereinafter, an example of a specific configuration of the sound pickup device of the present invention will be described.

（Ａ−１）実施形態の構成
図１は、本発明の第１の実施形態に係る収音装置の構成を示すブロック図である。 (A-1) Configuration of Embodiment FIG. 1 is a block diagram showing a configuration of a sound collecting device according to a first embodiment of the present invention.

図１は、この実施形態に係る各装置の接続構成及び収音装置１０の機能的構成について示したブロック図である。 FIG. 1 is a block diagram showing a connection configuration of each device and a functional configuration of a sound collecting device 10 according to this embodiment.

収音装置１０は、２つの超指向性マイクロホンＭ（Ｍ１、Ｍ２）で捕捉した音響信号に基づいて、目的エリアを音源とする目的エリア音を収音して出力する装置である。以下では、収音装置１０が出力する信号を出力信号Ｚと呼ぶものとする。 The sound pickup device 10 is a device that picks up and outputs a target area sound having a target area as a sound source, based on the acoustic signals captured by the two super-directional microphones M (M1, M2). Hereinafter, the signal output by the sound collection device 10 will be referred to as an output signal Z.

図２は、超指向性マイクロホンＭ１、Ｍ２の構成例について示した図（斜視図）である。 FIG. 2 is a diagram (perspective view) showing a configuration example of superdirective microphones M1 and M2.

図２に示すように、この実施形態の超指向性マイクロホンＭ１、Ｍ２には、側面に複数のスリットＭＳが形成された管を用いて構成される干渉管型の構造となっている。図２では、超指向性マイクロホンＭ１、Ｍ２の指向性ＭＤの向き（方向）を点線の矢印で図示している。超指向性マイクロホンＭ１、Ｍ２は、干渉管型の構成であれば、具体的な形状（例えば、管やスリットの具体的形状）については図２の構成に限定されないものである。 As shown in FIG. 2, the super-directional microphones M1 and M2 of this embodiment have an interference tube type structure configured by using tubes having a plurality of slits MS formed on the side surfaces. In FIG. 2, the directions (directions) of the directional MDs of the super-directional microphones M1 and M2 are indicated by dotted arrows. Superdirective microphones M1 and M2 are not limited to the specific configurations (for example, specific configurations of tubes and slits) shown in FIG. 2 as long as they have an interference tube type configuration.

図３は、この実施形態における超指向性マイクロホンＭ１、Ｍ２の配置構成の例について示した説明図である。 FIG. 3 is an explanatory diagram showing an example of the arrangement configuration of the super-directional microphones M1 and M2 in this embodiment.

図３に示すように、超指向性マイクロホンＭ１、Ｍ２は、目的エリアが存在する空間の任意の場所に配置される。図３では、超指向性マイクロホンＭ１の指向性を点線で図示し、超指向性マイクロホンＭ２の指向性を一点鎖線で図示している。 As shown in FIG. 3, superdirective microphones M1 and M2 are arranged at arbitrary places in the space where the target area exists. In FIG. 3, the directivity of superdirective microphone M1 is shown by a dotted line, and the directivity of superdirective microphone M2 is shown by a dashed line.

図３に示すように、超指向性マイクロホンＭ１、Ｍ２は、各超指向性マイクロホンの指向性が目的エリアでのみ重なるような位置及び向き（指向性の方向）で配置されていれば良い。例えば、超指向性マイクロホンＭ１、Ｍ２は、目的エリアを挟んで対向に配置しても良い。また、超指向性マイクロホンの数は２つに限定するものではなく、目的エリアが複数存在する場合、全てのエリアをカバーできる数のマイクロホンを配置するようにしてもよい。 As shown in FIG. 3, superdirective microphones M1 and M2 may be arranged at positions and orientations (directivity directions) such that the directivity of each superdirective microphone overlaps only in the target area. For example, superdirective microphones M1 and M2 may be arranged opposite to each other with the target area in between. Further, the number of super-directional microphones is not limited to two, and when there are a plurality of target areas, a sufficient number of microphones may be arranged to cover all areas.

次に、図１を用いて収音装置１０の内部構成について説明する。 Next, the internal configuration of the sound collecting device 10 will be described with reference to FIG.

収音装置１０は、信号入力部１０１、周波数変換部１０２、振幅補正係数算出部１０３、及び目的エリア音抽出部１０４を有している。収音装置１０を構成する各要素の詳細については後述する。 The sound pickup device 10 includes a signal input unit 101, a frequency conversion unit 102, an amplitude correction coefficient calculation unit 103, and a target area sound extraction unit 104. Details of each element constituting the sound collection device 10 will be described later.

次に、図４を用いて、収音装置１０のハードウェア構成について説明する。 Next, the hardware configuration of the sound collection device 10 will be described with reference to FIG.

収音装置１０は、全てハードウェア（例えば、専用チップ等）により構成するようにしてもよいし一部又は全部についてソフトウェア（プログラム）として構成するようにしてもよい。収音装置１０は、例えば、プロセッサ及びメモリを有するコンピュータにプログラム（実施形態の収音プログラムを含む）をインストールすることにより構成するようにしてもよい。 The sound pickup device 10 may be entirely configured by hardware (for example, a dedicated chip or the like), or a part or all of the sound collection device 10 may be configured as software (program). The sound collecting device 10 may be configured, for example, by installing a program (including the sound collecting program of the embodiment) in a computer having a processor and a memory.

図４は、収音装置１０のハードウェア構成の例について示したブロック図である。 FIG. 4 is a block diagram showing an example of the hardware configuration of the sound collection device 10.

図４では、収音装置１０を、ソフトウェア（コンピュータ）を用いて構成する際のハードウェア構成の例について示している。 FIG. 4 shows an example of a hardware configuration when the sound collection device 10 is configured using software (computer).

図４に示す収音装置１０は、ハードウェア的な構成要素として、プログラム（実施形態の収音プログラムを含む）がインストールされたコンピュータ２００を有している。なお、コンピュータ２００に、アナログ信号（超指向性マイクロホンＭ１、Ｍ２から供給される信号）をデジタル信号に変換する変換手段が搭載されていない場合、収音装置１０に別途図示しない変換手段を搭載するようにしてもよい。また、コンピュータ２００は、収音プログラム専用のコンピュータとしてもよいし、他の機能のプログラムと共用される構成としてもよい。 The sound collecting device 10 illustrated in FIG. 4 includes a computer 200 in which a program (including the sound collecting program according to the embodiment) is installed as a hardware component. If the computer 200 does not include a conversion unit that converts an analog signal (a signal supplied from the super-directional microphones M1 and M2) into a digital signal, the sound collection device 10 includes a conversion unit (not shown). You may do it. Further, the computer 200 may be a computer dedicated to a sound collection program, or may be configured to be shared with a program having another function.

図４に示すコンピュータ２００は、プロセッサ２０１、一次記憶部２０２、及び二次記憶部２０３を有している。一次記憶部２０２は、プロセッサ２０１の作業用メモリ（ワークメモリ）として機能する記憶手段であり、例えば、ＤＲＡＭ（ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等の高速動作するメモリを適用することができる。二次記憶部２０３は、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）やプログラムデータ（実施形態に係る収音プログラムのデータを含む）等の種々のデータを記録する記憶手段であり、例えば、ＦＬＡＳＨメモリやＨＤＤ等の不揮発性メモリを適用することができる。この実施形態のコンピュータ２００では、プロセッサ２０１が起動する際、二次記憶部２０３に記録されたＯＳやプログラム（実施形態に係る収音プログラムを含む）を読み込み、一次記憶部２０２上に展開して実行する。 The computer 200 illustrated in FIG. 4 includes a processor 201, a primary storage unit 202, and a secondary storage unit 203. The primary storage unit 202 is a storage unit that functions as a work memory (work memory) of the processor 201. For example, a high-speed operating memory such as a DRAM (Dynamic Random Access Memory) can be applied. The secondary storage unit 203 is a storage unit that records various data such as an OS (Operating System) and program data (including data of the sound collection program according to the embodiment), and is a nonvolatile memory such as a FLASH memory or an HDD. Sex memory can be applied. In the computer 200 of this embodiment, when the processor 201 is activated, the OS and programs recorded in the secondary storage unit 203 (including the sound collection program according to the embodiment) are read and expanded on the primary storage unit 202. Execute.

なお、コンピュータ２００の具体的な構成は図４の構成に限定されないものであり、種々の構成を適用することができる。例えば、一次記憶部２０２が不揮発メモリ（例えば、ＦＬＡＳＨメモリ等）であれば、二次記憶部２０３については除外した構成としてもよい。 The specific configuration of the computer 200 is not limited to the configuration shown in FIG. 4, and various configurations can be applied. For example, if the primary storage unit 202 is a non-volatile memory (for example, a FLASH memory or the like), the secondary storage unit 203 may be excluded.

（Ａ−２）実施形態の動作
次に、以上のような構成を有するこの実施形態における収音装置１０の動作を説明する。 (A-2) Operation of Embodiment Next, the operation of the sound collecting device 10 in this embodiment having the above-described configuration will be described.

信号入力部１０１は、超指向性マイクロホンＭ１、Ｍ２で収音した音響信号を、それぞれアナログ信号からデジタル信号ｙ_１、ｙ_２に変換する。 The signal input unit 101 converts the acoustic signals picked up by the super-directional microphones M1 and M2 from analog signals into digital signals y ₁ and y ₂ , respectively.

周波数変換部１０２は、入力信号ｙ_１、ｙ_２を、それぞれ時間領域から周波数領域の信号Ｙ_１（ｎ）、Ｙ_２（ｎ）に変換する。周波数変換部１０２は、例えば、高速フーリエ変換（ＦＦＴ：ＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ）を用いて、時間領域の信号ｙ_１、ｙ_２を周波数領域の信号Ｙ_１（ｎ）、Ｙ_２（ｎ）に変換する。 The frequency conversion unit 102 converts the input signals y ₁ and y ₂ from time domain signals into frequency domain signals Y ₁ (n) and Y ₂ (n), respectively. The frequency transforming unit 102 transforms the time domain signals y ₁ and y ₂ into frequency domain signals Y ₁ (n) and Y ₂ (n) using, for example, a fast Fourier transform (FFT). ..

ある特定のエリア内に存在する音（目的エリア音）だけを収音したい場合、マイクロホンの指向性を向けるだけでは、そのエリアと同一方向の線上に存在する音源（非目的エリア音）も収音してしまう。そこで、この実施形態の収音装置１０では、特許文献２で提案されている、「複数のマイクロホンアレイを用い、それぞれ別々の方向から目的エリアへ指向性を向け、指向性を目的エリアで交差させることで目的エリア音を収音する（エリア収音処理）手法」を用いるものとする。ただし、この実施形態で用いられる干渉型の超指向性マイクロホンＭ１、Ｍ２はそれ自体が鋭い指向性を有しているため、この実施形態の収音装置１０では、エリア収音処理の過程で、特許文献２におけるＢＦとＳＳによって目的音方向に指向性を形成する処理は必要としない。 If you want to pick up only the sound that exists in a certain area (target area sound), you can also pick up the sound source (non-target area sound) that exists on the line in the same direction as that area simply by pointing the microphone directivity. Resulting in. Therefore, in the sound collecting device 10 of this embodiment, “a plurality of microphone arrays are used, directivity is directed from different directions to the target area, and the directivity is crossed in the target area, is proposed in Patent Document 2. Therefore, the method of collecting the target area sound (area sound collection processing)” is used. However, since the interference-type super-directional microphones M1 and M2 used in this embodiment have sharp directivity themselves, the sound pickup device 10 of this embodiment causes The process of forming directivity in the target sound direction by BF and SS in Patent Document 2 is not necessary.

エリア収音処理によって目的音を抽出するためには、指向性入力信号Ｙ_１（ｎ）、Ｙ_２（ｎ）の夫々に含まれる目的音エリア音成分のパワーが同じになっている必要がある。そこで、振幅補正係数算出部１０３では、超指向性マイクロホンＭ１、Ｍ２と目的エリアとの距離の違いによって生じる目的エリア音成分の大きさの差異を補正する振幅補正係数を算出する。振幅補正係数算出部１０３における補正係数の算出方法は、種々考えられるが、ここでは周波数毎に振幅スペクトルの比率を算出し、その最頻値を補正係数とする（以下の（８）式、（９）式参照）。

In order to extract the target sound by the area sound collection processing, it is necessary that the power of the target sound area sound component included in each of the directional input signals Y ₁ (n) and Y ₂ (n) be the same. .. Therefore, the amplitude correction coefficient calculation unit 103 calculates an amplitude correction coefficient for correcting the difference in the magnitude of the target area sound component caused by the difference in the distance between the superdirective microphones M1 and M2 and the target area. There are various conceivable methods of calculating the correction coefficient in the amplitude correction coefficient calculation unit 103, but here, the ratio of the amplitude spectrum is calculated for each frequency, and the mode value thereof is used as the correction coefficient (the following expression (8), ( 9) Formula reference).

目的エリア音抽出部１０４は、超指向性マイクロホンＭ１、Ｍ２の各時間周波数変換データＹ_１（ｎ）、Ｙ_２（ｎ）を（１０）式若しくは（１１）式に従いＳＳし、目的エリア方向に存在する非目的エリア音Ｎ_１（ｎ）、Ｎ_２（ｎ）を抽出する。
Ｎ_１（ｎ）＝Ｙ_１（ｎ）−α_２（ｎ）Ｙ_２（ｎ） …（１０）
Ｎ_２（ｎ）＝Ｙ_２（ｎ）−α_１（ｎ）Ｙ_１（ｎ） …（１１） The target area sound extraction unit 104 SSs the time-frequency converted data Y ₁ (n) and Y ₂ (n) of the super-directional microphones M1 and M2 according to the expression (10) or the expression (11), and moves in the direction of the target area. The existing non-target area sounds N ₁ (n) and N ₂ (n) are extracted.
N ₁ (n)=Y ₁ (n)-α ₂ (n)Y ₂ (n) (10)
N ₂ (n)=Y ₂ (n)-α ₁ (n)Y ₁ (n) (11)

その後、目的エリア音抽出部１０４は、以下の（１２）式、（１３）式に従い、各ＢＦ出力から非目的エリア音をＳＳして目的エリア音を抽出する。以下の（１２）式、（１３）式において、γ_１（ｎ）、γ_２（ｎ）はＳＳ時の強度を変更するための係数である。
Ｚ_１（ｎ）＝Ｙ_１（ｎ）−γ_１（ｎ）Ｎ_１（ｎ） …（１２）
Ｚ_２（ｎ）＝Ｙ_２（ｎ）−γ_２（ｎ）Ｎ_２（ｎ） …（１３） After that, the target area sound extraction unit 104 extracts the target area sound by SS of the non-target area sound from each BF output according to the following expressions (12) and (13). In the following equations (12) and (13), γ ₁ (n) and γ ₂ (n) are coefficients for changing the strength during SS.
Z ₁ (n)=Y ₁ (n)−γ ₁ (n)N ₁ (n) (12)
Z ₂ (n)=Y ₂ (n)−γ ₂ (n)N ₂ (n) (13)

（Ａ−３）実施形態の効果
この実施形態によれば、以下のような効果を奏することができる。 (A-3) Effects of the Embodiment According to this embodiment, the following effects can be achieved.

この実施形態の収音装置１０では、マイクロホンアレイを用いる代わりに干渉管型の超指向性マイクロホンを用いている。 In the sound collection device 10 of this embodiment, an interference tube type super directional microphone is used instead of the microphone array.

これにより、この実施形態の収音装置１０では、干渉管型超指向性マイクロホンを用いることで、ＢＦ＋ＳＳによって生じる処理歪、低域周波数特性成分の欠落、高周波数域の制約などが改善され、抽出される目的音の音質が向上する。 Thereby, in the sound collecting device 10 of this embodiment, by using the interference tube type super directional microphone, the processing distortion caused by BF+SS, the lack of the low frequency characteristic component, the restriction of the high frequency region, etc. are improved, and the extraction is performed. The sound quality of the target sound is improved.

また、この実施形態の収音装置１０では、干渉管型超指向性マイクロホンを用いることで、従来（マイクロホンアレイを用いる場合）より指向性形成のためのＢＦ＋ＳＳ処理が不要であり、処理構成が大幅に簡素化することができる。 Further, in the sound collecting device 10 of this embodiment, by using the interference tube type super directional microphone, the BF+SS process for forming the directivity is not required as compared with the conventional case (when the microphone array is used), and the processing configuration is significantly large. Can be simplified.

（Ｂ）他の実施形態
本発明は、上記の各実施形態に限定されるものではなく、以下に例示するような変形実施形態も挙げることができる。 (B) Other Embodiments The present invention is not limited to each of the above-described embodiments, but may include modified embodiments as exemplified below.

（Ｂ−１）上記の実施形態では、収音装置１０の外部に、超指向性マイクロホンＭ１、Ｍ２が接続された構成として説明したが、収音装置１０自体に超指向性マイクロホンＭ１、Ｍ２を搭載した構成としてもよい。 (B-1) In the above-described embodiment, the super-directional microphones M1 and M2 are connected to the outside of the sound collection device 10, but the super-directional microphones M1 and M2 are connected to the sound collection device 10 itself. It may be mounted.

１０…収音装置、１０１…信号入力部、１０２…周波数変換部、１０３…振幅補正係数算出部、１０４…目的エリア音抽出部、Ｍ１、Ｍ２…超指向性マイクロホン。 10... Sound collection device, 101... Signal input part, 102... Frequency conversion part, 103... Amplitude correction coefficient calculation part, 104... Target area sound extraction part, M1, M2... Super directional microphone.

Claims

Non-target area sound extraction means for extracting the non-target area sound existing in the target area direction as viewed from each of the directional microphones by spectrally subtracting the input signals input to the plurality of directional microphones. ,
A target area sound extracting means for extracting a target area sound by spectrally subtracting the non-target area sound from the input signal.

The sound pickup device according to claim 1, wherein the directional microphone is an interference tube type.

Computer,
Non-target area sound extraction means for extracting the non-target area sound existing in the target area direction as viewed from each of the directional microphones by spectrally subtracting the input signals input to the plurality of directional microphones. ,
A sound collection program which functions as a target area sound extraction means for extracting a target area sound by spectrally subtracting the non-target area sound from the input signal.

In the sound collecting method performed by the sound collecting device,
The sound collecting device has a non-target area sound extraction means and a target area sound extraction means,
The non-target area sound extraction means performs spectrum subtraction on the input signals to which a plurality of directional microphones are input, so that the non-target area sounds existing in the target area direction when viewed from each of the directional microphones. Extract and
The target area sound extraction means extracts a target area sound by spectrally subtracting the non-target area sound from the input signal.