JP2000081900A

JP2000081900A - Sound absorbing method, and device and program recording medium therefor

Info

Publication number: JP2000081900A
Application number: JP10252282A
Authority: JP
Inventors: Mariko Aoki; 真理子青木; Shigeaki Aoki; 茂明青木; Hiroyuki Matsui; 弘行松井
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1998-09-07
Filing date: 1998-09-07
Publication date: 2000-03-21
Anticipated expiration: 2018-09-07
Also published as: JP3435357B2

Abstract

PROBLEM TO BE SOLVED: To suppress even such omnidirectional noise as an entire room is filled therewith. SOLUTION: Outputs of microphones 1, 2 are divided 4 into each frequency bands, and varied according to the positions of the microphones 1, 2. A difference between parameter values of each sound signal reaching the microphones is detected 3, and based on the detected difference, sound sources are separated by selecting a frequency component of each sound signal, and desired sound and undesired sound are discriminated 6 from each other from the difference between both frequency characteristics, and the undesired sound is suppressed on frequency base 7 and the output is synthesized with the sound source signal. The undesired signal is suppressed also on time base by paying attention to the difference in the periodicity of the waveforms between the desired and undesired sound.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は複数のマイクロホ
ンを用いて、複数の音源から少なくとも一つの音源を分
離する収音方法、その装置及びプログラム記録媒体に関
する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a sound collecting method for separating at least one sound source from a plurality of sound sources using a plurality of microphones, an apparatus therefor, and a program recording medium.

【０００２】[0002]

【従来の技術】従来、例えば通話のために音声を収音し
たり、音声認識や話者認識などの認識過程を動作させる
ためには、その入力信号に雑音が混じらず、クリーンな
状態で収音されている必要があった。しかし、我々が実
際に収音したり、認識過程を用いる場合、目的音声のみ
をクリアに収音することは一般に困難である。そのた
め、収音信号のＳ／Ｎを改善させる方法や、認識過程に
雑音耐性を持たせる技術が開発されてきた。特に、周波
数特性が未知な非定常雑音や、周波数特性が目的音声と
類似した雑音（例えば、雑音もまた音声である場合な
ど）については、各音源の周波数特性を、各音源の位置
の違いを利用して算出し、各音源の信号を分離抽出する
ことで信号のＳ／Ｎを改善させる方法が開発されている
（音源分離方法、装置および記録媒体：特願平０９−２
５２３１２，平成８年９月日本音響学会講演論文集４８
９〜４９０頁１−７−１３「チャネル間の情報に着目
した２音源分離手法の検討」）。さらに、この従来の方
法では、信号の周波数成分それぞれを、一つの音源の周
波数成分しか含まない程度に細かく帯域分割すること
で、周波数の選択のみで複数音源を分離抽出することが
可能である。2. Description of the Related Art Conventionally, for example, in order to collect voice for a call or to operate a recognition process such as voice recognition or speaker recognition, the input signal does not contain noise and is collected in a clean state. It had to be sounded. However, when we actually pick up a sound or use a recognition process, it is generally difficult to pick up only the target sound clearly. Therefore, a method for improving the S / N of a picked-up signal and a technique for giving noise resistance to a recognition process have been developed. In particular, for non-stationary noise whose frequency characteristics are unknown or noise whose frequency characteristics are similar to the target voice (for example, when the noise is also a voice), the frequency characteristics of each sound source and the difference in the position of each sound source are determined. A method of improving the S / N ratio of a signal by calculating and utilizing the signal of each sound source has been developed (a sound source separating method, device and recording medium: Japanese Patent Application No. 09-2).
52212, Sep. 1996 Proceedings of the Acoustical Society of Japan 48
Pages 9 to 490 1-7-13 "Study of two sound source separation method focusing on information between channels"). Furthermore, in this conventional method, it is possible to separate and extract a plurality of sound sources only by selecting a frequency by dividing the frequency components of a signal into bands so fine that only the frequency components of one sound source are included.

【０００３】[0003]

【発明が解決しようとする課題】しかし、従来の方法で
は、音源の空間的配置の違いを利用するため、例えば部
屋全体に充満するような方向性のない雑音の抑圧は困難
であった。そこで、この発明では、従来の方法に、方向
性のない雑音抑圧を行う機能を加えることにより、方向
性のある雑音も方向性のない雑音も抑圧し、さらにＳ／
Ｎ改善効果を高めることを可能にする。However, in the conventional method, since the difference in the spatial arrangement of the sound sources is used, it is difficult to suppress nondirectional noise that fills the entire room, for example. Therefore, in the present invention, a function of performing non-directional noise suppression is added to the conventional method, thereby suppressing both directional and non-directional noise.
N improves the effect of improvement.

【０００４】[0004]

【課題を解決するための手段】この発明によれば、互い
に離して設けられた複数のマイクロホンを用い、複数の
マイクロホンの位置に起因して変化するマイクロホンに
到達する各音響信号のパラメータ値の差をもとに、各音
響信号の周波数成分を選択して、各音源を分離抽出す
る。加えて、目的音声と雑音との周波数特性の違いを検
出し、方向性のない雑音を周波数軸上で抑圧する。さら
に、目的音と雑音の波形上の周期性の違いに着目して、
時間軸上でも不要な信号を抑圧する。According to the present invention, a plurality of microphones provided apart from each other are used, and a difference between parameter values of respective acoustic signals reaching the microphones which varies depending on the positions of the plurality of microphones. , The frequency components of each acoustic signal are selected, and each sound source is separated and extracted. In addition, a difference in frequency characteristics between the target voice and noise is detected, and noise having no direction is suppressed on the frequency axis. Furthermore, paying attention to the difference in periodicity between the target sound and the noise waveform,
Unnecessary signals are suppressed even on the time axis.

【０００５】この発明の従来技術との差は、従来は各音
源の空間的配置の違いのみを利用したものや、目的音と
雑音の周波数特性の違いのみを利用したものしかなかっ
たのに対し、この発明では両方の処理を組み合わせるこ
とにより、さらに信号のＳ／Ｎ改善が図れているところ
にある。The difference between the prior art of the present invention and the prior art is that in the prior art, only the difference in the spatial arrangement of each sound source was used or only the difference in the frequency characteristics between the target sound and noise was used. In the present invention, the S / N of the signal is further improved by combining both processes.

【０００６】[0006]

【発明の実施の形態】まず請求項１の発明の実施例につ
いて説明する。図１にこの実施例の機能構成の一例を示
す。マイクロホン１，２が間隔、例えば２０ｃｍ程度を
空けて配され、これらマイクロホン１，２はそれぞれ音
源Ａ，Ｂからの音響信号を収集して電気信号に変換す
る。マイクロホン１の出力をＬチャネル信号と、マイク
ロホン２の出力をＲチャネル信号と称する。Ｌチャネル
信号とＲチャネル信号は、帯域別チャネル間パラメータ
値差検出部３中のチャネル間時間差／レベル差検出部３
０１と、帯域分割部４へ供給される。帯域分割部４では
Ｌチャネル信号とＲチャネル信号はそれぞれ複数の周波
数帯域信号に分割されて帯域別チャネル間時間差／レベ
ル差検出部３０２と音源判定信号選別部５へ供給され
る。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS First, an embodiment of the present invention will be described. FIG. 1 shows an example of the functional configuration of this embodiment. Microphones 1 and 2 are arranged at intervals, for example, about 20 cm, and these microphones 1 and 2 collect acoustic signals from sound sources A and B, respectively, and convert them into electric signals. The output of the microphone 1 is called an L channel signal, and the output of the microphone 2 is called an R channel signal. The L-channel signal and the R-channel signal are output from the inter-channel time difference / level difference detection unit 3 in the band-specific inter-channel parameter value difference detection unit 3.
01 and supplied to the band division unit 4. In the band division unit 4, the L channel signal and the R channel signal are divided into a plurality of frequency band signals, respectively, and supplied to the band-by-band time difference / level difference detection unit 302 and the sound source determination signal selection unit 5.

【０００７】検出部３０１，３０２の各検出出力に応じ
て音源判定信号選別部５において各帯域毎にいずれかの
チャネル信号がＡ成分またはＢ成分として選別される。
これら選択された帯域毎のＡ成分信号、Ｂ成分信号につ
いて、不要成分識別部６で、目的音と雑音の周波数特性
の違いを検出し、雑音成分が識別される。識別された雑
音成分は、不要成分減衰部７で減衰される。最後に、選
択された帯域毎のＡ成分信号、Ｂ成分信号はそれぞれ音
源信号合成部８でそれぞれ合成されて、音源信号Ａと音
源信号Ｂとに分離出力される。[0007] In accordance with each detection output of the detection units 301 and 302, the sound source determination signal selection unit 5 selects one of the channel signals as an A component or a B component for each band.
With respect to the A component signal and the B component signal for each of the selected bands, the unnecessary component identification unit 6 detects a difference between the frequency characteristics of the target sound and the noise, and identifies the noise component. The identified noise component is attenuated by the unnecessary component attenuator 7. Finally, the A component signal and the B component signal for each of the selected bands are respectively synthesized by the sound source signal synthesizing unit 8 and separated and output as a sound source signal A and a sound source signal B.

【０００８】帯域別チャネル間時間差を求めるパラメー
タとしては、例えば信号を周波数分解した際の位相差を
用いる。また、帯域別チャネル間レベル差を求めるパラ
メータとしては、例えば、各チャネルのパワースペクト
ルの差を用いる。チャネル間時間差を求めるパラメータ
としては、例えば、チャネル間の相互相関を用いる。帯
域分割部４の帯域分割の方法は、たとえば離散的フーリ
エ変換して周波数領域信号に変換した後、各周波数帯域
に分割することにより行い、Ｌチャネル信号とＲチャネ
ル信号Ｒはそれぞれ帯域信号Ｌ（ｆ１）〜Ｌ（ｆｎ）と
Ｒ（ｆ１）〜Ｒ（ｆｎ）に分割される。As a parameter for obtaining the time difference between channels for each band, for example, a phase difference when a signal is frequency-decomposed is used. In addition, as a parameter for obtaining a level difference between channels for each band, for example, a difference in power spectrum of each channel is used. As a parameter for obtaining the inter-channel time difference, for example, cross-correlation between channels is used. The band dividing method of the band dividing unit 4 is performed, for example, by performing a discrete Fourier transform to convert to a frequency domain signal, and then dividing the signal into each frequency band. The L channel signal and the R channel signal R are respectively converted into band signals L ( f1) to L (fn) and R (f1) to R (fn).

【０００９】音源判定信号選別部５は、帯域別チャネル
間パラメータ値差検出部３で検出された値を用いて、各
帯域信号Ｌ（ｆ１）〜Ｌ（ｆｎ）とＲ（ｆ１）〜Ｒ（ｆ
ｎ）との各対応するものについていずれを選択するかの
判定を音源信号判定部５０１で行う。たとえば、Ｌ側の
マイクロホン１の近くに音源Ａが、Ｒ側のマイクロホン
２の近くに音源Ｂがある場合、ｉ番目の帯域において帯
域別チャネル間時間差および帯域別チャネル間レベル差
が正の値（ただし、帯域別チャネル間レベル差、時間差
はそれぞれＬ側の値からＲ側の値を引いた場合である）
となった場合、Ｒ（ｆｉ）を減衰させることを音源信号
選択部５０２で行う。The sound source determination signal selection unit 5 uses the values detected by the band-based inter-channel parameter value difference detection unit 3 to output the band signals L (f1) to L (fn) and R (f1) to R ( f
The sound source signal determination unit 501 determines which of the corresponding items in (n) is to be selected. For example, when the sound source A is near the microphone 1 on the L side and the sound source B is near the microphone 2 on the R side, the time difference between the channels and the level difference between the channels in the ith band are positive values ( However, the level difference between channels and the time difference for each band are obtained by subtracting the value on the R side from the value on the L side.)
When, the sound source signal selection unit 502 attenuates R (fi).

【００１０】不要成分識別部６では、帯域分割部で周波
数分解した信号に対し、目的音の周波数成分と不要音周
波数成分との識別を、それぞれの周波数特性の違いを利
用して行う。次に、不要成分減衰部７では、不要と識別
された周波数成分を減衰させる。音源信号合成部８で
は、出力された周波数成分を音響信号に再合成し、例え
ば逆フーリエ変換されて時間波形に戻される。以上の処
理により、複数の音が同時に発声される環境でも、各信
号を分離抽出することにより信号のＳ／Ｎを改善するこ
とができる。不要成分識別部６と不要成分減衰部７の詳
細な動作例は、以下に示す。The unnecessary component identifying section 6 identifies the frequency component of the target sound and the unnecessary sound frequency component of the signal that has been frequency-decomposed by the band dividing section by utilizing the difference in the frequency characteristics of each signal. Next, the unnecessary component attenuating unit 7 attenuates the frequency component identified as unnecessary. The sound source signal synthesizing unit 8 re-synthesizes the output frequency component into an audio signal, and performs, for example, inverse Fourier transform to return to a time waveform. By the above processing, even in an environment where a plurality of sounds are uttered simultaneously, the S / N of the signal can be improved by separating and extracting each signal. Detailed operation examples of the unnecessary component identifying unit 6 and the unnecessary component attenuating unit 7 will be described below.

【００１１】上記不要成分識別部６と不要成分減衰部７
の設置箇所としては、できるだけ後段に入れる方が処理
量が軽くて望ましい。（例えば、図１に示したように、
音源信号選択部５と音源合成部８との間に入れるな
ど。）なぜなら、全周波数成分ではなく、各音源の周波
数成分として選択された成分に対してのみ、不要音識別
処理および不要音減衰処理を加えた方が処理する周波数
成分の個数が少なくてすみ、軽い処理で実現可能なため
である。しかし、他の設置箇所でも同様の機能を実現可
能なことから、不要成分識別部６と不要成分減衰部７
は、帯域分割部４と帯域別チャネル間パラメータ値差検
出部３との間、または、上記帯域別チャネル間パラメー
タ値差検出過程と上記音源信号判定過程との間、また
は、音源信号判定部５０１と音源信号選択部５０２との
間、または、音源信号選択部５０２と音源合成部８との
間にというように、どこに、いくつ置いても良い。The unnecessary component identifying section 6 and the unnecessary component attenuating section 7
It is desirable to place it in the latter stage as much as possible, since the processing amount is light. (For example, as shown in FIG. 1,
For example, between the sound source signal selector 5 and the sound source synthesizer 8. Because the unnecessary sound identification processing and the unnecessary sound attenuating processing are applied to only the components selected as the frequency components of each sound source instead of all the frequency components, the number of frequency components to be processed can be reduced and the weight is light. This is because it can be realized by processing. However, since the same function can be realized in other installation locations, the unnecessary component identification unit 6 and the unnecessary component attenuation unit 7
Is used between the band division unit 4 and the band-based inter-channel parameter value difference detection unit 3, or between the band-based inter-channel parameter value difference detection process and the sound source signal determination process, or the sound source signal judgment unit 501. Any number may be placed anywhere between the sound source signal selection unit 502 and the sound source signal selection unit 502 and the sound source synthesis unit 8.

【００１２】この実施例では、Ｌチャネル信号とＲチャ
ネル信号のチャネル間時間差／レベル差は、チャネル間
時間差／レベル差検出部３０１により、推定される構成
になっているが、Ｌチャネル信号とＲチャネル信号のチ
ャネル間時間差／レベル差は既知として、その値をデー
タとして音源判定信号選別部５に与えてもよい。次に、
不要成分識別部６、不要成分減衰部７について説明す
る。In this embodiment, the inter-channel time difference / level difference between the L channel signal and the R channel signal is estimated by the inter channel time difference / level difference detection section 301. The inter-channel time difference / level difference of the channel signal may be known, and the value may be provided to the sound source determination signal selection unit 5 as data. next,
The unnecessary component identifying unit 6 and the unnecessary component attenuating unit 7 will be described.

【００１３】ここでは例として、不要成分識別部６およ
び不要成分減衰部７を帯域分割部４と帯域別チャネル間
パラメータ値差検出部３との間に置いた場合の方法を述
べる。まず、図２にその処理の流れを示し、これに従い
説明を行う。この実施例では、目的音と雑音の統計的性
質の違いを利用して雑音抑圧を行う。そこで、例とし
て、抑圧対象の信号を広帯域信号として説明する。ここ
では、ある値以上のパワーを持つ周波数成分のなかから
広帯域成分を判定する必要がある。まず、帯域分割部４
から周波数成分を受け取る（２−０１）。その受け取っ
た各周波数成分について、各帯域のパワー（Ｐｏｗ（ｋ
ｋ））に関してヒストグラムをとる（２−０２）。ヒス
トグラムは、パワーがある範囲以内に入る帯域数をカウ
ントする。信号に広帯域ノイズが含まれない場合の周波
数特性の例を図３に、そのヒストグラムの例を図４Ａ
に、そのパワー区間のパワーを図４Ｂに示す。さらに、
広帯域ノイズが含まれた信号の周波数特性の例を図５
に、そのヒストグラムの例を図６に示す。Here, as an example, a method in the case where the unnecessary component identifying unit 6 and the unnecessary component attenuating unit 7 are placed between the band dividing unit 4 and the band-based inter-channel parameter value difference detecting unit 3 will be described. First, FIG. 2 shows a flow of the processing, and the description will be made according to the flow. In this embodiment, noise suppression is performed using the difference in statistical properties between the target sound and noise. Therefore, as an example, a signal to be suppressed will be described as a wideband signal. Here, it is necessary to determine a broadband component from frequency components having power equal to or higher than a certain value. First, the band division unit 4
(2-01). For each of the received frequency components, the power (Pow (k
k)), and a histogram is taken (2-02). The histogram counts the number of bands that fall within a certain range of power. FIG. 3 shows an example of a frequency characteristic when a signal does not include wideband noise, and FIG. 4A shows an example of a histogram thereof.
FIG. 4B shows the power in the power section. further,
FIG. 5 shows an example of a frequency characteristic of a signal including broadband noise.
FIG. 6 shows an example of the histogram.

【００１４】広帯域ノイズが含まれない場合には、図３
に示したように調波構造以外の成分は小さいパワーしか
持たず、したがって、図４のヒストグラムでも、調波構
造が持つパワー以下の部分で、度数が突出する区間は、
パワーが非常に小さな区間（図４Ａでは、区間１）に限
られる。しかし、広帯域ノイズが含まれる場合は、図５
に示したように、信号の調波成分以外のところに比較的
大きなパワーを持つ帯域が多数存在する。よって、図６
のヒストグラムでも、例えば斜線部（区間３）のよう
に、調波構造が持つパワーより小さい区間のうち、比較
的大きなパワーを持つ区間の度数が突出する。When no broadband noise is included, FIG.
As shown in FIG. 4, components other than the harmonic structure have only a small power. Therefore, even in the histogram of FIG.
It is limited to a section where the power is very small (section 1 in FIG. 4A). However, when broadband noise is included, FIG.
As shown in (1), there are many bands having relatively large powers other than the harmonic components of the signal. Therefore, FIG.
Also in the histogram of, the frequency of a section having a relatively large power is prominent among sections smaller than the power of the harmonic structure, such as a hatched portion (section 3).

【００１５】そこで、あらかじめ決めておいた範囲（パ
ワー区間）（図６では区間３）に入る帯域数が、所定値
（しきい値）ｘを越えた場合（２−０３）、不要成分が
含まれていると判定し（２−０４）、不要成分減衰過程
に入る（２−０５）。不要成分減衰過程（２−０５）で
は、ヒストグラムが突出した区間（図６の場合は区間
３）のパワーを持つ帯域を減衰させる。減衰させる方法
としては、例えば、あらかじめ決めておいた値α（０＜
＝α＜１）を、減衰させる帯域の周波数成分に掛ける。
ここで、αの値は、全帯域同じ値でも、帯域ごとに変え
ても良い。また、減衰させる帯域を、ヒストグラムが突
出した区間に含まれる帯域だけでなく、例えば、ヒスト
グラムが突出する区間と同程度か、それ以下のパワーを
持つ帯域全てにしても良い。以上の方法により、不要成
分識別部６で広帯域雑音の有無を判定し、不要成分減衰
部７で広帯域雑音成分を減衰させることができる。Therefore, if the number of bands falling within a predetermined range (power section) (section 3 in FIG. 6) exceeds a predetermined value (threshold) x (2-03), unnecessary components are included. It is determined that it has been performed (2-04), and an unnecessary component attenuation process is started (2-05). In the unnecessary component attenuating step (2-05), the band having the power of the section where the histogram is prominent (section 3 in FIG. 6) is attenuated. As a method of attenuating, for example, a predetermined value α (0 <0 <
= Α <1) is multiplied by the frequency component of the band to be attenuated.
Here, the value of α may be the same value for all bands or may be changed for each band. Further, the band to be attenuated may be not only the band included in the section where the histogram protrudes, but also, for example, all the bands having power equal to or less than the section where the histogram protrudes. According to the above method, the presence / absence of broadband noise can be determined by the unnecessary component identification unit 6 and the wideband noise component can be attenuated by the unnecessary component attenuator 7.

【００１６】ここで、注意として、不要成分識別部６と
不要成分減衰部７は、帯域分割部４と帯域別チャネル間
パラメータ値差検出部３との間、または、帯域別チャネ
ル間パラメータ値差検出部３と音源信号判定部５０１と
の間、または、音源信号判定部５０１と音源信号選択部
５０２との間、または、上記音源信号選択部５０２と音
源合成部８との間のうち、どこに、いくつ置いても良
い。例えば、音源信号選択部５０２と音源合成部８との
間に置く場合と、帯域分割部４と上記帯域別チャネル間
パラメータ値差検出部３との間におく場合とで比較す
る。前者の利点として、既に各音源の周波数成分として
選択されているもののみに対して識別するため識別する
帯域数が少なくなり、処理が軽くなる利点がある。Here, it should be noted that the unnecessary component identifying unit 6 and the unnecessary component attenuating unit 7 are provided between the band dividing unit 4 and the band-specific inter-channel parameter value difference detecting unit 3 or the band-specific inter-channel parameter value difference. Where between the detection unit 3 and the sound source signal judgment unit 501, between the sound source signal judgment unit 501 and the sound source signal selection unit 502, or between the sound source signal selection unit 502 and the sound source synthesis unit 8 , You can put any number. For example, comparison is made between the case where the signal is placed between the sound source signal selection unit 502 and the sound source synthesis unit 8 and the case where the signal is placed between the band division unit 4 and the band-based inter-channel parameter value difference detection unit 3. As an advantage of the former, there is an advantage that the number of bands to be identified is reduced because only those that have already been selected as frequency components of each sound source are identified, and processing is lightened.

【００１７】次に他の実施例について説明する。この実
施例では、目的音の調波構造を利用して目的音と雑音を
識別し、雑音抑圧を行う。よって、例として、雑音の対
象を周期性ノイズや準周期性ノイズ（たとえば、音声や
動物の鳴き声等）とする。この場合、音源信号判定部５
０１で判定された成分（これが、各音源の周波数成分を
主として含む）から、音源信号の調波構造を推定し、そ
の構造以外の周波数成分において大きなパワーを持つも
のは、ノイズと判定する。図７に処理手順を示す。以
下、この手順にしたがって説明する。Next, another embodiment will be described. In this embodiment, the target sound is distinguished from noise using the harmonic structure of the target sound, and noise suppression is performed. Therefore, as an example, the target of the noise is periodic noise or quasi-periodic noise (for example, voice or animal cry). In this case, the sound source signal determination unit 5
The harmonic structure of the sound source signal is estimated from the components determined in step 01 (which mainly include the frequency components of each sound source), and those having large power in other frequency components are determined as noise. FIG. 7 shows the processing procedure. Hereinafter, description will be given according to this procedure.

【００１８】ここでは、不要成分識別部６と不要成分減
衰部７を音源信号選択部５０２と音源信号合成部８との
間に置いた場合で説明する。まず、音源信号選択部５０
２から選択された周波数成分を受け取る（７−０１）。
その受け取った周波数成分に対し、パワーの大きい帯域
（ＬＬ（ｆ））を大きい順に、あらかじめ決めた本数分
取り出す（７−０２）。次に、これらの帯域から基本周
波数を選択する。すなわち、人間の音声の基本周波数は
およそ１００Ｈｚから２５０Ｈｚ以内に存在するので、
ステップ７−０２で選んだ帯域のうち、例えば１００Ｈ
ｚから２５０Ｈｚに入るもののみ選抜する（７−０
３）。次に、ステップ７−０３で選んだ帯域（ＬＬｓ
（ｆ））から、基本周波数を推定するため、以下の処理
を行う。すなわち、選んだ帯域（ＬＬｓ（ｆ））それぞ
れについて、その高調波のパワーを加算する。加算した
パワーを、ＳｕｍＬＬｓ（ｆ）とする（７−０４）。こ
の加算した値ＳｕｍＬＬｓ（ｆ）が最も大きくなるＬＬ
ｓ（ｆ）を基本周波数と判定する（７−０５）。Here, a case where the unnecessary component identifying section 6 and the unnecessary component attenuating section 7 are placed between the sound source signal selecting section 502 and the sound source signal combining section 8 will be described. First, the sound source signal selection unit 50
2 is received (7-01).
With respect to the received frequency components, a predetermined number of bands (LL (f)) having larger powers are taken out in the descending order (7-02). Next, a fundamental frequency is selected from these bands. That is, since the fundamental frequency of human voice exists within approximately 100 Hz to 250 Hz,
Of the bands selected in step 7-02, for example, 100H
Only those that fall within 250Hz from z are selected (7-0
3). Next, the band (LLs) selected in step 7-03
The following processing is performed to estimate the fundamental frequency from (f)). That is, for each of the selected bands (LLs (f)), the harmonic power is added. The added power is set to SumLLs (f) (7-04). LL at which the sum SumLLs (f) becomes the largest
It is determined that s (f) is the fundamental frequency (7-05).

【００１９】次に、判定した基本周波数とその高調波以
外の成分で、大きなパワーを持つ帯域ｍｍが存在するか
否かを判定する（７−０６）。この判定は、例えば、基
本周波数とその高調波成分以外の帯域で、所定の値ｘを
越えるパワーを持つ帯域があればその帯域が不要成分で
あると判定し、なければ不要成分は存在しない、と判定
する。不要成分が存在すると判定されたら、不要成分減
衰部７において、不要と判定された帯域ｍｍのパワーを
減衰させる（７−０７）。この減衰の方法は、例えば、
ある値α（０＜＝α＜１）を帯域ｍｍのパワーに掛け
る。ここで、図２に示した実施例の場合と同様の注意と
して、不要成分識別部６と不要成分減衰部７は、帯域分
割部４と帯域別チャネル間パラメータ値差検出部３との
間、または、帯域別チャネル間パラメータ値差検出部３
と上記音源信号判定部５０１との間、または、音源信号
判定部５０１と音源信号選択部５０２との間、または、
音源信号選択部５０２と音源合成部８との間のうち、ど
こに、いくつ置いても良い。ただし、帯域分割部４と帯
域別チャネル間パラメータ値差検出部３との間、また
は、帯域別チャネル間パラメータ値差検出部３と音源信
号判定部５０１との間にのみ置いた場合は、各音源の成
分が判定される前であるため、各音源の調波構造推定精
度が劣化する可能性もある。Next, it is determined whether or not a band mm having a large power exists in the components other than the determined fundamental frequency and its harmonics (7-06). In this determination, for example, if there is a band having power exceeding a predetermined value x in a band other than the fundamental frequency and its harmonic components, it is determined that the band is an unnecessary component. Is determined. If it is determined that there is an unnecessary component, the unnecessary component attenuator 7 attenuates the power of the band mm determined to be unnecessary (7-07). The method of this attenuation is, for example,
A certain value α (0 <= α <1) is multiplied by the power in the band mm. Here, as a precaution similar to the case of the embodiment shown in FIG. 2, the unnecessary component identifying unit 6 and the unnecessary component attenuating unit 7 Or, a band-based inter-channel parameter value difference detection unit 3
Between the sound source signal determination unit 501 and the sound source signal determination unit 501 and the sound source signal selection unit 502, or
Any number may be placed anywhere between the sound source signal selection unit 502 and the sound source synthesis unit 8. However, if it is placed only between the band division unit 4 and the band-based inter-channel parameter value difference detection unit 3 or between the band-based inter-channel parameter value difference detection unit 3 and the sound source signal determination unit 501, Before the components of the sound source are determined, there is a possibility that the harmonic structure estimation accuracy of each sound source is deteriorated.

【００２０】最後に、もう一つの実施例について説明す
る。ここでは、時間波形の領域で不要音を識別し、識別
した信号を抑圧する。図８にその処理手順を示す。ここ
では、不要音識別部と不要音減衰部を、複数のマイクロ
ホン１，２と帯域分割部４との間に置いた場合で説明す
る。まず、信号を一定区間長読み込む（８−０１）。図
８では、例として、フーリエ変換する際のフレーム長分
読み込む場合で説明する。次に、読み込んだ信号が音声
区間か否かを判定する。その判定の仕方は、例えば信号
の自己相関関数を算出し、そのピーク値を求める（８−
０２）。ピークが存在するか否かの判定は、例えば、自
己相関関数の最大値と２番目に大きな値との差が、あら
かじめ定めたしきい値を越えた場合はピークが存在し、
越えない場合はピークが存在しない、という方法で行う
（８−０３）。ピークが存在する場合、その信号区間は
音声区間と判定し、次の帯域分割部４へと送られる（８
−０４）。ピークが検出されない場合はその信号区間は
不要音区間と判定され、不要音減衰部に送られ、信号の
振幅があらかじめ定めた大きさだけ減衰される（８−０
５）。減衰された後、次の処理である帯域分割部４へと
送られる。また、不要音識別部と不要音減衰部は、音源
合成部８の後段に置いても、両方に置いても良い。Finally, another embodiment will be described. Here, the unnecessary sound is identified in the time waveform region, and the identified signal is suppressed. FIG. 8 shows the processing procedure. Here, a case where the unnecessary sound discriminating unit and the unnecessary sound attenuating unit are placed between the plurality of microphones 1 and 2 and the band dividing unit 4 will be described. First, a signal is read for a fixed section length (8-01). In FIG. 8, as an example, a case will be described in which a frame length is read for the Fourier transform. Next, it is determined whether or not the read signal is a voice section. The determination is made, for example, by calculating the autocorrelation function of the signal and obtaining the peak value (8-
02). The determination as to whether or not a peak exists, for example, when the difference between the maximum value of the autocorrelation function and the second largest value exceeds a predetermined threshold, the peak exists,
If not, the peak is not present (8-03). If there is a peak, the signal section is determined to be a voice section, and is sent to the next band division section 4 (8
-04). If no peak is detected, the signal section is determined to be an unnecessary sound section, sent to the unnecessary sound attenuating section, and the signal amplitude is attenuated by a predetermined magnitude (8-0).
5). After being attenuated, it is sent to the band dividing unit 4 which is the next processing. Further, the unnecessary sound discriminating unit and the unnecessary sound attenuating unit may be provided at the subsequent stage of the sound source synthesizing unit 8 or both.

【００２１】以上述べた方法で雑音を抑圧された信号
は、通話用など、人間が直接聞く用途に限らず、例えば
音声認識装置や話者認識装置の入力用信号としても用い
ることが可能である。この発明により、信号のＳ／Ｎを
向上させ、クリアな信号を得ることができる。上述の各
部をハードウェアで構成する場合に限らず、コンピュー
タによりプログラムを読出し解読実行させることによ
り、その各機能を実行させることもできる。また図２又
は図７に示した手法と、図８に示した手法とを併用する
こともできる。The signal whose noise has been suppressed by the method described above can be used not only for the purpose of direct listening by a human, such as for speech, but also for the input signal of a voice recognition device or a speaker recognition device, for example. . According to the present invention, it is possible to improve the S / N of a signal and obtain a clear signal. Not only the case where each of the above-described units is configured by hardware, but also each function can be executed by reading and decoding and executing a program by a computer. Further, the technique shown in FIG. 2 or FIG. 7 and the technique shown in FIG. 8 can be used together.

【００２２】[0022]

【発明の効果】以上述べたようにこの発明によれば、不
要音識別部により、目的信号に対して不要な音声、雑音
などの成分を識別し、その不要音成分を不要音減衰部で
減衰させることにより、部屋全体に充満するような方向
性のない雑音をも抑圧することができ、信号のＳ／Ｎを
向上させ、クリアな信号を得ることができる。As described above, according to the present invention, the unnecessary sound discriminating section discriminates unnecessary voice and noise components with respect to the target signal, and the unnecessary sound component is attenuated by the unnecessary sound attenuating section. By doing so, it is also possible to suppress nondirectional noise that fills the entire room, improve the S / N of the signal, and obtain a clear signal.

[Brief description of the drawings]

【図１】この発明の実施例の機能的構成を示すブロック
図。FIG. 1 is a block diagram showing a functional configuration of an embodiment of the present invention.

【図２】図１中の不要成分識別部６、不要成分減衰部７
の処理手順の例を示す流れ図。FIG. 2 is an unnecessary component identifying unit 6 and an unnecessary component attenuating unit 7 in FIG.
4 is a flowchart showing an example of the processing procedure.

【図３】広帯域ノイズがない場合の周波数特性例を示す
図。FIG. 3 is a diagram illustrating an example of frequency characteristics when there is no broadband noise.

【図４】Ａは広帯域ノイズがない場合のヒストグラム例
を示す図、Ｂはその各パワー区間の値の例を示す図であ
る。FIG. 4A is a diagram illustrating an example of a histogram when there is no broadband noise, and FIG. 4B is a diagram illustrating an example of a value in each power section.

【図５】広帯域ノイズがある場合の周波数特性例を示す
図。FIG. 5 is a diagram illustrating an example of frequency characteristics when there is wideband noise.

【図６】広帯域ノイズがある場合のヒストグラム例を示
す図。FIG. 6 is a diagram showing an example of a histogram when there is broadband noise.

【図７】不要成分識別部６、不要成分減衰部７の処理手
順の他の例を示す流れ図。FIG. 7 is a flowchart showing another example of the processing procedure of the unnecessary component identifying unit 6 and the unnecessary component attenuating unit 7;

【図８】不要音識別部６、不要音減衰部７の処理手順の
更に他の例を示す流れ図。FIG. 8 is a flowchart showing still another example of the processing procedure of the unnecessary sound discriminating unit 6 and the unnecessary sound attenuating unit 7.

───────────────────────────────────────────────────── フロントページの続き (72)発明者松井弘行東京都新宿区西新宿三丁目19番２号日本電信電話株式会社内Ｆターム(参考） 5D015 CC14 DD02 EE04 ────────────────────────────────────────────────── ─── Continuing on the front page (72) Inventor Hiroyuki Matsui 3-19-2 Nishi-Shinjuku, Shinjuku-ku, Tokyo F-term in Nippon Telegraph and Telephone Corporation (reference) 5D015 CC14 DD02 EE04

Claims

[Claims]

1. A sound collection method for separating at least one sound source from a plurality of sound sources by using a plurality of microphones provided apart from each other, wherein each output channel signal of each of the microphones is divided into a plurality of frequency bands. And, for each of the same bands of the output channel signals divided in the above-mentioned band division process, the value of the parameter of the acoustic signal reaching the microphone that changes due to the positions of the plurality of microphones. A difference, a band-by-band parameter value difference detecting step of detecting as a band-by-band parameter value difference, and based on the band-by-band parameter value difference of each band, any of the band-divided output channel signals. A sound source signal determination process of determining whether to select, and, based on the determination result, each of the band-divided output channel signals, A sound source signal selecting step of selecting at least one signal input from one sound source; and a sound source synthesizing step of combining a plurality of band signals selected as signals from the same sound source in the sound source signal selecting step as a sound source signal. In the method, between the band dividing process and the band-based inter-channel parameter value difference detecting process, between the band-based inter-channel parameter value difference detecting process and the sound source signal determining process, the sound source signal determining process and the sound source Selecting at least one of the frequency components of each output channel signal divided in the band dividing step during at least one of the sound source signal selecting step and the sound source synthesizing step; An unnecessary component identification process for identifying unnecessary frequency components other than the signal from the sound source to be performed, and an unnecessary component for attenuating the frequency components identified in the unnecessary component identification process. Sound collecting method characterized by having a decay process.

2. The method according to claim 1, wherein the unnecessary component identifying step includes identifying unnecessary frequency components by focusing on a statistical property of a signal from a sound source to be selected and an unnecessary signal. Characteristic sound collection method.

3. The method according to claim 1, wherein said unnecessary component identifying step identifies a frequency component of an unnecessary signal by focusing on a harmonic structure of a signal of a sound source to be selected. Sound method.

4. A sound collecting method for separating at least one sound source from a plurality of sound sources by using a plurality of microphones provided apart from each other, wherein each output channel signal of each of the microphones is divided into a plurality of frequency bands. And, for each of the same bands of the output channel signals divided in the above-mentioned band division process, the value of the parameter of the acoustic signal reaching the microphone that changes due to the positions of the plurality of microphones. Detecting a difference as an inter-channel parameter value difference for each band; and a step of detecting an inter-channel parameter value difference for each band, based on the inter-band parameter value difference for each band of the respective bands. A sound source signal selecting step of selecting at least one signal input from the sound source; and a signal from the same sound source in the sound source signal selecting step. In the method having a sound source synthesizing process of synthesizing a plurality of band signals selected as a sound source signal, an unnecessary sound identifying process for identifying an unnecessary signal by focusing on a signal level; A sound attenuating step of attenuating the extracted signal between the plurality of microphones and the band dividing step or after the sound source synthesizing step.

5. A sound pickup device for separating at least one sound source from a plurality of sound sources by using a plurality of microphones provided apart from each other, wherein each output channel signal of each of the microphones is divided into a plurality of frequency bands. For each of the same bands of each of the output channel signals divided by the band dividing means, the value of the parameter of the acoustic signal reaching the microphone that changes due to the positions of the plurality of microphones. A difference between band-based channel parameter value difference detecting means for detecting a difference as a band-based channel-to-channel parameter value difference, and any of the band-divided output channel signals based on the band-based inter-channel parameter value difference of each band. Sound source signal determining means for determining whether or not to select, based on the determination result, the same from each of the band-divided output channel signals. Sound source signal selecting means for selecting at least one signal input from a sound source; and sound source synthesizing means for synthesizing, as a sound source signal, a plurality of band signals selected as signals from the same sound source by the sound source signal selecting means. In the apparatus, between the band dividing means and the band-specific inter-channel parameter value difference detecting means, between the band-specific inter-channel parameter value difference detecting means and the sound source signal judging means, the sound source signal judging means and the sound source Between the signal selecting means and at least one between the sound source signal selecting means and the sound source synthesizing means, the frequency component of each output channel signal divided by the band dividing means should be selected. Unnecessary component identification means for identifying unnecessary frequency components other than the signal from the sound source; and unnecessary component reduction for attenuating the frequency components identified by the unnecessary component identification means. Sound collecting device, characterized in that it comprises a means.

6. An apparatus according to claim 5, wherein said unnecessary component identifying means is means for creating a histogram with respect to the power of each band for each of the band-divided signals, and wherein the frequency of a predetermined power section is a predetermined value. A sound pickup device comprising means for identifying an unnecessary component based on the above.

7. The apparatus according to claim 5, wherein said unnecessary component identifying means selects a high power component LLs (f) in a predetermined frequency band from the frequency components selected by said sound source signal selecting means. Means, means for adding the powers of the respective harmonic components of the selected component LLs (f), and harmonic components other than the fundamental wave component LLs (f) having the maximum added power to the unnecessary signal. A sound collection device comprising: means for identifying a frequency component.

8. A sound pickup device for separating at least one sound source from a plurality of sound sources by using a plurality of microphones provided apart from each other, wherein each output channel signal of each of the microphones is divided into a plurality of frequency bands. For each of the same bands of each of the output channel signals divided by the band dividing means, the value of the parameter of the acoustic signal reaching the microphone that changes due to the positions of the plurality of microphones. The band-to-band parameter value difference detecting means for detecting the difference as a band-to-band parameter value difference; and the same from each of the band-divided output channel signals based on the band-to-band parameter value difference of each band. Sound source signal selecting means for selecting at least one signal input from a sound source; A sound source synthesizing means for synthesizing a plurality of band signals selected as a sound source signal, calculating a peak value of an autocorrelation function of the signal, and generating an unnecessary signal depending on whether or not the peak value exists. Unnecessary sound identification means for identifying, and unnecessary sound attenuation means for attenuating a signal identified as unnecessary by the unnecessary sound identification means, between the plurality of microphones and the band division means, or at a subsequent stage of the sound source synthesis means A sound pickup device, which is provided in a sound pickup device.

9. A recording medium recording a program for separating at least one sound source from a plurality of sound sources using a plurality of microphones provided apart from each other, wherein each output channel signal of each of the microphones is stored in a plurality. And a parameter of an acoustic signal reaching the microphone, which varies depending on the positions of the plurality of microphones, for each of the same bands of the output channel signals divided by the above-described band division processing. , A band-based inter-channel parameter value difference detection process for detecting the band-based inter-channel parameter value difference, and the band-divided output channel signal based on the band-based inter-channel parameter value difference of each band. Sound source signal determination processing for determining which of the above is to be selected, and each of the band-divided channels based on the determination result. Sound source signal selection processing for selecting at least one signal input from the same sound source from the channel signal, and sound source synthesis processing for synthesizing, as a sound source signal, a plurality of band signals selected as signals from the same sound source in the sound source signal selection processing Between the band division processing and the band-based inter-channel parameter value difference detection processing, between the band-based inter-channel parameter value difference detection processing and the sound source signal determination processing, and the sound source signal determination processing During at least one of the sound source signal selection process and the sound source signal selection process and the sound source synthesis process, a selection is made of frequency components of each output channel signal divided by the band division process. Unnecessary component identification processing for identifying unnecessary frequency components other than the signal from the sound source to be reduced, and unnecessary component reduction for attenuating the frequency components identified in the unnecessary component identification processing. Computer readable recording medium and a process characterized by having the above-mentioned program.

10. The recording medium according to claim 9, wherein the unnecessary component identification process includes a process of creating a histogram with respect to the power of each band for each of the band-divided signals; A process for identifying an unnecessary component based on whether the value is equal to or greater than a value.

11. The recording medium according to claim 9, wherein said unnecessary component identifying process selects a component LLs (f) having a large power in a predetermined frequency band from frequency components selected by said sound source signal selecting process. , A process of adding the power of each harmonic component of the selected component LLs (f), and a process of removing harmonic components other than the fundamental component LLs (f) whose added power is maximum. And a process for identifying the frequency component.

12. A recording medium recording a program for separating at least one sound source from a plurality of sound sources by using a plurality of microphones provided apart from each other, wherein each output channel signal of each of the microphones is stored in a plurality. And a parameter of an acoustic signal reaching the microphone, which varies depending on the positions of the plurality of microphones, for each of the same bands of the output channel signals divided by the above-described band division processing. , A band-based inter-channel parameter value difference detection process for detecting the band-based inter-channel parameter value difference, and the band-divided output channel signal based on the band-based inter-channel parameter value difference of each band. A sound source signal selection process for selecting at least one signal input from the same sound source; The above program has a sound source synthesizing process for synthesizing a plurality of band signals selected as signals from the same sound source as a sound source signal, and calculates a peak value of an autocorrelation function of the signal, and the peak value exists. An unnecessary sound discriminating process for discriminating an unnecessary signal depending on whether the signal is unnecessary, and an unnecessary sound attenuating process for attenuating a signal identified as unnecessary in the unnecessary sound discriminating process, between the plurality of microphones and the band dividing process. Or a computer-readable recording medium that the program has after the sound source synthesis processing.